We switched to cursor-based pagination (opens in new tab)

(moderntreasury.com)

120 pointsqin3y ago114 comments

114 comments

This post is not about database cursors. It's about the style of pagination where you have a ?_next=xxx link to get to the next page, where the xxx bit encodes details about the last item on the current page such that the next page can show everything that comes after that record.

This is also sometimes known as keyset pagination. My favourite technical explanation of that is here: https://use-the-index-luke.com/no-offset

paulmd3y ago

Sounds great on paper but my experience is that key-based indexing is broken on SOLR and likely other lucene engine DBs. If you are evaluating conditions on child objects inside a parent document, the child objects get split and stored as a separate document that is joined… and if that child document is in a different block/file on disk, then it won’t necessarily be inside the range being scanned, and so you will be missing some results that meet your logical criteria.

Possibly just an implementation error in the BlockJoinParser but it did not occur with numeric pagination.

In order to work with that, you need to “flatten” your json, like with the @JsonUnwrapped annotation, and some structures (like arrays) may become problematic and/or require significant lexical mapping of queries to the dataset.

fizx3y ago

parent-child in lucene is a hack. It only works when the children immediately follow the parent, and that never happens when it's in a different file because you updated the child. It's a minor miracle you didn't notice the error with numeric pagination.

clementmas3y ago

Most users don't care but I very much prefer to have URLs with `/?page=123` instead of `?_next=xxx`. I should get over it but meanwhile I'll stick to offset pagination with deferred join (see aarondf's comment).

jlg233y ago

This post is about "database cursors" and "keyset pagination". In practice, these terms refer to the same thing, one seen bottom-up the other seen top-down. Implementation-wise, one saves the state of the cursor in the pagination [parameters] and resumes reading from the DB with an equivalent cursor.

nextaccountic3y ago

> these terms refer to the same thing

No, cursor is this https://en.wikipedia.org/wiki/Cursor_(databases) https://www.postgresql.org/docs/current/plpgsql-cursors.html

I once did pagination using database cursors, which is something different than keyset pagination: The server would keep a cursor open and keep fetching more data from the same query. This enabled the system to have an interface similar to offset pagination (you get the first page, then the second page, etc) but without doing a new query for each page discarding the first n-1 pages per query

The downside is that it makes the server stateful, and doesn't scale (you would need to keep hundreds of cursors open if you had hundreds of simultaneous users)

1 more reply

aarondf3y ago

There are ways to mitigate the (although not eliminate) the slowing down of offset/limit pagination in later pages. The technique is called a "deferred join" and it is most effective in MySQL. The basic idea is to paginate as little data as necessary, and then do a self-join to get the rest of the data for a single page.

There are libraries for Laravel (https://github.com/hammerstonedev/fast-paginate) and Rails (https://github.com/planetscale/fast_page) as well!

Cursor based pagination is wonderful, but sometimes you're stuck with offset/limit for whatever reason. Might as well make it fast.

barrkel3y ago

To be clear, this technique (which it seems I independently discovered in 2015) mostly only works in MySQL because other databases usually have planners which are smart enough to not pull everything in eagerly.

MySQL is fairly predictable, though, so when you understand that it wants to nested-loop join all your rows before evaluating predicates on the parent table, it's a predictable win to stop it doing that.

The technique is still applicable even when you have no joins, because MySQL will materialize rows with every selected column before evaluating the unindexed portion of the predicate, and the order by.

aarondf3y ago

Hey this is a great explanation. Most people just say "this makes no sense, can never work"

hamandcheese3y ago

Cursor based pagination doesn’t actually solve the described pitfalls if your results are ordered by anything mutable, though.

A result you haven’t yet seen can be mutated to sort before your current cursor, likewise a result that you’ve already seen can be mutated to sort after the current cursor, causing you to see it twice.

Cursor based pagination does minimize the issue somewhat, because only the mutated rows are possibly affected, not unrelated records that lie on page boundaries. But depending on the use case I’m not sure that is worth that added complexity of cursor based pagination (it does get a bit tricky once you have any non-trivial sort clause).

simonw3y ago

The only approach I can think of which might be able to handle mutability of the rows used for sorting would be to support paginating through a snapshot: provide a mechanism whereby a full snapshot of the query at a specific point in time is captured such that the user can then paginate through that snapshot.

Expensive to implement, so this would only work for situations where you can afford to spend significant storage and computation resources to keep a specific user happy.

kibwen3y ago

That might be even worse (or just as bad) for users, as now they won't see any updates to the underlying data set even if they want to, and will presumbaly need to perform some explicit action to get a new snapshot.

Personally, if you care about users not missing any item in a query, you just can't use pagination at all, and you have to give them every item in the query in a single huge dump (running the query again would be the "explicit action" mentioned above that gets the user new data). Conversely, if you use pagination, users are free to assume that they might miss some items unless they already expect the underlying data to be immutable.

1 more reply

mikedouglas3y ago

This theoretically should be possible with MVCC, right? It's not an area I've explored and I could immediately see some issues with resource clean-up, but I could imagine it being possible with most modern DBs.

1 more reply

mbreese3y ago

You could also store timestamps as part of the records. Then your query will always be consistent if you add an extra clause of tstamp < query_tstamp. So long as you store both the query_tstamp and the last record, you’ll get consistent results without needing to store individual cursors/snapshots for each user. (You’ll still have more CPU time per query, but that’s kind of a given here).

You probably also need to switch from deleting records to adding an archive bit (or timestamp).

This gets complicated fast.

2 more replies

andy8003y ago

Why does every web site default to aggressively paginate their information? Pagination sucks, it's a waste of time and of clicks, and should be a last resort. Sure, when Google returns millions of search results, paginate. But:

For instance, if you have 40 entries and specify that there should be 10 items per page

40 entries??? Just show them all to me. My browser has a scrollbar, CTRL-F is much faster than your search box.

"but not everyone has a reliable fast connection" -- yes, which is a good reason to deliver more data per http request than breaking it up and requiring lots of slow requests.

"but the database load" -- half the queries, each returning 2x data, is almost always going to be easier on a RDBMS. If it's not then you probably need to rethink your schema.

ilammy3y ago

> Why does every web site default to aggressively paginate their information?

You get to see more ads while flipping through pages.

Timing metrics for loading smaller pages make marketing happy.

Timing metrics for time spent on the website make marketing happy.

andy8003y ago

Perhaps, for content-based sites. Not if you're a financial institution and visitors to your site are likely looking to find a specific transaction. Not if you are an insurance provider and visitors are trying to find a service provider. If you are a retailer selling stuff, you don't want browsers, you want buyers. I will concede that Amazon does pretty well and they paginate product listings, but I think they use a lot more intelligence to deliver high-value results than typical retailers, including very large ones.

VWWHFSfQ3y ago

> Why does every web site default to aggressively paginate their information?

Because data retrieval costs money and resources. Unbounded API responses are a susceptible to denial-of-service. Well-architected APIs will always cap the number of results they return per-call.

andy8003y ago

Nothing needs to be "unbounded." Set the cap at 100 or 200 records instead of 10-20. Better yet, plenty of results can be cached. A custom search might require a database query, but Newegg's initial listing of MicroSD cards, graphics cards, hard drives, etc, -- queries that are fulfilled identically hundreds of times per day -- can be cached.

1 more reply

friendzis3y ago

Do you perhaps conflate API level pagination with display pagination?

notriddle3y ago

How many people actually read past entry five?

Beltalowda3y ago

Your comment is half-way down the page. I read it.

It really depends on what kind of site we're talking about; there's loads of times I want to see loads of results. Sometimes I don't, but there's no harm is showing me more than I want, either.

hamandcheese3y ago

I’ve seen instances where the issue is not database performance, but web server performance. Serializing a graphql response in graphql-ruby, for example, can easily take 100+ms and grows linearly with number of records in a request.

In the general case this isn’t a good excuse, there’s no reason it should take that long. But you don’t always get to pick your technology.

phendrenad23y ago

I'd guess that most people have written some bad query that only worked for small limits, so they started superstitiously making their limits very low.

Aeolun3y ago

And this kind of thinking is how you end up with 2mb per request.

It’ll be like Jira, pulling in literal megabytes of data and still slow.

Beltalowda3y ago

I don't think the problem there is the pagination (or lack thereof). This page has 103 comments right now and is 204K (31K compressed).

HN is pretty minimalistic, but the point is that text data isn't really what makes requests very large. For reference, 2MB is enough to include an entire book and then some. All of Dune is about 1.7M uncompressed in epub format (and ~1.1M compressed as an epub file). HN threads, Jira discussions, etc. can get pretty large, but I have not seen any yet that have started to approach the size of Dune.

Images might add more data, but those can be loaded on-demand when scrolled into view pretty easily.

andy8003y ago

No, relying on mega-frameworks and non-optimized graphic images are the express train to large request sizes. Don't forget video.

Indeed.com's search results are almost all text. Maybe a company avatar. No reason it can't provide 100+ job listings per page rather than 10-20.

justsomeuser3y ago

I don't think that is a bad thing, considering that a YouTube video in HD would be 500MB and people flick through them like web pages.

I agree with OP that if you are on a powerful machine with fast internet it would be better just to load a massive HTML document of list items.

feike3y ago

Reminds me of Markus Winand who hands out stickers on database conferences banning offset.

His site is a great resource for anyone wanting to take a deeper dive on SQL performance:

https://use-the-index-luke.com/sql/partial-results/fetch-nex...

ReactiveJelly3y ago

So basically do it like Reddit?

https://old.reddit.com/?count=25&after=t3_wtpvdp

I noticed Reddit's pagination has that "after" parameter, which points to the last post on the current page.

It glitches out if the last item is deleted by moderators, but otherwise it works smoothly.

djbusby3y ago

On Reddit I frequently see the "next" page having the same posts as the previous page. Not all the same but many of the same. Like, maybe after is being respected but the sorting is different or something.

3 more replies

MatmaRex3y ago

Yeah, or Wikipedia.

https://en.wikipedia.org/w/index.php?title=Category:Living_p...

wootest3y ago

Which in turn reminds me of: http://simonwillison.net/2022/Aug/16/efficient-pagination-us...

pvorb3y ago

Do you know if this is specific to MySQL or does it also apply to other RDBMS like PostgreSQL?

croes3y ago

Can't he just use rowversion?

No need for row value syntax and it works with MS SQL Server

dinkledunk3y ago

how to jump to an arbitrary page?

nextaccountic3y ago

You can do that with postgres histograms https://www.citusdata.com/blog/2016/03/30/five-ways-to-pagin... - go to the section "Keyset with Estimated Bookmarks"

> As we saw, plain keyset pagination offers no facility to jump a certain percentage into the results except through client guesswork. However the PostgreSQL statistics collector maintains per-column histograms of value distribution. We can use these estimates in conjunction with limits and small offsets to get fast random-access pagination through a hybrid approach.

squeaky-clean3y ago

Jumping to a specific page is a bit of an ambiguous / undefined term in this case. Like asking for a specific page in a book that's still being written. Maybe today the plot twist occurred on page 100, but then the author decides chapter 1 needs more backstory, and now the plot twist happens on page 115.

Unless you can guarantee your data is static, or that the sorting order cannot be mutated and only append later values, the concept of what data belongs in which page could be changing every millisecond.

jsmith993y ago

Do you really need to jump to an arbitrary page and land on the exact item? For many applications an approximate jump is fine. If your column is fairly uniformly distributed you can guess the index for any arbitrary page.

1 more reply

MatmaRex3y ago

You don't, but instead you can jump to an arbitrary place in the results. For example, you could show results starting from the letter P, or show results starting from 2022-04-02.

hnuser8473y ago

Spoiler: you can’t.

1 more reply

jvolkman3y ago

For anyone curious about this approach, I've posted an excerpt [1] of the shared Python + SQLAlchemy + Postgres code we use to handle pagination at my company.

The name "cursor pagination" is super confusing given that "cursor" is an overloaded term in databases. I always call this "token pagination", given that the APIs I've seen usually call the value a token.

[1] https://gist.github.com/jvolkman/b8c0e3d05929a1506c99fbc9474...

drdec3y ago

I was hoping to read about how they handled cleaning up cursors, what the effect on the server was when having a bunch of open, long-running cursors, etc. Unfortunately the article only treated the subject at a superficial level.

So, anyone here implement pagination via cursors? What do you find to be the drawbacks and how do you mitigate them?

imbusy1113y ago

You probably assume they are talking about database cursors. The cursor is just a record ID in this article. There is no long term storage of database cursors on the server. Assuming you can sort all your data, next query just returns all records after the given record ID, plus the record with that ID.

One corner case would be if the cursor record is deleted. I don't see it mentioned how they handle it.

erikpukinskis3y ago

Does that mean you have to scan the entire results set to get the right page? So if I am on page 100, I have to query pages 1-99 and discard them?

Or is there a trick here I’m missing?

2 more replies

erikpukinskis3y ago

Yah, another downside of cursor-based pagination is: what happens when the record the cursor refers to is deleted?

Do you just crash and ask the user to start over?

Do you have to nudge open cursors on every delete?

simonw3y ago

My implementation of cursors works by encoding the primary ID of the last row on the page, along with additional information corresponding to the sort order if that's needed.

That way it doesn't matter if the record is deleted - I can still return the next page by showing records that come after that provided cursor value.

There's an example on this page: https://latest.datasette.io/fixtures/sortable?_sort=sortable

Since the table is sorted by the "sortable" column, the next page link includes this:

    ?_next=15%2Cg%2Cz

15 is the last value for "sortable" on the page, then g,z are the compound primary key for that last row.

1 more reply

hamandcheese3y ago

Instead of the cursor being an ID, it could directly encode whatever column(s) you are sorting by. Then you don’t have to locate any record in particular, you can always return records that sort after the cursor.

christophilus3y ago

It’s not a cursor as in a relational database cursor. It’s a cursor, as in an ID of an item in a stably ordered set. There’s no long-running anything to be worried about.

fein3y ago

I have a few main requirements for what I consider easy to use pagination.

1. I can set how many items per page.

2. I can get to an arbitrary page either through a path/ query param or at least close with a pagination row that contains a way to jump around. If an item gets removed, whatever I was looking for should still be in the same vicinity.

3. As a result of #1 and #2, I can go back and find items I saw previously because their placement is at some fairly reliable position on the page I remember it being on or around.

You know, like how a physical book or catalogue with pages works.

Please stop trying to improve on this and making usability more difficult. I hate infinity scroll and I hate not being able to pick my page in an intuitive fashion.

nemothekid3y ago

>I can get to an arbitrary page either through a path/ query param or at least close with a pagination row that contains a way to jump around

I've had quite a few heated discussions on this point. The problem is, once your dataset gets large enough, this use case is incredibly difficult to scale; not impossible (Google does it with millions of search results), but prohibitively expensive compared to how often the need of being able to jump to any specific page arises.

Now I always try to stick to cursor based pagination as the default in order to prevent people from building workflows on top of offsets.

robertknight3y ago

> Google does it with millions of search results

Google cheats, but in a way that very few users will notice. You can only access the first 20 pages of search results. Depending on user behavior, this is one way to offer navigation via page number while limiting worst-case cost.

1 more reply

int0x2e3y ago

I totally agree with you from a usability standpoint, and while it is possible to make this work well, for larger-scale services it is rarely without costs, and for the most part - people don't seem to care as much as you'd think.

While it can be frustrating, I doubt much revenue (relatively speaking) was lost due to this issue, which means for most apps and services, it will likely not get done.

(I'm not disputing the merit of your arguments, just explaining why this will rarely get done well in real life...)

debrice3y ago

The pagination con given in the article is wrong, switching to stream isn’t fixing the dropping issue, removing sorting is likely why no records are being dropped (you wouldn’t drop records using a numerical sorted ID or creation date)

Pagination is incredibly useful to human. If I tell you I found something on page 15 you can relate to it, something I cannot do with infinite scroll.

indecisive_user3y ago

If a user has to go to page 15 to find something useful to them then I would argue that's a bigger failure of the UX/filtering than it is a success of pagination.

Aeolun3y ago

Books work this way. It’s a very relatable to very many people. Conversely, few people understand why they cannot just jump to page 15.

erikpukinskis3y ago

Worth noting that another solution to the problems with offset cursors is to do updates. When you add/remove rows you just update the results set and then there’s no issue with missing or duplicated rows.

Not easy to do of course, but it’s one direction you can go.

wruza3y ago

Why is offset-based pagination a thing at all? It sucks when a feed is very active, it sucks when you link to a page or visit search results, where it may turn out to be hundreds of pages away in a week. I always wondered why nobody does some obvious (order_col > k, order by order_col, limit n). It’s not a two years old mistake, it’s two decades and sites still do that by default.

Aeolun3y ago

Because normal people enjoy referring to ‘item 4 on page 3’ instead of ‘search for item 57533898-2’.

At least I do. Passing numbers under 10 is easy. Passing identifiers is hell.

wruza3y ago

This makes sense for content that grows at the end.

bjourne3y ago

Ime, concurrent updates to the data set isn't a problem in practice and nobody cares if they occasionally get duplicate entries. The cluttered and sometimes useless urls cursors cause are, again ime, a much bigger usability problem. Sure, naively implemented queries suffer if the user types ?page=123456 in the url but such problems are quite easy to fix.

nickkell3y ago

How do you fix those problems then? Let’s say you have a “last page” button in the UI for example

dmacedo3y ago

Have used the conditional that if current page is greater than last page, just return the last. And same with negative just returning the first. If records are updated / deleted and the last page changed, then you'll just get the results of what the "new last page" are.

At scale you might care about the duplicate or up-to-date records. But cursor-based doesn't solve the problem if a results page is left open for "too long" and stuff was added behind your cursor (or after it, if navigating backwards).

It's as if making things less intuitive (to articles' reference to book pages), makes it any easier as long as you don't think about any pitfalls.

My suggestion is to just use pages, and optimise for the right order (I.e.: sequential IDs, or creation date, alphabetical, etc) that make sense for your data.

If you REALLY must care if results have changed, some delta being stored would be best (like some timestamp that allows the server side to indicate "hey, your results are out of date, from 7 days ago, maybe you left that page/API response unused for too long")

bjourne3y ago

Don't have a last page button. :) Or limit the number of results to, say, 1000, which is trivial for an rdbms to handle. Or precompute the result set's rankings and transform the offset-limit query into a "where rank >= 991 and rank <= 1000" query.

mike_hock3y ago

TL;DR: The headline. TFA doesn't really add any information.

G3rn0ti3y ago

Well, except that TFA explains what DB cursors are and why they are faster than page offsets (because they skip DB entries).

jsmith993y ago

I'm pretty sure they aren't using actual SQL cursors as that wouldn't scale well so it's probably not a 'DB cursor'. It's a shame as actual cursors are the native database way to do pagination.

thaumasiotes3y ago

> TFA explains what DB cursors are and why they are faster than page offsets (because they skip DB entries)

Except that no such information appears in the article. Here's the entirety of the explanation:

>> Once you’ve been returned a cursor, you can provide it in your requests to act as a farther-along starting point within your data entries. The server is then able to efficiently skip all entries that come before your specified cursor value

theteapot3y ago

TFA isn't talking about that kind of cursor.

theogravity3y ago

It needs to talk about how to actually implement it. Most articles like this one mention its existence and why it's good, but generally stop there.

1 more reply

camgunz3y ago

I think the challenge is always the stateless nature of HTTP. It's pretty hard to keep a connection to the API server that has your database cursor.

Everything else has big tradeoffs. You can make sure your rows are sortable by a specific column, but that means your searches must all be sorted by that column.

You can save the results of a search and page through it, but that's a lot of I/O.

You can not do any of the above, but then you run the risk of non-deterministic pagination (I went back and because someone added/deleted a row, that page is slightly different now). You also can't really link to it.

These things might all be fine. In my experience though, you run into people who really want to do search in way A, and will use the tradeoffs for way B/C/D to argue against it, as though A has no tradeoffs.

WesleyJohnson3y ago

At my current job, our intranet site has lackluster performance due, in part, to limit/offset pagination. Unfortunately, the business treats the "reports" we author like glorified spreadsheets and want the ability to filter on any column and order by any column. It makes it near impossible to tune/optimize.

The legacy system we're replacing used cursor pagination in a lot of areas and was perfectly acceptable to them, but now that we're going web based - it's not. Unfortunately, really - it seems vastly superior.

yrgulation3y ago

“It makes it near impossible to tune/optimize.”

I recommend using elastic search or a nosql database to optimise performance. Relational databases can be slow for this use case.

WesleyJohnson3y ago

Thanks, I'll have to search for some case studies on the topic to help me understand the approach. We use ElasticSearch in one small area to search for products across multiple points of metadata and I've always wondered if we were underutilizing ES in general.

jitl3y ago

What kind of NoSQL database are you thinking about? What strategy would you take with that database to optimize this problem?

2 more replies

hirundo3y ago

My sympathies to the coders downstream of this large change for the amount of work required. I would like to add cursor based pagination to the APIs I manage. It would not be an option to go to our clients and explain that offset-pagination will be removed. There seems to be very little cost in supporting both.

redditor986543y ago

AWS does the pagination thing for all LIST APIs and they encode the token in a way they clients cannot deconstruct anything out of it. For them it is just an opaque string. In fact it is cryptographically encrypted and not something simple like a Base64 encoding of some Json data.

And the token has enough metadata to reconstruct the query from where it left behind in the previous call and perform additional security checks so that it cannot be used by other customers to get results from a different account.

I have actually migrated databases from Aurora to DynamoDb behind the scenes where customers continued to call these APIs with these tokens and the calls would flip to the new DB and resume from the last place. No downtime or "maintenance window" which would be a non-starter at AWS anyway.

I recommend this for any external service. For internal services, if you work closely enough with your users, may be you can opt for something more transparent.

mgkimsal3y ago

I didn't see anything specific about timelines. I would expect you'd want to offer both for some length of time, with some deprecation notice and a sunset date for the older approach. But perhaps they seem use cases in their logs that the current offset is used minimally already, and it's better to switch now vs later if/when adoption is higher?

j / k navigate · click thread line to collapse

114 comments

simonw3y ago

This is also sometimes known as keyset pagination. My favourite technical explanation of that is here: https://use-the-index-luke.com/no-offset

paulmd3y ago

Possibly just an implementation error in the BlockJoinParser but it did not occur with numeric pagination.

fizx3y ago

clementmas3y ago

jlg233y ago

nextaccountic3y ago

> these terms refer to the same thing

No, cursor is this https://en.wikipedia.org/wiki/Cursor_(databases) https://www.postgresql.org/docs/current/plpgsql-cursors.html

The downside is that it makes the server stateful, and doesn't scale (you would need to keep hundreds of cursors open if you had hundreds of simultaneous users)

1 more reply

aarondf3y ago

There are libraries for Laravel (https://github.com/hammerstonedev/fast-paginate) and Rails (https://github.com/planetscale/fast_page) as well!

Cursor based pagination is wonderful, but sometimes you're stuck with offset/limit for whatever reason. Might as well make it fast.

barrkel3y ago

aarondf3y ago

Hey this is a great explanation. Most people just say "this makes no sense, can never work"

hamandcheese3y ago

Cursor based pagination doesn’t actually solve the described pitfalls if your results are ordered by anything mutable, though.

simonw3y ago

Expensive to implement, so this would only work for situations where you can afford to spend significant storage and computation resources to keep a specific user happy.

kibwen3y ago

1 more reply

mikedouglas3y ago

1 more reply

mbreese3y ago

You probably also need to switch from deleting records to adding an archive bit (or timestamp).

This gets complicated fast.

2 more replies

andy8003y ago

For instance, if you have 40 entries and specify that there should be 10 items per page

40 entries??? Just show them all to me. My browser has a scrollbar, CTRL-F is much faster than your search box.

"but not everyone has a reliable fast connection" -- yes, which is a good reason to deliver more data per http request than breaking it up and requiring lots of slow requests.

"but the database load" -- half the queries, each returning 2x data, is almost always going to be easier on a RDBMS. If it's not then you probably need to rethink your schema.

ilammy3y ago

> Why does every web site default to aggressively paginate their information?

You get to see more ads while flipping through pages.

Timing metrics for loading smaller pages make marketing happy.

Timing metrics for time spent on the website make marketing happy.

andy8003y ago

VWWHFSfQ3y ago

> Why does every web site default to aggressively paginate their information?

Because data retrieval costs money and resources. Unbounded API responses are a susceptible to denial-of-service. Well-architected APIs will always cap the number of results they return per-call.

andy8003y ago

1 more reply

friendzis3y ago

Do you perhaps conflate API level pagination with display pagination?

notriddle3y ago

How many people actually read past entry five?

Beltalowda3y ago

Your comment is half-way down the page. I read it.

It really depends on what kind of site we're talking about; there's loads of times I want to see loads of results. Sometimes I don't, but there's no harm is showing me more than I want, either.

hamandcheese3y ago

In the general case this isn’t a good excuse, there’s no reason it should take that long. But you don’t always get to pick your technology.

phendrenad23y ago

I'd guess that most people have written some bad query that only worked for small limits, so they started superstitiously making their limits very low.

Aeolun3y ago

And this kind of thinking is how you end up with 2mb per request.

It’ll be like Jira, pulling in literal megabytes of data and still slow.

Beltalowda3y ago

I don't think the problem there is the pagination (or lack thereof). This page has 103 comments right now and is 204K (31K compressed).

Images might add more data, but those can be loaded on-demand when scrolled into view pretty easily.

andy8003y ago

No, relying on mega-frameworks and non-optimized graphic images are the express train to large request sizes. Don't forget video.

Indeed.com's search results are almost all text. Maybe a company avatar. No reason it can't provide 100+ job listings per page rather than 10-20.

justsomeuser3y ago

I don't think that is a bad thing, considering that a YouTube video in HD would be 500MB and people flick through them like web pages.

I agree with OP that if you are on a powerful machine with fast internet it would be better just to load a massive HTML document of list items.

feike3y ago

Reminds me of Markus Winand who hands out stickers on database conferences banning offset.

His site is a great resource for anyone wanting to take a deeper dive on SQL performance:

https://use-the-index-luke.com/sql/partial-results/fetch-nex...

ReactiveJelly3y ago

So basically do it like Reddit?

https://old.reddit.com/?count=25&after=t3_wtpvdp

I noticed Reddit's pagination has that "after" parameter, which points to the last post on the current page.

It glitches out if the last item is deleted by moderators, but otherwise it works smoothly.

djbusby3y ago

3 more replies

MatmaRex3y ago

Yeah, or Wikipedia.

https://en.wikipedia.org/w/index.php?title=Category:Living_p...

wootest3y ago

Which in turn reminds me of: http://simonwillison.net/2022/Aug/16/efficient-pagination-us...

pvorb3y ago

Do you know if this is specific to MySQL or does it also apply to other RDBMS like PostgreSQL?

croes3y ago

Can't he just use rowversion?

No need for row value syntax and it works with MS SQL Server

dinkledunk3y ago

how to jump to an arbitrary page?

nextaccountic3y ago

You can do that with postgres histograms https://www.citusdata.com/blog/2016/03/30/five-ways-to-pagin... - go to the section "Keyset with Estimated Bookmarks"

squeaky-clean3y ago

jsmith993y ago

1 more reply

MatmaRex3y ago

You don't, but instead you can jump to an arbitrary place in the results. For example, you could show results starting from the letter P, or show results starting from 2022-04-02.

hnuser8473y ago

Spoiler: you can’t.

1 more reply

jvolkman3y ago

For anyone curious about this approach, I've posted an excerpt [1] of the shared Python + SQLAlchemy + Postgres code we use to handle pagination at my company.

[1] https://gist.github.com/jvolkman/b8c0e3d05929a1506c99fbc9474...

drdec3y ago

So, anyone here implement pagination via cursors? What do you find to be the drawbacks and how do you mitigate them?

imbusy1113y ago

One corner case would be if the cursor record is deleted. I don't see it mentioned how they handle it.

erikpukinskis3y ago

Does that mean you have to scan the entire results set to get the right page? So if I am on page 100, I have to query pages 1-99 and discard them?

Or is there a trick here I’m missing?

2 more replies

erikpukinskis3y ago

Yah, another downside of cursor-based pagination is: what happens when the record the cursor refers to is deleted?

Do you just crash and ask the user to start over?

Do you have to nudge open cursors on every delete?

simonw3y ago

My implementation of cursors works by encoding the primary ID of the last row on the page, along with additional information corresponding to the sort order if that's needed.

That way it doesn't matter if the record is deleted - I can still return the next page by showing records that come after that provided cursor value.

There's an example on this page: https://latest.datasette.io/fixtures/sortable?_sort=sortable

Since the table is sorted by the "sortable" column, the next page link includes this:

    ?_next=15%2Cg%2Cz

15 is the last value for "sortable" on the page, then g,z are the compound primary key for that last row.

1 more reply

hamandcheese3y ago

christophilus3y ago

It’s not a cursor as in a relational database cursor. It’s a cursor, as in an ID of an item in a stably ordered set. There’s no long-running anything to be worried about.

fein3y ago

I have a few main requirements for what I consider easy to use pagination.

1. I can set how many items per page.

3. As a result of #1 and #2, I can go back and find items I saw previously because their placement is at some fairly reliable position on the page I remember it being on or around.

You know, like how a physical book or catalogue with pages works.

Please stop trying to improve on this and making usability more difficult. I hate infinity scroll and I hate not being able to pick my page in an intuitive fashion.

nemothekid3y ago

>I can get to an arbitrary page either through a path/ query param or at least close with a pagination row that contains a way to jump around

Now I always try to stick to cursor based pagination as the default in order to prevent people from building workflows on top of offsets.

robertknight3y ago

> Google does it with millions of search results

1 more reply

int0x2e3y ago

While it can be frustrating, I doubt much revenue (relatively speaking) was lost due to this issue, which means for most apps and services, it will likely not get done.

(I'm not disputing the merit of your arguments, just explaining why this will rarely get done well in real life...)

debrice3y ago

Pagination is incredibly useful to human. If I tell you I found something on page 15 you can relate to it, something I cannot do with infinite scroll.

indecisive_user3y ago

If a user has to go to page 15 to find something useful to them then I would argue that's a bigger failure of the UX/filtering than it is a success of pagination.

Aeolun3y ago

Books work this way. It’s a very relatable to very many people. Conversely, few people understand why they cannot just jump to page 15.

erikpukinskis3y ago

Not easy to do of course, but it’s one direction you can go.

wruza3y ago

Aeolun3y ago

Because normal people enjoy referring to ‘item 4 on page 3’ instead of ‘search for item 57533898-2’.

At least I do. Passing numbers under 10 is easy. Passing identifiers is hell.

wruza3y ago

This makes sense for content that grows at the end.

bjourne3y ago

nickkell3y ago

How do you fix those problems then? Let’s say you have a “last page” button in the UI for example

dmacedo3y ago

It's as if making things less intuitive (to articles' reference to book pages), makes it any easier as long as you don't think about any pitfalls.

My suggestion is to just use pages, and optimise for the right order (I.e.: sequential IDs, or creation date, alphabetical, etc) that make sense for your data.

bjourne3y ago

mike_hock3y ago

TL;DR: The headline. TFA doesn't really add any information.

G3rn0ti3y ago

Well, except that TFA explains what DB cursors are and why they are faster than page offsets (because they skip DB entries).

jsmith993y ago

I'm pretty sure they aren't using actual SQL cursors as that wouldn't scale well so it's probably not a 'DB cursor'. It's a shame as actual cursors are the native database way to do pagination.

thaumasiotes3y ago

> TFA explains what DB cursors are and why they are faster than page offsets (because they skip DB entries)

Except that no such information appears in the article. Here's the entirety of the explanation:

theteapot3y ago

TFA isn't talking about that kind of cursor.

theogravity3y ago

It needs to talk about how to actually implement it. Most articles like this one mention its existence and why it's good, but generally stop there.

1 more reply

camgunz3y ago

I think the challenge is always the stateless nature of HTTP. It's pretty hard to keep a connection to the API server that has your database cursor.

Everything else has big tradeoffs. You can make sure your rows are sortable by a specific column, but that means your searches must all be sorted by that column.

You can save the results of a search and page through it, but that's a lot of I/O.

WesleyJohnson3y ago

yrgulation3y ago

“It makes it near impossible to tune/optimize.”

I recommend using elastic search or a nosql database to optimise performance. Relational databases can be slow for this use case.

WesleyJohnson3y ago

jitl3y ago

What kind of NoSQL database are you thinking about? What strategy would you take with that database to optimize this problem?

2 more replies

hirundo3y ago

redditor986543y ago

I recommend this for any external service. For internal services, if you work closely enough with your users, may be you can opt for something more transparent.

mgkimsal3y ago

j / k navigate · click thread line to collapse