When we started Amazon, this was precisely what I wanted to do, but using Library of Congress triple classifications instead of ISBN.
It turned out to be impossible because the data provider (a mixture of Baker & Tayler (book distributors) and Books In Print) munged the triple classification into a single string, so you could not find the boundaries reliably.
Had to abandon the idea before I even really got started on it, and it would certainly have been challenging to do this sort of "flythrough" in the 1994-1995 version of "the web".
Kudos!
I've spent quite some time looking at both the LoC Classification and the LoC Subject Headings. Sadly the LoC don't make either freely available in a useful machine-readable form, though it's possible to play games with the PDF versions. I'd been impressed by a few aspects of this, one point that particularly sticks in my mind is that the state-law section of the Classification shows a very nonuniform density of classifications amongst states. If memory serves, NY and CA are by far the most complex, with PA a somewhat distant third, and many of the "flyover" states having almost absurdly simple classifications, often quite similar. I suspect that this reflects the underlying statutory, regulatory, and judical / caselaw complexity.
Another interesting historical factoid is that the classification and its alphabetic top-level segmentation apparently spring directly from Thomas Jefferson's personal library, which formed the origin of the LoC itself.
For those interested, there's a lot of history of the development and enlargement of the Classification in the annual reports of the Librarian of Congress to Congress, which are available at Hathi Trust.
Classification: <https://www.loc.gov/catdir/cpso/lcco/>
Subject headings: <https://id.loc.gov/authorities/subjects.html>
Annual reports:
- Recent: <https://www.loc.gov/about/reports-and-budgets/annual-reports...>
- Historical archive to ~1866: <https://catalog.hathitrust.org/Record/000072049>
[0]: https://en.wikipedia.org/wiki/Library_of_Congress_Classifica...
Lines of actually working code; lines of commented-out inactive code; lines of explanatory comments. HTH!
Naah, gotcha, the other "LoC"... But only got it on about the third occurrence.
Having dealt with Baker & Taylor in the past, this doesn't surprise me in the least. It was one of the most technologically backwards companies I've ever dealt with. Purchase orders and reconciliations were still managed with paper, PDFs, and emails as of early 2020 (when I closed my account). I think at one point they even had me faxing documents in.
It was clear that they were utterly gobsmacked that a team of 3 or 4 people could have done what we have done in the time that we had done it. They had apparently contemplated getting into online retail directly, but saw two big problems: (a) legal and moral pushback from publishers who relied on Ingram just being a distributor (b) the technological challenge. I think at the time their IT staff numbered about 20 or so. They just couldn't believe what they were seeing.
Good times (there weren't very many of those for me in the first 14 months) :)
There’s also the problem of books with invalid ISBNs, i.e. where the check digit doesn’t match the rest of the ISBN, but where correcting the check digit would match a different book. These books would be outside of the ISBN space assumed by the blog post.
[0] https://scis.edublogs.org/2017/09/28/the-dreaded-case-of-dup...
Note: The presentation reflects the contents of Anna's archive exclusively, rather than the entire ISBN catalog. There is a discernible bias towards a limited range of languages, due to Anna's collection bias to those languages. The sections marked in black represent the missing entries in the archive.
Black should mostly be sections that have no assigned books
Zooming in you can see the titles, the barcode and hovering get a book cover and details. Incredible, everything you could want!
Some improvement ideas: checkbox to hide the floating white panel at top left, and the thing at top right. Because I really like to "immerse" in these visualizations, those floaters lift you out of that experience to some extent, limiting fun and functionality for me a bit.
There is an awe inspiring TED talk by Gary W. Flake demonstrating its use.
https://m.youtube.com/watch?v=LT_x9s67yWA
And here is our IEEE paper from 2011.
Really sorry this is not a web standard.
https://www.dropbox.com/scl/fi/bl8zkjs3y47q3377hh3ya/Yan_Wil...
There are more cool submissions here https://software.annas-archive.li/AnnaArchivist/annas-archiv...
Mine is at https://isbnviz.pages.dev
Out of all the VR vapourware, a real life infinite library or infinite museum is the one thing that could conceivably get me dropping cash.
It would be far more interesting as a project which tried to make all legitimately available downloadable texts accessible, say as an interface to:
The preset year part was neat though in and of itself just for looking at how active certain regions and areas have been in publishing. Poland's been really active lately. Norway looks very quiet by comparison. China looks like they ramped in ~2005 and huge amounts in the last decade.
United States has got some weird stuff too. Never heard of them, yet Blackstone Audio, Blurb Inc., and Draft2Digital put out huge numbers of ISBNs.
https://phiresky.github.io/isbn-visualization/?dataset=all&g...
It could probably be tweaked further to not show some of the texts (the N publishers part), less stuff on hover, etc.
On the data density part, actually noticed there's most of the covers for the books too, which was kind of a cool bit. Not sure if it's feasible, yet would be neat to show them as if they were the color of their binding in their pictures.
Makes me almost want a Skyrim style version of this idea, where they're all little 3D books on their 3D shelves, and you can wander down the library isles by sections. Click a book like Skyrim and put it in your inventory or similar. Thought this mod [1] especially was one of the coolest community additions to Skyrim when it came out. Also in the "not sure if it's feasible" category.
[1] Book Covers Skyrim, https://www.nexusmods.com/skyrimspecialedition/mods/901?tab=...
I considered two metrics that ended up being equivalent. First, minimizing loaded tiles assuming a hierarchical tiled map. The cost of moving x horizontally is just x/y tiles, using y as the side length of the viewport. Zooming from y_0 to y_1 loads abs(log_2(y_1/y_0)) tiles, which is consistent with ds = dy/y. Together this is just ds^2 = (dx^2 + dy^2)/y^2, exactly the upper-half-plane metric.
Alternatively, you could think of minimizing the "optical flow" of the viewport in some sense. This actually works out to the same metric up to scaling - panning by x without zooming, everything is just displaced by x/y (i.e. the shift as a fraction of the viewport). Zooming by a factor k moves a pixel at (u,v) to (k*u,k*v), a displacement of (u,v)*(k-1). If we go from a side length of y to y+dy, this is (u,v)*dy/y, so depending how exactly we average the displacements this is some constant times dy/y.
Then the geodesics you want are just the horocycles, circles with centers at y=0, although you need to do a little work to compute the motion along the curve. Once you have the arc, from θ_0 to θ_1, the total time should come from integrating dtheta/y = dθ/sin(θ), so to be exact you'd have to invert t = ln(csc(θ)-cot(θ)), so it's probably better to approximate. edit: mathematica is telling me this works out to θ = atan2(1-2*e^(2t), 2*e^t) which is not so bad at all.
Comparing with the "blub space" logic, I think the effective metric there is ds^2 = dz^2 + (z+1)^2 dx^2, polar coordinates where z=1/y is the zoom level, which (using dz=dy/y^2) works out to ds^2 = dy^2/y^4 + dx^2*(1/y^2 + ...). I guess this means the existing implementation spends much more time panning at high zoom levels compared to the hyperbolic model, since zooming from 4x to 2x costs twice as much as 2x to 1x despite being visually the same.
Their zoom actually is my "y" rather than a scale factor, so the metric is ds^2 = dy^2 + (C-y)^2 dx^2 where C is a bit more than the maximal zoom level. There is some special handling for cases where their curve would want to zoom out further.
Normalizing to the same cost to pan all the way zoomed out (zoom=1), their cost for panning is basically flat once you are very zoomed in, and more than the hyperbolic model when relatively zoomed out. I think this contributes to short distances feeling like the viewport is moving very fast (very little advantage to zooming out) vs basically zooming out all the way over larger distances (intermediate zoom levels are penalized, so you might as well go almost all the way).
The problem is twofold: which path should we take through the zoom levels,x,y and how fast should we move at any given point (and here "moving" includes zooming in/out as well). That's what the blub space would have been cool for, because it combines speed and path into one. So when you move linearly with constant speed through the blub space you move at different speeds at different zoom levels in normal space and also the path and speed changes are smooth.
Unfortunately that turned out not to work quite as well... even though the flight path was alright (although not perfect), the movement speeds were not what we wanted...
I think that comes from the fact that blub space is linear combination of speed and z component. So if you move with speed s at ground level (let's say z=1) you move with speed z at zoom level z (higher z means more zoomed out). But as you pointed out normal zoom behaviour is quadratic so at zoom level z you move with speed z². But I think there is no way to map this behaviour to a euclidean 2d/3d space (or at least I didn't find any. I can't really prove it right now that it's not possible xD)
So to fix the movement speed we basically sample the flight path and just move along it according to the zoom level at different points on the curve... Basically, even though there are durations in the flight path calculation, they get overwritten by TimeInterpolatingTrajectory, which is doing all the heavy work for the speed.
For the path... maybe a quadratic form with something like x^4 with some tweaking would have been better, but the behaviour we had was good enough :) Maybe the question we should ask is not about the interesting properties of non-euclidean spaces, but what makes a flightpath+speed look good
Or try "That Hideous Strength" by "C.S. Lewis" vs "Clive Stables Lewis", and suddenly you're arcing across a huge linear separation.
Still, given that that's what we use, this visualization is lovely. Imagine if you could open every book and read it…
It's not a recipe for getting rich! But it works for me, (and costs almost nothing).
The second section identifies a country, geographical region or language area. It consists of a 1-5 digit number. The third section, up to 7 digits, is given on request of a publisher to the ISBN agency, larger publishers (publishers with a large expected output) are given smaller numbers (as they get more digits to play with in the 4th section). The forth, up to 6 digits, are given to "identify a specific edition, of a publication by a specific publisher in a particular format", the last section is a single check digit, equal to 10|+´digits×⥊6/[1‿3] where digits are the first 12 digits.
From this visualization it's most apparent that the publishers "Create Space"(aka Great UNpublished, booksurge) and "Forgotten Books" should have been given a small number for the third section. (though in my opinion self-published editions and low value spam shouldn't get an isbn number, or rather it should be with the other independently published work @9798)
They also gave Google tons of space but it appears none of it has been used as of yet.
This probably means that in the (hopefully near) future where we have extraterrestrial publishing (most likely in the Moon or Mars) we’ll need another prefix.
So 978 really is Bookland, as it used to be, and Earth, but the EAN-13 namespace as a whole really does refer to Earth as well. That said, the extraterrestrials can get a prefix just the same?
Although I don't know if this was the winning entry or not
https://news.ycombinator.com/item?id=42652577 - Visualizing All ISBNs (2025-01-10, 139 comments)
I know Worldcat has something like this when you search for a book, but the API, I assume is only for library institutions and I'm not a library nor an institution.
Just thank you.
https://software.annas-archive.li/AnnaArchivist/annas-archiv...
Things I love:
- How every book has a title & link to the google books
- Information density - You can see various publishers, sparse areas of the grid, and more
- Visualization of empty space -- great work at making it look like a big bookshelf!
Improvements?
- Instead of 2 floating panels, collapse to 1
- After clicking a book, the tooltip should disappear once you zoom out/move locations!
- Sort options (by year, by publisher name)
- Midlevel visualization - I feel like at the second zoom level (groupings of publishers), there's little information that it provides besides the names and relative sparsity (so we can remove the ISBN-related stuff on every shelf) Also since there are a fixed width of shelves, I can know there are 20 publishers, so no need! If we declutter, it'll make for a really nice physical experience!
[1] (Gitlab page) https://software.annas-archive.li/AnnaArchivist/annas-archiv...
When I got down to the individual book level, I found several that didn’t have any metadata- not even a title. There are hyperlinks to look up the ISBN on Google books or World Cat, and in the cases I tried WorldCat had the data.
So… why not bring the worldcat data into the dataset?
Considering a specific example: "Forecasting Catastrophic Events in Technology, Nature and Medicine". The website's use of "Group 978-3: German language" is a bit of a misnomer, if they had said "Group 978-3: German issued" or "German publisher" it would be clearer to users.
https://en.wikipedia.org/wiki/ISBN#ISBN-10_to_ISBN-13_conver...