Visualizing all books of the world in ISBN-Space (opens in new tab)

(phiresky.github.io)

486 pointsphiresky1y ago91 comments

91 comments

Wow.

When we started Amazon, this was precisely what I wanted to do, but using Library of Congress triple classifications instead of ISBN.

It turned out to be impossible because the data provider (a mixture of Baker & Tayler (book distributors) and Books In Print) munged the triple classification into a single string, so you could not find the boundaries reliably.

Had to abandon the idea before I even really got started on it, and it would certainly have been challenging to do this sort of "flythrough" in the 1994-1995 version of "the web".

Kudos!

dredmorbius1y ago

What are you referring to as the LoC triple classification?

I've spent quite some time looking at both the LoC Classification and the LoC Subject Headings. Sadly the LoC don't make either freely available in a useful machine-readable form, though it's possible to play games with the PDF versions. I'd been impressed by a few aspects of this, one point that particularly sticks in my mind is that the state-law section of the Classification shows a very nonuniform density of classifications amongst states. If memory serves, NY and CA are by far the most complex, with PA a somewhat distant third, and many of the "flyover" states having almost absurdly simple classifications, often quite similar. I suspect that this reflects the underlying statutory, regulatory, and judical / caselaw complexity.

Another interesting historical factoid is that the classification and its alphabetic top-level segmentation apparently spring directly from Thomas Jefferson's personal library, which formed the origin of the LoC itself.

For those interested, there's a lot of history of the development and enlargement of the Classification in the annual reports of the Librarian of Congress to Congress, which are available at Hathi Trust.

Classification: <https://www.loc.gov/catdir/cpso/lcco/>

Subject headings: <https://id.loc.gov/authorities/subjects.html>

Annual reports:

- Recent: <https://www.loc.gov/about/reports-and-budgets/annual-reports...>

- Historical archive to ~1866: <https://catalog.hathitrust.org/Record/000072049>

smcin1y ago

Never knew about LoC book Classification till now; based on what I read I'd call it a failed US-wide attempt to standardize US collections (not international ones). Neat as it is, it's not free to access ($; why??), it's not used outside US(/Canada) and it's not used as standard by US booksellers or libraries, and it's anglocentric as noted in [0] (an alternative being Harvard–Yenching Classification, for Chinese books). Also that's disappointing you say that the states vary greatly in applying that segmentation.

[0]: https://en.wikipedia.org/wiki/Library_of_Congress_Classifica...

dredmorbius1y ago

The LoC classifications are, so far as I'm aware, free from distribution restrictions as works of the US government under copyright, and to that extent it's legal to distribute them for free.

However the LoC doesn't provide machine-readable data for free so far as I'm aware.

You can acquire the entire Classification and Subject Headings as PDF files (also WordPerfect (!!!) and MS Word, possibly some other formats), though that needs some pretty complex parsing to convert to a structured data format.

(I've not tried the WP files, though those might be more amenable to conversion.)

As far was "why", presumably some misguided government revenue-generating and/or business-self-interest legislation and/or regulation, e.g., library service providers who offer LoC Class/SH data, who prefer not to have free competition. (I'm speculating, I don't know this for a fact, though it seems fairly likely.)

1 more reply

dredmorbius1y ago

Answering separately:

...I'd call it a failed US-wide attempt to standardize...

The LOC Classification is a system for organising a printed (and largely bound) corpus on physical shelves/stacks. That is, any given document can occupy at most one and only one location, and two or more documents cannot occupy the same location, if they comprise separate bound volumes or other similar formats (audio recordings, video recordings, maps, microfilm/microfiche, etc.).

For digital records, this constraint isn't as significant (your filesystem devs will want to ensure against multiple nonduplicate records having the same physical address, but database references and indices are less constrained).

The Subject Headings provide ways of describing works in standardised ways. Think of it as strongly similar to a tagging system, but with far more thought and history behind it. ("Folksonomy" is a term often applied to tagging systems, with some parts both of appreciation and frustration.)

Where a given work has one and only one call number fitting within the LoC classification, using additional standardised classifications such as Cutter Codes, author and publication date, etc., works typically have multiple Subject Headings. Originally the SH's were used to create cross-references in physical card catalogues. Now they provide look-up affordances in computerised catalogues (e.g., WorldCat, or independently-derived catalogues at universities or the Library of Congress itself). You'll typically find a list of LoC SH's on the copyright page of a book along with the LoC call number.

Back to the Classification: there are many criticisms raised about LoC's effort, or others (e.g., Dewey Decimal, which incidentally is not free and is subject to copyright and possibly other IP, with some amusing case history). What critics often ignore is that classifications specifically address problems of item storage and retrieval, and as such, are governed by what is within the collection, who seeks to use that material, and how. In the case of state legal classifications, absent further experience with both that section of the classification (section K of the LoC Classification) and works within it, I strongly suspect that the complexity variation is a reflection of the underlying differences in state law (as noted above) and those wishing to reference it. That is, NY, CA, and PA probably have far greater complexity and far more demanding researchers, necessitating a corresponding complexity of their subsections of that classification, than do, say, Wyoming, North Dakota, and South Dakota (among the three smallest sections of state law by my rather faltering recollection).

Peculiarities of both the Dewey and LoC classifications, particularly in such areas as history (LoC allocates two alphabetic letters, E and F respectively, to "History of the Americas), geography, religion, etc. In the case of Dewey, Religion (200) is divided into General Religion (200--209), Philosophy and Theory of Religion (210--219), then the 220s, 230s, 240s, 250s, 260s, 270s, and 280s to various aspects of Christianity. All other religions get stuffed into the 290s. Cringe.

Going through LoC's Geography and History sections one finds some interesting discontinuities particularly following 1914--1917, 1920, 1939--1945, and 1990. There are of course earlier discontinuities, but the Classification's general outline was specified in the early 1800s, and largely settled by the late 1800s / early 20th century. Both the Classification and Subject Headings note many revisions and deprecated / superseding terms. Some of that might attract the attention of the present Administration, come to think of it, which would be particularly unfortunate.

The fact that the LoC's Classification and SH both have evident and reasonably-well-functioning revision and expansion processes and procedures actually seems to me a major strength of both systems. It's not an absolute argument for their adoption, but it's one which suggests strong consideration, in addition to the extant wide usage, enormous corpus catalogued, and supporting infrastructure.

1 more reply

CRConrad1y ago

> What are you referring to as the LoC triple classification?

Lines of actually working code; lines of commented-out inactive code; lines of explanatory comments. HTH!

Naah, gotcha, the other "LoC"... But only got it on about the third occurrence.

ilamont1y ago

> a mixture of Baker & Tayler (book distributors)

Having dealt with Baker & Taylor in the past, this doesn't surprise me in the least. It was one of the most technologically backwards companies I've ever dealt with. Purchase orders and reconciliations were still managed with paper, PDFs, and emails as of early 2020 (when I closed my account). I think at one point they even had me faxing documents in.

PaulDavisThe1st1y ago

A bit tangential but one of my favorite early amzn stories is when a small group from Ingram (at the time, the other major US book distributor) came to visit us in person (they were not very far away ... by design).

It was clear that they were utterly gobsmacked that a team of 3 or 4 people could have done what we have done in the time that we had done it. They had apparently contemplated getting into online retail directly, but saw two big problems: (a) legal and moral pushback from publishers who relied on Ingram just being a distributor (b) the technological challenge. I think at the time their IT staff numbered about 20 or so. They just couldn't believe what they were seeing.

Good times (there weren't very many of those for me in the first 14 months) :)

layer81y ago

It’s not uncommon for an ISBN to have been assigned multiple times to different books [0]. Thus “all books in ISBN space” may be an overstatement.

There’s also the problem of books with invalid ISBNs, i.e. where the check digit doesn’t match the rest of the ISBN, but where correcting the check digit would match a different book. These books would be outside of the ISBN space assumed by the blog post.

[0] https://scis.edublogs.org/2017/09/28/the-dreaded-case-of-dup...

mormegil1y ago

And possibly not even assigned at all. I looked at the lowest known ISBNs for Czech publishers and a different color stood out: no, https://books.google.cz/books?vid=ISBN9788000000015&redir_es... is not a correct ISBN, I'd say :-) (But I don't know if the book includes such obviously-fake ISBN, or the error is just in Google Books data.)

Finnucane1y ago

Publishers buy blocks of isbns based on expected need, how the actually assign them may be arbitrary.

rsecora1y ago

Impressive presentation.

Note: The presentation reflects the contents of Anna's archive exclusively, rather than the entire ISBN catalog. There is a discernible bias towards a limited range of languages, due to Anna's collection bias to those languages. The sections marked in black represent the missing entries in the archive.

phireskyOP1y ago

That's not entirely accurate since AA has separate databases for books they have as files, and one for books they only know the metadata of. The metadata database comes from various sources and as far as I know is pretty complete.

Black should mostly be sections that have no assigned books

bloak1y ago

I found some books which are available from dozens of online bookshops but which are not in this visualisation. Perhaps they're not yet in any library that feeds into worldcat.org, though some of them were about five years old.

phireskyOP1y ago

Did you search by title or ISBN? If you search by title, the search goes through Google Books, which is very incomplete (since I didn't build a search database myself). If you put in an ISBN13 directly, you'll find a lot more books are included (I'd say you can only find 10-30% of books via the Google Books API)

It's a bit misleading I guess the way I added that feature.

keepamovin1y ago

Wow, that is really cool. What an amazing passion project and what an incredible resource!

Zooming in you can see the titles, the barcode and hovering get a book cover and details. Incredible, everything you could want!

Some improvement ideas: checkbox to hide the floating white panel at top left, and the thing at top right. Because I really like to "immerse" in these visualizations, those floaters lift you out of that experience to some extent, limiting fun and functionality for me a bit.

robwwilliams1y ago

Ah, this is a perfect application for Microsift SilverLight PivotViewer, a terrific web interface we used for neuroimaging until Microsoft pulled the plug.

There is an awe inspiring TED talk by Gary W. Flake demonstrating its use.

https://m.youtube.com/watch?v=LT_x9s67yWA

And here is our IEEE paper from 2011.

Really sorry this is not a web standard.

https://www.dropbox.com/scl/fi/bl8zkjs3y47q3377hh3ya/Yan_Wil...

c-fe1y ago

Very cool visualisation!

There are more cool submissions here https://software.annas-archive.li/AnnaArchivist/annas-archiv...

Mine is at https://isbnviz.pages.dev

2551y ago

When you zoom in it's book shelves! That's so cool

MeteorMarc1y ago

Possible improvement: paperback and bounded editions are shown next to each other, but look the same. Do not know about the e-books.

greenie_beans1y ago

those would be totally different isbns. to connect the related editions you'd probably need to get something like the FBR records for each work and idk if anna's archive has related books like that?

grues-dinner1y ago

Awesome. A real life Library of Babel: https://libraryofbabel.info/

Out of all the VR vapourware, a real life infinite library or infinite museum is the one thing that could conceivably get me dropping cash.

WillAdams1y ago

Unfortunately, the writers won't see any of that for this particular implementation.

It would be far more interesting as a project which tried to make all legitimately available downloadable texts accessible, say as an interface to:

https://onlinebooks.library.upenn.edu/

araes1y ago

Found the presentation a little overwhelming in the current format. Took a bit to realize the preset part in the upper left actually led to further dataviz vectors like AA (yes/no), rarity, and Google Books inclusion. However, offers a lot in terms of the visualization and data depth available. Also liked https://archive.anarchy.cool/blog/all-isbns.html#visualizing for the region clustering look.

The preset year part was neat though in and of itself just for looking at how active certain regions and areas have been in publishing. Poland's been really active lately. Norway looks very quiet by comparison. China looks like they ramped in ~2005 and huge amounts in the last decade.

United States has got some weird stuff too. Never heard of them, yet Blackstone Audio, Blurb Inc., and Draft2Digital put out huge numbers of ISBNs.

phireskyOP1y ago

It is admittedly pretty noisy, which is somewhat intentional because the focus was on high data density. Here's an a bit more minimalistic view (less color, only one text level simultaneously):

https://phiresky.github.io/isbn-visualization/?dataset=all&g...

It could probably be tweaked further to not show some of the texts (the N publishers part), less stuff on hover, etc.

araes1y ago

It's ok. Not an issue, since it was meeting the request of the competition, not some commenter on the web.

On the data density part, actually noticed there's most of the covers for the books too, which was kind of a cool bit. Not sure if it's feasible, yet would be neat to show them as if they were the color of their binding in their pictures.

Makes me almost want a Skyrim style version of this idea, where they're all little 3D books on their 3D shelves, and you can wander down the library isles by sections. Click a book like Skyrim and put it in your inventory or similar. Thought this mod [1] especially was one of the coolest community additions to Skyrim when it came out. Also in the "not sure if it's feasible" category.

[1] Book Covers Skyrim, https://www.nexusmods.com/skyrimspecialedition/mods/901?tab=...

makizar1y ago

That's a really nice view! Made me think of this map of IP addresses arranged as a Hilbert curve i saw in this Tom7 video: (the rest of the video is wildly good if you haven't seen it) https://youtu.be/JcJSW7Rprio?si=wzFq4p61qYmpT59x&t=360

pfedak1y ago

I think you can reasonably think about the flight path by modeling the movement on the hyperbolic upper half plane (x would be the position along the linear path between endpoints, y the side length of the viewport).

I considered two metrics that ended up being equivalent. First, minimizing loaded tiles assuming a hierarchical tiled map. The cost of moving x horizontally is just x/y tiles, using y as the side length of the viewport. Zooming from y_0 to y_1 loads abs(log_2(y_1/y_0)) tiles, which is consistent with ds = dy/y. Together this is just ds^2 = (dx^2 + dy^2)/y^2, exactly the upper-half-plane metric.

Alternatively, you could think of minimizing the "optical flow" of the viewport in some sense. This actually works out to the same metric up to scaling - panning by x without zooming, everything is just displaced by x/y (i.e. the shift as a fraction of the viewport). Zooming by a factor k moves a pixel at (u,v) to (k*u,k*v), a displacement of (u,v)*(k-1). If we go from a side length of y to y+dy, this is (u,v)*dy/y, so depending how exactly we average the displacements this is some constant times dy/y.

Then the geodesics you want are just the horocycles, circles with centers at y=0, although you need to do a little work to compute the motion along the curve. Once you have the arc, from θ_0 to θ_1, the total time should come from integrating dtheta/y = dθ/sin(θ), so to be exact you'd have to invert t = ln(csc(θ)-cot(θ)), so it's probably better to approximate. edit: mathematica is telling me this works out to θ = atan2(1-2*e^(2t), 2*e^t) which is not so bad at all.

Comparing with the "blub space" logic, I think the effective metric there is ds^2 = dz^2 + (z+1)^2 dx^2, polar coordinates where z=1/y is the zoom level, which (using dz=dy/y^2) works out to ds^2 = dy^2/y^4 + dx^2*(1/y^2 + ...). I guess this means the existing implementation spends much more time panning at high zoom levels compared to the hyperbolic model, since zooming from 4x to 2x costs twice as much as 2x to 1x despite being visually the same.

pfedak1y ago

Actually playing around with it the behavior was very different from what I expected - there was much more zooming. Turns out I missed some parts of the zoom code:

Their zoom actually is my "y" rather than a scale factor, so the metric is ds^2 = dy^2 + (C-y)^2 dx^2 where C is a bit more than the maximal zoom level. There is some special handling for cases where their curve would want to zoom out further.

Normalizing to the same cost to pan all the way zoomed out (zoom=1), their cost for panning is basically flat once you are very zoomed in, and more than the hyperbolic model when relatively zoomed out. I think this contributes to short distances feeling like the viewport is moving very fast (very little advantage to zooming out) vs basically zooming out all the way over larger distances (intermediate zoom levels are penalized, so you might as well go almost all the way).

SrTobi1y ago

Hi, I was the one nerdsniped :) In the end I don't think blub space is the best way to do the whole zoom thing, but I was intrigued by the idea and had already spend too much time on it and the result turned out quite good.

The problem is twofold: which path should we take through the zoom levels,x,y and how fast should we move at any given point (and here "moving" includes zooming in/out as well). That's what the blub space would have been cool for, because it combines speed and path into one. So when you move linearly with constant speed through the blub space you move at different speeds at different zoom levels in normal space and also the path and speed changes are smooth.

Unfortunately that turned out not to work quite as well... even though the flight path was alright (although not perfect), the movement speeds were not what we wanted...

I think that comes from the fact that blub space is linear combination of speed and z component. So if you move with speed s at ground level (let's say z=1) you move with speed z at zoom level z (higher z means more zoomed out). But as you pointed out normal zoom behaviour is quadratic so at zoom level z you move with speed z². But I think there is no way to map this behaviour to a euclidean 2d/3d space (or at least I didn't find any. I can't really prove it right now that it's not possible xD)

So to fix the movement speed we basically sample the flight path and just move along it according to the zoom level at different points on the curve... Basically, even though there are durations in the flight path calculation, they get overwritten by TimeInterpolatingTrajectory, which is doing all the heavy work for the speed.

For the path... maybe a quadratic form with something like x^4 with some tweaking would have been better, but the behaviour we had was good enough :) Maybe the question we should ask is not about the interesting properties of non-euclidean spaces, but what makes a flightpath+speed look good

pfedak1y ago

The nice thing about deciding on a distance metric is that it gives you both a path (geodesics) and the speed, and if you trust your distance metric it should be perceptually constant velocity. I agree it's non-euclidean, I think the hyperbolic geometry description works pretty well (and has the advantage of well-studied geodesics).

I did finally find the duration logic when I was trying to recreate the path, I made this shader to try to compare: https://www.shadertoy.com/view/l3KBRd

1 more reply

zellyn1y ago

This really drives home how scattershot organizing books by publisher is. Try searching for "Harry Potter and the Goblet of Fire" and clicking on each of the results in turn: they're nowhere near each other.

Or try "That Hideous Strength" by "C.S. Lewis" vs "Clive Stables Lewis", and suddenly you're arcing across a huge linear separation.

Still, given that that's what we use, this visualization is lovely. Imagine if you could open every book and read it…

Finnucane1y ago

Why would you expect otherwise? Titles are assigned ISBNs by publishers as they are being published. Books published simultaneously as a set might have sequential numbers, but otherwise not. Books separated by a year or more are not going to have related numbers. It's an inventory tracking mechanism, it has no other meaning.

bambax1y ago

I did find my micro-publishing house relatively easily... Very cool! ;-)

https://i.imgur.com/mhw6Mub.png

tomw18081y ago

I know that isn't an AMA, but may I ask, how is running a publishing house working out for you? From the outside, having a small publishing house, sounds like an uphill battle on all fronts. What is the main driver to become a publisher - hobby turned into profession?

bambax1y ago

It's mostly still a hobby. I publish very few books and could do without being a "publisher". But being one allows me to exist in the official French databases where traditional bookshops can find my books and order with one click. Without that, they would have to search on Google, find a phone number or an email address, and they probably wouldn't even bother.

It's not a recipe for getting rich! But it works for me, (and costs almost nothing).

phireskyOP1y ago

Huh, that text (and barcodes) are very offset from where they should be. Would you mind sharing what OS and browser you are using and if this text weirdness was temporary or all the time?

bambax1y ago

This was on Chrome 109.0.5414.120, last available version for Windows 7. (And yes I know, I know, it's very bad to still run Win7 ;-)

casey21y ago

Regarding ISBN The first section consists of a 3 digit number are issued by GS1, they have only issued 978 and 979, all other sections are issued by the International ISBN Agency.

The second section identifies a country, geographical region or language area. It consists of a 1-5 digit number. The third section, up to 7 digits, is given on request of a publisher to the ISBN agency, larger publishers (publishers with a large expected output) are given smaller numbers (as they get more digits to play with in the 4th section). The forth, up to 6 digits, are given to "identify a specific edition, of a publication by a specific publisher in a particular format", the last section is a single check digit, equal to 10|+´digits×⥊6/[1‿3] where digits are the first 12 digits.

From this visualization it's most apparent that the publishers "Create Space"(aka Great UNpublished, booksurge) and "Forgotten Books" should have been given a small number for the third section. (though in my opinion self-published editions and low value spam shouldn't get an isbn number, or rather it should be with the other independently published work @9798)

They also gave Google tons of space but it appears none of it has been used as of yet.

JustinGoldberg91y ago

https://annas-archive.org/blog/all-isbns.html

Jun81y ago

Great description of ISBN format description and visualization. TIL that 978- prefix was “Bookland” ie a fictional country prefix that may be thought of as “Earth”. It has expanded to 979- which was originally “Musicland”.

This probably means that in the (hopefully near) future where we have extraterrestrial publishing (most likely in the Moon or Mars) we’ll need another prefix.

quink1y ago

Not really. The 978 prefix, or previously ISBN-10 namespace, in addition to a recalculation of the checksum, makes most books go into the EAN-13 namespace. EAN is meant for unique identifiers (“Numbers”) of “Articles” in “Europe”. Later that got changed to “International”, but most still prefer the acronym EAN.

So 978 really is Bookland, as it used to be, and Earth, but the EAN-13 namespace as a whole really does refer to Earth as well. That said, the extraterrestrials can get a prefix just the same?

celltalk1y ago

This is library of Alexendria but in digital format. Amazing work!

dark-star1y ago

This was made in response to the "challenge" posted on the Anna's Archive blog: https://annas-archive.org/blog/all-isbns.html

Although I don't know if this was the winning entry or not

c-fe1y ago

Winner not yet chosen. More cool ones are here https://software.annas-archive.li/AnnaArchivist/annas-archiv...

omoikane1y ago

Previous thread:

https://news.ycombinator.com/item?id=42652577 - Visualizing All ISBNs (2025-01-10, 139 comments)

youssefabdelm1y ago

Does anyone know if there's an API where I could plug in ISBN and get all the libraries in the world that have that book?

I know Worldcat has something like this when you search for a book, but the API, I assume is only for library institutions and I'm not a library nor an institution.

ofou1y ago

This is a wonderful submission to Anna's archive [1]. I really love people pushing the boundaries of shadow source initiatives that benefit all of us, especially providing great code and design. Can't emphasize enough the net plus of open source, BitTorrent, and shadow libraries that have had in the world. You can also make the case that LLMs wouldn't have been possible without shadow libraries; it's just no way of getting enough data to learn.

Just thank you.

https://software.annas-archive.li/AnnaArchivist/annas-archiv...

randomcatuser1y ago

Nice work!

Things I love:

- How every book has a title & link to the google books

- Information density - You can see various publishers, sparse areas of the grid, and more

- Visualization of empty space -- great work at making it look like a big bookshelf!

Improvements?

- Instead of 2 floating panels, collapse to 1

- After clicking a book, the tooltip should disappear once you zoom out/move locations!

- Sort options (by year, by publisher name)

- Midlevel visualization - I feel like at the second zoom level (groupings of publishers), there's little information that it provides besides the names and relative sparsity (so we can remove the ISBN-related stuff on every shelf) Also since there are a fixed width of shelves, I can know there are 20 publishers, so no need! If we declutter, it'll make for a really nice physical experience!

ks20481y ago

Does anyone see were the raw data is downloaded? I see this [1], but looks like it might just be the list of ISBNs and not the titles. I suppose following the build instructions for this page [2] would do it, but rather not install these js tools.

[1] (Gitlab page) https://software.annas-archive.li/AnnaArchivist/annas-archiv...

[2] https://github.com/phiresky/isbn-visualization

artninja19881y ago

Did they get the bounty?

vallode1y ago

I believe the bounty is closed but they haven't announced the winner(s).

IOUnix1y ago

This would be absolutely incredible to incorporate into VR. You could create such an intuitive organizational method adding a 3rd dimension for displaying.g

IOUnix1y ago

This would be absolutely incredible to incorporate into VR. You could create such an intuitive organizational method adding a 3rd dimension for displaying.

pbronez1y ago

Super cool. Love that you can zoom all the way in and suddenly it looks like a bookshelf.

When I got down to the individual book level, I found several that didn’t have any metadata- not even a title. There are hyperlinks to look up the ISBN on Google books or World Cat, and in the cases I tried WorldCat had the data.

So… why not bring the worldcat data into the dataset?

Hnrobert421y ago

I didn't appreciate the difficulty of the Fly to Book path calculation until I read the description!

siddharthgoel881y ago

What an amazing visualization.

fnord771y ago

there's a massive block under "German Language" that's almost entirely english

https://i.imgur.com/LKDuTJP.png

godber1y ago

Good find, I think those would be books written in English published by German publishers. The blog post discusses how ISBNs are allocated ... specifically the ones you picture are published by Springer, which is a German company that publishes in the English language.

Considering a specific example: "Forecasting Catastrophic Events in Technology, Nature and Medicine". The website's use of "Group 978-3: German language" is a bit of a misnomer, if they had said "Group 978-3: German issued" or "German publisher" it would be clearer to users.

phireskyOP1y ago

The names of the groups are taken directly from the official allocation XML from the International ISBN Organization. This is what they call it, I assume to distinguish it from "Germany". Maybe "German language publishers" would be appropriate.

jaakl1y ago

One more zoom LoD please: to the actual pages of the books!

Ekaros1y ago

No wonder search did not work for one book I tried. First it did not have the prefix and then the information on either the book or the database had different numbers...

ElijahLynn1y ago

Source code: https://github.com/phiresky/isbn-visualization

zeristor1y ago

I guess the 0 prefix for books in the UK of youre just had the 978 prefix added in front of it ages ago when they were running out of digits

omoikane1y ago

The 978 prefix was added to convert from ISBN-10 to ISBN-13:

https://en.wikipedia.org/wiki/ISBN#ISBN-10_to_ISBN-13_conver...

destitude1y ago

Searched for some books and didn't find them.. I assume this in't complete list of all USA based books with ISBN published?

phireskyOP1y ago

Books visible should be fairly complete as far as I know. But the search is pretty limited (and dependent on your location) because that uses the Google Books API. If you put in an ISBN13 directly, that should work more reliably.

sinuhe691y ago

I wonder if one day we will have an AI that reads, summarizes and catalogues all the published books? A super librarian :) Imagine being able to ask questions like: "What have they written about AI in the 21st century?". Even better: "What did people not think of when they pursued AGI in the 21st century, which later led to their extinction?" ;)

pillefitz1y ago

Since most foundational models have been trained on illegally acquired books, this info should be already baked in.

sinuhe691y ago

They had access only to just one tiny bit of the entire written words of the worlds. Not all books are available in electronic formats. You can see from the visualization in the article that we don’t even have the titles for a lot of published books with ISBN. And even so, books with ISBN comprise only a fraction of the entire ever written books. Not too mention books in various (minor) languages of the worlds.

godber1y ago

This is really exceptional work, and still works on my ten year old iPad. Great job!

maCDzP1y ago

I am guessing this is going to win the bounty by Anna’s Archive?

soheil1y ago

Needs a better minimap that moves as the map is zoomed in.

tekkk1y ago

Wow, what a cool little project, congratulations on shipping!

karunamurti1y ago

Nice, I found my book in the rack somewhere.

est1y ago

I am amazed to see how many STEM books China published.

est1y ago

can the graph be rotated vertically? The text is needlessly difficult to read.

phireskyOP1y ago

Yes! You can go here to have group texts be horizontal, it was just a stylistic choice to better show the layout: https://phiresky.github.io/isbn-visualization/?groupTextVert...

compootr1y ago

hey, is this that annas archive bounty thing? Best of luck to you!

chikere2321y ago

L-space!

vivzkestrel1y ago

Imagine making a foundational LLM model out of every book every written on every subject ever conceived

ofou1y ago

It’s called DeepSeek. The founder just confirmed a few days ago that he got the data from Anna's to train on, I think for their latest vision model.

kowlo1y ago

books are missing

lovegrenoble1y ago

crazy work!

j / k navigate · click thread line to collapse

91 comments

PaulDavisThe1st1y ago

Wow.

When we started Amazon, this was precisely what I wanted to do, but using Library of Congress triple classifications instead of ISBN.

Had to abandon the idea before I even really got started on it, and it would certainly have been challenging to do this sort of "flythrough" in the 1994-1995 version of "the web".

Kudos!

dredmorbius1y ago

What are you referring to as the LoC triple classification?

Classification: <https://www.loc.gov/catdir/cpso/lcco/>

Subject headings: <https://id.loc.gov/authorities/subjects.html>

Annual reports:

- Recent: <https://www.loc.gov/about/reports-and-budgets/annual-reports...>

- Historical archive to ~1866: <https://catalog.hathitrust.org/Record/000072049>

smcin1y ago

[0]: https://en.wikipedia.org/wiki/Library_of_Congress_Classifica...

dredmorbius1y ago

The LoC classifications are, so far as I'm aware, free from distribution restrictions as works of the US government under copyright, and to that extent it's legal to distribute them for free.

However the LoC doesn't provide machine-readable data for free so far as I'm aware.

(I've not tried the WP files, though those might be more amenable to conversion.)

1 more reply

dredmorbius1y ago

Answering separately:

...I'd call it a failed US-wide attempt to standardize...

1 more reply

CRConrad1y ago

> What are you referring to as the LoC triple classification?

Lines of actually working code; lines of commented-out inactive code; lines of explanatory comments. HTH!

Naah, gotcha, the other "LoC"... But only got it on about the third occurrence.

ilamont1y ago

> a mixture of Baker & Tayler (book distributors)

PaulDavisThe1st1y ago

Good times (there weren't very many of those for me in the first 14 months) :)

layer81y ago

It’s not uncommon for an ISBN to have been assigned multiple times to different books [0]. Thus “all books in ISBN space” may be an overstatement.

[0] https://scis.edublogs.org/2017/09/28/the-dreaded-case-of-dup...

mormegil1y ago

Finnucane1y ago

Publishers buy blocks of isbns based on expected need, how the actually assign them may be arbitrary.

rsecora1y ago

Impressive presentation.

phireskyOP1y ago

Black should mostly be sections that have no assigned books

bloak1y ago

phireskyOP1y ago

It's a bit misleading I guess the way I added that feature.

keepamovin1y ago

Wow, that is really cool. What an amazing passion project and what an incredible resource!

Zooming in you can see the titles, the barcode and hovering get a book cover and details. Incredible, everything you could want!

robwwilliams1y ago

Ah, this is a perfect application for Microsift SilverLight PivotViewer, a terrific web interface we used for neuroimaging until Microsoft pulled the plug.

There is an awe inspiring TED talk by Gary W. Flake demonstrating its use.

https://m.youtube.com/watch?v=LT_x9s67yWA

And here is our IEEE paper from 2011.

Really sorry this is not a web standard.

https://www.dropbox.com/scl/fi/bl8zkjs3y47q3377hh3ya/Yan_Wil...

c-fe1y ago

Very cool visualisation!

There are more cool submissions here https://software.annas-archive.li/AnnaArchivist/annas-archiv...

Mine is at https://isbnviz.pages.dev

2551y ago

When you zoom in it's book shelves! That's so cool

MeteorMarc1y ago

Possible improvement: paperback and bounded editions are shown next to each other, but look the same. Do not know about the e-books.

greenie_beans1y ago

those would be totally different isbns. to connect the related editions you'd probably need to get something like the FBR records for each work and idk if anna's archive has related books like that?

grues-dinner1y ago

Awesome. A real life Library of Babel: https://libraryofbabel.info/

Out of all the VR vapourware, a real life infinite library or infinite museum is the one thing that could conceivably get me dropping cash.

WillAdams1y ago

Unfortunately, the writers won't see any of that for this particular implementation.

It would be far more interesting as a project which tried to make all legitimately available downloadable texts accessible, say as an interface to:

https://onlinebooks.library.upenn.edu/

araes1y ago

United States has got some weird stuff too. Never heard of them, yet Blackstone Audio, Blurb Inc., and Draft2Digital put out huge numbers of ISBNs.

phireskyOP1y ago

It is admittedly pretty noisy, which is somewhat intentional because the focus was on high data density. Here's an a bit more minimalistic view (less color, only one text level simultaneously):

https://phiresky.github.io/isbn-visualization/?dataset=all&g...

It could probably be tweaked further to not show some of the texts (the N publishers part), less stuff on hover, etc.

araes1y ago

It's ok. Not an issue, since it was meeting the request of the competition, not some commenter on the web.

[1] Book Covers Skyrim, https://www.nexusmods.com/skyrimspecialedition/mods/901?tab=...

makizar1y ago

pfedak1y ago

Actually playing around with it the behavior was very different from what I expected - there was much more zooming. Turns out I missed some parts of the zoom code:

SrTobi1y ago

Unfortunately that turned out not to work quite as well... even though the flight path was alright (although not perfect), the movement speeds were not what we wanted...

pfedak1y ago

I did finally find the duration logic when I was trying to recreate the path, I made this shader to try to compare: https://www.shadertoy.com/view/l3KBRd

1 more reply

zellyn1y ago

Or try "That Hideous Strength" by "C.S. Lewis" vs "Clive Stables Lewis", and suddenly you're arcing across a huge linear separation.

Still, given that that's what we use, this visualization is lovely. Imagine if you could open every book and read it…

Finnucane1y ago

bambax1y ago

I did find my micro-publishing house relatively easily... Very cool! ;-)

https://i.imgur.com/mhw6Mub.png

tomw18081y ago

bambax1y ago

It's not a recipe for getting rich! But it works for me, (and costs almost nothing).

phireskyOP1y ago

Huh, that text (and barcodes) are very offset from where they should be. Would you mind sharing what OS and browser you are using and if this text weirdness was temporary or all the time?

bambax1y ago

This was on Chrome 109.0.5414.120, last available version for Windows 7. (And yes I know, I know, it's very bad to still run Win7 ;-)

casey21y ago

Regarding ISBN The first section consists of a 3 digit number are issued by GS1, they have only issued 978 and 979, all other sections are issued by the International ISBN Agency.

They also gave Google tons of space but it appears none of it has been used as of yet.

JustinGoldberg91y ago

https://annas-archive.org/blog/all-isbns.html

Jun81y ago

This probably means that in the (hopefully near) future where we have extraterrestrial publishing (most likely in the Moon or Mars) we’ll need another prefix.

quink1y ago

So 978 really is Bookland, as it used to be, and Earth, but the EAN-13 namespace as a whole really does refer to Earth as well. That said, the extraterrestrials can get a prefix just the same?

celltalk1y ago

This is library of Alexendria but in digital format. Amazing work!

dark-star1y ago

This was made in response to the "challenge" posted on the Anna's Archive blog: https://annas-archive.org/blog/all-isbns.html

Although I don't know if this was the winning entry or not

c-fe1y ago

Winner not yet chosen. More cool ones are here https://software.annas-archive.li/AnnaArchivist/annas-archiv...

omoikane1y ago

Previous thread:

https://news.ycombinator.com/item?id=42652577 - Visualizing All ISBNs (2025-01-10, 139 comments)

youssefabdelm1y ago

Does anyone know if there's an API where I could plug in ISBN and get all the libraries in the world that have that book?

I know Worldcat has something like this when you search for a book, but the API, I assume is only for library institutions and I'm not a library nor an institution.

ofou1y ago

Just thank you.

https://software.annas-archive.li/AnnaArchivist/annas-archiv...

randomcatuser1y ago

Nice work!

Things I love:

- How every book has a title & link to the google books

- Information density - You can see various publishers, sparse areas of the grid, and more

- Visualization of empty space -- great work at making it look like a big bookshelf!

Improvements?

- Instead of 2 floating panels, collapse to 1

- After clicking a book, the tooltip should disappear once you zoom out/move locations!

- Sort options (by year, by publisher name)

ks20481y ago

[1] (Gitlab page) https://software.annas-archive.li/AnnaArchivist/annas-archiv...

[2] https://github.com/phiresky/isbn-visualization

artninja19881y ago

Did they get the bounty?

vallode1y ago

I believe the bounty is closed but they haven't announced the winner(s).

IOUnix1y ago

This would be absolutely incredible to incorporate into VR. You could create such an intuitive organizational method adding a 3rd dimension for displaying.g

IOUnix1y ago

This would be absolutely incredible to incorporate into VR. You could create such an intuitive organizational method adding a 3rd dimension for displaying.

pbronez1y ago

Super cool. Love that you can zoom all the way in and suddenly it looks like a bookshelf.

So… why not bring the worldcat data into the dataset?

Hnrobert421y ago

I didn't appreciate the difficulty of the Fly to Book path calculation until I read the description!

siddharthgoel881y ago

What an amazing visualization.

fnord771y ago

there's a massive block under "German Language" that's almost entirely english

https://i.imgur.com/LKDuTJP.png

godber1y ago

phireskyOP1y ago

jaakl1y ago

One more zoom LoD please: to the actual pages of the books!

Ekaros1y ago

No wonder search did not work for one book I tried. First it did not have the prefix and then the information on either the book or the database had different numbers...

ElijahLynn1y ago

Source code: https://github.com/phiresky/isbn-visualization

zeristor1y ago

I guess the 0 prefix for books in the UK of youre just had the 978 prefix added in front of it ages ago when they were running out of digits

omoikane1y ago

The 978 prefix was added to convert from ISBN-10 to ISBN-13:

https://en.wikipedia.org/wiki/ISBN#ISBN-10_to_ISBN-13_conver...

destitude1y ago

Searched for some books and didn't find them.. I assume this in't complete list of all USA based books with ISBN published?

phireskyOP1y ago

sinuhe691y ago

pillefitz1y ago

Since most foundational models have been trained on illegally acquired books, this info should be already baked in.

sinuhe691y ago

godber1y ago

This is really exceptional work, and still works on my ten year old iPad. Great job!

maCDzP1y ago

I am guessing this is going to win the bounty by Anna’s Archive?

soheil1y ago

Needs a better minimap that moves as the map is zoomed in.

tekkk1y ago

Wow, what a cool little project, congratulations on shipping!

karunamurti1y ago

Nice, I found my book in the rack somewhere.

est1y ago

I am amazed to see how many STEM books China published.

est1y ago

can the graph be rotated vertically? The text is needlessly difficult to read.

phireskyOP1y ago

Yes! You can go here to have group texts be horizontal, it was just a stylistic choice to better show the layout: https://phiresky.github.io/isbn-visualization/?groupTextVert...

compootr1y ago

hey, is this that annas archive bounty thing? Best of luck to you!

chikere2321y ago

L-space!

vivzkestrel1y ago

Imagine making a foundational LLM model out of every book every written on every subject ever conceived

ofou1y ago

It’s called DeepSeek. The founder just confirmed a few days ago that he got the data from Anna's to train on, I think for their latest vision model.

kowlo1y ago

books are missing

lovegrenoble1y ago

crazy work!

j / k navigate · click thread line to collapse