Tell HN: Adobe took down the PDF 1.7 specification from their site

114 pointssteerablesafe4y ago36 comments

I just discovered that Adobe took down the PDF 1.7 specification from their site. It's used to be hosted at [1] and I can't find a replacement. Of course this doesn't mean that the specification can't be acquired freely from elsewhere [2, 3], but it's unfortunate if the authoritative source is down. Hopefully it is a mistake though and it will be back up.

[1] http://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf

[2] https://christianhaider.de/dokuwiki/lib/exe/fetch.php?media=pdf:pdf32000_2008.pdf

[3] https://web.archive.org/web/20220309040754if_/https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf

36 comments

darrenf4y ago

It's still available at this Adobe URL: https://opensource.adobe.com/dc-acrobat-sdk-docs/standards/p... "As distributed by Adobe after adoption as ISO 32000-1:2008, with permission of ISO." [0]

Not to mention ISO unsurprisingly host it, which I would also consider authoritative: https://www.iso.org/obp/ui/#iso:std:iso:32000:-1:ed-1:v1:en

[0] https://www.loc.gov/preservation/digital/formats/fdd/fdd0002...

steerablesafeOP4y ago

Ah, thanks for the link. For some reason that link doesn't turn up in google search for me, however hard I try. Of course the ISO one is also authoritative, but not free.

darrenf4y ago

Oh, mea culpa. Honestly I just found the opening page and didn't notice it needed payment to get to the rest of it. I naively assumed that since Adobe had published it with ISO's permission, it was free in both places.

steerablesafeOP4y ago

No worries! How did you find the "opensource.adobe.com" link? I can't hit it with google, even with aggressive searches like "site:opensource.adobe.com filetype:pdf". And I can't seem to navigate there from the main page.

1 more reply

svat4y ago

Apart from the PDF32000_2008.pdf (the ISO version), Adobe used to have a pdf_reference_1-7.pdf at https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdf_ref... which is the version before it got ISO-ized. The ISO version is "substantially the same" except for typesetting and small differences in wording, but I found the Adobe version much more of a pleasure to read. Comparing them is a good exercise in how much these small differences matter.

Anunayj4y ago

kinda funny you need a pdf reader to read the pdf specification :)

Cerium4y ago

Yeah, how did they bootstrap the first pdf reader? /s

It must have been hard guessing at the spec until you could read it properly.

cwt1374y ago

Maybe they used Postscript?

mdaniel4y ago

I somehow thought early versions of the PDF spec were published as a .ps version for that very reason, but my duck-fu is failing me finding any such link. It may require wayback-fu and that's beyond my level-of-effort :-)

geodel4y ago

Its same with html spec I think.

zdw4y ago

HTML is somewhat human readable in a text editor, but PDF likely is not.

1 more reply

pointlessone4y ago

How did you find the link on opensource.adobe.com? The used to host other standards, too (e.g. font formats).

iceblockderby4y ago

The ISO released the 2.0 version of the specification that replaces the 1.7 standard.

"Although it is an open standard, one major difference compared with prior versions of PDF is that ISO now holds the copyright to the PDF specification and thus PDF 2.0 is not freely downloadable." [0]

It looks like DMCA requests are being issued to anyone that hosted the old specification, even open source projects [1].

[0] https://www.pdfa.org/resource/iso-32000-pdf/ [1] https://github.com/Hopding/pdf-lib#git-history-rewrite

mr3374y ago

Wow, I feel like that is a step back. This feels a lot like other protocols non free specs like J1939 that is over $1000USD.

wooptoo4y ago

PDF 2.0 is not cheap either: https://www.iso.org/standard/75839.html Definitely a step back.

hoofedear4y ago

Why are these documents a paid product? Are there other ways to access it? I figured standardization documentation would be free to encourage adoption.

dorianmariefr4y ago

> This page was updated on 23 March 2022 as many direct links to legacy PDF specifications on adobe.com were broken. Many links now reference the Wayback Machine internet archive and thus may be slow.

https://www.pdfa.org/resource/pdf-specification-index/

colejohnson664y ago

It's possible they're reworking their CMS and that causes files to be moved (breaking links everywhere). Microsoft loves doing that with their developer blogs.

hunter2_4y ago

Not cool [0].

It's funny how CMSes tend to offer "clean URL" configurations (meaning that everything after the origin is 100% controlled by the CMS user) for requests served dynamically (database queries) but requests served statically (public files on disk) often end up containing implementation-specific junk (e.g., "/sites/" in the case of Drupal). The magic that makes clean dynamic URLs (rewrite everything that isn't a file to the boot script) should be expanded to make clean file URLs. Serving files would then need help from a script+db, but so what, that already happens for private files.

Obviously embedded assets that need to be fast (images, stylesheets, scripts, etc.) can't have a slow db query in the way. I'm only talking about files that are a first-class destination in the browser's address bar, like PDFs, and anything where the disposition is that it lands in your Downloads folder. Stuff that might be a search result or otherwise linked-to.

[0] https://www.w3.org/Provider/Style/URI

innocenat4y ago

Drupal allow you to set private file mode, which has clean URL.

hunter2_4y ago

It's kind of clean in that it uses a URL based on a db value instead of the filename on disk, but it's still got CMS-specific junk in that it always starts with "/system/" (at least in D7, I haven't explored it in D9).

jeffreportmill14y ago

Off topic, but man is that document hard to use as a reference. Ironically, I wish they would publish it as HTML broken down by chapter and section.

(I have used that document a lot to write a custom PDF generator and parser in Java, using a downloaded copy)

fivea4y ago

> Ironically, I wish they would publish it as HTML broken down by chapter and section.

I wish there was an EPUB version of the document. Do PDFs support reflowable content?

HWR_144y ago

I believe one of the selling points of PDFs was the absolute lack of reflowing content.

hunter2_4y ago

Right, as the point is to represent a physical document, paper and ink (or canvas, toner, whatever -- stuff that doesn't reflow).

Why anyone would use such a format for these situations, where the audience definitely cares way more about consuming it on an electronic device than printing it out, is... mind-boggling.

Of course, AI+ML to the rescue: Liquid Mode [0].

> Files are processed in our secure data servers and immediately deleted from our servers after the experience is generated.

[0] https://www.adobe.com/devnet-docs/acrobat/android/en/lmode.h...

1 more reply

compressedgas4y ago

A PDF can be reflowed without reconstructive processing only if a PDF was generated as a Tagged PDF [1] and if the viewer supports reflowing.

[1]: Essentially a PDF with its own EPUB inside it, but unlike just having an attached EPUB, there is a map between the page layout of the PDF and the tags.

There are implementations of reconstructive reflowing that infer the layout block structure and reading order and can reflow a two column paper into a single column.

zozbot2344y ago

PDFs can support tables of contents with labeled chapters and sections. Not sure if the feature is standardized, but it's there.

steerablesafeOP4y ago

The specification does have a hierarchical outline, and you can click on cross references too. Of course navigation can still be cumbersome, linking to chapters can also be awkward (tip: right click on outline element and copy link works in Firefox).

There are some problems of the spec though, and navigation is not the most pressing one. The spec is huge, support for less used parts is spotty in various PDF readers. It also has inaccuracies (not corrected in errata) and underspecified parts.

layer84y ago

> hard to use as a reference

How so? I frequently reference specific sections, tables or pages of the spec at work.

andrewmcwatters4y ago

Maybe one of the side effects of this is that people only continue writing against PDF 1.7.

hulitu4y ago

Maybe they want to sell it :)

j / k navigate · click thread line to collapse

36 comments

darrenf4y ago

It's still available at this Adobe URL: https://opensource.adobe.com/dc-acrobat-sdk-docs/standards/p... "As distributed by Adobe after adoption as ISO 32000-1:2008, with permission of ISO." [0]

Not to mention ISO unsurprisingly host it, which I would also consider authoritative: https://www.iso.org/obp/ui/#iso:std:iso:32000:-1:ed-1:v1:en

[0] https://www.loc.gov/preservation/digital/formats/fdd/fdd0002...

steerablesafeOP4y ago

Ah, thanks for the link. For some reason that link doesn't turn up in google search for me, however hard I try. Of course the ISO one is also authoritative, but not free.

darrenf4y ago

steerablesafeOP4y ago

1 more reply

svat4y ago

Anunayj4y ago

kinda funny you need a pdf reader to read the pdf specification :)

Cerium4y ago

Yeah, how did they bootstrap the first pdf reader? /s

It must have been hard guessing at the spec until you could read it properly.

cwt1374y ago

Maybe they used Postscript?

mdaniel4y ago

geodel4y ago

Its same with html spec I think.

zdw4y ago

HTML is somewhat human readable in a text editor, but PDF likely is not.

1 more reply

pointlessone4y ago

How did you find the link on opensource.adobe.com? The used to host other standards, too (e.g. font formats).

iceblockderby4y ago

The ISO released the 2.0 version of the specification that replaces the 1.7 standard.

It looks like DMCA requests are being issued to anyone that hosted the old specification, even open source projects [1].

[0] https://www.pdfa.org/resource/iso-32000-pdf/ [1] https://github.com/Hopding/pdf-lib#git-history-rewrite

mr3374y ago

Wow, I feel like that is a step back. This feels a lot like other protocols non free specs like J1939 that is over $1000USD.

wooptoo4y ago

PDF 2.0 is not cheap either: https://www.iso.org/standard/75839.html Definitely a step back.

hoofedear4y ago

Why are these documents a paid product? Are there other ways to access it? I figured standardization documentation would be free to encourage adoption.

dorianmariefr4y ago

https://www.pdfa.org/resource/pdf-specification-index/

colejohnson664y ago

It's possible they're reworking their CMS and that causes files to be moved (breaking links everywhere). Microsoft loves doing that with their developer blogs.

hunter2_4y ago

Not cool [0].

[0] https://www.w3.org/Provider/Style/URI

innocenat4y ago

Drupal allow you to set private file mode, which has clean URL.

hunter2_4y ago

jeffreportmill14y ago

Off topic, but man is that document hard to use as a reference. Ironically, I wish they would publish it as HTML broken down by chapter and section.

(I have used that document a lot to write a custom PDF generator and parser in Java, using a downloaded copy)

fivea4y ago

> Ironically, I wish they would publish it as HTML broken down by chapter and section.

I wish there was an EPUB version of the document. Do PDFs support reflowable content?

HWR_144y ago

I believe one of the selling points of PDFs was the absolute lack of reflowing content.

hunter2_4y ago

Right, as the point is to represent a physical document, paper and ink (or canvas, toner, whatever -- stuff that doesn't reflow).

Why anyone would use such a format for these situations, where the audience definitely cares way more about consuming it on an electronic device than printing it out, is... mind-boggling.

Of course, AI+ML to the rescue: Liquid Mode [0].

> Files are processed in our secure data servers and immediately deleted from our servers after the experience is generated.

[0] https://www.adobe.com/devnet-docs/acrobat/android/en/lmode.h...

1 more reply

compressedgas4y ago

A PDF can be reflowed without reconstructive processing only if a PDF was generated as a Tagged PDF [1] and if the viewer supports reflowing.

[1]: Essentially a PDF with its own EPUB inside it, but unlike just having an attached EPUB, there is a map between the page layout of the PDF and the tags.

There are implementations of reconstructive reflowing that infer the layout block structure and reading order and can reflow a two column paper into a single column.

zozbot2344y ago

PDFs can support tables of contents with labeled chapters and sections. Not sure if the feature is standardized, but it's there.

steerablesafeOP4y ago

layer84y ago

> hard to use as a reference

How so? I frequently reference specific sections, tables or pages of the spec at work.

andrewmcwatters4y ago

Maybe one of the side effects of this is that people only continue writing against PDF 1.7.

hulitu4y ago

Maybe they want to sell it :)

j / k navigate · click thread line to collapse