The arXiv of the future will not look like the arXiv (opens in new tab)

(ar5iv.labs.arxiv.org)

118 pointsdginev4y ago51 comments

51 comments

dang4y ago

Related ongoing thread:

Articles from arXiv.org as responsive HTML5 web pages - https://news.ycombinator.com/item?id=30835784 - March 2022 (8 comments)

stncls4y ago

The authors first list some issues with arXiv. Next, they describe how to fix those issues. Then the good news arrives: this improved arXiv already exists. It's called Authorea.com. All three authors are Authorea.com employees. They do disclose it as their affiliation. Still, this is essentially an ad written in LaTeX.

They correctly point out a few of the limitations of arXiv (mostly: static LaTeX and PDFs). But I profoundly dislike the other things they propose:

1. "open comments and reviews". I have no problem with open reviews on a third-party website, but arXiv is literally a "distribution service". It has one job and does it pretty well. I don't want it to turn into Reddit or (worse?) ResearchGate.

2. "alternative metrics". Enough with the metrics already. We all know they're destructive, at least all that have been tried so far. I didn't even know that arXiv showed some bibliometrics (because they are thankfully hidden behind default-disabled switches). Their proposed alternatives? "How many times a paper has been downloaded, tweeted, or blogged." I am not joking, this is what they propose to include in addition to citations. Seriously???

PS: Just a heads-up to anyone who, like me, would be wondering about the ar5iv.labs.arxiv.org link. The article is a regular paper submitted to arXiv. The authors do not belong to the organization maintaining arXiv. The usual link is: https://arxiv.org/abs/1709.07020

The ar5iv.labs.arxiv.org thing is an experimental html5 paper viewer by the arXiv people.

Edit: typos.

jimhefferon4y ago

Thanks. It was not clear to me whether this is a white paper by the arXiv people, or talk by external folks.

I now see that Wikipedia says this.

Authorea was launched in February 2013 by co-founders Alberto Pepe and Nathan Jenkins and scientific adviser Matteo Cantiello, who met while working at CERN. They recognized common difficulties in the scholarly writing and publishing process. To address these problems, Pepe and Jenkins developed an online, web-based editor to support real-time collaborative writing, and sharing and execution of research data and code. Jenkins finished the first prototype site build in less than three weeks.

Bootstrapping for almost two years, Pepe and Jenkins grew Authorea by reaching out to friends and colleagues, speaking at events and conferences, and partnering with early adopter institutions.

In September 2014, Authorea announced the successful closure of a $610K round of seed funding with the New York Angels and ff Venture Capital groups. In January 2016, Authorea closed a $1.6M round of funding led by Lux Capital and including the Knight Foundation and Bloomberg Beta. It later acquired the VC-backed company The Winnower.

In 2018 Authorea was acquired for an undisclosed amount by Atypon (part of Wiley).

sdenton44y ago

I don't really see how a for-profit preprint service is desirable, given the terrible track record of other for-profit entities in academic publishing. The extra features will be great until the gatekeeping kicks in after the first missed funding round...

wolverine8764y ago

> the PDF is not a format fit for sharing, discussing, and reading on the web. PDFs are (mostly) static, 2-dimensional and non-actionable objects. It is not a stretch to say that a PDF is merely a digital photograph of a piece of paper.

It is too far a stretch, murdering the poor subject:

PDFs are the best format available for long-term information, such as research papers. They have the advantages of digital data: Searchable, copy-able, transmittable, and data is extractable. They are also an open format, don't rely on a central service to be available, and they preserve presentation across platforms. They have metadata, and are annotatable and reviewable. And the PDF format is the best for long-term preservation, carefully designed to be readable in 50 years - partly because they preserve presentation across platforms - and that includes the metadata, annotations, and reviews.

PDFs are like paper in that they will look the same 50 years from now as they do today, unlike (almost?) any other digital format.

Yes, I wish they were a bit more dynamic in layout, and that the text was more cleanly extracted.

rhn_mk14y ago

> data is extractable

That's true for plain text (in the best case), but try extracting an equation, table or a diagram.

Stepping away from best case, PDFs in theory look the same everywhere, but turn into a mess on buggy implementations or differing rendering engines – due to the insistence on having a stable presentation, they assume positining and sizing always works, so when that fails, it fails worse than a buggy rendering of a presentation-agnostic document like an HTML page.

(In my experience, bugs either enter just before printing, or when displaying using JS-based renderers).

wolverine8764y ago

> That's true for plain text (in the best case), but try extracting an equation, table or a diagram.

Good point. From what format are tables, diagrams, and formulas extractable (while retaining format)? I've had good luck moving tables between my web browser and email applications, though it always surprises me that the html is implemented similarly enough.

> PDFs in theory look the same everywhere, but turn into a mess on buggy implementations or differing rendering engines

I don't deal with PDFs programatically, and it sounds like you might, but from the user end, and from running networks of thousands of users, I've hardly ever seen problems in practice except for the browsers' JavaScript renderers.

cxr4y ago

Of all the properties you mention, that PDFs "preserve presentation across platforms" is the only one that isn't shared with responsibly wielded HTML (e.g. the sort of thing that Zotero produces when stashing a local copy—which uses SingleFile under the hood). It's also the one property that is net undesirable—being more liability than benefit.

Being sent a PDF of an academic paper to read (or do anything with other than send it to a printer) is about ten times lower on the user preference scale than having someone send a link to a blog post on the same subject. (The other reason for that being that when people are in the mode that involves writing an academic paper, they forget how to write anything that anyone would actually want to read. Most academic writing sucks.)

Of the properties you listed that PDF does share with self-contained HTML, on the other hand, there isn't one that PDF isn't worse at—not even "transmittable". (Initially I would have put them on the same level there, but of course that's wrong. When you're in an environment where for whatever reason a file copy is not an option, PDF's binary format makes it harder to transmit the bytestream than HTML.)

Who cares if a PDF looks the same everywhere if that means everyone who encounters it bounces away rather than having to slog through any attempt to actually read it?

wolverine8764y ago

> responsibly wielded HTML

That would be fantastic, but there are no available solutions that meet the specs I listed, including long-term preservation and annotation (what annotation subsystems are there for HTML?). ePub is 'responsibly wielded HTML', but it lacks annotation and long-term preservation is iffy.

I much prefer PDFs to blog posts, personally - they are mine, I can annotate them, etc. Also, I find much more thought is put into a PDF than a blog post (which both beat Twitter!).

1 more reply

j-pb4y ago

None of those virtues hold in practice. I've worked both at public library digitization efforts and machine learning companies that did document ingestion and analytics. You always OCR the PDF visuals to get the text, because that's the only thing reliable about PDF. Everything else is often wrong, broken, or non-existent.

By separating the meaning from the visual representation there is no incentive to keep the invisible data workable.

PDF might as well be replaced with SVG, in terms of rendering consistency and metadata extraction capabilities. Because for a plain vector image format it's not that impressive.

wolverine8764y ago

If I understand correctly, your comment addresses PDFs created from scanning paper. PDFs at arXiv are converted from LaTeX inputs, per the OP, and not via scanning and OCR; therefore they contain perfect renditions of the text.

>> Searchable, copy-able, transmittable, and data is extractable. They are also an open format, don't rely on a central service to be available, and they preserve presentation across platforms. They have metadata, and are annotatable and reviewable. And the PDF format is the best for long-term preservation, carefully designed to be readable in 50 years - partly because they preserve presentation across platforms - and that includes the metadata, annotations, and reviews.

> None of those virtues hold in practice.

> You always OCR the PDF visuals to get the text, because that's the only thing reliable about PDF. Everything else is often wrong, broken, or non-existent.

Which don't hold in practice? Are they not searchable? Is presentation not preserved? I use a lot of PDFs and they hold for me. PDFs are very popular, so they must work pretty well.

> SVG

Is there a standard way to do review and annotation, and is presentation preserved, for example when printing? Also, PDFs contain various image formats; do they contain SVG?

1 more reply

mixedmath4y ago

This is an advertisement for Authorea (which I'd never heard of). I extract two passages that stand out to me.

> What is the single most important factor that has prevented the arXiv to quickly innovate? We believe it is LaTeX. The same technological advancement that has allowed the arXiv to flourish, is also, incredibly, its most important shortcoming. Indeed, the reliance of the arXiv on LaTeX is the source of all the weaknesses listed below.

> The research products hosted by the arXiv are PDFs. A title, abstract, and author list are provided by the authors upon submission as metadata, which is posted alongside the PDF, and is rendered in HTML to aid article discoverability.

It's interesting to me that the authors ignore that it is possible to read the source tex for most papers on the arxiv. The arxiv prefers to be given tex and source files, and then to compile and serve the pdf --- when this is done, you can read the source. In this way the arxiv is a repository of both the plain text source of the document and a formatted output.

In some of my papers, I deliberately include comments or extra data in the source for others. I'm not alone here; I've used the code embedded in this paper [1], for example.

While I think there would be some advantages if the arxiv required all papers to be compilable tex source files, I understand that the arxiv also accepts other formats to not exclude potential writers who do not know tex. [The other formats are pdfs (e.g. converted from Word) or HTML with jpg/png/gif images (which I have never seen in practice)].

[1]: https://arxiv.org/abs/1607.07827

tempnow9874y ago

"sharing research via PDF must inevitably come to an end."

Maybe instead of using the obsolete toolset arxiv provides, they could host their groundbreaking research on their own platform? The combination of ground breaking features and insightful commentary would draw users?

Actually, many of the negatives they list are positives in my book. The latex barrier screens out a ton of garbage in my view - I'm on some social science / word based research lists, and the quality of stuff is mind bogglingly bad.

Getting stuff it fit into a PDF (instead of the NY times new scrollable story stuff) makes grabbing or print off or even reading easy - less dynamic is good in my book.

pasadenasunset4y ago

I had a similar reaction. When you combine this concept with some of the fringe open science silliness, you are essentially telling scholars (what and how) they must share, AND (what and how) they are prohibited from sharing, leaving what is essentially a forced path:

"PDFs are prohibited, especially shared in private. All data, hypotheses, references, tables, code, must be presented in formats that are conducive to steali ^H^H^H^H replication and fostering a global science ethos of sharing."

"Authors who obey will have a beautiful platinum star printed next to their author nameplate. Extra star opportunities will exist for authors who announce their papers on Twitter with required levels of irony, hipness and verve!"

bee_rider4y ago

This seems... ambitious.

I think ArXiv (edit: Actually this is not by ArXiv, but some other group) is drastically over-estimating the desire to submit papers to their service. They are popular because they host the documents you were going to produce, in the format that the journals expect. The production of a Arxiv appropriate document is a side effect of the actual job, which is writing a paper to submit to a journal (hey, I'm as unhappy as you are that this is the actual job, but everyone hates publish-or-perish, if it could be overthrown it would have been).

"Getting academics to act in a way that is not directly in their self-interest because they just love sharing information" is a usually a pretty safe bet, but I think this would be a bit too far. Unless ArXiv can somehow get journals to expect their format (good luck!) I think this is going to be hard.

stncls4y ago

The article is not at all by the arXiv people. This is just a paper submitted to arXiv (about arXiv). The confusion is understandable, because the link is to arXiv's experimental HTML5 viewer, not the usual format (which would be: https://arxiv.org/abs/1709.07020).

The authors are from Authorea.com, a for-profit that wants to replace arXiv.

Edit: Aside from that, fully agree with you. Good luck to them.

bee_rider4y ago

Ah, thanks for the correction, that really changes things!

wcerfgba4y ago

Readers may find the Octopus project interesting:

> Designed to replace journals and papers as the place to establish priority and record your work in full detail, Octopus is free to use and publishes all kinds of scientific work, whether it is a hypothesis, a method, data, an analysis or a peer review.

> Publication is instant. Peer review happens openly. All work can be reviewed and rated.

> Your personal page records everything you do and how it is rated by your peers.

> Octopus encourages meritocracy, collaboration and a fast and effective scientific process.

> Created in partnership with the UK Reproducibility Network.

https://science-octopus.org/

cxr4y ago

Similarly, the MIT-affiliated PubPub <https://www.pubpub.org/>.

cmarschner4y ago

ArXiv has this wonderful property: It works. It‘s simple. Everybody understands it. It is steady. People build tools around it (e.g. arxiv-sanity.com, Google scholar…) which make it even more useful. It has spin-offs like biorxiv that are catered to those communities. It is like a piece of infrastructure. Who cares aboud doi’s when the arxiv URL is already the standard? Yes it has its disadvantages, but none of them seems to justify to me to turn it upside down. Rather one could just add things, slowly, just like they‘re doing it.

lazyjeff4y ago

I read this article the other day, "There are four schools of thought on reforming peer review" [1] about how there's four schools of thought about how to reform publishing and peer review. Each of them independently are fairly well received and makes sense in itself, at least among my academic circles. However, there are tensions between them, so it's hard to come up with a solution that's universally satisfying to even the majority of stakeholders.

This article about ArXiv is clearly in the "Democracy and Transparency school" as categorized article, but it doesn't yet address the other three camps. The arxiv article proposes machine-readable semantics, easier sharing and discoverability, papers + supplementary materials + reviews all open; this floods the world with even more publications with varying quality, so it's even harder to identify good quality work; and when things can be more easily aggregated by machines and measured with the alternative metrics proposed, it often leads to a more powerful winner-takes-all system that can be gamed (there's now a subtle game of increasing citations that appear on Google Scholar); finally, with an increase in submissions and materials that go along with submissions, it puts an even greater strain on the review system. These problems are not unsolvable, but almost every idea I've seen proposed so far has only been in a single camp, and there's side effects that harm the goals of the other three camps. So I'd love to see more ideas that balance the interests of all four camps that want to reform peer review and publishing.

[1]: https://blogs.lse.ac.uk/impactofsocialsciences/2022/03/24/th...

SkyMarshal4y ago

@dginev, as you guys upgrade arxiv with HTML5 tech, keep in mind you may be in a position to further implement some of Brett Victor's thinking on "explorable explanations":

http://worrydream.com/ExplorableExplanations/

http://worrydream.com/MediaForThinkingTheUnthinkable/

http://worrydream.com/ScientificCommunicationAsSequentialArt...

etc.

curiousgal4y ago

I don't usually read long articles on my phone but the design of that page on my Pixel 6 was just so perfect! I hope this becomes the norm!

periheli0n4y ago

This is precisely their point. Reading the usual Arxiv-PDF on a phone is a pain, even if you just want to glance at some key parts of the text. Their version is much, much better. It's self-promotion by the Authorea team on the platform they are competing with (ArXiv), but they have a point.

Arxiv needs to go HTML.

stncls4y ago

But the article link is arXiv's own (admittedly experimental) HTML5 viewer!! And your parent comment is praising it.

akvadrako4y ago

It's fascinating to imagine what the arxiv of the future would look like.

I imagine all scientific publications available on a distrusted block store, including raw emails, data and notes on a voluntary basis.

Stuff that could be published would include reviews, corrections in version control fashion, and enough metadata to model scientific progress.

What this article is describing sounds reasonable but not game changing.

pasadenasunset4y ago

Voluntary basis? You're an optimist, I see.

azangru4y ago

> The arXiv of the future is format-neutral and separates format from content.

Didn't this use to be Latex's tagline? Separate format from content. Which the authors of the article don't find separate enough.

How does the proper separation of format from content even work? Don't you need to markup your content in order for it to become formatted?

ssivark4y ago

It’s fairly well separated from the perspective of being able to write content fairly agnostic of a presentation template, and then swap in the required publication template in the end (with a few cosmetic tweaks very occasionally).

But LaTeX is largely an extension of TeX, and these markup languages seem not very amenable to re-implementing parsing / automated processing (given numerous attempts that have resulted in stalemates).

chaxor4y ago

One thing I would love to see from the arxiv sites is a publicly available download of an SQLite database. They have a bunch of PDFs, and latex source - but the real killer would be a database with just the text for each section, and then the ability to generate* the pdf, using various different styles. This would save an enormous amount of space, and make things far more tidy. I suppose the images could be stored in the SQLite as blobs, but there's probably a better way with vector dbs or something.

That's what the future will probably look like. With the SQLite decentralized on IPFS or torrent, where only queries get stored on each computer, making more popular queries faster to load (more peers).

*(or maybe an archive of a tons of zstd parquets for each table? - Not sure what the best way to organize several tables in parquet is yet)

enriquto4y ago

> This would save an enormous amount of space, and make things far more tidy.

Why? The output pdf is typically smaller than the input that produces it. Using rendered pdfs seems simple and very natural, and at worst can use twice the total amount of space.

chaxor4y ago

Instead of storing both pdf and source text, just store the source. The pdf is generated on demand, in whatever style you like.

Although, I had no idea PDFs were smaller than the input. I thought that they were substantially larger actually. But regardless, storing things twice is wasteful.

mistrial94y ago

for the ability to modify (forge?) contents you need the sources

gsvclass4y ago

At https://42papers.com/ we want to get more folks reading papers our focus in on surfacing papers from arXiv that our community would appreciate so we focus on trending papers, improving readability, etc

kkfx4y ago

A small proposal: why not a PopcornTime of papers? Witch means a distributed network (no matter if BitTorrent, ZeroNet, GNUNet, I2P or something else) to publish? That's the best freedom guarantee and just the mere number of nodes with a paper is a good metric about it's popularity, to avoid oblivion each uni/researcher can easily store and serve their own papers forever: files are small, so download is quick, not much resources are needed.

PeterisP4y ago

What problems does it solve for the authors? The features you describe above don't seem a problem in the current solutions; freedom and availability is a non-issue for authors, "to avoid oblivion each uni/researcher can easily store and serve their own papers forever" is a flaw not a feature (there are already far too many ways to do that, which only add extra burden to the authors if they want to "be everywhere" for the sake of availability), it doesn't seem that it would be easier than the current way; the resources/effort needed would be small but non-zero, so it sounds like just an extra annoyance, not something beneficial.

And if it solves some problems for someone else but not the authors, then how would a comprehensive majority of papers enter the system? Papers are even less interchangeable than movies; if you want to have a particular movie and it isn't available on PopcornTime, you might watch something else, for papers you just have to go elsewhere that actually does have everything.

pasadenasunset4y ago

This would and should be terrifying to any mid-career academic. The last thing needed is a complicated solution that solves no direct problem, YET offers plenty of "metrics" that one can attach all sorts of labels to, like "popularity".

Can you imagine some of the minds on academic Twitter holding a poll on article popularity? <SHUDDER> Leave science to the foul-tempered misanthropes, I say! j/k

1 more reply

surfTide4y ago

Original submission: https://arxiv.org/abs/1709.07020

> [Submitted on 20 Sep 2017]

Shouldn't this be "(2017)" - original article was submitted in 2017.

0lmer4y ago

I'm still wandering about a service that would be to arXiv what Github became to Sourceforge. Order of magnitude improvement of collaboration and interconnection between published materials.

gammarator4y ago

[2017]

surfTide4y ago

> [Submitted on 20 Sep 2017]

https://arxiv.org/abs/1709.07020

hamiltonians4y ago

ssrn is better. no need for endorsements to get stuff published.

einpoklum4y ago

They lost me at suggesting that a future ArXiv should be

> Web-native and web-first

Absolutely not. It should be "physical paper first". Any long-term archiving cannot rely on electrical devices for viewing archived material. Electrical grids fail. Technology changes. Even if ArXiv is not a print archive, the material in it must be, first and foremost, printable in a consistent manner, and with the authors targeting the physical printed form. Of course, one would need to actually print ArXiv items to physically archive them, but still.

Now, of course archiving data is useful and important; and large amounts of data are less appropriate for print archiving. But that should always be secondary to the archiving on knowledge.

j / k navigate · click thread line to collapse

51 comments

dang4y ago

Related ongoing thread:

Articles from arXiv.org as responsive HTML5 web pages - https://news.ycombinator.com/item?id=30835784 - March 2022 (8 comments)

stncls4y ago

They correctly point out a few of the limitations of arXiv (mostly: static LaTeX and PDFs). But I profoundly dislike the other things they propose:

The ar5iv.labs.arxiv.org thing is an experimental html5 paper viewer by the arXiv people.

Edit: typos.

jimhefferon4y ago

Thanks. It was not clear to me whether this is a white paper by the arXiv people, or talk by external folks.

I now see that Wikipedia says this.

Bootstrapping for almost two years, Pepe and Jenkins grew Authorea by reaching out to friends and colleagues, speaking at events and conferences, and partnering with early adopter institutions.

In 2018 Authorea was acquired for an undisclosed amount by Atypon (part of Wiley).

sdenton44y ago

wolverine8764y ago

It is too far a stretch, murdering the poor subject:

PDFs are like paper in that they will look the same 50 years from now as they do today, unlike (almost?) any other digital format.

Yes, I wish they were a bit more dynamic in layout, and that the text was more cleanly extracted.

rhn_mk14y ago

> data is extractable

That's true for plain text (in the best case), but try extracting an equation, table or a diagram.

(In my experience, bugs either enter just before printing, or when displaying using JS-based renderers).

wolverine8764y ago

> That's true for plain text (in the best case), but try extracting an equation, table or a diagram.

> PDFs in theory look the same everywhere, but turn into a mess on buggy implementations or differing rendering engines

cxr4y ago

Who cares if a PDF looks the same everywhere if that means everyone who encounters it bounces away rather than having to slog through any attempt to actually read it?

wolverine8764y ago

> responsibly wielded HTML

I much prefer PDFs to blog posts, personally - they are mine, I can annotate them, etc. Also, I find much more thought is put into a PDF than a blog post (which both beat Twitter!).

1 more reply

j-pb4y ago

By separating the meaning from the visual representation there is no incentive to keep the invisible data workable.

PDF might as well be replaced with SVG, in terms of rendering consistency and metadata extraction capabilities. Because for a plain vector image format it's not that impressive.

wolverine8764y ago

> None of those virtues hold in practice.

> You always OCR the PDF visuals to get the text, because that's the only thing reliable about PDF. Everything else is often wrong, broken, or non-existent.

Which don't hold in practice? Are they not searchable? Is presentation not preserved? I use a lot of PDFs and they hold for me. PDFs are very popular, so they must work pretty well.

> SVG

Is there a standard way to do review and annotation, and is presentation preserved, for example when printing? Also, PDFs contain various image formats; do they contain SVG?

1 more reply

mixedmath4y ago

This is an advertisement for Authorea (which I'd never heard of). I extract two passages that stand out to me.

In some of my papers, I deliberately include comments or extra data in the source for others. I'm not alone here; I've used the code embedded in this paper [1], for example.

[1]: https://arxiv.org/abs/1607.07827

tempnow9874y ago

"sharing research via PDF must inevitably come to an end."

Getting stuff it fit into a PDF (instead of the NY times new scrollable story stuff) makes grabbing or print off or even reading easy - less dynamic is good in my book.

pasadenasunset4y ago

bee_rider4y ago

This seems... ambitious.

stncls4y ago

The authors are from Authorea.com, a for-profit that wants to replace arXiv.

Edit: Aside from that, fully agree with you. Good luck to them.

bee_rider4y ago

Ah, thanks for the correction, that really changes things!

wcerfgba4y ago

Readers may find the Octopus project interesting:

> Publication is instant. Peer review happens openly. All work can be reviewed and rated.

> Your personal page records everything you do and how it is rated by your peers.

> Octopus encourages meritocracy, collaboration and a fast and effective scientific process.

> Created in partnership with the UK Reproducibility Network.

https://science-octopus.org/

cxr4y ago

Similarly, the MIT-affiliated PubPub <https://www.pubpub.org/>.

cmarschner4y ago

lazyjeff4y ago

[1]: https://blogs.lse.ac.uk/impactofsocialsciences/2022/03/24/th...

SkyMarshal4y ago

@dginev, as you guys upgrade arxiv with HTML5 tech, keep in mind you may be in a position to further implement some of Brett Victor's thinking on "explorable explanations":

http://worrydream.com/ExplorableExplanations/

http://worrydream.com/MediaForThinkingTheUnthinkable/

http://worrydream.com/ScientificCommunicationAsSequentialArt...

etc.

curiousgal4y ago

I don't usually read long articles on my phone but the design of that page on my Pixel 6 was just so perfect! I hope this becomes the norm!

periheli0n4y ago

Arxiv needs to go HTML.

stncls4y ago

But the article link is arXiv's own (admittedly experimental) HTML5 viewer!! And your parent comment is praising it.

akvadrako4y ago

It's fascinating to imagine what the arxiv of the future would look like.

I imagine all scientific publications available on a distrusted block store, including raw emails, data and notes on a voluntary basis.

Stuff that could be published would include reviews, corrections in version control fashion, and enough metadata to model scientific progress.

What this article is describing sounds reasonable but not game changing.

pasadenasunset4y ago

Voluntary basis? You're an optimist, I see.

azangru4y ago

> The arXiv of the future is format-neutral and separates format from content.

Didn't this use to be Latex's tagline? Separate format from content. Which the authors of the article don't find separate enough.

How does the proper separation of format from content even work? Don't you need to markup your content in order for it to become formatted?

ssivark4y ago

chaxor4y ago

*(or maybe an archive of a tons of zstd parquets for each table? - Not sure what the best way to organize several tables in parquet is yet)

enriquto4y ago

> This would save an enormous amount of space, and make things far more tidy.

Why? The output pdf is typically smaller than the input that produces it. Using rendered pdfs seems simple and very natural, and at worst can use twice the total amount of space.

chaxor4y ago

Instead of storing both pdf and source text, just store the source. The pdf is generated on demand, in whatever style you like.

Although, I had no idea PDFs were smaller than the input. I thought that they were substantially larger actually. But regardless, storing things twice is wasteful.

mistrial94y ago

for the ability to modify (forge?) contents you need the sources

gsvclass4y ago

kkfx4y ago

PeterisP4y ago

pasadenasunset4y ago

Can you imagine some of the minds on academic Twitter holding a poll on article popularity? <SHUDDER> Leave science to the foul-tempered misanthropes, I say! j/k

1 more reply

surfTide4y ago

Original submission: https://arxiv.org/abs/1709.07020

> [Submitted on 20 Sep 2017]

Shouldn't this be "(2017)" - original article was submitted in 2017.

0lmer4y ago

I'm still wandering about a service that would be to arXiv what Github became to Sourceforge. Order of magnitude improvement of collaboration and interconnection between published materials.

gammarator4y ago

[2017]

surfTide4y ago

> [Submitted on 20 Sep 2017]

https://arxiv.org/abs/1709.07020

hamiltonians4y ago

ssrn is better. no need for endorsements to get stuff published.

einpoklum4y ago

They lost me at suggesting that a future ArXiv should be

> Web-native and web-first

Now, of course archiving data is useful and important; and large amounts of data are less appropriate for print archiving. But that should always be secondary to the archiving on knowledge.

j / k navigate · click thread line to collapse