So then it's JSON, and I'll treat it as any other JSON: a document that is either an object or an array, that can include other objects or arrays, as well as numbers and strings. Property names doesn't matter, nor do order of properties or array items, or whatever values are contained therein.
Please don't try to overload media types like this. Atom isn't served as `application/xml` precisely because it isn't XML; it's served as `application/atom+xml`. For a media type that is JSON-like but isn't JSON, you may wish to look at `application/hal+json`; incidentally there's also `application/hal+xml` for the XML variant.
Or as someone else rightly suggested, consider just using JSON-LD.
"I am a valid JSON document. So is the Number below, and in fact every line below this line."
4
null
Now it's also true that JSON doesn't specify an entity that can be either an object or an array but not be a string or a bool or a number or null. So it's kind of true that JSON doesn't say that an object or array are valid root elements.
But JSON also says "JSON is built on two structures" - arrays and objects. It defines those two structures in terms of 'JSON values'. But it's a reasonable way to read the JSON spec to say that it defines a concept of a 'JSON structure' as an array or object - but not a plain value. And then to assume that a .json file contains a JSON 'structure'.
Basically... JSON's just not as well defined a standard as you might hope.
edit: And now I'm going to well actually myself: Turns out https://tools.ietf.org/html/rfc4627 defines a thing called a 'JSON text' which is an array or an object, and says that a 'JSON text' is what a JSON parser should be expected to parse.
So - pick a standard.
Alas, if only that were true.
RFC 4627:
> A JSON text is a serialized object or array. The MIME media type for JSON text is application/json.
RFC 7159:
> A JSON text is a serialized value. Note that certain previous specifications of JSON constrained a JSON text to be an object or an array. Implementations that generate only objects or arrays where a JSON text is called for will be interoperable in the sense that all implementations will accept these as conforming JSON texts.
IIRC, Ruby's JSON parser was written to be strictly RFC 4627 compliant, and yields a parser error for non-array non-object texts.
Since JSON isn't versioned so no one has any idea what "JSON" really means, or what "standard" is being followed.
Someone filed an issue and created a pull-request for this after you wrote this comment.
https://github.com/brentsimmons/JSONFeed/issues/22
https://github.com/brentsimmons/JSONFeed/pull/23
I hope they will merge it.
This is the same difference in schools that I express in a different comment [3] in this thread.
[1] https://tools.ietf.org/html/rfc6839 [2] https://tools.ietf.org/html/rfc3023#appendix-A [3] https://news.ycombinator.com/item?id=14361842
But if it were using XHTML, then the proper mime type would be application/xhtml+xml.
So the server could support all of the following:
application/jsonfeed
application/rss+xml
application/atom+xml
Who knows, maybe RSS and ATOM could be represented in JSON and have the following mime types:
application/rss+json
application/atom+json
If it's just an API response, and it is your API for an application called Widget Factory, then you can, if you want, have your own format:
application/vnd.widgetfactory+json
Generally, defining such a mime type should have some specification describing it otherwise no client can reliably implement a compatible client. Jsonfeed have proposed that specification.
Not sure why you want to emulate XML namespaces in JSON, but JSON schemas can include other JSON schemas and extend upon other JSON schemas. That accounts for 99.9% of the use cases for namespaces.
If this industry has a problem, it's FDD - Fad Driven Development and IIICIS (If It Isn't Cool, It Sucks) thinking.
I think with something like feeds there's the possible benefit of becoming a 'hello world' for frameworks. Many frameworks have you write a simple blogging engine or twitter copycat. I don't think I've ever seen that for a feed reader/publisher. People have said that Twitter clients were an interesting playground for new UI concepts and paradigms because the basics were so simple (back when their API keys were less restrictive). Maybe this could be that?
Maybe it's just that I work mostly with JVM languages (Java, Groovy, etc.) but I haven't had any problems with handling XML - including Atom - in years. But I admit that other platforms might not have the same degree of support.
Otherwise, I agree with the "if it ain't broke" principle. There's also cases where so much ad hoc complexity is built on top of JSON that you end up with the same problems XML has, except with less battle-tested implementations.
xmllint --xpath '//element/@attribute'
There's a good chance it's already installed on your mac.I agree that jq is really nice though. In particular, I still find JSON nicer than XML in the small-scale (e.g. scripts for transforming ATOM feeds) because:
- No DTDs means no unexpected network access or I/O failures during parsing
- No namespaces means names are WYSIWYG (no implicit prefixes which may/may not be needed, depending on the document)
- All text is in strings, rather than 'in between' elements
- No redundant element/attribute distinction
Even with tooling, these annoyances with XML leak through. As an example, xmlstarlet can find the authors in an ATOM file using an XPath query like '//author'; except if the document contains a default namespace, in which case it'll return no results since that XPath isn't namespaced.
This sort of silently-failing, document-dependent behaviour is really frustrating; requiring two branches (one for documents with a default-namespace, one for documents without) and text-based bash hackery to look for and dig out any default namespace prior to calling xmlstarlet :(
Basically, XML is to JSON as SOAP is to REST. It had it's day, though it's obviously still useful, but we have better tools now. Frankly, I'm surprised we haven't seen a proposal like this sooner.
That's true. Both XML and SOAP are well defined, and well structured.
JSON and REST are both marginally defined, and thus we see constant incompatible/incomplete implementations, or weird hacks to overcome the shortcomings.
> we have better tools now
I think "the cool kids are cargo-culting something newer now" is probably more accurate.
And the other part of me is not with you - manipulating XML is not as easy as JSON in most of my development time, and sometimes I even need to write something by my bare hands, which JSON is much more handy. Tons of other formats are more human-friendly than JSON, for example TOML, but they don't have the status JSON has. So I guess JSON is kinda choice under the current state of "web development times".
> JSON is simpler to read and write, and it’s less prone to bugs.
* Badly formed XML? Check. There might be badly formed JSON, but I tend to think it'll be a lot less likely.
* Need to continually poll servers for updates? Miss. Without additions to enable pubsub, or dynamic queries, clients are forced to use HTTP headers to check last updates, then do a delta on the entire feed if there is new or updated content. Also, if you missed 10 updates, and the feed only contains the last 5 items, then you lose information. This is the nature of a document-centric feed meant to be served as a static file. But it's 2017 now, and it's incredibly rare that a feed isn't created dynamically. A new feed spec should incorporate that reality.
* Complete understanding of modern content types besides blog posts? Miss. The last time I went through a huge list of feeds for testing, I found there were over 50 commonly used namespaces and over 300 unique fields used. RSS is used for everything from search results to Twitter posts to Podcasts... It's hard to describe all the different forms of data it can be contain. The reason for this is because the original RSS spec was so minimal (there's like 5 required fields) so everything else has just been bolted on. JSONFeed makes this same mistake.
* An understanding that separate but equal isn't equal. Miss. The thing that http://activitystrea.ms got right was the realization that copying content into a feed just ends up diluting the original content formatting, so instead it just contains metadata and points to the original source URL rather than trying to contain it. If JSONFeed wanted to really create a successor to RSS, it would spec out how to send along formatting information along with the data. It's not impossible - look at what Google did with AMP: They specified a subset of formatting options so that each article can still contain a unique design, but limited the options to increase efficiency and limit bugs/chaos.
This stuff is just off the top of my head. If you're going to make a new feed format in 2017, I'm sorry but copying what came before it and throwing it into JSON just isn't enough.
The real challenge these days is to replicate the solutions Facebook and Twitter brought to feeds (bidirectionality and data-retention in particular) in a decentralised manner that could actually become popular. Simply replicating RSS in the data-format du jour is not going to achieve that.
This is backwards, imo. The advantage of polling over pub sub is that all complexity is offloaded to the client. This comes with its own set of problems (inefficiency of reinventing the wheel across all clients, plus every client will implement that complexity differently resulting in countless bugs), but this is what drives adoption, which as someone else here has pointed out is all that matters. If you want adoption, you seemingly need to sacrifice a lot of efficiency in favour of making it stupidly easy to publish.
The "it's 2017 now" argument doesn't really address that even with dynamically generated content, you still need every dynamic serverside platform to adopt and implement your spec independently. Static is always easier. (plus with the recent trend towards static sites, "it's 2017 now" actually has the opposite implication).
It's a shame that ActivityStrea.ms hasn't had more uptake. We've added support in our enterprise social network product and think it enables some cool scenarios. But unfortunately too few other products support it these days.
The point of these syndication formats (RSS, Atom, now this) was always to act as the "I'm a static site and webhooks don't exist, so poll me" equivalent of webhooks. These "pretending to be webooks" were supposed to hook into a whole ecosystem of syndication middleware that turned the feeds into things like emails.
And that—the output-products of the middleware—was what people were supposed to consume, and what sites were meant to offer people to consume. The feed, as originally designed, was not intended for client consumption. That's why the whole model we have today, where millions of "feed-reader" clients poll these little websites that could never stand up to that load, seems so silly: it wasn't supposed to be the model. RSS feeds were supposed to be a way for static-ish content to "talk to" servers that would do the syndicating for them; not a format for clients to receive notifications in.
(And we already had a format for clients to receive notifications in: MIME email. There's no reason you can't add another MIME format beyond text/plain and text/html; and there's no reason you can't create an IMAP "feed-reader" that just filters your inbox to display only the messages containing application/rss+xml representations, and set up your regular inbox to filter out those same messages. And some messages would contain both representations, so you'd see e.g. newsletters as both text in your email client and as links in your feed client, and archiving them in one would do the same in the other, since they're the same message.)
---
The big problem I have with feeds (besides that people are using them wrong, as above) is that they have no "control-channel events" to notify a feed-consumer of something like e.g. the feed relocating to a new URL.
Right now, many feeds I follow just die, never adding a new feed item, and the reason for that is that, unbeknownst to me, the final item in the feed (that I never saw because it rotted away after 30 days, or because I "declared inbox zero" on my feeds, or whatever else) was a text post by the feed's author telling everyone to follow some new feed instead.
And other authors don't even bother with that; they use a blogging framework that generates RSS, but they're maybe not even aware that it does that for them, so instead they tell e.g. their Twitter followers, or Twitch subscribers, that they're moving to a new website, but their old website just sits there untouched forever-after, never receiving an update to point to the new site which would end up in the RSS feed my reader is subscribed to. It's nonsensical.
(And don't get me started on the fact that if you follow a Tumblr blog's RSS feed, and the blog author decides to rename their account, that not only kills the feed, but also causes all the permalinks to become invalid, rather than making them redirect... Tumblr isn't alone in this behavior, but Tumblr authors really like renaming their accounts, so you notice it a lot.)
There was also a typical Dave-Wineresque invention of replacing the old feed with some special, non-namespaced XML with the redirect: http://www.rssboard.org/redirect-rss-feed
But of course the real problem is social. As in people simply stop blogging or stop caring. And of course tool developers don't care if someone doesn't want to use their software anymore and don't think of developing the right buttons for this edgecase.
http://scripting.com/stories/2012/09/10/rssInJsonForReal.htm...
oh of course: https://xkcd.com/927/
(and I realize this doesn't exactly map, as JSON Feed isn't even trying to cover all the usecases of Atom or RSS, just switching the container format)
It's true that JSON is easier to deal with than XML. But that's relative, there are plenty of decent tools around RSS. From readers, to libraries in the most common programming languages, and extensions in the most common content management systems. JSON is slightly easier to read for human (although that's subjective), but then how often do you need to read the RSS feed manually, unless you are the one who is writing those libraries, etc. But that's a tiny share of all people using RSS.
>>> It reflects the lessons learned from our years of work reading and publishing feeds.
Sounds like the author(s) has extensive experience in this field and knows things better than some random person on the internet (me). But the homepage of the project doesn't convey those learned lessons.
However, SGML and XML were invented as structured markup languages for authoring of rich text documents by humans, for which JSON is unsuited and sucks just as much as XML sucks for APIs.
Edit: though XML has its place in many b2b and business-to-government data exchanges (financial and tax reporting, medical data exchange, and many others) where a robust and capable up-front data format specification for complex data is required
(feed
(version https://jsonfeed.org/version/1)
(title "My Example Feed")
(home-page-url https://example.org)
(feed-url https://example.org/feed.json)
(items
(item (id 2)
(content-text "This is a second item.")
(url https://example.org/second-item))
(item (id 1)
(content-html "<p>Hello, world!</p>")
(url https://example.org/initial-post))))
This looks much nicer IMHO than their first example: {
"version": "https://jsonfeed.org/version/1",
"title": "My Example Feed",
"home_page_url": "https://example.org/",
"feed_url": "https://example.org/feed.json",
"items": [
{
"id": "2",
"content_text": "This is a second item.",
"url": "https://example.org/second-item"
},
{
"id": "1",
"content_html": "<p>Hello, world!</p>",
"url": "https://example.org/initial-post"
}
]
}https://github.com/edn-format/edn
Example:
https://github.com/milikicn/activity-stream-example/blob/4db...
Not S-expression-based, though.
version: https://jsonfeed.org/version/1
title: "My Example Feed"
home-page-url: https://example.org
feed-url: https://example.org/feed.json
items: [
[
id: 2
content-text: "This is a second item."
url: https://example.org/second-item
]
[
id: 1
content-html: "<p>Hello, world!</p>"
url: https://example.org/initial-post
]
]If you really want to do a hash table, you could represent it as an alist:
(things
(key1 val1)
(key2 val2))
This all works because — whether using JSON, S-expressions or XML — ultimately you need something which can make sense of the parsed data structure. Even using JSON, nothing prevents a client submitting a feed with, say, a homepage URL of {"this": "was a mistake"}; just parsing it as JSON is insufficient to determine if it's valid. Likewise, an S-expression parser can render the example, but it still needs to be validated. One nice advantage of the S-expression example is that there's an obvious place to put all the validation, and an obvious way to turn the parsed S-expression into a valid object.There is one pretty damn solid SSAX parser by Kiselyov that has been ported to just about every scheme out there. It is interesting since it doesn't do the whole callback thing of most ssax parsers, but is implemented as a tree fold over the xml structure.
* In all cases (feed and items), the author field should be an array to allow for feeds with more than one author (for instance, a podcast might want to use this field for each of its hosts, or possibly even guests).
* external_url should probably be an array, too, in case you want to refer to multiple external resources about a specific topic, or in the case of a linkblog or podcast that discusses multiple topics, it could link to each subtopic.
* It might be nice if an item's ID could be enforced to a specific format, even if perhaps only within a single feed. Otherwise it's hard to know how to interpret posts with IDs like "potato", 1, null, "http://cheez.burger/arghlebarghle"
I'm going to pretend this is about music artists in a music library, but the logic is exactly the same for podcast hosts:
You tend to want fields like this to be singular, so that the field can be used in collation (i.e. "sort by artist.")
If you have multiple artists for a track, usually one can be designated the "primary" artist—the one that people best know, and would expect to find the track listed under when looking through their library. Usually, then, the rest get tacked on in the field in a freeform, maybe comma-and-space delimited fashion. The field isn't a strict strongly-typed references(Person) field, after all; it's just freeform text describing the authorship.
But as for hosts vs. guests, that's a whole can of worms. Look at the ID3 standard. Even though music library-management programs usually just surface an "Artist" field, you've actually got all of these separate (optional) fields embedded in each track:
• TCOM: Composer
• TEXT: Lyricist/Text writer
• TPE1: Lead performer(s)/Soloist(s)
• WOAR: Official artist/performer webpage
• TPE2: Band/orchestra/accompaniment
• TPE3: Conductor/performer refinement
• TPE4: Interpreted, remixed, or otherwise modified by
• TENC: Encoded by
• WOAS: Official audio source webpage
• TCOP: Copyright message
• WPUB: Publishers official webpage
• TRSN: Internet radio station name
• TRSO: Internet radio station owner
• WORS: Official internet radio station homepage
That gives you separate credits for pretty much the entire composition, production and distribution flow, which usually means that each field only needs one entry.
Would be great if people used them, wouldn't it? Maybe the semi-standard "A feat. B (C remix)" microformat could be parsed into "[TPE2] feat. [TPE1] ([TPE4])"...
Also, despite the fact this is technically not the responsibility of the spec itself, I would strongly suggest some words on the implications of the fact that the HTML fields are indeed HTML and the wisdom of passing them through some sort of HTML filter before displaying them.
In fact that's also part of why I suggest going ahead and letting titles contain HTML. All HTML is going to need to be filtered anyhow, and it's OK for clients to filter titles to a smaller valid tag list, or even filter out all tags. Suggesting (but not mandating) a very basic list of tags for that field might be a good compromise.
I agree that I see HTML in RSS titles, but I rather have the occasional garbled title that the author can fix by striping out HTML before the RSS than ensuring that every RSS reader isn't opening up new security holes.
Wow. Now that's confidence. Have you ever read the first version of a spec and thought, "That's just perfect. Any additional changes would just be a disappointment compared with the original"?
But MIDI doesn't really fit that description since it builds on 2 years of work by Roland. My best bet though.
As far as scenarios where it's feasible to get the answer right the first time go, this is a reasonably realistic one.
EDIT: Also, if you scroll to the bottom of the page you can see they have let a whole bunch of people look at the spec before releasing it, so there has been at least some peer review.
Less prone to bugs? How's that?
Parsing JSON is a minefield.
Yellow and light blue boxes highlight the worst situations for applications using the specified parser. Take a look at how a bunch of parsers perform with various payloads: http://seriot.ch/json/pruned_results.png
"JSON is the de facto standard when it comes to (un)serialising and exchanging data in web and mobile programming. But how well do you really know JSON? We'll read the specifications and write test cases together. We'll test common JSON libraries against our test cases. I'll show that JSON is not the easy, idealised format as many do believe. Indeed, I did not find two libraries that exhibit the very same behaviour. Moreover, I found that edge cases and maliciously crafted payloads can cause bugs, crashes and denial of services, mainly because JSON libraries rely on specifications that have evolved over time and that left many details loosely specified or not specified at all."
More details available at: http://seriot.ch/parsing_json.php
No it doesn't. XML is either well formed or not, and any parser encountering non well-formed XML will reject it outright.
Therefor all XML in use on the internet is spec-compliant.
Now try to say the same about JSON.
One example of a bug that often festered in XML parsers: https://en.wikipedia.org/wiki/Billion_laughs (there is no JSON equivalent of this)
The generalized theory, for those interested : https://en.wikipedia.org/wiki/Rule_of_least_power
> simpler to read and write
I've written a reasonably-popular podcast feed validator, and I don't understand either of these criticisms. Mind elaborating?
So I'm hoping for JSON-LD Feed 1.1 and a new war of format battles. Maybe we can even get Mark Pilgrim out of hiding!
More seriously, it's sad so to see that almost 20 years later, the dream of a decentralised and bidirectional web is in even worse shape than it was back then.
EDIT: Because I get downvoted despite stating my opinion on the topic, I adjusted the statement.
Example below filters out all URLs for a specific section of the paper.
test $# = 1 ||exec echo usage: $0 section
curl -o 1.json https://static01.nyt.com/services/json/sectionfronts/$1/index.jsonp
exec sed '/\"guid\" :/!d;s/\",//;s/.*\"//' 1.json
I guess SpiderBytes could be used for older articles?Personally, I think a protocol like netstrings/bencode is better than JSON because it better respects the memory resources of the user's computer.
Every proposed protocol will have tradeoffs.
To me, RAM is sacred. I can "parse" netstrings in one pass but I have been unable to do this with a state machine for JSON. I have to arbitrarily limit the number of states or risk a crash. As easy as it is to exhaust a user's available RAM with Javascript so too can this be done with JSON. Indeed they go well together.
I'm currently creating an API where I'm asking devs to post JSON rather than a bunch of separate parameters, but I haven't seen this done in other APIs (if you have, can you point me to a few examples?). I'm curious what others thoughts are on this. It seems that with GraphQl, we're maybe starting to move in this direction.
I think that images and urls would do well as order lists rather than as individual values. at the top level you have 3 urls and an array for hubs. with type and url you could have an array for hubs and the urls. same could be done for images at the top level and both again at the item level.
But even more frustrating is when a format comes out that's close to being a faithful translation of an established format, but makes small, incompatible changes that push the burden of faithful translation onto content authors, or the makers of third-party libraries.
I honestly don't intend to offer harsh targeted critique against the authors -- I assume good faith; more just voicing exasperation. There have been similar attempts over the years -- one from Dave Winer, the creator of RSS 0.92 and RSS 2.0, called RSS.js [1], which stoked some interest at first [2]; others by devs working in isolation without seeming access to a search engine and completely unaware of prior art; some who are just trying something unrelated and accidentally produce something usable [3]; finally, this question pops up from time to time on forums where people with an interest in this subject tend to congregate [4]. Meanwhile, real standards-bodies are off doing stuff that reframes the problem entirely [5] -- which seems out-of-touch at first, but I'd argue provides a better approach than similar-but-not-entirely-compatible riff on something really old.
And as a meta, "people who use JSON-based formats", as a loose aggregate, have a serious and latent disagreement about whether data should have a schema or even a formal spec. In the beginning when people first started using JSON instead of XML, it was done in a schemaless way, and making sense of it was strictly best-effort on part of the receiving party. Then a movement appeared to bring schemas to JSON, which went against the original reason for using JSON in the first place, and now we're stuck with the two camps playing in the same sandbox whose views, use-cases, and goals are contradictory. This appears to be a "classic" loose JSON format, not a strictly-schemad JSON format, not even bothering to declare its own mediatype. This invites criticism from the other camp, yet the authors are clearly not playing in that arena. What's the long-term solution here?
[1] http://scripting.com/stories/2012/09/10/rssInJsonForReal.htm... [2] https://core.trac.wordpress.org/ticket/25639 [3] http://www.giantflyingsaucer.com/blog/?p=3521 [4] https://groups.google.com/forum/#!topic/restful-json/gkaZl3A... [5] https://www.w3.org/TR/activitystreams-core/
It should just be size and duration or size_bytes size_seconds (but adding units only makes sense if you could use other units). adding _in to the mix is strange.
Implemented: http://sprout.rupy.se/feed?json
Or asked another way, what problem does this solve for you?
While not hard evidence, I think it's indicative of the kind of experience a developer has when they choose to engage with syndication.
I don't understand why suddenly people treat this like something that uniquely solves a problem. Maybe I'm missing something?
String encoded blog posts are going to be painful once people start using the `content_html` part of the spec.
That said, they're being responsive to questions in Issues, so I remain optimistic.