I would love to see * gone but I must note that _ is annoyingly hard to type on a screen keyboard.
Back in the days of USENET one common choice was using a / to delimit /emphasis/ - the usual reading was that this indicated words that would normally be rendered as /italics/. You'd often see it used to indicate the titles of books and movies, as well, since the typographical convention was typically that these were italicized as well - note that both <em> and <cite> typically render as italics, for instance. I have always disliked Markdown's choice to use * as a delimiter for both italics and bold; / always implied italics to me, and * always implied bold.
Anyway. I propose that / would be a much better delimiter for emphasis than _. On a US keyboard, it can be typed without a shift key. And on a US IOS screen keyboard, it is a simple swipe on B, versus shifting to the numeric entry page and swiping on &.
https://en.wikipedia.org/wiki/International_Phonetic_Alphabe...
So:
**strong text** -> bold "strong text"
\*\*doubled splats\*\* -> "doubled splats" w/ "**" on either side
This is also cumbersome to type, but at least there's a path to what you want to present even if the character is reserved for markup.I'm honestly with you on this and I'm in the middle of building a huge Markdown site where I have the freedom to change the syntax now if I want.
*this is bold*
/this is italics/
_this is underlined_
Beyond simple conventions like this, I'd just as soon drop into HTML as deal with some other markup that ends up being just as complex. We don't need to allow permutations and combinations such as bold and italics, double-weight bold, etc. these never occur in normal prose typesetting and if you need it just use HTML for those rare cases.Fuck no. Same idiocy as turning -- into long em, makes writing any technical posts mighty annoying
Get better screen keyboard. On mine _ doesn't require shift, neither does *
> While Markdown’s syntax has been influenced by several existing text-to-HTML filters — including Setext, atx, Textile, reStructuredText, Grutatext, and EtText — the single biggest source of inspiration for Markdown’s syntax is the format of plain text email.
But...
There are tons of markup languages for prose that have well-defined specs.
So, why did Markdown win?
IMO, because it does not have a well-defined spec. It is highly tolerant of formatting errors, inconsistencies, etc. If an author makes a mistake when writing Markdown, you can always look at it in plain text.
Whereas a perfectly-spec'd markup language would probably evolve toward an unreadable-to-humans mess in the committee-driven pursuit of precision.
You see this theme in so many places in tech: "less is more", the Unix philosophy of everything-is-a-file, messy HTML5 over "XHTML", ML extraction vs. explicit semantic web, etc.
Same reason that JSON won.
JSON and Markdown are base standards that were generated by market need to simplify.
JSON won because it was not overly complex and there was some flexibility. If you need more go YAML or use JSON as a platform for more.
Every attempt to change JSON has and should be shot down. JSON really just has basic CS types: string, int/number, bool, object, lists. From there any data or types can be serialized or filled. With JSON you can do types via overloads/additional keys, you can add files by url/uri or base64, and any additional needs using parts of basic JSON. Even large numbers can just be strings with type defs as additional keys/patterns. Financial data can just use strings or ints with no decimal largely because this is the safest way to store financial data to prevent float issues.
KISS is life and sometimes things are just done, no improvements needed. Now you can take JSON and add things on top of it if you want. Same with Markdown. The base doesn't need to change... ever.
Don't SOAP my JSON. Don't HTML my Markdown. Though you can add specs (JSONSchema/OpenAPI) and formatting tools on top in a processing step. For messaging and base content, they are perfect, simple, clear, concise and no need to change.
JSON is very strict. It won't let you have a comma after the last element of a list, for instance (which is very annoying in many cases). It won't let you add comments in any way, shape or form. It won't let you use single quotes instead of double quotes. Or forget quotes in keys. Or mess with case in null / true / false. Or use NaN values.
Markdown is ill-defined, and will happily let you do whatever the hell you want.
JSON is made for programs, and is a PITA to write as a human (for the reasons mentioned above). But a pleasure to parse and (to some extent) generate automatically. It's not very good with text.
Markdown is made for humans, and I'd hate to have to parse a markdown file and do something with its content other than basic formatting. It's bad at anything but text.
Native support for JSON parsing and stringify helped when it came later. The Selector api that also came later made XML parsing a little easier if you didn’t want to use XPath, but by then most things were JSON anyway.
I really wish JSON allowed for final trailing commas in arrays/objects.
It would make for more readable diffs, simpler text templating, easier writing/parsing for us humans, etc. I'd happily trade all of TOML, YAML, XML, and every other similar format in existence for that one change.
I completely agree. My favourite software is not just functional, it also is opinionated and expresses a philosophy on how to do something. Simply adding flexibility forever in a quest to be useful for everyone ends up making it useful for no-one.
But in the case of MarkDown the original implementation was just not that great. Which has nothing to do with being easier; MacFarlane’s Djot is an easier to implement and easier to describe language.
And of course your point about “committee-driven pursuit of precision” is just a made up hypoethical which is not worth responding to. (The only committee has been on CommonMark, which is a definition of “MarkDown” (TM) which merely tries to deal with years of drift between different MarkDown implementations. With their famously long-winded spec-by-prose-enumeration style.)
I think markdown won because it was specifically made with HTML output in mind, instead of arbitrary output (docbook, in the case of AsciiDoc, which is pretty much infinitely malleable).
There are in effect two different versions of AsciiDoc, because Asciidoctor people have appropriated the name while making their own changes to it and marking what they dislike as deprecated.
AsciiDoc cannot express all of DocBook, for example figures with multiple images.
While I despise Markdown, there isn't all that much to be a fanboy of. Just the syntax is overall saner.
The requirement for Markdown is to be simple and easy. It's intended for use by people who are going to ignore whatever specs and documentation there are. They'll write a little comment, a bug ticket, or a readme and they might need things like links, bold, italic, etc. And the job is to turn that into some legible HTML. So most of its features are simple and easy to remember. Just add a blank line for a new paragraph, prefix your bullets with a -. and so on.
Markdown is undeniably simple and easy to learn. Which is why it got so popular. It has edge cases but they don't really matter. It has obscure features (e.g. tables) most people don't use, so those don't matter either. And there's a wide range of things it can't do that also don't matter. The job never was being a drop in replacement for more complex tools. It was removing the need to use those for the simple use cases and be simply good enough.
The alternatives each chase requirements that are important to their creators but not to most casual users, or indeed the people that integrate markup tools. And of course the more these alternatives differ from Markdown, the harder of a sell it becomes. And the more there are, the less likely it is for any of them to become more popular than markdown. At this point, markdown is a common default in things like issue trackers, readme's on Github/Gitlab, etc. Any tool integrating some kind of markup language support in their content management is more likely to be using markdown than anything else at this point.
The reason is simply that using something else breaks the principle of the least amount of surprise for the user. Markdown is the largest common denominator. It's good enough and easy enough to deal with. So, most new things would favor using that over anything else. It's a self re-enforcing thing.
Or the lowest.
This is how populist politics works. The thing that appeals to the most people isn't necessarily the thing we should be doing.
The internet and web appealed to a small percentage of people in the early 90s, and it was glorious. You had to put in effort to get anything out, which meant most people didn't bother, which meant it was a nice place. The music industry similarly had a high level of entry. Both are filled with crap now.
Elitist old man shouting at clouds? Maybe. Doesn't mean I'm wrong though.
You only need to be good enough to enter this kind of competition... and win. The reasons you might win can be many arbitrary things, like someone deciding to adopt a practice in a large organization, or dedicating efforts to writing parsers in many languages etc.
Maybe, and I mean that sincerely...but are you just saying this must happen or can you actually point to where MacFarlane's proposals would make a significantly less pleasant language?
This proposal shows us a clear step in that direction, going from something simple and easy for humans to understand, with complex implementation, to emphasize part of a word:
fan*tas*tic
To proposing a simple implementation that's... weird for humans: fan~_tas_~tic[1] Maybe I’m being a hypocrite here? I definitely am in favor of a lot of “cutesy” ways to communicate (things that are more stylistic than necessary). But not intra-word emphasis, really.
Sure, it probably is easier to parse, and maybe there are a few edge cases that it does better, but the goal of markdown is to have text that is:
A) human readable and looks good without parsing it
B) can be parsed and presented using different themes
In djot they sacrifice a lot (e.g. we now have to insert empty lines in a nested list?!) of point A for questionable gains at point B. Guess what I as a user care more about?
Markdown accepting a wide range of inputs is not a mistake, it is a feature. If that makes parsing more complex that is an acceptable side effect not a mistake.
I would have tried harder to find some other way to make the grammar simple.
I haven’t seen anything else (in addition) that makes it less “human readable” though.
Regarding the specific author's suggestions, he explicitly writes that he doesn't propose to implement them in the actual MD "standard", since backwards compatibility is more important. That said there is value in making the markup less ambiguous while preserving the "writability" even if it's just a thought experiment.
If markdown just used bold _italics_ at the start, or needed a tag for HTML instead of passing it as is... it would be entirely fine and just as popular now. Or any other generally agreed upon as "good" fix.
But inertia makes things like that near-impossible to change now. Only additions can sorta work and even those are hard as critical mass of dialects needs to apply them for it to work.
Now one could speculate about the reasons.
Nothing messy about HTML, whatever version. It just uses SGML features from a more civilized age, such as inferring tags not explicitly present when unambiguously required by the content model grammar.
Btw a large fragment of markdown can be implemented using SGML's SHORTREF feature, as can customizations such as GitHub-flavored markdown. John Gruber's markdown language is specified as a canonical rewriting into HTML with the option of inline HTML as fallback, making SGML SHORTREF a particularly fitting implementation model since it works just the same. It's quite striking how a technique for custom syntax invented in the 70's (however imperfectly specified, though not in a worse-is-better way lol) could foresee Wiki syntaxes and also determine the most commonly used markup language (HTML) fifty years later.
Agree with the gist of your post, though. As fantastic as MacFarlane's pandoc is, the idea to re-assign redundancies in markdown (eg. interpret minute presence/omission of space chars to mean something) was bound to fail, and that was very clear to me skimming only through a few paragraphs of the CommonMark manifesto. When it was first discussed here back then, someone commented that this was bound to happen when a logician (McFarlane) approached Wiki syntax.
> What if we tried to create a light markup syntax [..] revising some of the features that have led to bloat and complexity in the CommonMark spec?
Are you writing this new format to make life easier for the humans using it, or the humans programming it?
It's sad when programmers don't see the forest for the trees.
The rest of the article frequently takes the side of the users, and mentions how confusing certain existing rules are to them. I know I frequently don't know what to expect from Markdown in certain corner cases, and felt vindicated by the author calling them out here. Some of their ideas for simplification would surprisingly even let us do things that are currently not possible.
Not necessarily. Generics, and/or C++ templates are a pain to parse because they're context sensitive. But while reading/writing code it's typically obvious whether I'm writing a comparison or a generic/template.
Foo<Bar> foo;
// VS
Foo < Bar;
Likewise, in C++ you can end up with: unordered_set<tuple<int, float>> mySet;
// >> is ambiguous here without a symbol table or context around the statement
Foo >> 5;
I think both of these are fairly obvious as a user of the language, but boy am I glad I don't have to parse that!You are still confouding rules for writing with rules for parsing. It's absolutely possible and easy to make rules making writing easier but parsing harder.
For example, if you make rule that makes formatting markers like **\_ be order insensitive (so **_word**_ formats same as **_word_**), much easier for user, as they don't need to remember order of which the operators were used, harder to code (I assume)
There are cases that are 100% ambiguous in the spec, which means there can be no _right_ answer. Different users will have different (and both reasonable) expectations about what the same input will do. So, in these cases "too hard" for the computer means leads directly to a negative user experience. The language becomes more unpredictable.
I agree that we shouldn't _ever_ lose focus on the end user experience. But sometimes, you have to make the spec less ambiguous to improve the end-user experience.
"In this article from 2017, I talk about dinglehoppers, which have since been improved by research from these three papers [1][2][3]. Here is where I revisit this topic in 2021."
He actually implemented these ideas: https://djot.net/
Surely riffing from; Mark, Common or Down would have been more effective.
I think we could pick one way to handle emphasis, lists, and code blocks that covers a specific and predictable 80%.
Anything that becomes hard to describe without including additional notation to the grammar is probably best suited to be left as HTML, as was the intention behind markdown to begin with.
E.g. a macro that returns todays date, todays great offer, etc. Or a "number of days until xxx" for countdowns until some event.
His attribute syntax is very close. A posssible macro syntax use {@ as leading marker, e.g.
{@macroname position=left}
or There is {@daysuntil date=20230710} days to launch.It definitely is a weird choice to use *s for both bold and italics. Parsers could be implemented much easier, if both had different delimiters as mentioned in the post.
2. The only thing I miss is support for nested numbered lists.
2. 1. (The best kind of lists.)
Markdown is meant to be simple. To represent complex things, use something else.
I don’t think I’ll ever use this and if someone tries to make me learn this instead of regular markdown, I’ll probably just not bother.
I don’t want to diminish anyone’s creativity, but this seems like a lot of work put into something unnecessarily.
https://web.archive.org/web/20121017064607/http://94.249.190...
https://news.ycombinator.com/item?id=4437875
These days, it would be good to mix/match ideas from: pugjs, htmlx, jupyter, dhall
The AsciidocFX program is a good "starter's editor" for those unfamiliar with Asciidoc and lightweight markup in general - it includes a "boxed" DocBook-XSL pipeline as an alternative to the Ruby-based asciidoctor-pdf. For an actual production editor, Visual Studio Code with the Asciidoctor extension is very hard to beat. Github integration on top of VSC gives you some collaborative visibility, too.
On the PDF front, another interesting Asciidoc project is asciidoctor-web-pdf, which uses Paged.js and CSS to product extremely complex PDFs using web technologies (Chromium + Puppeteer, I think). That, asciidoctor-pdf (Ruby/Prawn), and DocBook-XSL are the main PDF pipelines.
The is especially the case when it works for the vast majority of use cases (or can be hammered into them); ambigiuities are very visible to implementers and detail-oriented folks, but most people never see these issues, or don't care about them.
And, while it sucks that it's complicated to implement, that burden is on relatively few people. See also: the HTML Priority of Constituencies.
Oh yes. I made the fun decision to write a markdown parser/contenteditable component for https://sqwok.im and ended up spending probably a month on it, largely writing endless unit tests and covering odd cases like that.
It's far from perfect and probably will still break on certain ambiguous inputs. I like his ideas for clarifying the language for the most general audience.
This is like not using bind variables on your sql library. I just don't understand it. I'm looking at you, Crockford.
For example I can write ◊bold{strong* word} and it becomes (bold “strong* word”). It’s very clear how this should be rendered.
*foo* always means * followed by , and the closing * is missing and would be flagged.
<string><em>foo</em>... uh oh missing a closing *, can't parse
Oh boy, HN mangled this. I'm leaving it as an examplar
You can show string-literal text *without* HN's markup interpolation by indenting the start of the line by two characters:If you can do this, you can write manual pages for options or flags
First Term
: This is the definition of the first term.
Second Term
: This is one definition of the second term.
: This is another definition of the second term.
<https://www.markdownguide.org/extended-syntax>I meant
-a the minus aflag text
-b the minus bflag text
Something the something text
It's basically table or grid layout without lines.You mean a definition list like the HTML native one?