Beyond Markdown (2018) (opens in new tab)

(johnmacfarlane.net)

125 pointsunforswearing2y ago126 comments

126 comments

> To dramatically reduce ambiguities, we can remove the doubled character delimiters for strong emphasis. Instead, use a single _ for regular emphasis, and a single * for strong emphasis.

I would love to see * gone but I must note that _ is annoyingly hard to type on a screen keyboard.

Back in the days of USENET one common choice was using a / to delimit /emphasis/ - the usual reading was that this indicated words that would normally be rendered as /italics/. You'd often see it used to indicate the titles of books and movies, as well, since the typographical convention was typically that these were italicized as well - note that both and <cite> typically render as italics, for instance. I have always disliked Markdown's choice to use * as a delimiter for both italics and bold; / always implied italics to me, and * always implied bold.

Anyway. I propose that / would be a much better delimiter for emphasis than _. On a US keyboard, it can be typed without a shift key. And on a US IOS screen keyboard, it is a simple swipe on B, versus shifting to the numeric entry page and swiping on &.

jitl2y ago

But I like to write paths like /usr/bin/ in my text and don’t want to worry about backtick code quoting them every time

avgcorrection2y ago

This is a real problem considering that some programmers are just sloppy with code-quoting paths and such. And it’s even debatable whether (sort of stylistically) you should even code-quote paths.

anamexis2y ago

There's the same problem with variables and filenames containing 2+ underscores, resulting in unwanted italics.

ilyt2y ago

That at least is rarer than paths.

SoftTalker2y ago

Most people don't write file paths. Only a concern for programmers, who should be fine escaping them by whatever mechanism.

jitl2y ago

Most people don’t write markdown! What’s the demographic of markdown writers who don’t write file paths? Bloggers?

3 more replies

dyarosla2y ago

Uh.. what about all those non programmers that write URLs?

kej2y ago

I think many/most people use slashes more often than they use asterisks, however.

1 more reply

doubleunplussed2y ago

And in markdown you chuck them between backticks to indicate they're verbatim text to be rendered in monospace.

1 more reply

FinnKuhn2y ago

no, but most people definitely use links in their texts and they have the same problem. / is also regularly used for fractions or/and in a situation where you could use two words

cbarrick2y ago

In IPA, slashes are used for abstract phonemic notation ( pronunciation guides). Converting that to would be annoying for certain communities.

https://en.wikipedia.org/wiki/International_Phonetic_Alphabe...

dredmorbius2y ago

There's the concept of escaping notation using a single backslash: \

So:

  **strong text** -> bold "strong text"
  
  \*\*doubled splats\*\* -> "doubled splats" w/ "**" on either side

This is also cumbersome to type, but at least there's a path to what you want to present even if the character is reserved for markup.

cbarrick2y ago

Yes. I'm less concerned about it being cumbersome to write than I am about it being cumbersome to read.

And again, this complaint will only be relevant in contexts where this specific convention is used.

ilyt2y ago

rare enough to not matter

qingcharles2y ago

I really like your proposal, but in the days of USENET the // wasn't interpreted by the machine but simply by our minds, just like * for bold. Would there be any extra issues caused by italics being / rather than *?

I'm honestly with you on this and I'm in the middle of building a huge Markdown site where I have the freedom to change the syntax now if I want.

SoftTalker2y ago

I'd like to see

  *this is bold*
  /this is italics/
  _this is underlined_

Beyond simple conventions like this, I'd just as soon drop into HTML as deal with some other markup that ends up being just as complex. We don't need to allow permutations and combinations such as bold and italics, double-weight bold, etc. these never occur in normal prose typesetting and if you need it just use HTML for those rare cases.

s1mon2y ago

Underlining is an emphasis hack for mechanical typewriters or in handwriting. There's no reason to use it typographically in something which has all the layout possibilities of a modern computer or printer.

https://practicaltypography.com/underlining.html

2 more replies

yaantc2y ago

This is what Org mode does. It's still very tied to Emacs, but there's an effort to standardize the Org format. Hopefully this will help its adoption outside of Emacs, it's a nice markup (and a lot more).

ilyt2y ago

"why /usr/bin/vi" turns /usr/bin into italic ?"

underlines is not used enough to take the commonly used character ~this is fine~, trailing tilde is rare enough

defrost2y ago

As a point of order;

\* more than 3\/4's of people may not feel that way, particularly those discussing snake\_case Vs CamelCase

3 more replies

ilyt2y ago

> Back in the days of USENET one common choice was using a / to delimit /emphasis/ - the usual reading was that this indicated words that would normally be rendered as /italics/.

Fuck no. Same idiocy as turning -- into long em, makes writing any technical posts mighty annoying

Get better screen keyboard. On mine _ doesn't require shift, neither does *

toastal2y ago

Usage of `_` and `-` keys are dead simple home-row keys on Dvorak keyboards. I’ve never switched layouts because most are focused on prose, but programming demands a lot of snake & kebab casing. …not that `/` is too far away.

loloquwowndueo2y ago

Do you have some reference for “simple swipe on B”? Doesn’t work for me (not that simple I guess?)

bobbylarrybobby2y ago

They're probably referring to the iPad keyboard (https://www.ghacks.net/2019/08/02/how-to-enable-the-swipe-ke...)

hsfzxjy2y ago

Why would a _ be harder to type than a * on a screen keyboard?

ilyt2y ago

same difficulty on mine, just switch into alternate character page and both need no shifting or button holding there, no idea what he's on about.

ajdude2y ago

I agree. I admit, I still /do this/ for emphasis, and have always wondered why Markdown didn't follow suit.

masklinn2y ago

Because Gruber used a different convention?

> While Markdown’s syntax has been influenced by several existing text-to-HTML filters — including Setext, atx, Textile, reStructuredText, Grutatext, and EtText — the single biggest source of inspiration for Markdown’s syntax is the format of plain text email.

sqs2y ago

I understand where the author is coming from and respect their contributions to Commonmark.

But...

There are tons of markup languages for prose that have well-defined specs.

So, why did Markdown win?

IMO, because it does not have a well-defined spec. It is highly tolerant of formatting errors, inconsistencies, etc. If an author makes a mistake when writing Markdown, you can always look at it in plain text.

Whereas a perfectly-spec'd markup language would probably evolve toward an unreadable-to-humans mess in the committee-driven pursuit of precision.

You see this theme in so many places in tech: "less is more", the Unix philosophy of everything-is-a-file, messy HTML5 over "XHTML", ML extraction vs. explicit semantic web, etc.

drawkbox2y ago

> IMO, because it does not have a well-defined spec.

Same reason that JSON won.

JSON and Markdown are base standards that were generated by market need to simplify.

JSON won because it was not overly complex and there was some flexibility. If you need more go YAML or use JSON as a platform for more.

Every attempt to change JSON has and should be shot down. JSON really just has basic CS types: string, int/number, bool, object, lists. From there any data or types can be serialized or filled. With JSON you can do types via overloads/additional keys, you can add files by url/uri or base64, and any additional needs using parts of basic JSON. Even large numbers can just be strings with type defs as additional keys/patterns. Financial data can just use strings or ints with no decimal largely because this is the safest way to store financial data to prevent float issues.

KISS is life and sometimes things are just done, no improvements needed. Now you can take JSON and add things on top of it if you want. Same with Markdown. The base doesn't need to change... ever.

Don't SOAP my JSON. Don't HTML my Markdown. Though you can add specs (JSONSchema/OpenAPI) and formatting tools on top in a processing step. For messaging and base content, they are perfect, simple, clear, concise and no need to change.

sacado22y ago

I think JSON and Markdown are very different, in fact.

JSON is very strict. It won't let you have a comma after the last element of a list, for instance (which is very annoying in many cases). It won't let you add comments in any way, shape or form. It won't let you use single quotes instead of double quotes. Or forget quotes in keys. Or mess with case in null / true / false. Or use NaN values.

Markdown is ill-defined, and will happily let you do whatever the hell you want.

JSON is made for programs, and is a PITA to write as a human (for the reasons mentioned above). But a pleasure to parse and (to some extent) generate automatically. It's not very good with text.

Markdown is made for humans, and I'd hate to have to parse a markdown file and do something with its content other than basic formatting. It's bad at anything but text.

billyhoffman2y ago

JSON won because parsing it on in the browser was just was a call to “eval()”, and then you just access the object using normal JS conventions/syntax (e.g data.foo[0].bar). Whereas XML required creating a DOM parser and document fragment, and then using cumbersome HTML DOM methods like “ getElementsByTagName()” to get each value directly (or worse Xpath). It totally sucked.

Native support for JSON parsing and stringify helped when it came later. The Selector api that also came later made XML parsing a little easier if you didn’t want to use XPath, but by then most things were JSON anyway.

majkinetor2y ago

It should at least have comments. Then it can freeze.

drawkbox2y ago

You can make a separate file that has comments or even another JSON file that has descriptions by JSONPath or key.

Some libs also support comments and trim before processing but I prefer the external/metadata way. Comments add weight.

rollcat2y ago

> Every attempt to change JSON has and should be shot down.

I really wish JSON allowed for final trailing commas in arrays/objects.

It would make for more readable diffs, simpler text templating, easier writing/parsing for us humans, etc. I'd happily trade all of TOML, YAML, XML, and every other similar format in existence for that one change.

ghusto2y ago

It makes generating from templates in certain (many!) instances needlessly difficult. I say needlessly, because the rule is seemingly arbitrary. I can't see what purpose it serves.

axblount2y ago

Also, "worse is better"

https://en.wikipedia.org/wiki/Worse_is_better

mindwok2y ago

Nice, I didn't know there was a term for this.

I completely agree. My favourite software is not just functional, it also is opinionated and expresses a philosophy on how to do something. Simply adding flexibility forever in a quest to be useful for everyone ends up making it useful for no-one.

avgcorrection2y ago

This is only perhaps correct in that a loosey-goosey proposals can spread farther because it is seemingly simple to implement (less MUST and whatever) and by the time you notice inconsistencies between implementations, the thing has reached a sort of critical mass already and the things aren’t that inconsistent so you just shrug and say whatever.

But in the case of MarkDown the original implementation was just not that great. Which has nothing to do with being easier; MacFarlane’s Djot is an easier to implement and easier to describe language.

And of course your point about “committee-driven pursuit of precision” is just a made up hypoethical which is not worth responding to. (The only committee has been on CommonMark, which is a definition of “MarkDown” (TM) which merely tries to deal with years of drift between different MarkDown implementations. With their famously long-winded spec-by-prose-enumeration style.)

bobbylarrybobby2y ago

Asciidoctor has a spec, reads pretty similarly to markdown, and is infinitely better IMO. And it (well, AsciiDoc) predated markdown!

I think markdown won because it was specifically made with HTML output in mind, instead of arbitrary output (docbook, in the case of AsciiDoc, which is pretty much infinitely malleable).

premysl2y ago

The Asciidoctor flavour of AsciiDoc doesn't have a specification. There is only a working group. The parsers are a mess composed of regular expressions.

There are in effect two different versions of AsciiDoc, because Asciidoctor people have appropriated the name while making their own changes to it and marking what they dislike as deprecated.

AsciiDoc cannot express all of DocBook, for example figures with multiple images.

While I despise Markdown, there isn't all that much to be a fanboy of. Just the syntax is overall saner.

MilStdJunkie2y ago

Ah, DocBook //imageobjectco with something like calspair as well. I've been wanting it badly, but there's zero movement in the Asciidoctor group to try and tackle that beast.

With all due respect, and speaking as an amateur programmer, when it comes to lightweight markup, is there a better way to write a parser besides regular expressions? I suppose it's how the semantics are abstracted.

Asciidoc does get you conditionals and transclusion in the core spec, without needing to resort to extensions. This is what brought me over. That and the XML interoperability.

The Eclipse WG isn't published yet, but, in my opinion, it's a more stable surface to build on than the "many worlds" of Markdown.

Every time someone shows me a cool markdown trick, it requires me to pull something down from github and `npm-install` (or equivalent). But, well, that's kind of the point, isn't it? Markdown's ease of implementation allows a degree of glorious hackery that's just not possible otherwise. While Asciidoctor's great albatross - and its great asset - is Ruby . . which inevitably involves Opal at some point.

jillesvangurp2y ago

You are completely right. The underlying theme here is that the requirements matter.

The requirement for Markdown is to be simple and easy. It's intended for use by people who are going to ignore whatever specs and documentation there are. They'll write a little comment, a bug ticket, or a readme and they might need things like links, bold, italic, etc. And the job is to turn that into some legible HTML. So most of its features are simple and easy to remember. Just add a blank line for a new paragraph, prefix your bullets with a -. and so on.

Markdown is undeniably simple and easy to learn. Which is why it got so popular. It has edge cases but they don't really matter. It has obscure features (e.g. tables) most people don't use, so those don't matter either. And there's a wide range of things it can't do that also don't matter. The job never was being a drop in replacement for more complex tools. It was removing the need to use those for the simple use cases and be simply good enough.

The alternatives each chase requirements that are important to their creators but not to most casual users, or indeed the people that integrate markup tools. And of course the more these alternatives differ from Markdown, the harder of a sell it becomes. And the more there are, the less likely it is for any of them to become more popular than markdown. At this point, markdown is a common default in things like issue trackers, readme's on Github/Gitlab, etc. Any tool integrating some kind of markup language support in their content management is more likely to be using markdown than anything else at this point.

The reason is simply that using something else breaks the principle of the least amount of surprise for the user. Markdown is the largest common denominator. It's good enough and easy enough to deal with. So, most new things would favor using that over anything else. It's a self re-enforcing thing.

ghusto2y ago

> largest common denominator

Or the lowest.

This is how populist politics works. The thing that appeals to the most people isn't necessarily the thing we should be doing.

The internet and web appealed to a small percentage of people in the early 90s, and it was glorious. You had to put in effort to get anything out, which meant most people didn't bother, which meant it was a nice place. The music industry similarly had a high level of entry. Both are filled with crap now.

Elitist old man shouting at clouds? Maybe. Doesn't mean I'm wrong though.

crabbone2y ago

These things don't win on engineering merits. Markdown wasn't better than others. It was like a bunch of others. It's just natural that one form of communication becomes a monopoly because people want to be able to talk to as many people as possible.

You only need to be good enough to enter this kind of competition... and win. The reasons you might win can be many arbitrary things, like someone deciding to adopt a practice in a large organization, or dedicating efforts to writing parsers in many languages etc.

hyperpape2y ago

> Whereas a perfectly-spec'd markup language would probably evolve toward an unreadable-to-humans mess in the committee-driven pursuit of precision.

Maybe, and I mean that sincerely...but are you just saying this must happen or can you actually point to where MacFarlane's proposals would make a significantly less pleasant language?

fiddlerwoaroof2y ago

Requiring a blank line before a sublist just looks wrong.

jitl2y ago

I couldn’t figure out what was meant by a sublist. Like any hierarchy? Or just list-in-paragraph-in-list, not list-in-list? That one could use some HTML disambiguation in the article.

EDIT: yeah it’s always required. That kills me.

Izkata2y ago

> Whereas a perfectly-spec'd markup language would probably evolve toward an unreadable-to-humans mess in the committee-driven pursuit of precision.

This proposal shows us a clear step in that direction, going from something simple and easy for humans to understand, with complex implementation, to emphasize part of a word:

  fan*tas*tic

To proposing a simple implementation that's... weird for humans:

  fan~_tas_~tic

avgcorrection2y ago

It seems like a minor concession since most uses of intra-word emphasis are more cutesy than communicative[1] (it is of course sometimes very useful when there is a subtle syllable emphasis, or a suble typo that you want to point out).

[1] Maybe I’m being a hypocrite here? I definitely am in favor of a lot of “cutesy” ways to communicate (things that are more stylistic than necessary). But not intra-word emphasis, really.

atoav2y ago

I had a look at djot, which adresses all of the author's grievances and I must say... I don't like it.

Sure, it probably is easier to parse, and maybe there are a few edge cases that it does better, but the goal of markdown is to have text that is:

A) human readable and looks good without parsing it

B) can be parsed and presented using different themes

In djot they sacrifice a lot (e.g. we now have to insert empty lines in a nested list?!) of point A for questionable gains at point B. Guess what I as a user care more about?

Markdown accepting a wide range of inputs is not a mistake, it is a feature. If that makes parsing more complex that is an acceptable side effect not a mistake.

avgcorrection2y ago

I agree that empty line in front of a nested list is ugly. I very often make hierarchical descriptions of things like events or things to do or recipes and that kind of thing would be annoying to have to deal with. I like my lists tight.

I would have tried harder to find some other way to make the grammar simple.

I haven’t seen anything else (in addition) that makes it less “human readable” though.

ninepoints2y ago

I'd argue that it won the adoption it did in spite of parsing ambiguities and the lack of a spec. Not because of it. There are plenty of examples of well specified things that have gained mass adoption, so I think you are confusing cause and correlation here.

giraffe_lady2y ago

Well-specified formats that are primarily produced by humans writing them by hand? The main entries in this category are programming languages.

ninepoints2y ago

IDK, JSON? HTML and XML are markup languages also. There are obvious issues with markdown that were fixed/resolved in various markdown variants, and missing features as well that I don't think anyone could argue helped adoption. Case in point, the most commonly used markdown flavor is GFM because we all adopted GitHub and that was what they support.

eterevsky2y ago

It is true that Markdown won by putting simplicity for the users in front of simplicity for the parsers. But since it became ubiquitous, there's a lot of value in codifying the standard to make sure that it doesn't diverge into different dialects.

Regarding the specific author's suggestions, he explicitly writes that he doesn't propose to implement them in the actual MD "standard", since backwards compatibility is more important. That said there is value in making the markup less ambiguous while preserving the "writability" even if it's just a thought experiment.

ilyt2y ago

It's not really problem with being "perfectly specced" or not, it's just matter of inertia.

If markdown just used bold _italics_ at the start, or needed a tag for HTML instead of passing it as is... it would be entirely fine and just as popular now. Or any other generally agreed upon as "good" fix.

But inertia makes things like that near-impossible to change now. Only additions can sorta work and even those are hard as critical mass of dialects needs to apply them for it to work.

still_grokking2y ago

What you're saying is that the most stupid and broken "solutions" win…

Now one could speculate about the reasons.

tannhaeuser2y ago

> messy HTML5 over "XHTML"

Nothing messy about HTML, whatever version. It just uses SGML features from a more civilized age, such as inferring tags not explicitly present when unambiguously required by the content model grammar.

Btw a large fragment of markdown can be implemented using SGML's SHORTREF feature, as can customizations such as GitHub-flavored markdown. John Gruber's markdown language is specified as a canonical rewriting into HTML with the option of inline HTML as fallback, making SGML SHORTREF a particularly fitting implementation model since it works just the same. It's quite striking how a technique for custom syntax invented in the 70's (however imperfectly specified, though not in a worse-is-better way lol) could foresee Wiki syntaxes and also determine the most commonly used markup language (HTML) fifty years later.

Agree with the gist of your post, though. As fantastic as MacFarlane's pandoc is, the idea to re-assign redundancies in markdown (eg. interpret minute presence/omission of space chars to mean something) was bound to fail, and that was very clear to me skimming only through a few paragraphs of the CommonMark manifesto. When it was first discussed here back then, someone commented that this was bound to happen when a logician (McFarlane) approached Wiki syntax.

still_grokking2y ago

SGML is a hot mess. It should have died decades ago.

throwawaaarrgh2y ago

> All of these rules lead to unexpected results sometimes, and they make writing a parser for CommonMark a complex affair.

> What if we tried to create a light markup syntax [..] revising some of the features that have led to bloat and complexity in the CommonMark spec?

Are you writing this new format to make life easier for the humans using it, or the humans programming it?

It's sad when programmers don't see the forest for the trees.

boerseth2y ago

If the rules are too complicated, then they are a challenge for all parties, both users and implementers. I think it is useful to be able to imagine at least on some higher level what a parser would do to the stuff I write, so everyone benefits from the ease of understanding that comes with simpler rules. The question is just how far we can simplify without reducing usability.

The rest of the article frequently takes the side of the users, and mentions how confusing certain existing rules are to them. I know I frequently don't know what to expect from Markdown in certain corner cases, and felt vindicated by the author calling them out here. Some of their ideas for simplification would surprisingly even let us do things that are currently not possible.

_gabe_2y ago

> If the rules are too complicated, then they are a challenge for all parties, both users and implementers

Not necessarily. Generics, and/or C++ templates are a pain to parse because they're context sensitive. But while reading/writing code it's typically obvious whether I'm writing a comparison or a generic/template.

  Foo<Bar> foo;
  // VS
  Foo < Bar;

Likewise, in C++ you can end up with:

  unordered_set<tuple<int, float>> mySet;
  // >> is ambiguous here without a symbol table or context around the statement
  Foo >> 5;

I think both of these are fairly obvious as a user of the language, but boy am I glad I don't have to parse that!

ilyt2y ago

> If the rules are too complicated, then they are a challenge for all parties, both users and implementers.

You are still confouding rules for writing with rules for parsing. It's absolutely possible and easy to make rules making writing easier but parsing harder.

For example, if you make rule that makes formatting markers like **\_ be order insensitive (so **_word**_ formats same as **_word_**), much easier for user, as they don't need to remember order of which the operators were used, harder to code (I assume)

ncallaway2y ago

The problem is when it's too hard for the computers, then it negatively impacts the user experience.

There are cases that are 100% ambiguous in the spec, which means there can be no _right_ answer. Different users will have different (and both reasonable) expectations about what the same input will do. So, in these cases "too hard" for the computer means leads directly to a negative user experience. The language becomes more unpredictable.

I agree that we shouldn't _ever_ lose focus on the end user experience. But sometimes, you have to make the spec less ambiguous to improve the end-user experience.

jitl2y ago

I am flamfoozled by paragraph-in-list and list-indentation regularly in status quo markdown. Maybe it’s because the syntax is a little weird for the edge cases today? Or maybe I’m just a goof who needs to go read GitHub’s parser source.

0cf8612b2e1e2y ago

Weird that he did not reference the djot package[0] which seems like his attempt at implementing this thesis.

[0] https://djot.net/

noizejoy2y ago

Maybe because OP is from 2018?

0cf8612b2e1e2y ago

Ha! That would do it. Well, guess he stewed on the idea long enough he had to take a swing at the problem.

hinkley2y ago

There is something to be said for, not editing your old posts, but applying a preface that references later iterations on the idea. I wish I were better about that myself.

"In this article from 2017, I talk about dinglehoppers, which have since been improved by research from these three papers [1][2][3]. Here is where I revisit this topic in 2021."

eslaught2y ago

Needs a 2018 in the title.

He actually implemented these ideas: https://djot.net/

evolve2k2y ago

What a terribly named project.

Surely riffing from; Mark, Common or Down would have been more effective.

eslaught2y ago

After the controversy over naming CommonMark (where @jgm et al. caught flak over originally trying to name it Standard Markdown), I'm not surprised that he picked something totally unrelated. And really, it's not Markdown at all at this point, so being more clearly differentiated from Markdown / CommonMark seems like a plus to me.

cpach2y ago

I don’t agree. I like the sound of it. And after all it’s just a name.

anschwa2y ago

Personally, I would like to see a markdown spec that eliminates parsing ambiguity by restricting the "edge-case" features that HTML is really much better at describing in a standard and structured way.

I think we could pick one way to handle emphasis, lists, and code blocks that covers a specific and predictable 80%.

Anything that becomes hard to describe without including additional notation to the grammar is probably best suited to be left as HTML, as was the intention behind markdown to begin with.

avgcorrection2y ago

https://github.com/jgm/djot

silvestrov2y ago

One feature which I'm missing which is very useful for CMS systems etc is a standard syntax for implementation specific callbacks/macros.

E.g. a macro that returns todays date, todays great offer, etc. Or a "number of days until xxx" for countdowns until some event.

His attribute syntax is very close. A posssible macro syntax use {@ as leading marker, e.g.

    {@macroname position=left}

    There is {@daysuntil date=20230710} days to launch.

lewisjoe2y ago

We are implementing markdown support in Zoho Writer (https://zoho.com/writer) and I can confirm how difficult it is to handle bold and italics.

It definitely is a weird choice to use *s for both bold and italics. Parsers could be implemented much easier, if both had different delimiters as mentioned in the post.

dr_kiszonka2y ago

1. Markdown is great.

2. The only thing I miss is support for nested numbered lists.

2. 1. (The best kind of lists.)

prepend2y ago

I write a lot of markdown. Ive taught lots of people to use it. I’ve never encountered these problems.

Markdown is meant to be simple. To represent complex things, use something else.

I don’t think I’ll ever use this and if someone tries to make me learn this instead of regular markdown, I’ll probably just not bother.

I don’t want to diminish anyone’s creativity, but this seems like a lot of work put into something unnecessarily.

eduction2y ago

The article reads very much like a list of problems important to an implementation author rather than a user. Except maybe the nested list thing, which does sound somewhat annoying. But also rare.

hinkley2y ago

My thought is to represent complex things, use better prose or diagrams. Though José and some of his friends are warming me up to interactive tools. Livebook has some stuff I need to look at more. Currently mostly targeted at developers, of course.

JimmyRuska2y ago

I also made my own editor a long time ago, used it for personal use and on the writing site roleplay.cloud. It had lisp-like syntax, custom expansions. It also had some of these ideas, reference links, I would run code snippets with [python ...]. Normal html code would also work, like [br] instead of

https://web.archive.org/web/20121017064607/http://94.249.190...

https://news.ycombinator.com/item?id=4437875

These days, it would be good to mix/match ideas from: pugjs, htmlx, jupyter, dhall

eviks2y ago

Is there any great modern rich document alternative? That wound truly go beyond markdown

MilStdJunkie2y ago

Asciidoc. Particularly if you 1) need XML interoperability, 2) complex print outputs, 3) complex tables, 4) transclusion (partial and otherwise) in core spec, 5) conditionals in core spec.

The AsciidocFX program is a good "starter's editor" for those unfamiliar with Asciidoc and lightweight markup in general - it includes a "boxed" DocBook-XSL pipeline as an alternative to the Ruby-based asciidoctor-pdf. For an actual production editor, Visual Studio Code with the Asciidoctor extension is very hard to beat. Github integration on top of VSC gives you some collaborative visibility, too.

On the PDF front, another interesting Asciidoc project is asciidoctor-web-pdf, which uses Paged.js and CSS to product extremely complex PDFs using web technologies (Chromium + Puppeteer, I think). That, asciidoctor-pdf (Ruby/Prawn), and DocBook-XSL are the main PDF pipelines.

IshKebab2y ago

Asciidoc is the best option if you need a bit more than Markdown.

mnot2y ago

Making breaking changes to markdown is about as practical as doing it to HTML -- already existing content and mindshare give the current form massive inertia.

The is especially the case when it works for the vast majority of use cases (or can be hammered into them); ambigiuities are very visible to implementers and detail-oriented folks, but most people never see these issues, or don't care about them.

And, while it sucks that it's complicated to implement, that burden is on relatively few people. See also: the HTML Priority of Constituencies.

holler2y ago

> Consider, for example *this* text*

Oh yes. I made the fun decision to write a markdown parser/contenteditable component for https://sqwok.im and ended up spending probably a month on it, largely writing endless unit tests and covering odd cases like that.

It's far from perfect and probably will still break on certain ambiguous inputs. I like his ideas for clarifying the language for the most general audience.

1 more reply

hinkley2y ago

I have been trying to research my way out of having to write a markdown parser that doesn't allow inline html because I don't want to be a markdown author, but I categorically don't want people being able to inject things into the wiki(s) I need to create. In some languages it's a flag. In others, there's no flag.

This is like not using bind variables on your sql library. I just don't understand it. I'm looking at you, Crockford.

Decabytes2y ago

This is why I like the way Racket does this with the Pollen language. You can use Pollen mark up and create your own tags and then decide how they are converted. It all becomes a list of X-expressions that can be manipulated in any form you like. But the tree nature of an X-expression means you don’t get issues like *strong* word*.

For example I can write ◊bold{strong* word} and it becomes (bold “strong* word”). It’s very clear how this should be rendered.

wackget2y ago

One feature missing from any Markdown language is vertical table headers (i.e. headers on the left):

https://stackoverflow.com/q/60995936/1652951

cratermoon2y ago

Just make it so that literal * has to be escaped, and use a greedy parse.

*foo* always means * followed by , and the closing * is missing and would be flagged.

<string>foo... uh oh missing a closing *, can't parse

Oh boy, HN mangled this. I'm leaving it as an examplar

dredmorbius2y ago

You can show string-literal text without HN's markup interpolation by indenting the start of the line by two characters:

  You can show string-literal text *without* HN's markup interpolation by indenting the start of the line by two characters:

H8crilA2y ago

This makes parsing and rendering easier, but writing harder. Given the widespread adoption of Markdown I suspect this project to go absolutely nowhere, since it focuses on precisely the opposite thing that makes Markdown popular.

ggm2y ago

I wish more markdowns accepted some notation for keyword bullet lists. Indented lists which mark with a bolded or emphasised term.

If you can do this, you can write manual pages for options or flags

dredmorbius2y ago

There is an extended syntax for that:

  First Term
  : This is the definition of the first term.

  Second Term
  : This is one definition of the second term.
  : This is another definition of the second term.

<https://www.markdownguide.org/extended-syntax>

ggm2y ago

How this renders is suboptimal to me. It's like a variant of heading.

I meant

  -a        the minus aflag text
  -b        the minus bflag text
  Something the something text

It's basically table or grid layout without lines.

oneeyedpigeon2y ago

That's a CSS issue rather than anything to do with markdown or the markup it generates.

1 more reply

toastal2y ago

> keyword bullet lists

You mean a definition list like the HTML native one?

oneeyedpigeon2y ago

I'm sure they mean DLs, yes - this is the one thing I always bemoan that markdown lacks, even more so than tables.

toastal2y ago

### But my headings

* can be a definition list

* if I don’t actually care about semantics

yaccz2y ago

reStructuredText is way better format while only slightly more complex than markdown.

ChrisArchitect2y ago

(2018)

j / k navigate · click thread line to collapse

126 comments

egypturnash2y ago

> To dramatically reduce ambiguities, we can remove the doubled character delimiters for strong emphasis. Instead, use a single _ for regular emphasis, and a single * for strong emphasis.

I would love to see * gone but I must note that _ is annoyingly hard to type on a screen keyboard.

jitl2y ago

But I like to write paths like /usr/bin/ in my text and don’t want to worry about backtick code quoting them every time

avgcorrection2y ago

This is a real problem considering that some programmers are just sloppy with code-quoting paths and such. And it’s even debatable whether (sort of stylistically) you should even code-quote paths.

anamexis2y ago

There's the same problem with variables and filenames containing 2+ underscores, resulting in unwanted italics.

ilyt2y ago

That at least is rarer than paths.

SoftTalker2y ago

Most people don't write file paths. Only a concern for programmers, who should be fine escaping them by whatever mechanism.

jitl2y ago

Most people don’t write markdown! What’s the demographic of markdown writers who don’t write file paths? Bloggers?

3 more replies

dyarosla2y ago

Uh.. what about all those non programmers that write URLs?

kej2y ago

I think many/most people use slashes more often than they use asterisks, however.

1 more reply

doubleunplussed2y ago

And in markdown you chuck them between backticks to indicate they're verbatim text to be rendered in monospace.

1 more reply

FinnKuhn2y ago

no, but most people definitely use links in their texts and they have the same problem. / is also regularly used for fractions or/and in a situation where you could use two words

cbarrick2y ago

In IPA, slashes are used for abstract phonemic notation ( pronunciation guides). Converting that to would be annoying for certain communities.

https://en.wikipedia.org/wiki/International_Phonetic_Alphabe...

dredmorbius2y ago

There's the concept of escaping notation using a single backslash: \

So:

  **strong text** -> bold "strong text"
  
  \*\*doubled splats\*\* -> "doubled splats" w/ "**" on either side

This is also cumbersome to type, but at least there's a path to what you want to present even if the character is reserved for markup.

cbarrick2y ago

Yes. I'm less concerned about it being cumbersome to write than I am about it being cumbersome to read.

And again, this complaint will only be relevant in contexts where this specific convention is used.

ilyt2y ago

rare enough to not matter

qingcharles2y ago

I'm honestly with you on this and I'm in the middle of building a huge Markdown site where I have the freedom to change the syntax now if I want.

SoftTalker2y ago

I'd like to see

  *this is bold*
  /this is italics/
  _this is underlined_

s1mon2y ago

https://practicaltypography.com/underlining.html

2 more replies

yaantc2y ago

ilyt2y ago

"why /usr/bin/vi" turns /usr/bin into italic ?"

underlines is not used enough to take the commonly used character ~this is fine~, trailing tilde is rare enough

defrost2y ago

As a point of order;

\* more than 3\/4's of people may not feel that way, particularly those discussing snake\_case Vs CamelCase

3 more replies

ilyt2y ago

> Back in the days of USENET one common choice was using a / to delimit /emphasis/ - the usual reading was that this indicated words that would normally be rendered as /italics/.

Fuck no. Same idiocy as turning -- into long em, makes writing any technical posts mighty annoying

Get better screen keyboard. On mine _ doesn't require shift, neither does *

toastal2y ago

loloquwowndueo2y ago

Do you have some reference for “simple swipe on B”? Doesn’t work for me (not that simple I guess?)

bobbylarrybobby2y ago

They're probably referring to the iPad keyboard (https://www.ghacks.net/2019/08/02/how-to-enable-the-swipe-ke...)

hsfzxjy2y ago

Why would a _ be harder to type than a * on a screen keyboard?

ilyt2y ago

same difficulty on mine, just switch into alternate character page and both need no shifting or button holding there, no idea what he's on about.

ajdude2y ago

I agree. I admit, I still /do this/ for emphasis, and have always wondered why Markdown didn't follow suit.

masklinn2y ago

Because Gruber used a different convention?

sqs2y ago

I understand where the author is coming from and respect their contributions to Commonmark.

But...

There are tons of markup languages for prose that have well-defined specs.

So, why did Markdown win?

Whereas a perfectly-spec'd markup language would probably evolve toward an unreadable-to-humans mess in the committee-driven pursuit of precision.

You see this theme in so many places in tech: "less is more", the Unix philosophy of everything-is-a-file, messy HTML5 over "XHTML", ML extraction vs. explicit semantic web, etc.

drawkbox2y ago

> IMO, because it does not have a well-defined spec.

Same reason that JSON won.

JSON and Markdown are base standards that were generated by market need to simplify.

JSON won because it was not overly complex and there was some flexibility. If you need more go YAML or use JSON as a platform for more.

KISS is life and sometimes things are just done, no improvements needed. Now you can take JSON and add things on top of it if you want. Same with Markdown. The base doesn't need to change... ever.

sacado22y ago

I think JSON and Markdown are very different, in fact.

Markdown is ill-defined, and will happily let you do whatever the hell you want.

JSON is made for programs, and is a PITA to write as a human (for the reasons mentioned above). But a pleasure to parse and (to some extent) generate automatically. It's not very good with text.

Markdown is made for humans, and I'd hate to have to parse a markdown file and do something with its content other than basic formatting. It's bad at anything but text.

billyhoffman2y ago

majkinetor2y ago

It should at least have comments. Then it can freeze.

drawkbox2y ago

You can make a separate file that has comments or even another JSON file that has descriptions by JSONPath or key.

Some libs also support comments and trim before processing but I prefer the external/metadata way. Comments add weight.

rollcat2y ago

> Every attempt to change JSON has and should be shot down.

I really wish JSON allowed for final trailing commas in arrays/objects.

ghusto2y ago

It makes generating from templates in certain (many!) instances needlessly difficult. I say needlessly, because the rule is seemingly arbitrary. I can't see what purpose it serves.

axblount2y ago

Also, "worse is better"

https://en.wikipedia.org/wiki/Worse_is_better

mindwok2y ago

Nice, I didn't know there was a term for this.

avgcorrection2y ago

bobbylarrybobby2y ago

Asciidoctor has a spec, reads pretty similarly to markdown, and is infinitely better IMO. And it (well, AsciiDoc) predated markdown!

I think markdown won because it was specifically made with HTML output in mind, instead of arbitrary output (docbook, in the case of AsciiDoc, which is pretty much infinitely malleable).

premysl2y ago

The Asciidoctor flavour of AsciiDoc doesn't have a specification. There is only a working group. The parsers are a mess composed of regular expressions.

There are in effect two different versions of AsciiDoc, because Asciidoctor people have appropriated the name while making their own changes to it and marking what they dislike as deprecated.

AsciiDoc cannot express all of DocBook, for example figures with multiple images.

While I despise Markdown, there isn't all that much to be a fanboy of. Just the syntax is overall saner.

MilStdJunkie2y ago

Ah, DocBook //imageobjectco with something like calspair as well. I've been wanting it badly, but there's zero movement in the Asciidoctor group to try and tackle that beast.

Asciidoc does get you conditionals and transclusion in the core spec, without needing to resort to extensions. This is what brought me over. That and the XML interoperability.

The Eclipse WG isn't published yet, but, in my opinion, it's a more stable surface to build on than the "many worlds" of Markdown.

jillesvangurp2y ago

You are completely right. The underlying theme here is that the requirements matter.

ghusto2y ago

> largest common denominator

Or the lowest.

This is how populist politics works. The thing that appeals to the most people isn't necessarily the thing we should be doing.

Elitist old man shouting at clouds? Maybe. Doesn't mean I'm wrong though.

crabbone2y ago

hyperpape2y ago

> Whereas a perfectly-spec'd markup language would probably evolve toward an unreadable-to-humans mess in the committee-driven pursuit of precision.

Maybe, and I mean that sincerely...but are you just saying this must happen or can you actually point to where MacFarlane's proposals would make a significantly less pleasant language?

fiddlerwoaroof2y ago

Requiring a blank line before a sublist just looks wrong.

jitl2y ago

I couldn’t figure out what was meant by a sublist. Like any hierarchy? Or just list-in-paragraph-in-list, not list-in-list? That one could use some HTML disambiguation in the article.

EDIT: yeah it’s always required. That kills me.

Izkata2y ago

> Whereas a perfectly-spec'd markup language would probably evolve toward an unreadable-to-humans mess in the committee-driven pursuit of precision.

This proposal shows us a clear step in that direction, going from something simple and easy for humans to understand, with complex implementation, to emphasize part of a word:

  fan*tas*tic

To proposing a simple implementation that's... weird for humans:

  fan~_tas_~tic

avgcorrection2y ago

[1] Maybe I’m being a hypocrite here? I definitely am in favor of a lot of “cutesy” ways to communicate (things that are more stylistic than necessary). But not intra-word emphasis, really.

atoav2y ago

I had a look at djot, which adresses all of the author's grievances and I must say... I don't like it.

Sure, it probably is easier to parse, and maybe there are a few edge cases that it does better, but the goal of markdown is to have text that is:

A) human readable and looks good without parsing it

B) can be parsed and presented using different themes

In djot they sacrifice a lot (e.g. we now have to insert empty lines in a nested list?!) of point A for questionable gains at point B. Guess what I as a user care more about?

Markdown accepting a wide range of inputs is not a mistake, it is a feature. If that makes parsing more complex that is an acceptable side effect not a mistake.

avgcorrection2y ago

I would have tried harder to find some other way to make the grammar simple.

I haven’t seen anything else (in addition) that makes it less “human readable” though.

ninepoints2y ago

giraffe_lady2y ago

Well-specified formats that are primarily produced by humans writing them by hand? The main entries in this category are programming languages.

ninepoints2y ago

eterevsky2y ago

ilyt2y ago

It's not really problem with being "perfectly specced" or not, it's just matter of inertia.

But inertia makes things like that near-impossible to change now. Only additions can sorta work and even those are hard as critical mass of dialects needs to apply them for it to work.

still_grokking2y ago

What you're saying is that the most stupid and broken "solutions" win…

Now one could speculate about the reasons.

tannhaeuser2y ago

> messy HTML5 over "XHTML"

still_grokking2y ago

SGML is a hot mess. It should have died decades ago.

throwawaaarrgh2y ago

> All of these rules lead to unexpected results sometimes, and they make writing a parser for CommonMark a complex affair.

> What if we tried to create a light markup syntax [..] revising some of the features that have led to bloat and complexity in the CommonMark spec?

Are you writing this new format to make life easier for the humans using it, or the humans programming it?

It's sad when programmers don't see the forest for the trees.

boerseth2y ago

_gabe_2y ago

> If the rules are too complicated, then they are a challenge for all parties, both users and implementers

  Foo<Bar> foo;
  // VS
  Foo < Bar;

Likewise, in C++ you can end up with:

  unordered_set<tuple<int, float>> mySet;
  // >> is ambiguous here without a symbol table or context around the statement
  Foo >> 5;

I think both of these are fairly obvious as a user of the language, but boy am I glad I don't have to parse that!

ilyt2y ago

> If the rules are too complicated, then they are a challenge for all parties, both users and implementers.

You are still confouding rules for writing with rules for parsing. It's absolutely possible and easy to make rules making writing easier but parsing harder.

ncallaway2y ago

The problem is when it's too hard for the computers, then it negatively impacts the user experience.

I agree that we shouldn't _ever_ lose focus on the end user experience. But sometimes, you have to make the spec less ambiguous to improve the end-user experience.

jitl2y ago

0cf8612b2e1e2y ago

Weird that he did not reference the djot package[0] which seems like his attempt at implementing this thesis.

[0] https://djot.net/

noizejoy2y ago

Maybe because OP is from 2018?

0cf8612b2e1e2y ago

Ha! That would do it. Well, guess he stewed on the idea long enough he had to take a swing at the problem.

hinkley2y ago

There is something to be said for, not editing your old posts, but applying a preface that references later iterations on the idea. I wish I were better about that myself.

"In this article from 2017, I talk about dinglehoppers, which have since been improved by research from these three papers [1][2][3]. Here is where I revisit this topic in 2021."

eslaught2y ago

Needs a 2018 in the title.

He actually implemented these ideas: https://djot.net/

evolve2k2y ago

What a terribly named project.

Surely riffing from; Mark, Common or Down would have been more effective.

eslaught2y ago

cpach2y ago

I don’t agree. I like the sound of it. And after all it’s just a name.

anschwa2y ago

I think we could pick one way to handle emphasis, lists, and code blocks that covers a specific and predictable 80%.

Anything that becomes hard to describe without including additional notation to the grammar is probably best suited to be left as HTML, as was the intention behind markdown to begin with.

avgcorrection2y ago

https://github.com/jgm/djot

silvestrov2y ago

One feature which I'm missing which is very useful for CMS systems etc is a standard syntax for implementation specific callbacks/macros.

E.g. a macro that returns todays date, todays great offer, etc. Or a "number of days until xxx" for countdowns until some event.

His attribute syntax is very close. A posssible macro syntax use {@ as leading marker, e.g.

    {@macroname position=left}

    There is {@daysuntil date=20230710} days to launch.

lewisjoe2y ago

We are implementing markdown support in Zoho Writer (https://zoho.com/writer) and I can confirm how difficult it is to handle bold and italics.

It definitely is a weird choice to use *s for both bold and italics. Parsers could be implemented much easier, if both had different delimiters as mentioned in the post.

dr_kiszonka2y ago

1. Markdown is great.

2. The only thing I miss is support for nested numbered lists.

2. 1. (The best kind of lists.)

prepend2y ago

I write a lot of markdown. Ive taught lots of people to use it. I’ve never encountered these problems.

Markdown is meant to be simple. To represent complex things, use something else.

I don’t think I’ll ever use this and if someone tries to make me learn this instead of regular markdown, I’ll probably just not bother.

I don’t want to diminish anyone’s creativity, but this seems like a lot of work put into something unnecessarily.

eduction2y ago

The article reads very much like a list of problems important to an implementation author rather than a user. Except maybe the nested list thing, which does sound somewhat annoying. But also rare.

hinkley2y ago

JimmyRuska2y ago

https://web.archive.org/web/20121017064607/http://94.249.190...

https://news.ycombinator.com/item?id=4437875

These days, it would be good to mix/match ideas from: pugjs, htmlx, jupyter, dhall

eviks2y ago

Is there any great modern rich document alternative? That wound truly go beyond markdown

MilStdJunkie2y ago

Asciidoc. Particularly if you 1) need XML interoperability, 2) complex print outputs, 3) complex tables, 4) transclusion (partial and otherwise) in core spec, 5) conditionals in core spec.

IshKebab2y ago

Asciidoc is the best option if you need a bit more than Markdown.

mnot2y ago

Making breaking changes to markdown is about as practical as doing it to HTML -- already existing content and mindshare give the current form massive inertia.

And, while it sucks that it's complicated to implement, that burden is on relatively few people. See also: the HTML Priority of Constituencies.

holler2y ago

> Consider, for example *this* text*

It's far from perfect and probably will still break on certain ambiguous inputs. I like his ideas for clarifying the language for the most general audience.

1 more reply

hinkley2y ago

This is like not using bind variables on your sql library. I just don't understand it. I'm looking at you, Crockford.

Decabytes2y ago

For example I can write ◊bold{strong* word} and it becomes (bold “strong* word”). It’s very clear how this should be rendered.

wackget2y ago

One feature missing from any Markdown language is vertical table headers (i.e. headers on the left):

https://stackoverflow.com/q/60995936/1652951

cratermoon2y ago

Just make it so that literal * has to be escaped, and use a greedy parse.

*foo* always means * followed by , and the closing * is missing and would be flagged.

<string>foo... uh oh missing a closing *, can't parse

Oh boy, HN mangled this. I'm leaving it as an examplar

dredmorbius2y ago

You can show string-literal text without HN's markup interpolation by indenting the start of the line by two characters:

  You can show string-literal text *without* HN's markup interpolation by indenting the start of the line by two characters:

H8crilA2y ago

ggm2y ago

I wish more markdowns accepted some notation for keyword bullet lists. Indented lists which mark with a bolded or emphasised term.

If you can do this, you can write manual pages for options or flags

dredmorbius2y ago

There is an extended syntax for that:

  First Term
  : This is the definition of the first term.

  Second Term
  : This is one definition of the second term.
  : This is another definition of the second term.

<https://www.markdownguide.org/extended-syntax>

ggm2y ago

How this renders is suboptimal to me. It's like a variant of heading.

I meant

  -a        the minus aflag text
  -b        the minus bflag text
  Something the something text

It's basically table or grid layout without lines.

oneeyedpigeon2y ago

That's a CSS issue rather than anything to do with markdown or the markup it generates.

1 more reply

toastal2y ago

> keyword bullet lists

You mean a definition list like the HTML native one?

oneeyedpigeon2y ago

I'm sure they mean DLs, yes - this is the one thing I always bemoan that markdown lacks, even more so than tables.

toastal2y ago

### But my headings

* can be a definition list

* if I don’t actually care about semantics

yaccz2y ago

reStructuredText is way better format while only slightly more complex than markdown.

ChrisArchitect2y ago

(2018)

j / k navigate · click thread line to collapse