It contains lots of code in code blocks and has a table of contents at the start with internal links to later pages.
I've tried lots of different Markdown-PDF converters like md2pdf and Pandoc, even trying converting it through LaTeX first, however none of them produce working internal PDF links, have effective syntax highlighting for HTML, CSS, JavaScript and Python, and wrap code to fit it on the page.
I have a very long regular expression (email validation of course) that doesn't fit on one line but no solutions I have found properly break the lines on page overflow.
What tools does everyone recommend?
What worked well for me: Pandoc with a custom LaTeX template, and a decent amount of inline LaTeX to handle edge cases. We had a LaTeX theme to use from our publisher, but we also needed our own totally separate theme for the free version version of the book.
For one-off things like really long code lines, I found it best to manually figure out how to handle them. Sometimes there was a bit of TeX magic but, more often, I just rewrote or reorganized the code. I see the presentation and structure of code and math snippets as an integral part of how I'm communicating the underlying ideas, so manually changing things around to read better was fundamentally no different from going back and editing prose.
Unfortunately, this also means that the process was relatively hands-on. If you need something ≈completed automated, I expect Pandoc → LaTeX is going to fall a bit short. Edge cases need manual intervention, and it's easy for formatting errors to sneak in—the free version of our book has some formatting mistakes like code bleeding into the margin because I ran out of energy to fix all of them!
https://github.com/syntax-tree/mdast-util-from-markdown
It might work better if you parse it into an intermediary Mdast format first, do whatever processing you need to implement "pages" (not a part of any Markdown dialect I'm familiar with?" but it shouldn't be hard to write a custom parser for that in Mdast), output that to HTML (via https://github.com/syntax-tree/mdast-util-to-hast) and THEN convert the HTML to PDF.
The AST tools basically give you structured JSON that's much easier to work with programmatically than raw Markdown. Then you can render that semantic JSON into HTML or other outputs.
I did a customized-MD pipeline which normalized to pandoc (extra features got encoded to pass through pandoc), obtained the pandoc JSON ast, and emitted html/latex/etc using Julia pattern matching. The code was small, and the yak shave and husbandry was worth escaping the struggle with sea of crufty candidate tools, each with assorted one-chosen-point in a high-dimensional design space, and missing features, misfeatures, gotchas, and mazes ("maybe if I combine this unmaintained plugin with that one and add a postprocess massage step over there and then maybe..." - blech).
My guess is that either toolchain could do the job... maybe just depends on personal preference whether someone prefers to pipe together command-line tools in a bash script, vs making use of the npm ecosystem (mdast is all in JS).
Maybe the popularity of JS & npm means there are available mdast plugins & third party packages that can help with whatever niche transformation you might need, and custom node rendering is just a lambda away. It's all in JS for a seamless experience, and there is no separate DSL to learn (just some basic helper functions).
That might be harder to do in Pandoc... (might need a custom Lua filter or another language like your Julia pattern matching?)
As for effectiveness... it probably just depends on the particular implementer :) I'd trust a grizzled old *NIX sysadmin type over your typical bootcamp JS programmer any day, but also... the JS ecosystem is pretty mature and powerful now, and Mdast is pretty amazing. At work we use it to build one of the most important parts of our app, and its power and flexibility never cease to amaze me.
In a recent Talk Python to Me podcast [0], the Quarto [1] developers talked about how they are using Pandoc’s Lua interpreter [2] to perform transformations that aren’t part of vanilla pandoc in.md -o out.pdf.
0. https://talkpython.fm/episodes/show/493/quarto-open-source-t...
I've worked on PAIP, Paradigms of Artificial Intelligence Programming, and I might be able to help you a bit. It's around 1k pages long. I used Pandoc to generate an epub file, and then Calibre to turn that into a PDF file. I just tried using Pandoc to generate the PDF file directly, and it/LaTeX choked on some Unicode characters.
For internal ebook links, there's a Lua script. You'll have to keep anchors unique across the book for this:
* good: "chapter1#section1_1" and "chapter2#section2_1"
* bad: a "chapter1#section1" and a "chapter2#section1"
WIP: https://github.com/norvig/paip-lisp/pull/195
For line wrapping of code, there's CSS. I first used it over on "Writing an Operating System in 1,000 Lines"; here's the PR: https://github.com/nuta/operating-system-in-1000-lines/pull/...
Google\ Chrome --no-sandbox --headless --print-to-pdf-no-header --no-pdf-header-footer --enable-logging=stderr --log-level=2 --in-process-gpu --disable-gpu --print-to-pdf=resume.pdf "file://path/to/resume.html"
On a tangentially related note, I guarantee you that your regex is wrong. There is only one way to validate an email address:
Send an email to it and have them respond. Otherwise you will block some valid users.
Now of course you can make a regex that gets most email addresses, and if you're ok with that, then that's fine. But if you don't want to accidentally exclude someone, then sending email is the only way to validate it.
Why is everyone trying to check for things they don't have to? If you need a valid email address, of course you have to send an email for confirmation. anna@example.com is perfectly syntactically valid, but isn't useful to anyone for sending emails. If you optionally want your users to enter an email address, don't overcomplicate things.
I forgot where I read it (maybe something about testing or DDD), but an idea I like much is to not validate stuff coming from an external system other than for your internal constraints. You don't control an email account and how it was created and the specification is messy, so if you want to check for its existence, you query the other system. Same for other identifiers.
Use this and add sed lines for any required non-breakyness per normal CSS, rules can be specific to @media print as required.
$ cat ~/bin/mdview
#!/bin/bash
# markdown viewer
tmpfile=.mdview.tmp-`uuidgen`.html
# start html
echo "<html><head><style>img{margin:20px;max-width:100%}@media print{img{max-height:90%;max-width:90%;page-break-after:always}}body{margin:6em;font-family:sans}pre,code{font-weight:bold;font-size:110%;font-family:Ubuntu Mono}</style></head><body>" >${tmpfile}
# duplicate markdown for modification
cp ${1} ${1}.mdtmp
# add extra newline after trailing :
sed -i -e 's/: \*$/:\r\r\n\n/' ${1}.mdtmp
# generate HTML from markdown
# note the --html-no-skiphtml --html-no-escapehtml allows the preservation
# of <a name="blah"></a> anchors within text to allow [link][#anchorname]
lowdown --html-no-skiphtml --html-no-escapehtml -thtml ${1}.mdtmp >>${tmpfile}
# remove the temporary markdown file
rm ${1}.mdtmp
# add newline before images
sed -i -e 's/<img/<br><img/' ${tmpfile}
# view result
firefox $tmpfile &
# sleep for a short moment
sleep 1.25
# remove the temporary file
rm ${tmpfile}[1]: https://github.com/w4rh4wk/dogx
[2]: https://github.com/W4RH4WK/M.Sc.-Thesis/blob/master/output/t...
Convert to HTML, then use Prince (https://www.princexml.com/) to style and convert to PDF.
I switched from using MD-->(Pandoc-->(latex))--> PDF to using MD-->(Pandoc-->(typst))--> PDF.
Have you considered manually splitting the regular expression into multiple lines in the source document, using something like the `VERBOSE` mode from Python re module [1]?
[1]: https://docs.python.org/3/howto/regex.html#using-re-verbose
1. Download the app [01] 2. Create a new empty document 3. Insert a markdown section type 4. past your markdown code into the markdown section 5. click on "Preview & Export" 6. Configure your PDF
I'm the creator of MonsterWriter. For complex markdown it probably has some shortcomings but I would love to hear what is missing for your use case.
Unfortunately Asciidoctor is written in Ruby which makes it an arse to work with if you need to write any plugins. And the HTML output uses Google Fonts by default, so I don't think much of the authors. But it's probably the best authoring system I've found for programming style content. For scientific content I would use LyX or maybe Typst.
I believe it’s Mac only. I use it sometimes when I’m creating PDFs from my personal documentation to share more publicly, which I keep in Markdown and deploy on Gitlab Pages as a static site.
Sphinx and jupyter-book support MyST Markdown.
PDF Tables of Contents with links to headings or page numbers are possible with MyST and RestructuredText.
Keep in mind that you'll need to install custom fonts if you're using languages other than English.
Its really just a proof of concept at this point, but it might be of interest to you (and others).
Code: https://github.com/dominicdoty/sveltedoc
Rendered: https://sveltedoc.pages.dev/
Writeup: https://www.dominicdoty.com/2025/03/02/sveltedoc/
TLDR - I've been using Asciidoc a lot at work recently and was dissatisfied with it. This was an attempt at using Svelte to generate a document as a webpage that formats well when printed (or printed to PDF). All the power of HTML+CSS+JS when you want it, but the ease of use to just write markdown when you don't.