Instead you could do this as
\documentclass{article}
\usepackage{xparse}
\NewDocumentCommand \LambdaCalc {u{.} r()} {%
[arg:(#1) body:(#2)]
}
\DeclareUnicodeCharacter {03BB} {\LambdaCalc}
\begin{document}
λx.(2x)
\end{document}> these are predefined as `\@firstoftwo` and `\@secondoftwo`
I do wish LaTeX kernel commands (which I'm assuming these are) were more widely documented. As it stands, it's pretty hard to keep track of what already exists. Is there a nice reference for those?
> Also the Unicode bytes are already active so setting their catcode is useless.
This is true for LaTeX and not TeX, correct? Originally, I'd `\expandafter\let\expandafter\@firstoct\@firstoftwoλ`, but I decided not to assume that that character was already active.
> Also redefining the first octet breaks LaTeX's UTF-8 handling...
How so? (If the else case wasn't broken)
>...and the else case forms an infinite loop.
If `\Firstλ` was not an active character, would this still be true? Since I store `\Firstλ` in `\lambda@first@oct` before it's declared an active character.
> and it breaks other uses of `(` and `)` in the argument.
This is not a concern for the DSL, but...
> Changing the catcodes of `(` and `)` means that this command doesn't work in the arguments of other commands
...this is. Thanks.
> Instead you could do this as
Damn :)
Thanks for the nice feedback. I suppose I should read up on xparse. In any case I feel like it's not moot to try to achieve the same results with primitives, to have some idea of what's breaking when a given program doesn't compile (usually at that point the primitives surface).
Not really, the traditional commands are rather messy. Of course you can read source2e, but that's not really documentation. For new stuff it often makes sense to write the more programmy stuff in expl3 which is much better documented in interface3. (It contains these commands as `\use_i:nn` and `\use_ii:nn`)
> > Also the Unicode bytes are already active so setting their catcode is useless. > > This is true for LaTeX and not TeX, correct?
Right, this is LaTeX specific.
> Also redefining the first octet breaks LaTeX's UTF-8 handling... > > How so? (If the else case wasn't broken)
LaTeX's definition of the first byte handles arbitrary valid UTF-8 following bytes by using corresponding definitions or printing correct errors, while even a definition which wouldn't trigger the active character again would just print the two bytes which does not print a useful error message and probably prints two random characters from the font, completely ignoring any potential definition using LaTeX's mechanism for other codepoints starting with this byte.
>...and the else case forms an infinite loop. > > If `\Firstλ` was not an active character, would this still be true? Since I store `\Firstλ` in `\lambda@first@oct` before it's declared an active character.
You are correct, if the first byte wouldn't already be an active character (e.g. in plain TeX) then it wouldn't loop. It wouldn't expand to something particularly useful, but that wouldn't be any worse than without the definition so it would be "correct".
> I suppose I should read up on xparse.
Normally `xparse` is preloaded and not a package anymore, therefore also it's documentation has been moved into usrguide3. In this case you still need the package though since the `d` argument type has not been added to the kernel (and therefore also not to usrguide3) since delimited arguments are not recommended for LaTeX commands. It's still documented in the old `xparse` manual though. Just in case you're wondering about the split.
Half of the post is about handling UTF-8, which AFAIK both LuaTeX and XeTeX (you really shoulduse either) do natively.
LuaTeX and XeTeX usually aren't an option where LaTeX comes up, i.e., in academic submissions. This is a common discussion, see [the comments under my previous post].
> LaTeX is great for typesetting math.
Q: Ok, great! So how do I typeset this bit of common math?
A: a 20-line barrage of import statements, makeatletter's and definitions that you copy-paste into your preamble and cross your fingers that it won't conflict with the half-dozen other barrages that you copied there to do other bits of common math, often hidden between other Google results with wildly different answers.
About the posted article: if all one wanted to do was "typeset this bit of common math", one can just type "\lambda x.(2x)" in math mode. Or, if not constrained to keep it old school i.e. pdfTeX, use XeTeX/LuaTeX with \usepackage{unicode-math}, to type "λ x.(2x)" directly.
The posted article is actually about doing some parsing using TeX, namely the author wants to type "λ x.(2x)" into their .tex file and have it be parsed into, say, [arg:(x) body:(2x)] to be used later for whatever they're building. This is not related to typesetting at all, so why do they want to do such a thing in TeX, instead of doing it outside and using TeX just for typesetting? The motivation seems to be, as their footnote 2 indicates, that some people just enjoy being perverse. That's fine!
Even there, if you compare the author's approach with that in the comment here https://news.ycombinator.com/item?id=33296527 (by someone who knows what they're doing; cf. https://www.latex-project.org/about/team/), you'll see how the "right" way is less forbidding-looking, and also less breakage-prone. What's going on is that the author has just learned something new (how Unicode is handled in pdfTeX even though it only works with 8-bit bytes), become excited at the possibilities, and hacked their own solution using the primitives, without bothering to integrate with the broader ecosystem of other packages and conventions — which is also fine; TeX will let you do that and not get in the way.
The real interesting question raised by your comment IMO is not at all about the posted article but about experiences such as those in your comment: I can easily imagine many people doing what you did (not understanding the context, and possibly even copying ad-hoc code like this into one's document and crossing one's fingers): here we start to get into the actual problems with the LaTeX ecosystem and the mismatch between users' mental models and that of the (too many!) pieces of software involved, but I've exceeded the time limit I set to comment here so I'll stop :)
Too much to ask for I guess. Continue waiting.
There are very few bits of software that are more arcane and broken by default than this absolute crapstraction of a platform.
And that is exactly why a more programmable platform would be good! These issues arise in the first place largely because TeX is not easily programmable, so people have to find arcane workarounds to do anything complex in it.