How to Implement a Programming Language in JavaScript (opens in new tab)

(lisperator.net)

99 pointscristiantincu11y ago34 comments

34 comments

I've written multiple parsers/interpreters in both JS and Python -- Python being my favorite language. From that experience, I've come around to the fact that they're both the wrong language for writing languages -- lexers, parsers, interpreter loops, compilers.

I have a good analogy to explain this. Take this OCaml program.

    let sum_file filename =
      let file = In_channel.create filename in
      let numbers = List.map ~f:Int.of_string (In_channel.input_lines file) in
      let sum = List.fold ~init:0 ~f:(+) numbers in
      In_channel.close file;
      sum
    ;;

This is roughly equivalent to:

    def sum_file(filename):
      with open(filename) as f:
        return sum([int(x) for x in f])

Now think of the degree to which OCaml is awkward here. That's exactly how awkward Python and JS are for writing languages :)

OCaml is based around ML-style typed records, which are exactly what you need for manipulating languages.

I am not a type safety guy, but when you write languages, there are tons of nested conditionals. In Python in JS or C, you end up with bugs in those corners. It is quite easy to crash any non-trivial parser, like the CPython parser, or Clang's parsers, etc. The type safety that ML offers helps a lot with this.

The code for writing parsers and interpreters is just SHORTER in OCaml than in Python. I used to think Python was the most concise language. But no, it depends on the problem domain. Try it and you'll see.

Someone saying the same thing here:

http://flint.cs.yale.edu/cs421/case-for-ml.html

mightybyte11y ago

I 100% agree that JS and Python are the wrong language for writing languages. Haskell, however gets you the best of both worlds. Your sum_file function in Haskell would look like this:

    sumFile :: FilePath -> IO Int
    sumFile file = sum . map read . lines <$> readFile file

I don't think many people would dispute that Haskell is at least as good as OCaml for writing languages. And strangely enough, the most advanced Perl 6 implementation is even written in Haskell!

kbenson11y ago

And strangely enough, the most advanced Perl 6 implementation is even written in Haskell!

That hasn't been true for a few years, but it was true for quite a while. At this point Pugs is fairly out of date and (I believe) abandoned, as it's main developer discontinued development. For a long while it was the most advanced Perl 6 compiler though, and from what I understand a lot of it's rapid advancement was attributed to it being written in Haskell (Perl 6 and Haskell share a good portion of advanced features, so that may have helped).

The current state of support of the various Perl 6 compilers can be seen here[1].

1: http://perl6.org/compilers/features

1 more reply

chubot11y ago

Right, I was referring to all ML-based languages -- so SML, OCaml, Haskell, F#, and possibly even Rust.

Do you know how this would look in Haskell?

    # Return K most common lines in a file

    def top_k(f, k):
      counts = collections.defaultdict(int)
      for line in f:
        counts[line] += 1

      return sorted(counts.items(), key=lambda x: x[1], reverse=True))[:k]

I was actually looking for the OCaml example which does this. I think it was in "Real World OCaml", and I remember it being horribly ugly compared to Python. I couldn't find it though.

4 more replies

l_dopa11y ago

Your comparison isn't really fair. With similar functions from Core's In_channel you can write something like:

  let sum_file filename =
    with_file filename 
      (fold_lines ~init:0 ~f:(fun a l -> a + int_of_string l))

"Using the right tool for the job" when it comes to functional vs. "mainstream" languages is a popular meme, but it doesn't hold up to scrutiny. You can always write your own higher-level code for a given domain with a reasonably expressive language. Adding a useful type system, ADTs, etc. to a language like Python is much, much harder (though that's exactly what Microsoft and Google are trying with Javascript).

chubot11y ago

FWIW I took it straight out of Real World OCaml, assuming that that's idiomatic OCaml.

See my other comment -- how does top K lines in OCaml look? I recall that was in the book too, but couldn't find it. I remember it being fantastically ugly.

2 more replies

kirse11y ago

I had just stumbled on this C# 1.1 parser written in F# yesterday. I'm not sure if a syntax-tree and parser in 350 lines is decent, but I can at least see what you're talking about in regards to ML/OCaml being well-suited for the task.

http://www.fssnip.net/lf

sitkack11y ago

While the result is probably correct, your argument is, odd. :)

Hammers in nails with an orange, and this folks, is how Black and Decker makes better drills than Makita.

The single reason that OCaml excels at compilers is due to pattern matching. It _removes_ all those explicit conditionals by making them implicit.

http://andrej.com/plzoo/

Python has excellent affordances for writing compilers, they just have to be used.

https://github.com/mrocklin/multipledispatch/

https://github.com/Suor/patterns

chubot11y ago

It's not just pattern matching -- it's also ADTs and the functional paradigm. (e.g. in Python, functools.partial is uglier than OCaml's application, just as working with dictionaries is uglier in OCaml than Python)

Do those Python libraries offer type safety? I don't think there is any way they can. If not, you might as well use OCaml.

As I said, Python is my favorite language, but there is no reason to write OCaml in Python or even shell scripts in Python. Why not use the real thing? The Python code still won't be shorter than OCaml, even if you use these fancy libraries.

munificent11y ago

I agree with you in principle. After all, "ML" got its name "metalanguage" from the fact that it was explicitly designed to be a domain-specific language for implementing programming languages. It's hard to be a DSL at its job.

At the same time, that has to be balanced with the ecosystem value of implementing your language in a language lots of others knows.

It's a worthwhile investment to implement your language with 4x the code in Python/Java/JS/etc. if you get more than 4x contributions because of it.

wtetzner11y ago

> It's a worthwhile investment to implement your language with 4x the code in Python/Java/JS/etc. if you get more than 4x contributions because of it.

Maybe, but I'd say it depends on the quality of the contributions.

chubot11y ago

That's definitely a consideration, depending on your goals. For many language projects, I don't think you need more than one person for the front end. For libraries and code gen, you definitely need contributions.

IMO it's actually advantageous to have a compact description of a lexer, parser, and AST that one person edits or very few people edit. That's your language design.

At one point I thought: "If ML is so great, then why don't you see it more often in the real world?" And then the revelation was that Python essentially uses ML to describes its AST:

http://svn.python.org/projects/python/tags/r32b1/Parser/Pyth...

This is Zephyr ASDL, an ML-inspired DSL for describing ASTs. A lot of ML people may recognize Andrew Appel from the authors list:

https://scholar.google.com/scholar?cluster=11682730813888505...

So my favorite language's AST is described with ML!!! And has been for 10+ years. (Python has a Grammar file as well in a different syntax).

I'm actually interested in a hybrid architecture: an interpreter where OCaml generates the byte code, and then C++ executes it. I think you can produce extremely compact and flexible interpreters with this architecture, and there are several other advantages to dividing it this way.

The downside is perhaps a more complex build process, but the OCaml toolchain is quite nice actually, and I have compiled it from scratch. It's head and shoulders above Haskell in that regard. OCaml can produce .o files and link with C/C++, so you still get one executable.

I looked at your Wren language which I like a lot. I did wish the front end was more high level and not in C, but that's just me :)

There are a lot of people coming around to OCaml. Facebook is using it for Flow and pfff language manipulation:

https://github.com/facebook/pfff (I suspect Google's code analysis tools would be cut down in size by 5x if written like this in OCaml.)

And Hack, the statically typed PHP, is written in OCaml. And there are several more minor languages like HaXe and another one that are OCaml.

So if you're looking for contributions from language experts, there's definitely a lot that are familiar with OCaml.

arcatek11y ago

Another really good course to learn the basic about programming language design:

http://nathansuniversity.com/

And if you want to actually build a "real" programming language, I advise you to try llvm - it really takes away the pain of generating bytecode, and gives you everything you need to deal with the actual design of your language.

fredkelly11y ago

Interesting read.

Possibly more digestable, I recommend the walkthrough of building "Egg" in the fantastic (and free!) Eloquent JavaScript by Marijn Haverbeke: http://eloquentjavascript.net/11_language.html

tinco11y ago

Given the domain name I was hoping they'd first implement a Lisp in 12 lines of Javascript, and then implement the new language in that lisp, that would've been an interesting tutorial.

I get that it's a beginners tutorial and they have some constraints, but it starts off with "let's dream up a language" and then presents a super standard language, it's basically just Javascript with obligatory semicolons..

mishoo11y ago

... and then it gets continuations, and that's where it becomes interesting.

Any case, the intent was purely didactic, but I did implement a Scheme dialect based on half of that code. To be released some day...

thomasfoster9611y ago

I'm glad they didn't write a tutorial that made another Lisp - I'd your audience is javacript developers, a lisp isn't all that appealing usually.

Anyways, this is such a detailed series of tutorials I may be distracted for days bringing my half finished programming language back from the dead.

moron4hire11y ago

I do a lot of JavaScript and Lisp is extremely appealing to me. JavaScript is a functional language, but not a very good one. Lisps are much better at doing functional programming than JavaScript. So a Lisp that translates to JavaScript is right up my alley.

samatman11y ago

You might enjoy wisp: https://github.com/Gozala/wisp

noiv11y ago

I remember reading this the first time as a JS rookie. Every time the author mentions "this is same in JS" a light went on in my head and all the JS quirks found earlier became just natural. It still deserves its place in the Christmas tree bookmark section.

j / k navigate · click thread line to collapse

34 comments

chubot11y ago

I have a good analogy to explain this. Take this OCaml program.

    let sum_file filename =
      let file = In_channel.create filename in
      let numbers = List.map ~f:Int.of_string (In_channel.input_lines file) in
      let sum = List.fold ~init:0 ~f:(+) numbers in
      In_channel.close file;
      sum
    ;;

This is roughly equivalent to:

    def sum_file(filename):
      with open(filename) as f:
        return sum([int(x) for x in f])

Now think of the degree to which OCaml is awkward here. That's exactly how awkward Python and JS are for writing languages :)

OCaml is based around ML-style typed records, which are exactly what you need for manipulating languages.

Someone saying the same thing here:

http://flint.cs.yale.edu/cs421/case-for-ml.html

mightybyte11y ago

I 100% agree that JS and Python are the wrong language for writing languages. Haskell, however gets you the best of both worlds. Your sum_file function in Haskell would look like this:

    sumFile :: FilePath -> IO Int
    sumFile file = sum . map read . lines <$> readFile file

I don't think many people would dispute that Haskell is at least as good as OCaml for writing languages. And strangely enough, the most advanced Perl 6 implementation is even written in Haskell!

kbenson11y ago

And strangely enough, the most advanced Perl 6 implementation is even written in Haskell!

The current state of support of the various Perl 6 compilers can be seen here[1].

1: http://perl6.org/compilers/features

1 more reply

chubot11y ago

Right, I was referring to all ML-based languages -- so SML, OCaml, Haskell, F#, and possibly even Rust.

Do you know how this would look in Haskell?

    # Return K most common lines in a file

    def top_k(f, k):
      counts = collections.defaultdict(int)
      for line in f:
        counts[line] += 1

      return sorted(counts.items(), key=lambda x: x[1], reverse=True))[:k]

I was actually looking for the OCaml example which does this. I think it was in "Real World OCaml", and I remember it being horribly ugly compared to Python. I couldn't find it though.

4 more replies

l_dopa11y ago

Your comparison isn't really fair. With similar functions from Core's In_channel you can write something like:

  let sum_file filename =
    with_file filename 
      (fold_lines ~init:0 ~f:(fun a l -> a + int_of_string l))

chubot11y ago

FWIW I took it straight out of Real World OCaml, assuming that that's idiomatic OCaml.

See my other comment -- how does top K lines in OCaml look? I recall that was in the book too, but couldn't find it. I remember it being fantastically ugly.

2 more replies

kirse11y ago

http://www.fssnip.net/lf

sitkack11y ago

While the result is probably correct, your argument is, odd. :)

Hammers in nails with an orange, and this folks, is how Black and Decker makes better drills than Makita.

The single reason that OCaml excels at compilers is due to pattern matching. It _removes_ all those explicit conditionals by making them implicit.

http://andrej.com/plzoo/

Python has excellent affordances for writing compilers, they just have to be used.

https://github.com/mrocklin/multipledispatch/

https://github.com/Suor/patterns

chubot11y ago

Do those Python libraries offer type safety? I don't think there is any way they can. If not, you might as well use OCaml.

munificent11y ago

At the same time, that has to be balanced with the ecosystem value of implementing your language in a language lots of others knows.

It's a worthwhile investment to implement your language with 4x the code in Python/Java/JS/etc. if you get more than 4x contributions because of it.

wtetzner11y ago

> It's a worthwhile investment to implement your language with 4x the code in Python/Java/JS/etc. if you get more than 4x contributions because of it.

Maybe, but I'd say it depends on the quality of the contributions.

chubot11y ago

IMO it's actually advantageous to have a compact description of a lexer, parser, and AST that one person edits or very few people edit. That's your language design.

At one point I thought: "If ML is so great, then why don't you see it more often in the real world?" And then the revelation was that Python essentially uses ML to describes its AST:

http://svn.python.org/projects/python/tags/r32b1/Parser/Pyth...

This is Zephyr ASDL, an ML-inspired DSL for describing ASTs. A lot of ML people may recognize Andrew Appel from the authors list:

https://scholar.google.com/scholar?cluster=11682730813888505...

So my favorite language's AST is described with ML!!! And has been for 10+ years. (Python has a Grammar file as well in a different syntax).

I looked at your Wren language which I like a lot. I did wish the front end was more high level and not in C, but that's just me :)

There are a lot of people coming around to OCaml. Facebook is using it for Flow and pfff language manipulation:

https://github.com/facebook/pfff (I suspect Google's code analysis tools would be cut down in size by 5x if written like this in OCaml.)

And Hack, the statically typed PHP, is written in OCaml. And there are several more minor languages like HaXe and another one that are OCaml.

So if you're looking for contributions from language experts, there's definitely a lot that are familiar with OCaml.

arcatek11y ago

Another really good course to learn the basic about programming language design:

http://nathansuniversity.com/

fredkelly11y ago

Interesting read.

Possibly more digestable, I recommend the walkthrough of building "Egg" in the fantastic (and free!) Eloquent JavaScript by Marijn Haverbeke: http://eloquentjavascript.net/11_language.html

tinco11y ago

Given the domain name I was hoping they'd first implement a Lisp in 12 lines of Javascript, and then implement the new language in that lisp, that would've been an interesting tutorial.

mishoo11y ago

... and then it gets continuations, and that's where it becomes interesting.

Any case, the intent was purely didactic, but I did implement a Scheme dialect based on half of that code. To be released some day...

thomasfoster9611y ago

I'm glad they didn't write a tutorial that made another Lisp - I'd your audience is javacript developers, a lisp isn't all that appealing usually.

Anyways, this is such a detailed series of tutorials I may be distracted for days bringing my half finished programming language back from the dead.

moron4hire11y ago

samatman11y ago

You might enjoy wisp: https://github.com/Gozala/wisp

noiv11y ago

j / k navigate · click thread line to collapse