I have a good analogy to explain this. Take this OCaml program.
let sum_file filename =
let file = In_channel.create filename in
let numbers = List.map ~f:Int.of_string (In_channel.input_lines file) in
let sum = List.fold ~init:0 ~f:(+) numbers in
In_channel.close file;
sum
;;
This is roughly equivalent to: def sum_file(filename):
with open(filename) as f:
return sum([int(x) for x in f])
Now think of the degree to which OCaml is awkward here. That's exactly how awkward Python and JS are for writing languages :)OCaml is based around ML-style typed records, which are exactly what you need for manipulating languages.
I am not a type safety guy, but when you write languages, there are tons of nested conditionals. In Python in JS or C, you end up with bugs in those corners. It is quite easy to crash any non-trivial parser, like the CPython parser, or Clang's parsers, etc. The type safety that ML offers helps a lot with this.
The code for writing parsers and interpreters is just SHORTER in OCaml than in Python. I used to think Python was the most concise language. But no, it depends on the problem domain. Try it and you'll see.
Someone saying the same thing here:
sumFile :: FilePath -> IO Int
sumFile file = sum . map read . lines <$> readFile file
I don't think many people would dispute that Haskell is at least as good as OCaml for writing languages. And strangely enough, the most advanced Perl 6 implementation is even written in Haskell!That hasn't been true for a few years, but it was true for quite a while. At this point Pugs is fairly out of date and (I believe) abandoned, as it's main developer discontinued development. For a long while it was the most advanced Perl 6 compiler though, and from what I understand a lot of it's rapid advancement was attributed to it being written in Haskell (Perl 6 and Haskell share a good portion of advanced features, so that may have helped).
The current state of support of the various Perl 6 compilers can be seen here[1].
Do you know how this would look in Haskell?
# Return K most common lines in a file
def top_k(f, k):
counts = collections.defaultdict(int)
for line in f:
counts[line] += 1
return sorted(counts.items(), key=lambda x: x[1], reverse=True))[:k]
I was actually looking for the OCaml example which does this. I think it was in "Real World OCaml", and I remember it being horribly ugly compared to Python. I couldn't find it though. let sum_file filename =
with_file filename
(fold_lines ~init:0 ~f:(fun a l -> a + int_of_string l))
"Using the right tool for the job" when it comes to functional vs. "mainstream" languages is a popular meme, but it doesn't hold up to scrutiny. You can always write your own higher-level code for a given domain with a reasonably expressive language. Adding a useful type system, ADTs, etc. to a language like Python is much, much harder (though that's exactly what Microsoft and Google are trying with Javascript).See my other comment -- how does top K lines in OCaml look? I recall that was in the book too, but couldn't find it. I remember it being fantastically ugly.
Hammers in nails with an orange, and this folks, is how Black and Decker makes better drills than Makita.
The single reason that OCaml excels at compilers is due to pattern matching. It _removes_ all those explicit conditionals by making them implicit.
Python has excellent affordances for writing compilers, they just have to be used.
Do those Python libraries offer type safety? I don't think there is any way they can. If not, you might as well use OCaml.
As I said, Python is my favorite language, but there is no reason to write OCaml in Python or even shell scripts in Python. Why not use the real thing? The Python code still won't be shorter than OCaml, even if you use these fancy libraries.
At the same time, that has to be balanced with the ecosystem value of implementing your language in a language lots of others knows.
It's a worthwhile investment to implement your language with 4x the code in Python/Java/JS/etc. if you get more than 4x contributions because of it.
Maybe, but I'd say it depends on the quality of the contributions.
IMO it's actually advantageous to have a compact description of a lexer, parser, and AST that one person edits or very few people edit. That's your language design.
At one point I thought: "If ML is so great, then why don't you see it more often in the real world?" And then the revelation was that Python essentially uses ML to describes its AST:
http://svn.python.org/projects/python/tags/r32b1/Parser/Pyth...
This is Zephyr ASDL, an ML-inspired DSL for describing ASTs. A lot of ML people may recognize Andrew Appel from the authors list:
https://scholar.google.com/scholar?cluster=11682730813888505...
So my favorite language's AST is described with ML!!! And has been for 10+ years. (Python has a Grammar file as well in a different syntax).
I'm actually interested in a hybrid architecture: an interpreter where OCaml generates the byte code, and then C++ executes it. I think you can produce extremely compact and flexible interpreters with this architecture, and there are several other advantages to dividing it this way.
The downside is perhaps a more complex build process, but the OCaml toolchain is quite nice actually, and I have compiled it from scratch. It's head and shoulders above Haskell in that regard. OCaml can produce .o files and link with C/C++, so you still get one executable.
I looked at your Wren language which I like a lot. I did wish the front end was more high level and not in C, but that's just me :)
There are a lot of people coming around to OCaml. Facebook is using it for Flow and pfff language manipulation:
https://github.com/facebook/pfff (I suspect Google's code analysis tools would be cut down in size by 5x if written like this in OCaml.)
And Hack, the statically typed PHP, is written in OCaml. And there are several more minor languages like HaXe and another one that are OCaml.
So if you're looking for contributions from language experts, there's definitely a lot that are familiar with OCaml.
And if you want to actually build a "real" programming language, I advise you to try llvm - it really takes away the pain of generating bytecode, and gives you everything you need to deal with the actual design of your language.
Possibly more digestable, I recommend the walkthrough of building "Egg" in the fantastic (and free!) Eloquent JavaScript by Marijn Haverbeke: http://eloquentjavascript.net/11_language.html
I get that it's a beginners tutorial and they have some constraints, but it starts off with "let's dream up a language" and then presents a super standard language, it's basically just Javascript with obligatory semicolons..
Any case, the intent was purely didactic, but I did implement a Scheme dialect based on half of that code. To be released some day...
Anyways, this is such a detailed series of tutorials I may be distracted for days bringing my half finished programming language back from the dead.