Java seems to fit this role very well. It is statically typed, object-oriented, and doesn't delve into memory. However, it seems to get a lot of hate (or, at least, dismissal) from many programming communities, so I am asking, why not Java? Why is it so horrible as a systems language above C? Is there any other language that fits this role in a better way?
I am in particular asking this because I have been banging my head against the Python syntax for awhile, but I am trying to expand what languages I can program in.
The hate against Java comes from using Java for application development: this is largely due to the kinds of applications that are typically written in Java (line of business software) and (this is the most important reason) accidental complexity and low quality of APIs like Spring or J2EE.
Recipe for programming happyness is to use the right tool for the job:
* Python (or Ruby) for web application development, development tools, and "devops" scripting.
* C (or C++) for pieces that need deterministic performance[1], provide a "native" feeling user interface, or require control over memory layout.
Note: performance and efficiency are relative to what your throughput and latency requirements are. Google's crawlers and indexers will remain in C++ for the foreseeable future, but (for example) crawlers for an intranet can get away with being in Java (or Python for that matter).
* Java (or Scala, Haskell, OCaml, Go, Erlang, or one of the many Lisps) for "userland" systems programming. If the majority of the system fits under the last bullet point, use C++.
* Avoid JNI or Swig if you can. Use JSON + REST for cross-language RPC. If you need performance guarantees of a tight binary protocol use Thrift or Protocol Buffers. If you have to use JNI, consider using JNA first.
* No matter what language you use, stick to high quality libraries and tools. For Java, you'll absolutely want to use guava, Guice, and either Netty (or NIO.2 if you are using Java 7) or Jetty + Jersey + Jackson (for REST APIs).
Pick up either emacs and cscope, netbeans, Eclipse, or IntelliJ for navigating a large Java codebase.
All Java build tools suck. Maven sucks less and is the de-facto standard in the open source community. Twitter's "pants" is also worth looking at.
* Don't touch Spring with a 60-foot pole: in the mildest terms it's unequivocal and absolute garbage. Ditto for any other buzzword you may see in a job listing for an "enterprise" Java development job (with 20 years of experience required, naturally).
[1] Java performance can be quite high, but a JIT-ted and garbage collected runtime implies a lack of determinism.
However, I also think it is a biased caricature based on "common sense" that isn't all that well founded.
Most of these languages are entirely usable in areas far outside the prescribed areas you have given for them.
If you are working on a few domains like embedded or OS stuff, low-level graphics or signal processing - or you need to interact with a specific system that is pretty language-specific like Rails or iOS - then your options narrow a lot. A few tasks are just a little forced unless your level of comfort is very high (doing 1-minute shell script jobs in C++, for example).
But it would be very hard to overstate the degree of overlap, in 2012. It no longer usually makes sense to write things in ASM for speed, for example...
In the rare situations where your favorite higher-level language is somehow not good enough for a given project, it is rare that you cannot make a mongrel project which drops to another level just where necessary. If your language does not support this then it is broken in a generally important way.
If you are not really ready or willing to fill in gaps and just want to glue existing things together, that changes things slightly - then your primary consideration is not the language but the available libraries.
The major differences between languages are mostly matters of custom and ideology rather than niche suitability.
Of course I can't a priori prove that Spring is garbage, much like I can't a priori prove that it's better to be healthy and rich than to be poor and sick. It is a judgement call, but a judgement call that I believe I'm qualified to make, having worked with a large Spring codebase for 2.5 years.
And additionaly, all, what you can do with Spring, you can do with JEE.
The right tool for the job. Java has it's place and it just where strlen said it should be.
But, Java's a bit verbose, has gaps in concise support for higher-level constructs, and sometimes the static typing gets in the way. So if you don't find those parts helpful -- some do -- and think your performance targets can be met with other later optimizations/design-choices/selective-reimplementations, stick with whatever more concise language you're good at.
Or, use any of the more concise languages available on the JVM allowing intermixing of the occasional Java facility, like Jython, JRuby, Groovy, Javascript, Scala, Clojure, and others.
(If efficiently handling massive numbers of concurrent net/IO streams is a priority, the recent JVM-based project vert.x may be of interest. I haven't used it for anything but toy tests, but it seems to combine some of the best-practices for maximum JVM IO throughput with a somewhat higher-level-language-agnostic top layer well-suited for servers/proxies/crawlers.)
Although our use of Java as an implementation language was somewhat controversial when we be- gan the project, we have not regretted the choice. Java’s combination of features — including threads, garbage collection, objects, and exceptions — made our implementation easier and more elegant. More- over, when run under a high-quality Java runtime, Mercator’s performance compares well to other web crawlers for which performance numbers have been published.
source: [Mercator: A scalable, extensible web crawler (1999)](http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.151.5...)
This probably is a consequence of the verbosity of Java-the-language, which made heavy tooling support a necessity. And then Eclipse, which provides one of the tightest language integration with Java of any IDE ever.
The sad thing is that this is not really the fault of Java-the-language or Eclipse. It did spawn a whole caste of very mediocre programmers and libraries though, which can make for a very unpleasant culture.
Used correctly, Java can be a great tool, though.
The idea is that as a programmer, you have to have an intimate understanding of what is going on in order to make the machine do your bidding quickly and correctly.
But that mediocre Eclipse user I caricaturized does not have that understanding. He certainly knows how to get the job done for a certain set of tasks, but he does not know the details of how this is happening. Thus, he creates programs that follow "best practices", "conventions", "design patterns" and lots of automatically created wizard-boilerplate.
That might not be "bad code" mind you, but it almost certainly is not "great code", either. Thus, mediocre. And then these people create libraries that are mediocre and try to use only libraries that they can understand and that are hence mediocre. A culture emerges that is very consistent, but also very mediocre.
If you want to use Java (e.g.: you know it already and don't like learning other things), who cares? Why is this an issue where you have to challenge other people's opinions of Java? Use it if you want to.
Another approach could be Jython (or any other JVM language closer to the desired level of abstraction) and Java.
I don't have much love for Java the language. It's not much easier to program than with C, isn't faster and is very verbose. Still, what you are doing looks like a good match for it. And all the respect I don't have for the language, I have for the JVM.
I wouldn't use if for web app development as there are much more productive options around.
Java:
List<String> firstNames = new ArrayList<String>();
for(Person p : people) {
firstNames.add(p.getFirstName());
}
addCallback(new Runnable() {
public void run() {
doSomething();
}
});
Python: first_names = [p.first_name for p in people]
add_callback(do_something)
Scala: val firstNames = people.map((p) => p.firstName);
addCallback(() => doSomething());
The Python and Scala versions do exactly what they say, while the Java code has a bunch of boilerplate that you have to mentally filter out before you can understand what it's doing. And the Scala code is fully typesafe; the compiler infers types rather than making you continually repeat them.I also forgot about file i/o, which I don't do much of in web applications.
One can write concurrent systems in Java without understanding concurrency. Languages like Scala and Clojure will give you some freedom but will also enforce certain design principles which will save you.
Similarly for web development, there are scores of frameworks in the Java world, and you can mess it up easily. Rails / Django on the other hand will provide one good, solid way to do web programming.
Finally, Java is showing it's age. The need to write large files of XML to configure things and the lack of ability to treat functions as objects put developers off. Some things are being addressed by Oracle but will take time.
Seriously, there is a fairly direct translation from any Java you might want to write to completely equivalent Python. Sure, Python offers more complex techniques such as list comprehensions and iterators. But you don't need to use them. You can just write Java-like Python.
* First class functions (interfaces with one method) plus garbage collector eventually encourage a functional programming style, with lots of little objects created on the heap. Alas, the per-object memory overhead of popular Java implementations is horrendous.
* Strong emphasis on using threads for concurrency. Alas, in practice, threads are incredibly large memory hogs.
* Verbosity. While it is possible to write clean composable code in Java, it is also remarkably verbose. After a while, this gets old and people take all shortcuts they can to limit verbosity. Which is a very bad idea. To quote an esteemed colleague, "I never took a shortcut I didn't regretted it later". Can we have our lambdas yet, pretty please?
* My beef with threads for concurrency revolves not around memory footprint (can you substantiate threads as "memory hogs"?), but instead around the necessity to be mindful of resource sharing. Yes, the JDK gives you lots of useful tools in this quest, but it's still not all that difficult to end up with a deadlocked app.
Notice, though, that competent people have done great jobs using these languages. So you have some choices. Two of them are: wonder why people bash Java or go do something useful with it. I suggest you do the second.
The key to using programming languages is in trying to use the one which will help you the most, or get in your way the least. Sort of "the right tool for the job". Idk what jobs java is good at. If you found out that it's good for your project, then use it.
Take a look a this article: http://prog21.dadgum.com/143.html
People use all sorts of distinctly sub-optimal tools and technologies for various reasons unrelated to those technologies' merits. One of the biggest reasons is familiarity--many people do not like learning radically new things and so stick to what they know. Popularity does not imbue any sort of quality to programming languages any more than it does to anything else like music. There's a reason that trained musicians respect classical music--even if they're making pop--and there's a reason programming language people respect ML.
In short: just because many people manage to use Java does not mean it is in any sense optimal or even good.
Also, I think the oft-repeated "right tool for the right job" bromide about programming languages is deeply flawed. Programming languages overlap far more than most tools--they are all general-purpose programming languages, after all. The difference between a hammer and a screwdriver is far greater than even the difference between Java and Haskell. Choosing a programming language is more like choosing the best power drill--they overlap almost completely and can do the same jobs. It's quite plausible that some are almost always better than others, but that you could ultimately do the job with either. It will just be more difficult with one than the other.
Also, even if languages did differ significantly, there is no guarantee that any particular language has anything it's best at--it can be strictly worse than other languages for every conceivable use.
Finally, I think that surrendering to familiarity and choosing something you know over something you need to learn is rarely a good choice. Sure, if you have a hard deadline, it might be a reasonable compromise. But learning a language is essentially a constant expense where its affect on your productivity is linear to how much you program. Just because it might take more effort to get started with Scala does not mean you should immediately consign yourself to the drag on productivity that is Java.
You should be learning something new all the time, and programming languages are some of the most important things to learn in CS--they affect not only what you write but how you think. So strive to find the best one you can rather than settling for something that works--in this day and age, expecting your language to be somewhat usable is too low a bar to set.
Nobody is saying the contrary. I suggested him one of the many options he has (from which I mentioned 2), which is using java. And said that using java is better than going around looking for why people bash java. Which happens to be true.
Consider how he would choose a better language though.
It's surprising how cumbersome it may get to write a simple loop in a more "elaborate" language.
> Also, I think the oft-repeated "right tool for the right job" bromide about programming languages is deeply flawed. Programming languages overlap far more than most tools--they are all general-purpose programming languages, after all. The difference between a hammer and a screwdriver is far greater than even the difference between Java and Haskell. Choosing a programming language is more like choosing the best power drill--they overlap almost completely and can do the same jobs. It's quite plausible that some are almost always better than others, but that you could ultimately do the job with either. It will just be more difficult with one than the other.
There is more to choosing a language than "the language". Here are some reasons why he may wanna use java:
1) He has books about java, but not about anything else.
2) His co-workers use java.
3) He needs to work with the JVM.
4) He really wants to use java.
5) He has a bunch of minor reasons to use java.
6) Libraries, Libraries, Libraries.
7) Development tools.
8) Good implementations.
9) There is a standard for the language.
10) There is a huge community around the language.
11) He doesns't really have a choice. His boss wants him to use java.
12) He's in academia and most people in his institute uses java.
(there are more!)
The "right tool for the job" sure is true. And very much true. It is complicated to select it though. That's why I didn't try to tell him HOW to select that tool. I really think he'll be better off with java if he saw java is good enough (another point is that a "known good enough" is usually better than a "unknown perfect").
We choose "a language" but rarely because of "the language". As you said yourself, a lot of times, these languages are general purpose languages and overlap a lot.
Have you ever noticed that a lot of the languages used today are really tied up to their implementations/running systems? C and UNIX, Java and JVM, C# and CLR, Python and CPython, PHP, Ruby, Objective-C (this one is a really good example), JavaScript, etc. This was true in the past too, think of delphi, vb6 and windows, lisp and the lisp machines, fortran and cobol and the IBM systems.
A lot of what was "language design" in the past, is libraries and implementation design today. Choosing a language is more than looking at "the language", its syntax, semantics, idioms, patterns, etc.
Some other important stuff. Like building a GUI is important, communicating over a network, accessing files, dealing with the data base, doing graphics programming. You can either use, for example, c# and have lots of these from .Net CLR with little effort, or pick OCaml, for example, and have it, but having to do a lot of work that you'd not have to do in c#. Even if you port OCaml to run on the CLR, it's unlikely to be as much CLR friendly than C#.
But, also, there are "local" reasons to choosing a language. These are stuff you or I don't know because it's specific to him or his group of people.
> You should be learning something new all the time, and programming languages are some of the most important things to learn in CS--they affect not only what you write but how you think. So strive to find the best one you can rather than settling for something that works--in this day and age, expecting your language to be somewhat usable is too low a bar to set.
Learning new stuff is good advice, generally. But programming languages are one of the most irrelevant things in CS. In the long run, they are irrelevant.
"We" already teach/learn a lot of programming concepts and techniques without specific programming languages, but with concepts shared by a class of programming languages (as you even said it, lots of them overlap). Concepts and techniques that, in the past, were highly specific of particular programming languages.
Languages get obsolete. Those which remain do so usually because of practical matters (like C, or Java, or C++).
The idea, sometimes new, a programming language may bring is important though. The language itself is not. For example, closures are really catching on now, but it was invented much in the past, and first implemented in languages that people do not use much (I guess it was scheme, but I am not asserting it)
And, so I can end this reply...
I'm sorry for arrogance, but you should not lose the ability to separate interesting from the practical, which I got the impression you cannot do very well. Some things are both (I guess haskell is one of these), but it's not usually the case.
What some people (fortunately, it doesn't seem that it's most of them) don't understand is that lots of programming do not require the sort of elaborate constructs and idioms that, for example, scheme allows you to use. I once talked to a guy who did lots of "business" software. I mentioned scheme to him. Told him lots of cool stuff about recursion, clojures and macros, told him a little about lambda calculus; showing how you could do it in scheme. And he told me "It's cool, but it's also like people don't know what is useful anymore.". Well, that got me thinking back then.
You can argue all you want if he's right or not; if the software he writes is difficult or not, but it's not that he didn't see the advantage of those things. But turns out that most repetitions do not require and are not good with recursion (they just loop over a collection or a range of values; a for+iterators or numbers would usually do), most functions do not return other functions (not that you couldn't do it that way, but it's usually the case that your program is simpler if you don't), and minimalism is not really that much convenient in writing software for $$ (many people seem to reach this conclusion). Lots of applications are still single threaded, and runs in only one process. Immutability is a lot more interesting in theory than in practice for a large class of programs. Static typing still catches a lot of problems, and people usually do not bother that much about having to write down the types. Beautiful techniques for managing large programs are very interesting ... for large programs. It turns out that lots of programs are not that large. And the list goes on and on.
It's not "horrible", it just has many slight-to-moderate deficiencies and annoyances that make development more work than it should be.
Is there any other language that fits this role in a better way?
Scala is strictly superior when used as a "better Java". (If you go deep into its functional capabilities you get a different set of tradeoffs). C# is better as a language, but then you're tied to .NET.
Really we'd need to know more details of what you're doing and why you believe Python may not work. Are you concerned about performance, or do you need to do things that Python doesn't have convenient APIs for?
I think most people on HN who hate Java are talking about creating websites, and for good reason. Back in the bad ol' days, people would use Java frameworks like Struts for web apps, and it was quite painful.
For my latest project I'm using Play Framework for front-end Java, and it's quite delightful.
I am a fan of lots of languages, but recently for anything I am supporting for a long period of time I want static typing to catch massive re-factoring issues. Id rather use C# then Java personally, but Play is an amazing Java framework to use.
The best reason to AVOID using Java is the huge demand for Java programmers and the low supply. At my job we can barely find applicants with Java so we end up hiring .NET people and converting them.
Out of interest, how do these guys find the switch to Java from C#, assuming they're not VB.NET guys?
Python is very powerful in terms of string manipulation because it has very good language constructs (like slice syntax) which makes development easy. At the beginning it might be a little bit confusing but once you mastered it you really feel power.
Twisted like frameworks also makes good job at this point. It is well-designed, asynchronus and it suits well for multi-tier network applications.
You can safely ignore the people who bash Java - they are generally clueless. The Java language is perfectly fine: high performance, statically typed, OOP, relatively simple and maintainable. It may not offer the most concise code and it may not have all the "trendy" language syntax features but guess what - that actually doesn't matter much in the real world (i.e. outside the realm of language designers and fanboys). If saving a few characters of typing is your major concern when choosing a language, you have much bigger problems.
But the real strength in Java is not the language but rather the overall platform - the combination of the JVM (which is an amazing high performance feat of engineering), the library ecosystem (which is the best overall for any language), the tools (great IDEs, Maven, a host of other developer-focused tools), the fact that the OpenJDK itself and most of the libraries are open source and the portability (compiled JVM code is extremely portable, and importantly doesn't need a recompile unlike some other so-called "cross-platform" languages)
So overall you can't really go wrong with choosing Java for server side applications. Although I would also give Clojure or Scala a look - if you are after "powerful" languages then these two are pretty amazing and you still get all the benefits of being on the Java platform.
Your tirade against "fanboys" is nothing but a straw man--the point of having more expressive, concise code is not "saving a few characters of typing" but making your program easier to write and easier to read (and, therefore, easier to maintain). Sure, you can get stuff done with Java, but you can generally get it done faster and better with other languages.
High-level features absolutely matter in the real world--they allow you to write code faster and give you more confidence that it is correct. Code written at a higher level is not only shorter but also more declarative and clearer. The idea that only "language designers and fanboys" care about having these features in their languages is patently absurd and rather arrogant.
To me, it seems Java is a compromise--it ignores decades of research and progress in programming language design in favor of catering to people who knew C++ and didn't want to learn something radically different. Thanks to being widely taught, it is now essentially a lowest common denominator: practically any programmer you meet will have at least learned the basics of Java at some point. But I think this is exactly the sort of compromise any good programmers should not take!
Now, the JVM is, admittedly, a good platform. It has some glaring weaknesses--poor support for functional programming, poor interoperation with native code, long start-up time and so on--but, on the whole, is very strong. Happily, you aren't bound to Java if you want to be on the JVM and you can use some of the great alternatives like Scala. But this does nothing to defend Java-the-language--having a good implementation does not make a language well-designed or particularly usable.
That being said, it doesn't really matter what language you write your crawler in: its performance will much sooner be influenced by other aspects (network latency, storage, etc) than the language you choose.
So pick the language you're most comfortable with for crawling and offload the data processing to a lower level language that is better sooted for that task.
webInfo = {url: "bla.bla", title: "bla die blub", links: ["link1", "link2"]}
Notice that webInfo contains two different types, Strings and Arrays. In Java arrays or hashes you can not easily mix types - you'll end up just putting objects everywhere, then be forced to litter the code with type casts. Or you create the unwieldly class hierarchy. That is my prediction, anyway - I am too lazy to come up with a good example :-(
You can also not simply write something like the hash above. The nearest you can get is if you have created that class hierarchy with suitable constructors, you could instantiate that in one go. At least that is my memory - I have now avoided it for so long that I am not even sure how to instantiate an Array or a Hash with data on the fly anymore.
I think instantiating an array with data goes something like
links = new String[]{"bla", "blub"}, and there is nothing like that for Hashes - you are stuck with
info = new HashMap()<String, Object>;//generics are particularly ugly and annoying
info.put("links", new String[]{"bla", "blub"});
info.put("title", "some stupid web site");
info.put("url", "undisclosed");
And so on - a far cry from the example above. (Note the Java syntax is probably wrong, created from memory - but it is something like that).
Even if you went through the mind numbing work of creating appropriate classes, you'd be stuck with
info = new WebInfo(title, url, new String[]{link1, link2,...});
And that is just for two different types, and notice that there is no way to see what the name of the parameters of the WebInfo constructor actually are from that snippet of code.
title: someTitle
is actually much more readable because you can instantly see that someTitle is supposed to be a title.
Also if you want to use NoSQL, I suspect converting java classes to JSON could be a pita, too.
Also beware of pseudo work: I suspect Java is partly popular because it makes you feel productive. You are constantly busy creating Tiny Types (as you call them), generating code in Eclipse (cool: one click and you have 50 lines of code in your class) and so on. It is all just pseudo work that accomplishes nothing, but maybe feeling productive is worth it.
ImmutableMap.Builder gives you almost syntax-free hash map creation.
You can use constructors in a loop or Lists.transfrom for data-driven construction. Not exactly like JavaScript, but gets close.
Unless you're building something that needs to be (1) highly dynamic (like a web-based spreadsheet where you don't know the column types til run-time, or (2) true real-time software, you're probably better off using java. Some libraries do suck as others wrote, but it's the volume of good libraries you care about. In any case, I'd argue that in many alternate languages, the code you're writing so quickly doesn't need to be written at all in java, because there's a library for it.
Verboseness is a fact in Java, but a decent IDE shields you from that as well. With Java it takes a little longer to get things done, but (in my experience) you spend less time trying on performance, fixing problems in the underlying tools or language, or just dealing with your own bugs and keeping things running. Since most development is maintenance, you want to optimize for that.
You should probably check existing web crawler solutions to see if you can adapt them before rolling your own.
We may say that with the current crop of languages running on the JVM, Java is a low-level language. It is to the JVM what C is to hardware. You avoid coding in both when you have higher-level languages available which will make you more productive.
But when you want to optimize performance on the JVM for specific chunks of your application - without resorting to JVM bytecode of course - Java is the right choice.
A good example of when you can't use Java would be embedded projects or projects where you need to squeeze even more performance out of your code. Java is fast but it isn't the fastest.
Feel free to love your language of choice but I'd recommend not letting it get in the way of common sense.
Can you show some benchmarks compare java w/ jit and other languages?
I love another language, but Java is better choice.
In such case like web crawler, the main issue with Java is the scalability or rather lack of it. You need to code it yourself, but that's not any different than other languages and platforms.