Chris said:
I've seen a few posts, columns and articles which state that one of the
advantages of Python is that code can be developed x times faster than
languages such as <<Insert popular language name here>>.
Does anyone have any comments on that statement from personal
experience?
I had to work at a laboratory a few years ago which used Java
exclusively. I was coming from several years as a graduate student
using Python almost exclusively for my own work. (But I used to teach
introductory Java classes at my previous university, so I had plenty of
Java experience.)
My own work and the work that I did for the lab were quite similar,
mainly focused on training machine learning models on natural language
processing tasks. I estimated that the Java code took me about 5x as
long. Part of this is the verbosity of Java, e.g. where you have to
write an anonymous inner class instead of using a function or a class
object directly. But probably a larger part of this was using the Java
libraries, which tend to be way over-engineered, and more complicated to
use than they need to be.
A simple example from document indexing. Using Java Lucene to index
some documents, you'd write code something like::
Analyzer analyzer = new StandardAnalyzer()
IndexWriter writer = new IndexWriter(store_dir, analyzer, true)
for (Value value: values) {
Document document = Document()
Field title = new Field("title", value.title,
Field.Store.YES,
Field.Index.TOKENIZED)
Field text = new Field("text", value.text,
Field.Store.YES,
Field.Index.TOKENIZED)
document.add(title)
document.add(text)
}
Why is this code so verbose? Because the Lucene Java APIs don't like
useful defaults. So for example, even though StandardAnalyzer is
supposedly *Standard*, there's no IndexWriter constructor that includes
it automatically. Similarly, if you create a Field with a string name
and value (as above), you must specify both a Field.Store and a
Field.Index - there's no way to let them default to something reasonable.
Compare this to Python code. Unfortunately, PyLucene wraps the Lucene
APIs pretty directly, but I've wrapped PyLucene with my own wrapper that
adds useful defaults (and takes advantages of things like Python's
**kwargs). Here's what the same code looks like with my Python wrapper
to Lucene::
writer = IndexWriter(store_dir)
for value in values:
document = Document(title=value.title, text=value.text)
writer.addDocument(document)
writer.close()
Gee, and I wonder why it took me so much longer to write things in Java. ;-)
STeVe