Mohd Hanafiah Abdullah wrote:
[ ... ]
You can see:
http://www.tiobe.com/tpci.htm
for an index on programming languages in terms of popularity.
This appears highly suspect to me. Ignoring C and C++ for the moment,
they show Verilog as being behind VHDL -- yet even the staunchest
advocates of VHDL generally agree that Verilog has a much larger market
share (probably 5 times as large).
Ultimately, they're measuring one thing, but reporting it as something
else entirely. What they're measuring seems to be mostly the number of
_posts_ about a language. Unfortunately, they're _reporting_ this as
indicating the number of lines of code written in that language.
TTBOMK, nobody has ever shown a direct relationship between the two. At
least IME, the relationship is mostly inverse -- when I'm cranking out
a lot of code, I rarely have time to post much.
Likewise, revisions to a language (contemplated or recent) tend to
generate a great deal of discussion. The code is written later, after
the language is (again) well understood and the discussion has largely
died down.
Personally, I think we can do quite a bit better by searching for
strings that are likely to occur once (and only once) per source file
to get at least a vague notion of the number of files in each language:
string hits
"import java.io" 902,000
"include <stdio.h>" 879,000
"include <iostream>" 404,000
"include <iostream.h>" 154,000
[Note: these numbers varied over even a short period of time -- maybe
Google was doing a crawl, or updating results from its last one, while
I was doing the searches. If you re-do the searches, expect results to
vary, but only slightly, at least if you re-do them soon.]
This has to be done carefully to produce meaningful results though.
Just for example, searching for "import java" produces over 2 million
hits, but a single Java source file will often start off three or four
import lines. Likewise this ignores the length of each piece of source
code (which probably varies with language), the percentage of code
written in that language that's visible to Google (e.g. probably a LOT
lower for COBOL than for C, C++ or Java), etc. Finally, I've made no
attempt to isolate particular dates like they're doing.
Even with all these shortcomings, I suspect it's still a better measure
of quantity of source code than looking for mentions of the language.
As mentioned above, I'm fairly sure some languages generate a higher
ratio of discussion to source code than others. For example, searching
for "java.io" instead of "import java.io" more than triples the number
of hits. By contrast, searching for 'iostream' without the 'include'
only increases the number of hits by about 50%. I, at least, would tend
to assume that in both cases the "extra" hits are mostly in
documentation, discussions, etc.
In the end, I'm a bit surprised that the results don't differ even more
greatly.