programming languages (etc) "web popularity" fun

A

Alex Martelli

(You need Mark Pilgrim's pygoogle, see
http://diveintomark.org/projects/pygoogle/ , and a personal license to the
google api, see http://www.google.com/apis/ , saved in a file such as
"googlekey.txt" in your home directory [pygoogle looks in several places,
see http://diveintomark.org/projects/pygoogle/readme.txt for the list).

So, a little script such as...:

#! /usr/local/bin/python2.3
# programming languages popularity web-survey

import google
import time

def quoter(xs): return ['"%s"'%x for x in xs]
langs = '''
python ruby perl caml java haskell lisp eiffel sml scheme
fortran ada forth apl javascript ecmascript vbscript vba sql
bash awk tcsh csh zsh ksh autolisp elisp occam intercal basic
abc algol applescript assembly befunge beta chill cobol dylan
erlang pascal delphi idl limbo smalltalk squeak m4 matlab logo
foxpro turing tcl snobol simula setl self rexx rebol postscript
php oz modula ml miranda mercury mumps oberon sather stackless
functional procedural parallel hpf agile extreme database
relational rpg
'''.split() + quoter([
'visual basic', 'object pascal', 'objective c', 'c++', 'c#', 'c',
'stackless python', 'object oriented',
])

# ensure all duplications are removed
langs = dict.fromkeys(langs).keys()

print 'examining %d terms' % len(langs)
results = []
for i, lang in enumerate(langs):
# ...compensate for frequent "internal errors" from google...
while True:
print '%2d: %20s' % (i, lang.strip('"'), ),
try: data = google.doGoogleSearch(lang + ' programming')
except Exception:
print "... likely internal server error, we wait & retry... "
time.sleep(0.5)
else:
results.append((data.meta.estimatedTotalResultsCount, lang))
# give running feedback since it DOES take a while!
print '%9d' % data.meta.estimatedTotalResultsCount
break
results.sort()
results.reverse()
print
print
print '%20s %9s' % ("Language", "# of hits")
print

for numb, lang in results:
print '%20s %9d' % (lang.strip('"'), numb)


Gives me the following results:

Language # of hits

c 4980000
database 3750000
basic 3750000
java 3320000
self 2000000
php 1880000
c++ 1860000
perl 1640000
sql 1150000
logo 1070000
parallel 1030000
javascript 1030000
functional 997000
object oriented 944000
visual basic 847000
beta 745000
python 729000
scheme 693000
assembly 687000
forth 591000
extreme 572000
c# 506000
relational 377000
delphi 354000
fortran 344000
pascal 329000
postscript 297000
tcl 277000
abc 259000
lisp 220000
procedural 204000
ml 201000
ada 196000
vbscript 181000
cobol 171000
foxpro 137000
vba 123000
matlab 111000
smalltalk 101000
ruby 97900
bash 87400
mercury 86800
rpg 81600
oz 78500
turing 72200
rexx 66100
agile 62700
eiffel 58300
idl 58100
haskell 55100
awk 53100
mumps 49800
chill 47600
objective c 44900
modula 39000
apl 38800
csh 31700
dylan 31500
simula 30600
erlang 29900
m4 28000
squeak 24400
miranda 24300
applescript 24000
object pascal 23900
algol 21000
ksh 17900
tcsh 17600
sml 16000
oberon 15400
caml 15300
hpf 11900
limbo 11400
rebol 10800
occam 10300
elisp 8780
ecmascript 7080
zsh 5640
autolisp 5430
sather 4260
snobol 3900
intercal 2700
setl 2010
stackless 1040
befunge 951
stackless python 431

of course there are quite a few anomalies here -- e.g. i think there is
no automatic way to "clean" the C hit count from the hits for objective c,
c++, c# -- basic from visual basic -- and so on. But then, this is for
fun, not a scientific query, which is why i've mixed other catchwords
with the programming languages as I thought of them.

Doing some "eyeball cleanup" we can see that c, net of c++, c# etc, must
be a little below Java; basic, net of visual basic, ditto. 'self' is
alas too unlikely to refer to that little-known though interesting
language:). similarly for 'logo', 'beta', ... -- and 'sql' is likely
to be mixed up with many other languages too.

So, I think the top ten places, in order, for actual languages, are really:
java
c (not objective/c++/c#)
basic (not visual)
php
c++
perl
javascript
visual basic
python
scheme

not too surprising, I guess. One could explore a bit more of course
(e.g. specifically look for 'basic -visual' etc etc) but I'm running
a bit short of my daily 1000 searches so I'm gonna leave that fun to
you, o readers. Points to ponder: the preponderance of visual basic
over python, and of python over scheme, is really small; the latter
may perhaps be explained by some occurrences of 'scheme' as an ordinary
word rather than the language name, and the former by the fact that the
typical web usage of many visual basic programmers is unlikely to include
writing websites about VB, compared to the web usage of Pythonistas.

If scheme's apparent popularity does turn out to be an artefact, then
forth (or is it an artefact from "go forth" etc...?-), assembly (but IS
that used in the programming sense...?), and C# are the other possible
contenders for the coveted tenth place. After the contenders for the
top places we have a (to me!) somewhat surprising bunch -- delphi,
fortran, pascal, postscript (!), tcl, abc (!?), lisp, ml, ada, and
vbscript in this order. Wow -- how are the mighty fallen! -- cobol
is BELOW this second bunch...!

Coming to buzzwords that aren't programming languages, other
surprises await: "functional" edges out "object oriented", "extreme"
is WAY more popular than "procedural" (yeah right:), "agile"
programming isn't as popular a term as I'd have thought (but still,
more than eiffel...:).

Plenty of other food for flamewars here -- can mercury AND oz
really be THAT much more popular than haskell, erlang, caml -- the
latter badly outscored even by OLD miranda -- and ML so WAY more
popular than ALL other pure functional languages & dialects (and
indeed even more than ada, vbscript, cobol, foxpro, vba, matlab,
smalltalk, ruby, bash...)...?!

googling sure _IS_ plenty of fun!!!-)


Alex
 
C

Cameron Laird

.
.
.
So, I think the top ten places, in order, for actual languages, are really:
java
c (not objective/c++/c#)
basic (not visual)
php
c++
perl
javascript
visual basic
python
scheme

not too surprising, I guess. One could explore a bit more of course .
.
.
googling sure _IS_ plenty of fun!!!-)


Alex

It's easy to imagine sources of noise for these data, including
such English-language commonplaces as "go forth" you already
mentioned. A next step might be to try to refine the queries
to eliminate classes of noise. The one that most catches my at-
tention is PHP; I've got to think that a lot of those are pages
that use PHP, rather than discuss it.
 
A

Alex Martelli

Cameron Laird wrote:
...
It's easy to imagine sources of noise for these data, including
such English-language commonplaces as "go forth" you already

Sure! Although juxtaposing "programming" to the search, as my
little script did, is, I believe, going to help a lot, it's no
magic. If a language was called, for example, 'and', we'd NEVER
manage to get reliable statistics about it:).

Actually there is a lesson here about "product naming for the
21st century". If you want to help people googling for your
product (firm, project, whatever), *use a made-up word* so that
all the google hits on it will be real ones. If you want to make
sure you're basically ungooglable-for, well -- take a leaf from
MS, and name your technologies "COM", ".NET" and so on:).
mentioned. A next step might be to try to refine the queries
to eliminate classes of noise. The one that most catches my at-
tention is PHP; I've got to think that a lot of those are pages
that use PHP, rather than discuss it.

No doubt, and google can help a little with THIS kind of artefact,
thanks to the "allintext:" qualifier. (BTW, should anybody with
any interest in web searching not have O'Reilly's book "Google
Hacks" yet, GET IT!-).

So, I've made a 2nd release of my script, more targeted at those
languages which stand a chance for the top spots and more subject
to automatic cleaning. The quoter function has gone, the langs
variable is built in more detail with:

langs = [x.strip() for x in '''
"c" -"c++" -"c#"
basic -visual
"c++"
"visual basic"
"assembly language" OR "machine code" OR "machine language"
forth -"go forth" -"and so forth"
"c#"
pascal -object
[ ...many simple unquoted single-word languages snipped... ]
smalltalk
ruby
'''.splitlines() if x.strip()]

and the search, in the loop, has become:

data = google.doGoogleSearch('allintext: %s programming' % lang)


with these refinements, we get the following top 30 languages:

Language # of hits

java 3050000
c" -"c++" -"c# 2470000
basic -visual 1880000
c++ 1710000
perl 1510000
php 1060000
javascript 939000
visual basic 758000
python 682000
scheme 642000
c# 460000
forth -"go forth" -"and so forth 325000
fortran 322000
delphi 305000
tcl 254000
postscript 236000
abc 233000
lisp 201000
ada 177000
ml 174000
vbscript 165000
cobol 157000
assembly language" OR "machine code" OR "machine language 146000
pascal -object 142000
foxpro 127000
vba 112000
matlab 103000
smalltalk 90300
ruby 88000

php has indeed lost a couple notches, and so have forth, assembly
(most particularly), pascal, basic. The "top 10" are still the same
though. A few hints for would-be further-cleaner-uppers though...:

abc programming gets a LOT of help from one certain TV network!-)
[all others on this list, from a simple eyeball test w/interactive
searches on 1st pages only, appear legit]
c is HEAVILY handicapped by those - conditions; if we did
java -"c++" -"c#" (tried interactively), we'd only get 2,370,000,
so c is in fact still quite likely to be king of the heap (same
query, interactive, with C instead of java, is over 4,000,000...)
this is an example of the fact that these numbers don't get reproduced
when I try the same google queries interactively (in opera) -- there
may be different filtering schemes in play
being careful is of course particularly warranted when two contendants
appear to be very close, abd there are many such pairs here --
python and scheme, forth and fortran, ada and ml, smalltalk and ruby...

Let's see what somebody else can dream up, perhaps on a very different
tack than my idea of tacking the word 'programming' on...


Alex
 
J

jmdeschamps

Alex Martelli said:
(You need Mark Pilgrim's pygoogle, see
http://diveintomark.org/projects/pygoogle/ , and a personal license to the
google api, see http://www.google.com/apis/ , saved in a file such as
"googlekey.txt" in your home directory [pygoogle looks in several places,
see http://diveintomark.org/projects/pygoogle/readme.txt for the list).

So, a little script such as...:

#! /usr/local/bin/python2.3
# programming languages popularity web-survey

...
(Bunch of lines cut out)
not too surprising, I guess. One could explore a bit more of course
(e.g. specifically look for 'basic -visual' etc etc) but I'm running
a bit short of my daily 1000 searches so I'm gonna leave that fun to
you, o readers. Points to ponder: the preponderance of visual basic
over python, and of python over scheme, is really small; the latter
may perhaps be explained by some occurrences of 'scheme' as an ordinary
word rather than the language name, and

Naturally we can also ponder about the fact that python is also a
snake, a java rats eating thing even ;-)

http://www.guardian.co.uk/international/story/0,3604,1072265,00.html
(thanks FL and Secret Labs)

and the half name of a 'passé' UK humour bunch of people! They must
make up for something (nudge,nudge)

(bunch of other lines cut out)
...



Jean-Marc
BTW Thanks for a great nutshell book!
 
A

Alex Martelli

jmdeschamps wrote:
...
Naturally we can also ponder about the fact that python is also a
snake, a java rats eating thing even ;-)

However, I think that the inclusion of 'programming' in the search cuts
down references to snakes and monty python. 'monty python programming'
does give 22,000 hit, but that includes (e.g. on the first page) lots
of pages about the python language that mention its name's origin --
eyeballing a few pages of such hits suggests over half of those are
like that; '(clever OR crafty) scheme programming' gives 23,500 hits
which include "hacker" in the AHD ("hack, practical joke, clever
scheme"), Linux Today's denunciation of "MS 'Software Choice' Scheme
a Clever Fraud', etc. It's hard to gauge!

For example, one purely suggestive indicator may come from
looking just for the phrase "XXX programming language" (with
the quotes in the search). With that, focusing on the 20
languages that previously appeared most popular (java down to
ml, and, for fairness, adding a -visual to ALL searches with
the single exception of the one for visual basic)...:

Language # of hits

java 58300
c 45300
c++ 17100
perl 9090
python 7110
ada 5610
scheme 3910
visual basic 3810
basic 3610
lisp 1820

c# 1480
fortran 1320
php 1310
javascript 881
forth 829
tcl 519
ml 513
delphi 396
postscript 228
abc 95

lisp and particularly ada make a great recovery and push out
of the Top 10 php and javascript.

But if we look for "XXX programmer" instead of "programming
language" things change rather drastically below the top 3
unchanged places...:

Language # of hits

java 37000
c 13400
c++ 12400
visual basic 11300
perl 9160
php 7100
python 3820
delphi 2190
lisp 1990
fortran 1530

javascript 1450
ada 1370
basic 1190
c# 479
tcl 323
forth 305
scheme 283
abc 145
postscript 144
ml 123

visual basic jumps up to 4th plage -- php is in again with a
vengeance -- and so is delphi and venerable fortran. lisp
holds and notches up, but scheme disappears into the depth
of the ranking! What fun!

BTW Thanks for a great nutshell book!

You're welcome! googling for 'XXX in a Nutshell' does show
my work doing roughly as well as could be expected given the
languages' popularity (and the lack of Nutshell books for
some, which helps:):

java 10700
perl 5540
python 1140
c# 393
delphi 354
c++ 352

....:)


Alex
 
D

Dan Schmidt

| Plenty of other food for flamewars here -- can mercury AND oz
| really be THAT much more popular than haskell, erlang, caml -- the
| latter badly outscored even by OLD miranda --

Objective Caml (the main dialect of caml) really suffers from a naming
problem. To pick all the references to it, you'll want to search for
"ocaml" and "o'caml" as well as "caml".

Dan
 
D

Dan Schmidt

| Plenty of other food for flamewars here -- can mercury AND oz
| really be THAT much more popular than haskell, erlang, caml -- the
| latter badly outscored even by OLD miranda --

Objective Caml (the main dialect of caml) really suffers from a naming
problem. To find all the references to it, you'll want to search for
"ocaml" and "o'caml" as well as "caml".

Dan
 
C

Cameron Laird

.
.
.
You're welcome! googling for 'XXX in a Nutshell' does show
my work doing roughly as well as could be expected given the
languages' popularity (and the lack of Nutshell books for
some, which helps:):

java 10700
perl 5540
python 1140
c# 393
delphi 354
c++ 352
.
.
.
Note, too, that the Python one hasn't been on the street
as long as some of the others. It's reasonable to specu-
late that the ratios haven't equilibrated yet.
 
A

Alex Martelli

Dan said:
| Plenty of other food for flamewars here -- can mercury AND oz
| really be THAT much more popular than haskell, erlang, caml -- the
| latter badly outscored even by OLD miranda --

Objective Caml (the main dialect of caml) really suffers from a naming
problem. To find all the references to it, you'll want to search for
"ocaml" and "o'caml" as well as "caml".

Good point, so, here's an updated search for the FP-ish crowd (each
OR-separated term is looked up in '"%s programming language"'):

# 0 scheme 4830
# 1 prolog 1080
# 2 ml OR sml 645
# 3 haskell 361
# 4 erlang 316
# 5 caml OR o'caml OR ocaml 312
# 6 mercury OR oz OR mozart 234
# 7 unlambda 121
# 8 miranda 79
# 9 clean 65

#10 pliant 63
#11 FISh 61
#12 rebol 54
#13 FP 27
#14 joy 20
#15 scala OR funnel 6
#16 mondrian 6
#17 HOP 1
#18 lemon 0
#19 Alcool-90 0

with the OR'ed terms, O'CAML is about as popular as Haskell or
Erlang -- of the "pure functional" programming languages (I stuck
in scheme and prolog just to provide some 'scaling'...:), only
ML / SML (Standard ML) is definitely more web-popular than this group,
while distinctions among them are, I suspect, within "noise" (as
are those in a lower group among Clean, Pliant, FISh and REBOL, say).


Alex
 
D

Dang Griffith

Note, too, that the Python one hasn't been on the street
as long as some of the others. It's reasonable to specu-
late that the ratios haven't equilibrated yet.

Not sure I understand your comment.
Python predates Java.
--dang
 
D

Dang Griffith

P

Peter Hansen

Dang said:
Not sure I understand your comment.
Python predates Java.
--dang

I'm sure Cameron is quite aware of that, and yet your comment
does nothing to invalidate his statement. Surely you can accept
that "Python hasn't been on the street as long as *some* of
the others" (emphasis added), can't you?

-Peter
 
P

Peter Otten

Peter said:
I'm sure Cameron is quite aware of that, and yet your comment
does nothing to invalidate his statement. Surely you can accept
that "Python hasn't been on the street as long as *some* of
the others" (emphasis added), can't you?

-Peter

The point is, "the Python one" does refer to "Python in a Nutshell".
Are there any Nutshell books that came out after March 2003?

Peter
 
A

Alex Martelli

Peter Otten wrote:
...
The point is, "the Python one" does refer to "Python in a Nutshell".
Are there any Nutshell books that came out after March 2003?

Sure (e.g. Windows Server 2003 in a Nutshell in September; C# in
a Nutshell 2nd edition in August; C++ in a Nutshell in May; and
others, too, I'm sure). O'Reilly is relentless...:).


Alex
 
A

Alex Martelli

Dang said:
Any chance of seeing PowerBuilder in the list?

Sure -- goes near the end of the current list of 31 languages, with
about helf the hits of Ruby or smalltalk, but over twice as many as
caml of all sorts:

matlab 104000
smalltalk 91300
ruby 90000
powerbuilder 49500
caml OR ocaml OR "o'caml 19500


Alex
 
P

Peter Hansen

Peter said:
The point is, "the Python one" does refer to "Python in a Nutshell".
Are there any Nutshell books that came out after March 2003?

But while Python predates Java, the Nutshell book for it is
extremely recent, while the Java one is very old.

I'm confused now, and I don't know which of us is arguing in favor
of which position... <sigh>. Too little sleep I think. :-(

-Peter
 
P

Peter Otten

Alex said:
Sure (e.g. Windows Server 2003 in a Nutshell in September; C# in
a Nutshell 2nd edition in August; C++ in a Nutshell in May; and
others, too, I'm sure). O'Reilly is relentless...:).


Alex

Second editions don't count, as you cannot reliably discriminate the
references, and older references tend to stay on the web. So C# has a
headstart of one year, Java 7 years, Perl 5 years, C++ came out roughly a
the same time as PiaN...

Hey, that was a cunning trick to make me assert how good your book does,
reference-wise :)

Peter

PS: I bought it a month ago, and I like it
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,009
Latest member
GidgetGamb

Latest Threads

Top