Using the nntplib module to count Google Groups users

S

Steven D'Aprano

There's been a bit of a discussion about how prevalent Google Groups
users are in this forum. This is a good opportunity to use one of
Python's standard library modules to scan through the comp.lang.python
newsgroup and find out. So here's some code to do so:


import nntplib
import sys
s = nntplib.NNTP('news.internode.on.net') # footnote [1]
resp, count, first, last, name = s.group('comp.lang.python')
print 'Group', name, 'has', count, 'articles, range', first, 'to', last
print 'Checking the most recent (approx) 5000 messages...'
last = int(last)
count = 0
gg = 0
template = "\rArticle %d: found %d Google Groups headers."
try:
for id in range(last-5000, last+1):
try:
headers = s.head(str(id))
except Exception:
continue
count += 1
for line in headers:
if "google" in line and "group" in line:
gg += 1
sys.stdout.write(template % (id, gg))
sys.stdout.flush()
break
except KeyboardInterrupt:
pass
finally:
print

s.quit()
print "Google Groups posts: %.2f%% of %d" % (gg*100.0/count, count)



Footnote [1] For this to work, you will need to be a subscriber with the
ISP Internode. If you are not, you will need to substitute your ISP's
news server. (Or your own, if you are running your own news server.)


This is a relatively busy newsgroup, and consequently downloading all the
headers may take a while, which is why I have limited it to only the most
recent 5000. I get this output:

Group comp.lang.python has 150071 articles, range 369087 to 519157
Checking the most recent (approx) 5000 messages...
Article 519153: found 957 Google Groups headers.
'205 Transferred 12653216 bytes in 0 articles, 0 groups. Disconnecting.'
Google Groups posts: 19.14% of 5001


Note that this *definitely* over-counts Google Groups. It also includes
replies to GG posts, as well as those actually sent via GG. There are
other false-positives as well. But as a rough-and-ready estimate, I think
it is good evidence that fewer than 1 in 5 posts come from Google Groups,
so definitely a minority, and by a long way.

Naturally this doesn't count lurkers who read via GG but never post. Nor
does it count distinct users, only distinct posts.

If anyone wants to modify the script to determine the ratio of posters,
rather than posts, using GG, be my guest. I'd be interested in the
answer, but not interested enough to actually do the work myself.
 
C

Chris Angelico

If anyone wants to modify the script to determine the ratio of posters,
rather than posts, using GG, be my guest. I'd be interested in the
answer, but not interested enough to actually do the work myself.

And if anyone does, do please post the result on-list. I, too, would
be mildly curious to know - but again, not enough to do the work (I'm
busy in D&D at the moment).

ChrisA
 
Z

Zero Piraeus

:

And if anyone does, do please post the result on-list.

Taking a different tack, since I happen to have a complete[1] local
archive of python-list going back a few years ... here's a quick and
dirty script to count unique senders and Google Groups users for this
year:

- - -

import os
from email.parser import HeaderParser

LIST = "(e-mail address removed)"
MAILDIR = "/path/to/mail/archive/cur"
YEAR = "2013"

parser = HeaderParser()

found = set()
gg_users = 0

for filename in os.listdir(MAILDIR):
with open(os.path.join(MAILDIR, filename)) as message:
headers = parser.parse(message)
sender = headers.get("from", "")
dest = headers.get("to", "")
date = headers.get("date", "")
if (LIST not in dest) or (YEAR not in date) or (sender in found):
continue
found.add(sender)
if "(e-mail address removed)" in headers.get("complaints-to", ""):
gg_users += 1
print("GG user:")
print(sender)
print("Senders: %d" % len(found))
print("GG users: %d" % gg_users)
print("---")

- - -

It's obviously not very robust, but I reckon it's good enough to get an
idea what's going on.

The results:

Senders: 1701
GG users: 879

.... so just over 50%.

If anyone wants the complete output, just let me know and I'll email it
privately.

-[]z.

[1] except for spam filtered out by Gmail.
 
R

rusi

The results:


Senders: 1701
GG users: 879

... so just over 50%.


If anyone wants the complete output, just let me know and I'll email it
privately.

If you have a GG account just go to the 'aboutgroup' info here:
https://groups.google.com/forum/#!aboutgroup/comp.lang.python

It tells me there are 21447 members.

What that means I shall not hazard.

Instead I quote a book 'Mathsemantics' by Edward Macneal

------------
I 1980 I was one passenger, ten passengers, eighteen passengers, thirty-six passengers, forty-two passengers, fifty-five passengers, seventy-two passengers and ninety-four passengers. Each of these statements is true.
-----------
.... explanation...
-----------
I was one passenger in the sense that I was a person who traveled by air in that year.
I was eighteen passengers in the sense that I made eighteen round trips.
I was forty-two passengers in the sense that on forty-two different occasions I entered and exited the system of a different carrier.
I was seventy-two passengers in the sense that on seventy-two occasions I was on board an aircraft when it took off from one place and landed at another.
I was ninety-four passengers in the sense that I made ninety-four separate entrances and exits from airport terminal buildings.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top