print header for output

C

Cathy James

I managed to get output for my function, thanks much for your
direction. I really appreciate the hints. Now I have tried to place
the statement "print ("Length \t" + "Count\n")" in different places in
my code so that the function can print the headers only one time in
this manner:

Count Length
4 7
8 1
12 2


Code so far:
def fileProcess(filename = open('declaration.txt', 'r')):

"""Call the program with an argument,
it should treat the argument as a filename,
splitting it up into words, and computes the length of each word.
print a table showing the word count for each of the word lengths
that has been encountered."""

freq = {} #empty dict to accumulate word count and word length
print ("Length \t" + "Count\n")
for line in filename:
punc = string.punctuation + string.whitespace#use Python's
built-in punctuation and whiitespace
for word in (line.replace (punc, "").lower().split()):
if word in freq:
freq[word] +=1 #increment current count if word already in dict

else:
freq[word] = 1 #if punctuation encountered,
frequency=0 word length = 0
#print ("Length \t" + "Count\n")#print header for all numbers.
for word, count in freq.items():
print(len(word), count)

fileProcess()

Send Python-list mailing list submissions to
       (e-mail address removed)

To subscribe or unsubscribe via the World Wide Web, visit
       http://mail.python.org/mailman/listinfo/python-list
or, via email, send a message with subject or body 'help' to
       (e-mail address removed)

You can reach the person managing the list at
       (e-mail address removed)

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Python-list digest..."

Today's Topics:

  1. Re: How do you copy files from one location to another?
     (Terry Reedy)
  2. Re: Strategy to Verify Python Program is POST'ing to a web
     server. (Paul Rubin)
  3. Re: Strategy to Verify Python Program is POST'ing to a web
     server. (Terry Reedy)
  4. Re: debugging https connections with urllib2? (Roy Smith)
  5. Re: Improper creating of logger instances or a Memory Leak?
     (Chris Torek)
  6. Re: Strategy to Verify Python Program is POST'ing to a web
     server. (Chris Angelico)
  7. NEED HELP-process words in a text file (Cathy James)
  8. Re: NEED HELP-process words in a text file (Chris Rebert)
  9. Re: NEED HELP-process words in a text file (Tim Chase)


---------- Forwarded message ----------
From: Terry Reedy <[email protected]>
To: (e-mail address removed)
Date: Sat, 18 Jun 2011 16:52:26 -0400
Subject: Re: How do you copy files from one location to another?
Python is great for automating sysadmin tasks, but perhaps you should
just use rsync for this.  It comes with the benefit of only copying
the changes instead of every file every time.

"rsync -a C:\source E:\destination" and you're done.

Perhaps 'synctree' would be a candidate for addition to shutil.

If copytree did not prohibit an existing directory as destination, it could be used for synching with an 'ignore' function.

--
Terry Jan Reedy




---------- Forwarded message ----------
From: Paul Rubin <[email protected]>
To: (e-mail address removed)
Date: Sat, 18 Jun 2011 14:03:19 -0700
Subject: Re: Strategy to Verify Python Program is POST'ing to a web server.
For example, if I create a website that tracks some sort of
statistical information and don't ensure that my program is the one
that is uploading it, the statistics can be thrown off by people
entering false POST data onto the data upload page.  Any remedy?

If you're concerned about unauthorized users posting random crap, the
obvious solution is configure your web server to put password protection
on the page.

If you're saying AUTHORIZED users (those allowed to use the program to
post stuff) aren't trusted to not bypass the program, you've basically
got a DRM problem, especially if you think the users might
reverse-engineer the program to figure out the protocol.  The most
effective approaches generally involve delivering the program in the
form of a hardware product that's difficult to tamper with.  That's what
cable TV boxes amount to, for example.

What is the application, if you can say?  That might help get better
answers.



---------- Forwarded message ----------
From: Terry Reedy <[email protected]>
To: (e-mail address removed)
Date: Sat, 18 Jun 2011 17:17:09 -0400
Subject: Re: Strategy to Verify Python Program is POST'ing to a web server.
Hello Folks,

I am wondering what your strategies are for ensuring that data
transmitted to a website via a python program is indeed from that
program, and not from someone submitting POST data using some other
means.  I find it likely that there is no solution, in which case what
is the best solution for sending data to a remote server from a python
program and ensuring that it is from that program?

For example, if I create a website that tracks some sort of
statistical information and don't ensure that my program is the one
that is uploading it, the statistics can be thrown off by people
entering false POST data onto the data upload page.  Any remedy?

You have not specified all the parameters of the problem. Are there a limited number of copies of your program or are they distrubuted freely? What about multiple votes from one program?

Corporate proxy votes (which are a legally important type of statistical information) work as follows. Each shareholder is mailed or emailed a 'control number'. Attend stockholder meeting in person, mail proxy vote, or login with any browser with control number. Repeat votes by the same control idsupercede previous vote. There should be a 'thank you for voting' responsefor each vote. I suspect IP addr. is recorded with vote too. I have not heard of specific problems with electronic proxy voting.

--
Terry Jan Reedy




---------- Forwarded message ----------
From: Roy Smith <[email protected]>
To: (e-mail address removed)
Date: Sat, 18 Jun 2011 17:45:42 -0400
Subject: Re: debugging https connections with urllib2?
 Irmen de Jong said:
Put a proxy between the https-service endpoint and your client app.
Let the proxy talk https and let your client talk http to the proxy.

Clever.  I like.  Thanks.



---------- Forwarded message ----------
From: Chris Torek <[email protected]>
To: (e-mail address removed)
Date: 18 Jun 2011 22:28:39 GMT
Subject: Re: Improper creating of logger instances or a Memory Leak?
I've run across a memory leak in a long running process which I can't
determine if its my issue or if its the logger.

You do not say what version of python you are using, but on the
other hand I do not know how much the logger code has evolved
over time anyway. :)
Each application thread gets a logger instance in it's init() method
via:

       self.logger = logging.getLogger('ivr-'+str(self.rand))

where self.rand is a suitably large random number to avoid collisions
of the log file's name.

This instance will "live forever" (since the thread shares the
main logging manager with all other threads).
---------
class Manager:
   """
   There is [under normal circumstances] just one Manager instance, which
   holds the hierarchy of loggers.
   """
   def __init__(self, rootnode):
       """
       Initialize the manager with the root node of the logger hierarchy.
       """
       [snip]
       self.loggerDict = {}

   def getLogger(self, name):
       """
       Get a logger with the specified name (channel name), creating it
       if it doesn't yet exist. This name is a dot-separated hierarchical
       name, such as "a", "a.b", "a.b.c" or similar.

       If a PlaceHolder existed for the specified name [i.e. the logger
       didn't exist but a child of it did], replace it with the created
       logger and fix up the parent/child references which pointed to the
       placeholder to now point to the logger.
       """
       [snip]
                   self.loggerDict[name] = rv
       [snip]
[snip]
Logger.manager = Manager(Logger.root)
---------

So you will find all the various ivr-* loggers in
logging.Logger.manager.loggerDict[].
finally the last statements in the run() method are:

       filehandler.close()
       self.logger.removeHandler(filehandler)
       del self.logger #this was added to try and force a clean up of
the logger instances.

There appears to be no __del__ handler and nothing that allows
removing a logger instance from the manager's loggerDict.  Of
course you could do this "manually", e.g.:

       ...
       self.logger.removeHandler(filehandler)
       del logging.Logger.manager.loggerDict[self.logger.name]
       del self.logger # optional

I am curious as to why you create a new logger for each thread.
The logging module has thread synchronization in it, so that you
can share one log (or several logs) amongst all threads, which is
more typically what one wants.
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)      http://web.torek.net/torek/index.html



---------- Forwarded message ----------
From: Chris Angelico <[email protected]>
To: (e-mail address removed)
Date: Sun, 19 Jun 2011 09:12:13 +1000
Subject: Re: Strategy to Verify Python Program is POST'ing to a web server.
This is only true if you distribute your app with one built-in
certificate, which does indeed seem like a bad idea.  When you know
your user base though, especially if this is a situation with a small
number of deployments, than you can distribute a unique certificate to
each client, signed by your CA.

That changes it from verifying the program to verifying the user. It's
a somewhat different beast, but it still leaves the possibility of
snagging the cert and using it in another program. Same with IP
address checks. You can't prove that the other end is a particular
program.
An authentication process that involves the client executing code
supplied by the server opens up one single point of failure (server is
compromised or man-in-the-middle attack is happening) by which
arbitrary code could get executed on the client.  Yikes!

Yeah, hence the part of verifying the server's cert too. That one is a
bit safer though; nobody but you will have that certificate, so it's
not as easy to take and put into another program. But this whole
scheme was meant from the start to be ridiculous.
If ...
then you'll have to accept that you cannot trust the submitted data
100%, and just take measures to mitigate abuse.

I still stand by my original point, namely that the "if" on here is
superfluous, and the "then" is unconditional. But the measures you
describe _do_ reduce the likelihood significantly.

ChrisA



---------- Forwarded message ----------
From: Cathy James <[email protected]>
To: (e-mail address removed)
Date: Sat, 18 Jun 2011 18:21:55 -0500
Subject: NEED HELP-process words in a text file
Dear Python Experts,

First, I'd like to convey my appreciation to you all for your support
and contributions.  I am a Python newborn and need help with my
function. I commented on my program as to what it should do, but
nothing is printing. I know I am off, but not sure where. Please
help:(

import string
def fileProcess(filename):
   """Call the program with an argument,
   it should treat the argument as a filename,
   splitting it up into words, and computes the length of each word.
   print a table showing the word count for each of the word lengths
that has been encountered.
   Example:
   Length Count
   1 16
   2 267
   3 267
   4 169
   >>>"&"
   Length    Count
   0    0
   >>>
   >>>"right."
   Length    Count
   5    10
   """
   freq = [] #empty dict to accumulate words and word length
   filename=open('declaration.txt, r')
   for line in filename:
       punc = string.punctuation + string.whitespace#use Python's
built-in punctuation and whiitespace
       for i, word in enumerate (line.replace (punc, "").lower().split()):
           if word in freq:
               freq[word] +=1 #increment current count if word already in dict

           else:
               freq[word] = 0 #if punctuation encountered,
frequency=0 word length = 0
       for word in freq.items():
           print("Length /t"+"Count/n"+ freq[word],+'/t' +
len(word))#print word count and length of word separated by a tab




   #Thanks in advance,
CJ.



---------- Forwarded message ----------
From: Chris Rebert <[email protected]>
To: Cathy James <[email protected]>
Date: Sat, 18 Jun 2011 16:30:00 -0700
Subject: Re: NEED HELP-process words in a text file
Subject: NEED HELP-process words in a text file

Dear Python Experts,

First, I'd like to convey my appreciation to you all for your support
and contributions.  I am a Python newborn and need help with my
function. I commented on my program as to what it should do, but
nothing is printing. I know I am off, but not sure where. Please
help:(

Netiquette comment: Please avoid SHOUTING and including unnecessary
entreaties in your subject lines in the future.

Cheers,
Chris



---------- Forwarded message ----------
From: Tim Chase <[email protected]>
To: Cathy James <[email protected]>
Date: Sat, 18 Jun 2011 19:09:18 -0500
Subject: Re: NEED HELP-process words in a text file
    freq = [] #empty dict to accumulate words and word length

While you say you create an empty dict, using "[]" creates an empty *list*, not a dict.  Either your comment is wrong or your code is wrong. :)  Given your usage, I presume you want a dict, not a list.
    for line in filename:
        punc = string.punctuation + string.whitespace#use Python's
built-in punctuation and whiitespace

Since you don't change "punc" in your loop, you'd get better performance by hoisting this outside of the loop so it's only evaluated once.  Not that it should matter *that* greatly, but it's just a bad-code-smell.
        for i, word in enumerate (line.replace (punc, "").lower().split()):

.replace() doesn't operate on sets of characters, but rather strings.  So unless your line contains the exact text in "punc" (unlikely), that replacement is a NOP.  There are a couple ways to go about removing unwanted characters:

- make a set of those characters and produce a resulting string from things not in that set:

 punc_set = set(punc)
 line = ''.join(c for c in line if c not in punc_set)

- use a regexp to strip them out...something like

 punc_re = re.compile("[" + re.escape(punc) + "]")
 ...
 line = punc_re.sub('', line)

- use string translations.  I'm not as familiar with these, but the following seemed to work for me, abusing the 2nd "deletechars" parameter for your particular use-case:

 line = line.translate(None, punc)

I don't see .translate(None) documented anywhere.  My random effort seemed to work in 2.6, but fails in 2.5 and prior.  YMMV.
            if word in freq:
                freq[word] +=1 #increment current count if word already in dict

            else:
                freq[word] = 0 #if punctuation encountered,
frequency=0 word length = 0

Again, your 2nd comment disagrees with your code.  As an aside, if you're using 2.5 or greater, I'd use collections.defaultdict(int) as the accumulator:

 freq = collections.defaultdict(int)
 ...
 freq[word] += 1
 # no need to check presence
        for word in freq.items():
            print("Length /t"+"Count/n"+ freq[word],+'/t' +
len(word))#print word count and length of word separated by a tab

Where to begin:

- Your escapes are using "/" instead of "\" for <tab> and <newline> whichI expect will mess up the formatting.

- You're also labeling them "Length/Count" but printing "count/length".

- you're iterating over freq.items() but that should be written as

 for word, count in freq.items():

or

 for word in freq:

-  Additionally, adding the bits together makes it somewhat hard to understand.

I'd use something like

 for word, count in freq.items():
   print("Word \tLength \tCount\n%s \t%i \t%i" % (
     word, len(word), count))

-tkc
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,039
Messages
2,570,376
Members
47,028
Latest member
IsmaelLans

Latest Threads

Top