Regular expression worries

Discussion in 'Python' started by CSUIDL PROGRAMMEr, Oct 11, 2006.

  1. folks
    I am new to python, so excuse me if i am asking stupid questions.

    I have a txt file and here are some lines of it

    Document<Keyword<date:2006-08-19> Keyword<time:11:00:43>
    Keyword<username:YOURBOTNICK> Keyword<data:localhost.localdomain>
    Keyword<logon:localhost.localdomain
    > Keyword<date:2006-08-19> Keyword<time:11:00:44> Keyword<sender:>

    Keyword<receiver:> Keyword<data::+iwx> Keyword<mode::+iwx

    I am writing a python program to replace the tags and word Document
    with Doc.

    Here is my python program

    #! /usr/local/bin/python

    import sys
    import string
    import re

    def replace():
    filename='/root/Desktop/project/chatlog_20060819_110043.xml.txt'
    try:
    fh=open(filename,'r')
    except:
    print 'file not opened'
    sys.exit(1)
    for l in
    open('/root/Desktop/project/chatlog_20060819_110043.xml.txt'):

    l=l.replace("Document", "DOC")
    fh.close()

    if __name__=="__main__":
    replace()

    But it does not replace Document with Doc in the txt file

    Is there anything wrong i am doing

    thanks
     
    CSUIDL PROGRAMMEr, Oct 11, 2006
    #1
    1. Advertising

  2. CSUIDL PROGRAMMEr

    Guest

    You are opening the same file twice, reading its contents line-by-line
    into memory, replacing "Document" with "Doc" *in memory*, never writing
    that to disk, and then discarding the line you just read into memory.

    If your file is short, you could read the entire thing into memory as
    one string using the .read() method of fh (your file object). Then,
    call .replace on the string, and then write to disk.

    If your file is long, then you want to do the replace line by line,
    writing as you go to a second file. You can later rename that file to
    the original file's name and delete the original.

    Also, you aren't using regular expressions at all. You do not
    therefore need the re module.

    CSUIDL PROGRAMMEr wrote:
    > folks
    > I am new to python, so excuse me if i am asking stupid questions.
    >
    > I have a txt file and here are some lines of it
    >
    > Document<Keyword<date:2006-08-19> Keyword<time:11:00:43>
    > Keyword<username:YOURBOTNICK> Keyword<data:localhost.localdomain>
    > Keyword<logon:localhost.localdomain
    > > Keyword<date:2006-08-19> Keyword<time:11:00:44> Keyword<sender:>

    > Keyword<receiver:> Keyword<data::+iwx> Keyword<mode::+iwx
    >
    > I am writing a python program to replace the tags and word Document
    > with Doc.
    >
    > Here is my python program
    >
    > #! /usr/local/bin/python
    >
    > import sys
    > import string
    > import re
    >
    > def replace():
    > filename='/root/Desktop/project/chatlog_20060819_110043.xml.txt'
    > try:
    > fh=open(filename,'r')
    > except:
    > print 'file not opened'
    > sys.exit(1)
    > for l in
    > open('/root/Desktop/project/chatlog_20060819_110043.xml.txt'):
    >
    > l=l.replace("Document", "DOC")
    > fh.close()
    >
    > if __name__=="__main__":
    > replace()
    >
    > But it does not replace Document with Doc in the txt file
    >
    > Is there anything wrong i am doing
    >
    > thanks
     
    , Oct 11, 2006
    #2
    1. Advertising

  3. CSUIDL PROGRAMMEr wrote:
    > folks
    > I am new to python, so excuse me if i am asking stupid questions.


    From what I see, you seem to be new to programming in general !-)

    > I have a txt file and here are some lines of it
    >
    > Document<Keyword<date:2006-08-19> Keyword<time:11:00:43>
    > Keyword<username:YOURBOTNICK> Keyword<data:localhost.localdomain>
    > Keyword<logon:localhost.localdomain
    > > Keyword<date:2006-08-19> Keyword<time:11:00:44> Keyword<sender:>

    > Keyword<receiver:> Keyword<data::+iwx> Keyword<mode::+iwx
    >
    > I am writing a python program to replace the tags and word Document
    > with Doc.
    >
    > Here is my python program
    >
    > #! /usr/local/bin/python
    >
    > import sys
    > import string
    > import re
    >
    > def replace():
    > filename='/root/Desktop/project/chatlog_20060819_110043.xml.txt'
    > try:
    > fh=open(filename,'r')
    > except:
    > print 'file not opened'
    > sys.exit(1)


    You open your file a first time, and bind the reference to the file
    object to fh.

    > for l in
    > open('/root/Desktop/project/chatlog_20060819_110043.xml.txt'):


    And then you open the file a second time...

    > l=l.replace("Document", "DOC")


    This modifies the string referenced by l (talk about a bad name) and
    rebind to the same name

    > fh.close()


    Then you close fh... and discard the modifications to l.

    > if __name__=="__main__":
    > replace()
    >
    > But it does not replace Document with Doc in the txt file


    Why should it ? You didn't asked for it !-)

    > Is there anything wrong i am doing


    Yes.

    The canonical way to modify a text file is to read from original / do
    transformations / *write modifications to a tmp file* / replace the
    original with the tmp file.


    --
    bruno desthuilliers
    python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
    p in ''.split('@')])"
     
    Bruno Desthuilliers, Oct 11, 2006
    #3
  4. CSUIDL PROGRAMMEr

    Tim Chase Guest

    > for l in
    > open('/root/Desktop/project/chatlog_20060819_110043.xml.txt'):
    >
    > l=l.replace("Document", "DOC")
    > fh.close()
    >
    > But it does not replace Document with Doc in the txt file


    In addition to closing the file handle for the loop *within* the
    loop, you're changing "l" (side note: a bad choice of names, as
    in most fonts, it's difficult to visually discern from the number
    "1"), but you're not writing it back out any place. One would do
    something like

    outfile = open('out.txt', 'w')
    infile = open(filename)
    for line in infile:
    outfile.write(line.replace("Document", "DOC"))
    outfile.close()
    infile.close()

    You could even let garbage collection take care of the file
    handle for you:


    outfile = open('out.txt', 'w')
    for line in open(filename):
    outfile.write(line.replace("Document", "DOC"))
    outfile.close()


    If needed, you can then move the 'out.txt' overtop of the
    original file.

    Or, you could just use

    sed 's/Document/DOC/g' $FILENAME > out.txt

    or with an accepting version, do it in-place with

    sed -i 's/Document/DOC/g' $FILENAME

    if you have sed available on your system.

    Oh...and it doesn't look like your code is using regexps for
    anything, despite the subject-line of your email :) I suspect
    they'll come in later for the "replace the tags" portion you
    mentioned, but that ain't in the code.

    -tkc
     
    Tim Chase, Oct 11, 2006
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. VSK
    Replies:
    2
    Views:
    2,305
  2. i18n worries

    , Jul 2, 2003, in forum: Python
    Replies:
    0
    Views:
    375
  3. James Goldwater

    wxPython worries

    James Goldwater, Jan 14, 2004, in forum: Python
    Replies:
    25
    Views:
    979
    Jarek Zgoda
    Jan 17, 2004
  4. Replies:
    16
    Views:
    730
    Stephen Sprunk
    Jul 24, 2006
  5. Replies:
    8
    Views:
    324
    Andrew Thompson
    Mar 25, 2007
Loading...

Share This Page