Seeking assistance - string processing.

Discussion in 'Python' started by billpaterson2006@googlemail.com, Nov 14, 2006.

  1. Guest

    I've been working on some code to search for specific textstrings and
    act upon them insome way. I've got the conversion sorted however there
    is 1 problem remaining.

    I am trying to work out how to make it find a string like this "==="
    and when it has found it, I want it to add "===" to the end of the
    line.

    For example.

    The text file contains this:

    ===Heading

    and I am trying to make it be processed and outputted as a .dat file
    with the contents

    ===Heading===

    Here's the code I have got so far.

    import string
    import glob
    import os

    mydir = os.getcwd()
    newdir = mydir#+"\\Test\\";

    for filename in glob.glob1(newdir,"*.txt"):
    #print "This is an input file: " + filename
    fileloc = newdir+"\\"+filename
    #print fileloc

    outputname = filename
    outputfile = string.replace(outputname,'.txt','.dat')
    #print filename
    #print a

    print "This is an input file: " + filename + ". Output file:
    "+outputfile

    #temp = newdir + "\\" + outputfile
    #print temp


    fpi = open(fileloc);
    fpo = open(outputfile,"w+");

    output_lines = []
    lines = fpi.readlines()

    for line in lines:
    if line.rfind("--------------------") is not -1:
    new = line.replace("--------------------","----")
    elif line.rfind("img:") is not -1:
    new = line.replace("img:","[[Image:")
    elif line.rfind(".jpg") is not -1:
    new = line.replace(".jpg",".jpg]]")
    elif line.rfind(".gif") is not -1:
    new = line.replace(".gif",".gif]]")
    else:
    output_lines.append(line);
    continue
    output_lines.append(new);

    for line in output_lines:
    fpo.write(line)

    fpi.close()
    fpo.flush()
    fpo.close()


    I hope this gets formatted correctly :p

    Cheers, hope you can help.
     
    , Nov 14, 2006
    #1
    1. Advertising

  2. wrote:

    > I am trying to work out how to make it find a string like this "==="
    > and when it has found it, I want it to add "===" to the end of the
    > line.


    how about

    if line.startswith("==="):
    line = line + "==="

    or

    if "===" in line: # anywhere
    line = line + "==="

    ?

    > if line.rfind("--------------------") is not -1:
    > new = line.replace("--------------------","----")


    it's not an error to use replace on a string that doesn't contain the
    pattern, so that rfind is rather unnecessary.

    (and for cases where you need to look first, searching from the left
    is usually faster than searching backwards; use "pattern in line" or
    "line.find(pattern)" instead of rfind.

    </F>
     
    Fredrik Lundh, Nov 14, 2006
    #2
    1. Advertising

  3. Guest

    Thanks so much, a really elegant solution indeed.

    I have another question actually which I'm praying you can help me
    with:

    with regards to the .jpg conversion to .jpg]] and .gif -> .gif]]

    this works, but only when .jpg/.gif is on it's own line.

    i.e:

    ..jpg

    will get converted to:

    ..jpg]]

    but

    Image:test.jpg

    gets converted to:

    [[Image:test.jpg

    rather than

    [[Image:test.jpg]]

    ------------------

    Hope you can help again! Cheers
     
    , Nov 14, 2006
    #3
  4. Peter Otten Guest

    wrote:

    > Thanks so much, a really elegant solution indeed.
    >
    > I have another question actually which I'm praying you can help me
    > with:
    >
    > with regards to the .jpg conversion to .jpg]] and .gif -> .gif]]
    >
    > this works, but only when .jpg/.gif is on it's own line.
    >
    > i.e:
    >
    > .jpg
    >
    > will get converted to:
    >
    > .jpg]]
    >
    > but
    >
    > Image:test.jpg
    >
    > gets converted to:
    >
    > [[Image:test.jpg
    >
    > rather than
    >
    > [[Image:test.jpg]]
    >
    > ------------------
    >
    > Hope you can help again! Cheers


    It does not do the right thing in all cases, but maybe you can get away with

    for line in lines:
    if line.startswith("==="):
    line = line.rstrip() + "===\n"
    line = line.replace("--------------------","----")
    line = line.replace("img:","[[Image:")
    line = line.replace(".jpg",".jpg]]")
    line = line.replace(".gif",".gif]]")
    output_lines.append(line)

    Peter
     
    Peter Otten, Nov 14, 2006
    #4
  5. Guest

    Cheers for the reply.

    But I'm still having a spot of bother with the === addition

    it would seem that if there is no whitespace after the ===test
    then the new === gets added to the next line

    e.g file contains:

    ===test (and then no whitesapace/carriage returns or anything)

    and the result is:

    ===test
    ===

    I tried fidding aruond trying to make it add whitespace but it didnt
    work.

    What do you think I should do?

    Cheers
     
    , Nov 14, 2006
    #5
  6. wrote:

    > But I'm still having a spot of bother with the === addition
    >
    > it would seem that if there is no whitespace after the ===test
    > then the new === gets added to the next line
    >
    > e.g file contains:
    >
    > ===test (and then no whitesapace/carriage returns or anything)
    >
    > and the result is:
    >
    > ===test
    > ===


    that's probably because it *does* contain a newline. try printing the
    line with

    print repr(line)

    before and after you make the change, to see what's going on.

    > I tried fidding aruond trying to make it add whitespace but it didnt
    > work.


    peter's complete example contains one way to solve that:

    if line.startswith("==="):
    line = line.rstrip() + "===\n"

    > What do you think I should do?


    reading the chapter on strings in your favourite Python tutorial once
    again might help, I think. python have plenty of powerful tools for
    string processing, and most of them are quite easy to learn and use; a
    quick read of the tutorial and a little more trial and error before
    posting should be all you need.

    </F>
     
    Fredrik Lundh, Nov 14, 2006
    #6
  7. Peter Otten Guest

    wrote:

    > Cheers for the reply.
    >
    > But I'm still having a spot of bother with the === addition
    >
    > it would seem that if there is no whitespace after the ===test
    > then the new === gets added to the next line
    >
    > e.g file contains:
    >
    > ===test (and then no whitesapace/carriage returns or anything)
    >
    > and the result is:
    >
    > ===test
    > ===


    You'd get the above with Fredrik's solution if there is a newline. That's
    why I put in the rstrip() method call (which removes trailing whitespace)
    and added an explicit "\n" (the Python way to spell newline). With my
    approach

    if line.startswith("==="):
    line = line.rstrip() + "===\n"

    you should always get

    ===test===(and then a newline)

    Peter
     
    Peter Otten, Nov 14, 2006
    #7
  8. John Machin Guest

    wrote:
    > I've been working on some code to search for specific textstrings and
    > act upon them insome way. I've got the conversion sorted


    What does that mean? There is no sort in the computer sense, and if you
    mean as in "done" ...

    > however there
    > is 1 problem remaining.
    >
    > I am trying to work out how to make it find a string like this "==="
    > and when it has found it, I want it to add "===" to the end of the
    > line.


    The answer is at the end. Now take a deep breath, and read on carefully
    and calmly:

    >
    > For example.
    >
    > The text file contains this:
    >
    > ===Heading
    >
    > and I am trying to make it be processed and outputted as a .dat file
    > with the contents
    >
    > ===Heading===
    >
    > Here's the code I have got so far.
    >
    > import string


    Not needed for this task. In fact the string module has only minimal
    use these days. From what book or tutorial did you get the idea to use
    result = string.replace(source_string, old, new) instead of result =
    source_string.replace(old, new) sometimes? You should be using the
    result = source_string.replace(old, new) way all the time.

    What version of Python are you using?

    > import glob
    > import os
    >
    > mydir = os.getcwd()
    > newdir = mydir#+"\\Test\\";


    Try and make a real comment obvious; don't do what you did -- *delete*
    unwanted code; alternatively if it may be wanted in the future, put in
    a real comment to say why.

    What was the semicolon for?

    Consider using os.path.join() -- it's portable. Don't say "But my code
    will only ever be run on Windows". If you write code like that, it will
    be a self-fulfilling prophecy -- no-one will want try to run it
    anywhere else.

    >
    > for filename in glob.glob1(newdir,"*.txt"):
    > #print "This is an input file: " + filename

    No it isn't; it's a *name* of a file
    > fileloc = newdir+"\\"+filename
    > #print fileloc
    >
    > outputname = filename
    > outputfile = string.replace(outputname,'.txt','.dat')


    No again, it's not a file.

    Try outputname = filename.replace('.txt', '.dat')
    Also consider what happens if the name of the input file is foo.txt.txt
    [can happen]

    > #print filename
    > #print a
    >
    > print "This is an input file: " + filename + ". Output file:
    > "+outputfile


    No it isn't.


    >
    > #temp = newdir + "\\" + outputfile
    > #print temp
    >
    >
    > fpi = open(fileloc);
    > fpo = open(outputfile,"w+");


    Why the "+"?
    Semi-colons?

    >
    > output_lines = []


    Why not just write as you go? What happens with a 1GB file? How much
    memory do you have on your computer?


    > lines = fpi.readlines()


    Whoops. That's now 2GB min of memory you need

    >
    > for line in lines:


    No, use "for line in fpi"

    > if line.rfind("--------------------") is not -1:


    Quick, somebody please count the "-" signs in there; we'd really like
    to know what this program is doing. If there are more identical
    characters than you have fingers on your hand, don't do that. Use
    character.repeat(count). Then consider giving it a name. Consider
    putting in a comment to explain what your code is doing. If you can,
    like why use rfind instead of find -- both will give the same result if
    there are 0 or 1 occurrences of the sought string, and you aren't using
    the position if there are 1 or more occurences. Then consider that if
    you need a a comment for code like that, then maybe your variable names
    are not very meaningful.

    > new = line.replace("------------------","----")


    Is that the same number of "-"? Are you sure?

    > elif line.rfind("img:") is not -1:
    > new = line.replace("img:","[[Image:")
    > elif line.rfind(".jpg") is not -1:
    > new = line.replace(".jpg",".jpg]]")


    That looks like a pattern to me. Consider setting up a list of (old,
    new) tuples and looping over it.

    > elif line.rfind(".gif") is not -1:
    > new = line.replace(".gif",".gif]]")
    > else:
    > output_lines.append(line);
    > continue
    > output_lines.append(new);
    >


    Try this:
    else:
    new = line
    fpo.write(new)

    > for line in output_lines:
    > fpo.write(line)
    >
    > fpi.close()
    > fpo.flush()


    News to me that close() doesn't automatically do flush() on a file
    that's been open for writing.

    > fpo.close()
    >
    >
    > I hope this gets formatted correctly :p
    >
    > Cheers, hope you can help.


    Answer to your question:

    string1 in string2 beats string2.[r]find(string1) for readability and
    (maybe) for speed too

    elif "===" in line: # should be same to assume your audience can count
    to 3
    new = line[:-1] + "===\n"

    HTH,
    John
     
    John Machin, Nov 14, 2006
    #8
  9. John Machin Guest

    John Machin wrote:

    > new = line[:-1] + "===\n"


    To allow for cases where the last line in the file is not terminated
    [can happen],
    this should be:

    new = line.rstrip("\n") + "===\n"
    # assuming you want to fix the unterminated problem.

    Cheers,
    John
     
    John Machin, Nov 14, 2006
    #9
  10. Guest

    Thanks Fredrik, Peter and John for your help.

    John, I especially enjoyed your line by line assasination of my code,
    keep it up.

    I'm not a programmer, I dislike programming, I'm bad at it. I just
    agreed to do this to help someone out, I didn't even know what python
    was 3 days ago.

    In case you were wondering about all the crazyness with the -------'s -
    it's because I am trying to batch convert 1600 files into new versions
    with slightly altered syntax.

    It all works for now, hurrah, now it's time to break it again.

    Cheerio fellas (for now, I'll be back I'm sure ;-D)
     
    , Nov 14, 2006
    #10
  11. wrote:
    > I've been working on some code to search for specific textstrings and
    > act upon them insome way. I've got the conversion sorted however there
    > is 1 problem remaining.
    >
    > I am trying to work out how to make it find a string like this "==="
    > and when it has found it, I want it to add "===" to the end of the
    > line.
    >
    > For example.
    >
    > The text file contains this:
    >
    > ===Heading
    >
    > and I am trying to make it be processed and outputted as a .dat file
    > with the contents
    >
    > ===Heading===
    >
    > Here's the code I have got so far.
    >
    > import string
    > import glob
    > import os
    >
    > mydir = os.getcwd()
    > newdir = mydir#+"\\Test\\";
    >
    > for filename in glob.glob1(newdir,"*.txt"):
    > #print "This is an input file: " + filename
    > fileloc = newdir+"\\"+filename
    > #print fileloc
    >
    > outputname = filename
    > outputfile = string.replace(outputname,'.txt','.dat')
    > #print filename
    > #print a
    >
    > print "This is an input file: " + filename + ". Output file:
    > "+outputfile
    >
    > #temp = newdir + "\\" + outputfile
    > #print temp
    >
    >
    > fpi = open(fileloc);
    > fpo = open(outputfile,"w+");
    >
    > output_lines = []
    > lines = fpi.readlines()
    >
    > for line in lines:
    > if line.rfind("--------------------") is not -1:
    > new = line.replace("--------------------","----")
    > elif line.rfind("img:") is not -1:
    > new = line.replace("img:","[[Image:")
    > elif line.rfind(".jpg") is not -1:
    > new = line.replace(".jpg",".jpg]]")
    > elif line.rfind(".gif") is not -1:
    > new = line.replace(".gif",".gif]]")
    > else:
    > output_lines.append(line);
    > continue
    > output_lines.append(new);
    >
    > for line in output_lines:
    > fpo.write(line)
    >
    > fpi.close()
    > fpo.flush()
    > fpo.close()
    >
    >
    > I hope this gets formatted correctly :p
    >
    > Cheers, hope you can help.
    >
    >


    Here's a suggestion:

    >>> import SE
    >>> Editor = SE.SE ('--------------------==---- img:=[[Image:

    ..jpg=.jpg]] .gif=.gif]]')
    >>> Editor ('-------------------- img: .jpg .gif') # See if it works

    '------------------------ [[Image: .jpg]] .gif]]'

    It works. (Add in other replacements if the need arises.)

    Works linewise

    >>> for line in f:

    new_line = Editor
    (line)
    ...

    Or filewise, which comes in handy in your case:

    >>> for in_filename in glob.glob (newdir+'/*.txt'):

    out_filename = in_filename.replace ('.txt','.dat')
    Editor (in_filename, out_filename)


    See if that helps. Find SE here: http://cheeseshop.python.org/pypi/SE/2.3

    Frederic
     
    Frederic Rentsch, Nov 14, 2006
    #11
  12. John Machin Guest

    wrote:
    > Thanks Fredrik, Peter and John for your help.
    >
    > John, I especially enjoyed your line by line assasination of my code,
    > keep it up.
    >
    > I'm not a programmer, I dislike programming, I'm bad at it. I just
    > agreed to do this to help someone out, I didn't even know what python
    > was 3 days ago.
    >


    I would have to disagree strongly with the "I'm bad at it". Everything
    is relative. I've seen mind-bogglingly fugly incoherent messes produced
    by people who claim to be professional programmers with 3 *years*
    experience in a language (only rarely in Python). To have produced what
    you did -- it was clear enough what you were trying to do, and it
    "worked" well enough for a one-off job -- with 3 *days* experience with
    Python was a remarkable achievement IMHO.

    What's this "dislike programming" business? No such concept :)

    Cheers,
    John
     
    John Machin, Nov 14, 2006
    #12
  13. On 14 Nov 2006 03:44:43 -0800, "John Machin" <>
    declaimed the following in comp.lang.python:

    >
    > >
    > > for line in lines:

    >

    <snip>
    > Is that the same number of "-"? Are you sure?
    >
    > > elif line.rfind("img:") is not -1:
    > > new = line.replace("img:","[[Image:")
    > > elif line.rfind(".jpg") is not -1:
    > > new = line.replace(".jpg",".jpg]]")

    >
    > That looks like a pattern to me. Consider setting up a list of (old,
    > new) tuples and looping over it.
    >

    Even worse...

    If "line" contains, say, "img:something.jpg", the first "elif" will
    make "new" look like "[[Image:something.jpg"... AND ALL OTHER ELIFs are
    skipped, so the closing ]] substitution IS NOT DONE!

    Something like:

    elif "img:" in line:
    out = line.replace("img:", "[[Image:")
    if ".jpg" in line:
    out = out.replace(".jpg", ".jpg]]")
    if ".gif" in line:
    out = out.replace(".gif", ".gif]]")

    will ensure that both ends of the img: tag is processed (unless it is
    expected that the file name will wrap to the next line).
    --
    Wulfraed Dennis Lee Bieber KD6MOG

    HTTP://wlfraed.home.netcom.com/
    (Bestiaria Support Staff: )
    HTTP://www.bestiaria.com/
     
    Dennis Lee Bieber, Nov 14, 2006
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Keith-Earl
    Replies:
    1
    Views:
    458
    Mary Chipman
    Jun 15, 2004
  2. Jeff Goslin

    Newbie seeking VB to ANSI C Conversion assistance

    Jeff Goslin, Nov 5, 2003, in forum: C Programming
    Replies:
    14
    Views:
    723
    Jimmy
    Nov 23, 2003
  3. Hubert Hung-Hsien Chang
    Replies:
    2
    Views:
    517
    Michael Foord
    Sep 17, 2004
  4. Replies:
    10
    Views:
    492
    Noah Roberts
    Oct 6, 2006
  5. Baron Samedi
    Replies:
    7
    Views:
    401
    Anand Hariharan
    Mar 30, 2009
Loading...

Share This Page