Help needed to retrieve text from a text-file using RegEx

Discussion in 'Python' started by Bruno Desthuilliers, Feb 9, 2009.

  1. Oltmans a écrit :
    > Here is the scenario:
    >
    > It's a command line program. I ask user for a input string. Based on
    > that input string I retrieve text from a text file. My text file looks
    > like following
    >
    > Text-file:
    > -------------
    > AbcManager=C:\source\code\Modules\Code-AbcManager\
    > AbcTest=C:\source\code\Modules\Code-AbcTest\
    > DecConnector=C:\source\code\Modules\Code-DecConnector\
    > GHIManager=C:\source\code\Modules\Code-GHIManager\
    > JKLConnector=C:\source\code\Modules\Code-JKLConnector
    >
    > -------------
    >
    > So now if I run the program and user enters
    >
    > DecConnector
    >
    > Then I'm supposed to show them this text "C:\source\code\Modules\Code-
    > DecConnector" from the text-file. Right now I'm retrieving using the
    > following code which seems quite ineffecient and inelegant at the same
    > time
    >
    > with open('MyTextFile.txt')


    This will lookup for MyFile.txt in the system's current working
    directory - which is not necessarily in the script's directory.

    > as file:


    this shadows the builtin's 'file' symbol.

    > for line in file:
    >
    > if mName in line: #mName is the string that
    > contains user input


    >
    > Path =str(line).strip('\n')


    'line' is already a string.

    > tempStr=Path
    >
    > Path=tempStr.replace(mName+'=',"",1)


    You don't need the temporary variable here. Also, you may want to use
    str.split instead:


    # NB : renaming for conformity to
    # Python's official naming conventions

    # 'name' => what the user looks for
    # 'path_to_file' => fully qualified path to the 'database' file

    target = "%s=" % name # what we are really looking for

    with open(path_to_file) as the_file:
    for line in the_file:
    # special bonus : handles empty lines and 'comment' lines
    # feel free to comment out the thre following lines if
    # you're sure you don't need them !-)
    line = line.strip()
    if not line or line.startswith('#') or line.startswith(';'):
    continue

    # faster and simpler than a regexp
    if line.startswith(target):
    # since the '=' is in target, we can safely assume
    # that line.split('=') will return at least a
    # 2-elements list
    path = line.split('=')[1]
    # no need to look further
    break
    else:
    # target not found...
    path = None



    > I was wondering if using RegEx will make this look better.


    I don't think so. Really.
    Bruno Desthuilliers, Feb 9, 2009
    #1
    1. Advertising

  2. Bruno Desthuilliers

    Oltmans Guest

    Here is the scenario:

    It's a command line program. I ask user for a input string. Based on
    that input string I retrieve text from a text file. My text file looks
    like following

    Text-file:
    -------------
    AbcManager=C:\source\code\Modules\Code-AbcManager\
    AbcTest=C:\source\code\Modules\Code-AbcTest\
    DecConnector=C:\source\code\Modules\Code-DecConnector\
    GHIManager=C:\source\code\Modules\Code-GHIManager\
    JKLConnector=C:\source\code\Modules\Code-JKLConnector

    -------------

    So now if I run the program and user enters

    DecConnector

    Then I'm supposed to show them this text "C:\source\code\Modules\Code-
    DecConnector" from the text-file. Right now I'm retrieving using the
    following code which seems quite ineffecient and inelegant at the same
    time

    with open('MyTextFile.txt') as file:

    for line in file:

    if mName in line: #mName is the string that
    contains user input

    Path =str(line).strip('\n')

    tempStr=Path

    Path=tempStr.replace(mName+'=',"",1)

    I was wondering if using RegEx will make this look better. If so, can
    you please suggest a Regular Expression for this? Any help is highly
    appreciated. Thank you.
    Oltmans, Feb 9, 2009
    #2
    1. Advertising

  3. Bruno Desthuilliers

    Chris Rebert Guest

    On Mon, Feb 9, 2009 at 9:22 AM, Oltmans <> wrote:
    > Here is the scenario:
    >
    > It's a command line program. I ask user for a input string. Based on
    > that input string I retrieve text from a text file. My text file looks
    > like following
    >
    > Text-file:
    > -------------
    > AbcManager=C:\source\code\Modules\Code-AbcManager\
    > AbcTest=C:\source\code\Modules\Code-AbcTest\
    > DecConnector=C:\source\code\Modules\Code-DecConnector\
    > GHIManager=C:\source\code\Modules\Code-GHIManager\
    > JKLConnector=C:\source\code\Modules\Code-JKLConnector
    >
    > -------------
    >
    > So now if I run the program and user enters
    >
    > DecConnector
    >
    > Then I'm supposed to show them this text "C:\source\code\Modules\Code-
    > DecConnector" from the text-file. Right now I'm retrieving using the
    > following code which seems quite ineffecient and inelegant at the same
    > time
    >
    > with open('MyTextFile.txt') as file:
    >
    > for line in file:
    >
    > if mName in line: #mName is the string that
    > contains user input
    >
    > Path =str(line).strip('\n')
    >
    > tempStr=Path
    >
    > Path=tempStr.replace(mName+'=',"",1)
    >
    > I was wondering if using RegEx will make this look better. If so, can
    > you please suggest a Regular Expression for this? Any help is highly
    > appreciated. Thank you.


    If I might repeat Jamie Zawinski's immortal quote:
    Some people, when confronted with a problem, think "I know, I'll
    use regular expressions." Now they have two problems.

    If you add one section header (e.g. "[main]") to the top of the file,
    you'll have a valid INI-format file which can be parsed by the
    ConfigParser module --
    http://docs.python.org/library/configparser.html

    Cheers,
    Chris

    --
    Follow the path of the Iguana...
    http://rebertia.com
    Chris Rebert, Feb 9, 2009
    #3
  4. Bruno Desthuilliers

    Guest

    Oltmans <> wrote:
    > Here is the scenario:
    >
    > It's a command line program. I ask user for a input string. Based on
    > that input string I retrieve text from a text file. My text file looks
    > like following
    >
    > Text-file:
    > -------------
    > AbcManager=C:\source\code\Modules\Code-AbcManager\
    > AbcTest=C:\source\code\Modules\Code-AbcTest\
    > DecConnector=C:\source\code\Modules\Code-DecConnector\
    > GHIManager=C:\source\code\Modules\Code-GHIManager\
    > JKLConnector=C:\source\code\Modules\Code-JKLConnector
    >
    > -------------
    >
    > So now if I run the program and user enters
    >
    > DecConnector
    >
    > Then I'm supposed to show them this text "C:\source\code\Modules\Code-
    > DecConnector" from the text-file. Right now I'm retrieving using the
    > following code which seems quite ineffecient and inelegant at the same
    > time
    >
    > with open('MyTextFile.txt') as file:
    > for line in file:
    > if mName in line: #mName is the string that contains user input
    > Path =str(line).strip('\n')
    > tempStr=Path
    > Path=tempStr.replace(mName+'=',"",1)


    I've normalized your indentation and spacing, for clarity.

    > I was wondering if using RegEx will make this look better. If so, can
    > you please suggest a Regular Expression for this? Any help is highly
    > appreciated. Thank you.


    This smells like it might be homework, but I'm hoping you'll learn some
    useful python from what follows regardless of whether it is or not.

    Since your complaint is that the above code is inelegant and inefficient,
    let's clean it up. The first three lines that open the file and set up
    your loop are good, and I think you will agree that they are pretty clean.
    So, I'm just going to help you clean up the loop body.

    'line' is already a string, since it was read from a file. No need to
    wrap it in 'str':

    Path = line.strip('\n')
    tempStr=Path
    Path=tempStr.replace(mName+'=',"",1)

    'strip' removes characters from _both_ ends of the string. If you are
    trying to make sure that you _only_ strip a trailing newline, then you
    should be using rstrip. If, on the other hand, you just want to get
    rid of any leading or trailing whitespace, you could just call 'strip()'.
    Since your goal is to print the text from after the '=', I'll assume
    that stripping whitespace is desirable:

    Path = line.strip()
    tempStr=Path
    Path=tempStr.replace(mName+'=',"",1)

    The statement 'tempStr=Path' doesn't do what you think it does.
    It just creates an alternate name for the string pointed to by Path.
    Further, there is no need to have an intermediate variable to hold a
    value during transformation. The right hand side is computed, using
    the current values of any variables mentioned, and _then_ the left hand
    side is rebound to point to the result of the computation. So we can
    just drop that line entirely, and use 'Path' in the 'replace' statement:

    Path = line.strip()
    Path = Path.replace(mName+'=',"",1)

    However, you can also chain method calls, so really there's no need for
    two statements here, since both calls are simple:

    Path = line.strip().replace(mName+'=',"",1)

    To make things even simpler, Python has a 'split' function. Given the
    syntax of your input file I think we can assume that '=' never appears
    in a variable name. split returns a list of strings constructed by
    breaking the input string at the split character, and it has an optional
    argument that gives the maximum number of splits to make. So by doing
    'split('=', 1), we will get back a list consisting of the variable name
    and the remainder of the line. The remainder of the line is exactly
    what you are looking for, and that will be the second element of the
    returned list. So now your loop body is:

    Path = line.strip().split('=', 1)[1]

    and your whole loop looks like this:

    with open('MyTextFile.txt') as file:
    for line in file:
    if mName in line:
    Path = line.strip().split('=', 1)[1]

    I think that looks pretty elegant. Oh, and you might want to add a
    'break' statement to the loop, and also an 'else:' clause (to the for
    loop) so you can issue a 'not found' message to the user if they type
    in a name that does not appear in the input file.

    --RDM
    , Feb 9, 2009
    #4
  5. Bruno Desthuilliers

    Paul McGuire Guest

    On Feb 9, 11:22 am, Oltmans <> wrote:
    > Here is the scenario:
    >
    > It's a command line program. I ask user for a input string. Based on
    > that input string I retrieve text from a text file. My text file looks
    > like following
    >
    > Text-file:
    > -------------
    > AbcManager=C:\source\code\Modules\Code-AbcManager\
    > AbcTest=C:\source\code\Modules\Code-AbcTest\
    > DecConnector=C:\source\code\Modules\Code-DecConnector\
    > GHIManager=C:\source\code\Modules\Code-GHIManager\
    > JKLConnector=C:\source\code\Modules\Code-JKLConnector
    >


    Assuming the text-file is in the under-30Mb size, I would just read
    the whole thing into a dict at startup, and then use the dict over and
    over.

    data = file(filename).read()
    lookup = dict( line.split('=',1) for line in data.splitlines() if
    line )

    # now no further need to access text file, just use lookup variable

    while True:
    user_entry = raw_input("Lookup key: ").strip()
    if not user_entry:
    break
    if user_entry in lookup:
    print lookup[user_entry]
    else:
    print "No entry for '%s'" % user_entry
    Paul McGuire, Feb 9, 2009
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Daren Hawes
    Replies:
    2
    Views:
    352
    Daren Hawes
    Jul 28, 2004
  2. Alvin Bruney - ASP.NET MVP

    Regex help needed

    Alvin Bruney - ASP.NET MVP, Sep 16, 2005, in forum: ASP .Net
    Replies:
    2
    Views:
    378
    Alvin Bruney - ASP.NET MVP
    Sep 16, 2005
  3. wtnt
    Replies:
    1
    Views:
    3,035
    Jonathan Turkanis
    Jan 30, 2004
  4. Replies:
    3
    Views:
    754
    Reedick, Andrew
    Jul 1, 2008
  5. Peter Vanderhaden

    help needed using a regex .sub

    Peter Vanderhaden, Nov 22, 2007, in forum: Ruby
    Replies:
    4
    Views:
    119
    7stud --
    Nov 23, 2007
Loading...

Share This Page