Help needed to retrieve text from a text-file using RegEx

  • Thread starter Bruno Desthuilliers
  • Start date
B

Bruno Desthuilliers

Oltmans a écrit :
Here is the scenario:

It's a command line program. I ask user for a input string. Based on
that input string I retrieve text from a text file. My text file looks
like following

Text-file:
-------------
AbcManager=C:\source\code\Modules\Code-AbcManager\
AbcTest=C:\source\code\Modules\Code-AbcTest\
DecConnector=C:\source\code\Modules\Code-DecConnector\
GHIManager=C:\source\code\Modules\Code-GHIManager\
JKLConnector=C:\source\code\Modules\Code-JKLConnector

-------------

So now if I run the program and user enters

DecConnector

Then I'm supposed to show them this text "C:\source\code\Modules\Code-
DecConnector" from the text-file. Right now I'm retrieving using the
following code which seems quite ineffecient and inelegant at the same
time

with open('MyTextFile.txt')

This will lookup for MyFile.txt in the system's current working
directory - which is not necessarily in the script's directory.

this shadows the builtin's 'file' symbol.
for line in file:

if mName in line: #mName is the string that
contains user input
Path =str(line).strip('\n')

'line' is already a string.
tempStr=Path

Path=tempStr.replace(mName+'=',"",1)

You don't need the temporary variable here. Also, you may want to use
str.split instead:


# NB : renaming for conformity to
# Python's official naming conventions

# 'name' => what the user looks for
# 'path_to_file' => fully qualified path to the 'database' file

target = "%s=" % name # what we are really looking for

with open(path_to_file) as the_file:
for line in the_file:
# special bonus : handles empty lines and 'comment' lines
# feel free to comment out the thre following lines if
# you're sure you don't need them !-)
line = line.strip()
if not line or line.startswith('#') or line.startswith(';'):
continue

# faster and simpler than a regexp
if line.startswith(target):
# since the '=' is in target, we can safely assume
# that line.split('=') will return at least a
# 2-elements list
path = line.split('=')[1]
# no need to look further
break
else:
# target not found...
path = None


I was wondering if using RegEx will make this look better.

I don't think so. Really.
 
O

Oltmans

Here is the scenario:

It's a command line program. I ask user for a input string. Based on
that input string I retrieve text from a text file. My text file looks
like following

Text-file:
-------------
AbcManager=C:\source\code\Modules\Code-AbcManager\
AbcTest=C:\source\code\Modules\Code-AbcTest\
DecConnector=C:\source\code\Modules\Code-DecConnector\
GHIManager=C:\source\code\Modules\Code-GHIManager\
JKLConnector=C:\source\code\Modules\Code-JKLConnector

-------------

So now if I run the program and user enters

DecConnector

Then I'm supposed to show them this text "C:\source\code\Modules\Code-
DecConnector" from the text-file. Right now I'm retrieving using the
following code which seems quite ineffecient and inelegant at the same
time

with open('MyTextFile.txt') as file:

for line in file:

if mName in line: #mName is the string that
contains user input

Path =str(line).strip('\n')

tempStr=Path

Path=tempStr.replace(mName+'=',"",1)

I was wondering if using RegEx will make this look better. If so, can
you please suggest a Regular Expression for this? Any help is highly
appreciated. Thank you.
 
C

Chris Rebert

Here is the scenario:

It's a command line program. I ask user for a input string. Based on
that input string I retrieve text from a text file. My text file looks
like following

Text-file:
-------------
AbcManager=C:\source\code\Modules\Code-AbcManager\
AbcTest=C:\source\code\Modules\Code-AbcTest\
DecConnector=C:\source\code\Modules\Code-DecConnector\
GHIManager=C:\source\code\Modules\Code-GHIManager\
JKLConnector=C:\source\code\Modules\Code-JKLConnector

-------------

So now if I run the program and user enters

DecConnector

Then I'm supposed to show them this text "C:\source\code\Modules\Code-
DecConnector" from the text-file. Right now I'm retrieving using the
following code which seems quite ineffecient and inelegant at the same
time

with open('MyTextFile.txt') as file:

for line in file:

if mName in line: #mName is the string that
contains user input

Path =str(line).strip('\n')

tempStr=Path

Path=tempStr.replace(mName+'=',"",1)

I was wondering if using RegEx will make this look better. If so, can
you please suggest a Regular Expression for this? Any help is highly
appreciated. Thank you.

If I might repeat Jamie Zawinski's immortal quote:
Some people, when confronted with a problem, think "I know, I'll
use regular expressions." Now they have two problems.

If you add one section header (e.g. "[main]") to the top of the file,
you'll have a valid INI-format file which can be parsed by the
ConfigParser module --
http://docs.python.org/library/configparser.html

Cheers,
Chris
 
R

rdmurray

Oltmans said:
Here is the scenario:

It's a command line program. I ask user for a input string. Based on
that input string I retrieve text from a text file. My text file looks
like following

Text-file:
-------------
AbcManager=C:\source\code\Modules\Code-AbcManager\
AbcTest=C:\source\code\Modules\Code-AbcTest\
DecConnector=C:\source\code\Modules\Code-DecConnector\
GHIManager=C:\source\code\Modules\Code-GHIManager\
JKLConnector=C:\source\code\Modules\Code-JKLConnector

-------------

So now if I run the program and user enters

DecConnector

Then I'm supposed to show them this text "C:\source\code\Modules\Code-
DecConnector" from the text-file. Right now I'm retrieving using the
following code which seems quite ineffecient and inelegant at the same
time

with open('MyTextFile.txt') as file:
for line in file:
if mName in line: #mName is the string that contains user input
Path =str(line).strip('\n')
tempStr=Path
Path=tempStr.replace(mName+'=',"",1)

I've normalized your indentation and spacing, for clarity.
I was wondering if using RegEx will make this look better. If so, can
you please suggest a Regular Expression for this? Any help is highly
appreciated. Thank you.

This smells like it might be homework, but I'm hoping you'll learn some
useful python from what follows regardless of whether it is or not.

Since your complaint is that the above code is inelegant and inefficient,
let's clean it up. The first three lines that open the file and set up
your loop are good, and I think you will agree that they are pretty clean.
So, I'm just going to help you clean up the loop body.

'line' is already a string, since it was read from a file. No need to
wrap it in 'str':

Path = line.strip('\n')
tempStr=Path
Path=tempStr.replace(mName+'=',"",1)

'strip' removes characters from _both_ ends of the string. If you are
trying to make sure that you _only_ strip a trailing newline, then you
should be using rstrip. If, on the other hand, you just want to get
rid of any leading or trailing whitespace, you could just call 'strip()'.
Since your goal is to print the text from after the '=', I'll assume
that stripping whitespace is desirable:

Path = line.strip()
tempStr=Path
Path=tempStr.replace(mName+'=',"",1)

The statement 'tempStr=Path' doesn't do what you think it does.
It just creates an alternate name for the string pointed to by Path.
Further, there is no need to have an intermediate variable to hold a
value during transformation. The right hand side is computed, using
the current values of any variables mentioned, and _then_ the left hand
side is rebound to point to the result of the computation. So we can
just drop that line entirely, and use 'Path' in the 'replace' statement:

Path = line.strip()
Path = Path.replace(mName+'=',"",1)

However, you can also chain method calls, so really there's no need for
two statements here, since both calls are simple:

Path = line.strip().replace(mName+'=',"",1)

To make things even simpler, Python has a 'split' function. Given the
syntax of your input file I think we can assume that '=' never appears
in a variable name. split returns a list of strings constructed by
breaking the input string at the split character, and it has an optional
argument that gives the maximum number of splits to make. So by doing
'split('=', 1), we will get back a list consisting of the variable name
and the remainder of the line. The remainder of the line is exactly
what you are looking for, and that will be the second element of the
returned list. So now your loop body is:

Path = line.strip().split('=', 1)[1]

and your whole loop looks like this:

with open('MyTextFile.txt') as file:
for line in file:
if mName in line:
Path = line.strip().split('=', 1)[1]

I think that looks pretty elegant. Oh, and you might want to add a
'break' statement to the loop, and also an 'else:' clause (to the for
loop) so you can issue a 'not found' message to the user if they type
in a name that does not appear in the input file.

--RDM
 
P

Paul McGuire

Here is the scenario:

It's a command line program. I ask user for a input string. Based on
that input string I retrieve text from a text file. My text file looks
like following

Text-file:
-------------
AbcManager=C:\source\code\Modules\Code-AbcManager\
AbcTest=C:\source\code\Modules\Code-AbcTest\
DecConnector=C:\source\code\Modules\Code-DecConnector\
GHIManager=C:\source\code\Modules\Code-GHIManager\
JKLConnector=C:\source\code\Modules\Code-JKLConnector

Assuming the text-file is in the under-30Mb size, I would just read
the whole thing into a dict at startup, and then use the dict over and
over.

data = file(filename).read()
lookup = dict( line.split('=',1) for line in data.splitlines() if
line )

# now no further need to access text file, just use lookup variable

while True:
user_entry = raw_input("Lookup key: ").strip()
if not user_entry:
break
if user_entry in lookup:
print lookup[user_entry]
else:
print "No entry for '%s'" % user_entry
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,731
Messages
2,569,432
Members
44,832
Latest member
GlennSmall

Latest Threads

Top