re.search (works)|(doesn't work) depending on for loop order

S

sgharvey

.... and by works, I mean works like I expect it to.

I'm writing my own cheesy config.ini parser because ConfigParser
doesn't preserve case or order of sections, or order of options w/in
sections.

What's confusing me is this:
If I try matching every line to one pattern at a time, all the
patterns that are supposed to match, actually match.
If I try to match every pattern to one line at a time, only one
pattern will match.

What am I not understanding about re.search?

Doesn't match properly:
<code>
# Iterate through each pattern for each line
for line in lines:
for pattern in patterns:
# Match each pattern to the current line
match = patterns[pattern].search(line)
if match:
"%s: %s" % (pattern, str(match.groups()) )
</code>

_Does_ match properly:
<code>
# Let's iterate through all the lines for each pattern
for pattern in pattern:
for line in lines:
# Match each pattern to the current line
match = patterns[pattern].search(line)
if match:
"%s: %s" % (pattern, str(match.groups()) )
</code>

Related code:
The whole src
http://pastebin.com/f63298772
regexen and delimiters (imported into whole src)
http://pastebin.com/f485ac180
 
M

Marc 'BlackJack' Rintsch

... and by works, I mean works like I expect it to.

I'm writing my own cheesy config.ini parser because ConfigParser
doesn't preserve case or order of sections, or order of options w/in
sections.

What's confusing me is this:
If I try matching every line to one pattern at a time, all the
patterns that are supposed to match, actually match.
If I try to match every pattern to one line at a time, only one
pattern will match.

What am I not understanding about re.search?

That has nothing to do with `re.search` but how files work. A file has a
"current position marker" that is advanced at each iteration to the next
line in the file. When it is at the end, it stays there, so you can just
iterate *once* over an open file unless you rewind it with the `seek()`
method.

That only works on "seekable" files and it's not a good idea anyway
because usually the files and the overhead of reading is greater than the
time to iterate over in memory data like the patterns.

Ciao,
Marc 'BlackJack' Rintsch
 
J

John Machin

That has nothing to do with `re.search` but how files work. A file has a
"current position marker" that is advanced at each iteration to the next
line in the file. When it is at the end, it stays there, so you can just
iterate *once* over an open file unless you rewind it with the `seek()`
method.

That only works on "seekable" files and it's not a good idea anyway
because usually the files and the overhead of reading is greater than the
time to iterate over in memory data like the patterns.

Unless the OP has changed the pastebin code since you read it, that's
absolutely nothing to do with his problem -- his pastebin code slurps
in the whole .ini file using file.readlines; it is not iterating over
an open file.
 
J

John Machin

... and by works, I mean works like I expect it to.

You haven't told us what you expect it to do. In any case, your
subject heading indicates that the problem is 99.999% likely to be in
your logic -- the converse would require the result of re.compile() to
retain some memory of what it's seen before *AND* to act differently
depending somehow on those memorised facts.
I'm writing my own cheesy config.ini parser because ConfigParser
doesn't preserve case or order of sections, or order of options w/in
sections.

What's confusing me is this:
If I try matching every line to one pattern at a time, all the
patterns that are supposed to match, actually match.
If I try to match every pattern to one line at a time, only one
pattern will match.

What am I not understanding about re.search?

Its behaviour is not contingent on previous input.

The following pseudocode is not very useful; the corrections I have
made below can be made only after reading the actual pastebin code :-
( ... you are using the name "pattern" to refer both to a pattern name
(e.g. 'setting') and to a compiled regex.
Doesn't match properly:
<code>
# Iterate through each pattern for each line
for line in lines:
for pattern in patterns:

you mean: for pattern_name in pattern_names:
# Match each pattern to the current line
match = patterns[pattern].search(line)

you mean: match = compiled_regexes[pattern_name].search(line)
if match:
"%s: %s" % (pattern, str(match.groups()) )

you mean: print pattern_name, match.groups
</code>

_Does_ match properly:
<code> [snip]

</code>

Related code:
The whole src http://pastebin.com/f63298772

This can't be the code that you ran, because it won't even compile.
See comments in my update at http://pastebin.com/m77f0617a

By the way, you should be either (a) using *match* (not search) with a
\Z at the end of each pattern or (b) checking that there is not
extraneous guff at the end of the line ... otherwise a line like
"[blah] waffle" would be classified as a "section".

Have you considered leading/trailing/embedded spaces?
regexen and delimiters (imported into whole src) http://pastebin.com/f485ac180

HTH,
John
 
B

Brian Lane

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
... and by works, I mean works like I expect it to.

I'm writing my own cheesy config.ini parser because ConfigParser
doesn't preserve case or order of sections, or order of options w/in
sections.

What's confusing me is this:
If I try matching every line to one pattern at a time, all the
patterns that are supposed to match, actually match.
If I try to match every pattern to one line at a time, only one
pattern will match.

I don't see that behavior when I try your code. I had to fix your
pattern loading:

patterns[pattern] = re.compile(pattern_strings[pattern], re.VERBOSE)

I would also recommend against using both the plural and singular
variable names, its bound to cause confusion eventually.

I also changed contents to self.contents so that it would be accessible
outside the class.

The correct way to do it is run each pattern against each line. This
will maintain the order of the config.ini file. If you do it the other
way you will end up with everything ordered based on the patterns
instead of the file.

I tried it with Python2.5 on OSX from within TextMate and it ran as
expected.

Brian

- --
- ---[Office 70.9F]--[Outside 54.5F]--[Server 103.3F]--[Coaster 68.0F]---
- ---[ KLAHOWYA WSF (366773110) @ 47 31.2076 -122 27.2249 ]---
Software, Linux, Microcontrollers http://www.brianlane.com
AIS Parser SDK http://www.aisparser.com
Movie Landmarks Search Engine http://www.movielandmarks.com

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Remember Lexington Green!

iD8DBQFH5ZHaIftj/pcSws0RAigtAJsE+NWTxwV5kO797P6AXhNTEp8dmQCfXL9I
y0nD/oOfNw6ZR6UZIOvwkkE=
=U+Zo
-----END PGP SIGNATURE-----
 
G

Gabriel Genellina

... and by works, I mean works like I expect it to.

I'm writing my own cheesy config.ini parser because ConfigParser
doesn't preserve case or order of sections, or order of options w/in
sections.

Take a look at ConfigObj http://pypi.python.org/pypi/ConfigObj/

Instead of:

# Remove the '\n's from the end of each line
lines = [line[0:line.__len__()-1] for line in lines]

line.__len__() is a crazy (and ugly) way of spelling len(line). The
comment is misleading; you say you remove '\n's but you don't actually
check for them. The last line in the file might not have a trailing \n.
See this:

lines = [line.rstrip('\n') for line in lines]

Usually trailing spaces are ignored too; so you end up writing:

lines = [line.rstrip() for line in lines]

In this case:
# Compile the regexen
patterns = {}
for pattern in pattern_strings:
patterns.update(pattern: re.compile(pattern_strings[pattern],
re.VERBOSE))

That code does not even compile. I got lost with all those similar names;
try to choose meaningful ones. What about this:

patterns = {}
for name,regexpr in pattern_strings.iteritems():
patterns[name] = re.compile(regexpr, re.VERBOSE))

or even:

patterns = dict((name,re.compile(regexpr, re.VERBOSE))
for name,regexpr in pattern_strings.iteritems()

or even compile them directly when you define them.

I'm not sure you can process a config file in this unstructured way; looks
a lot easier if you look for [sections] and process sequentially lines
inside sections.

if match:
content.update({pattern: match.groups()})

I wonder where you got the idea of populating a dict that way. It's a
basic operation:
content[name] = value

The regular expressions look strange too. A comment may be empty. A
setting too. There may be spaces around the = sign. Don't try to catch all
in one go.
 
S

sgharvey

En Sat, 22 Mar 2008 17:27:49 -0300, sgharvey <[email protected]>
escribi�:
Take a look at ConfigObjhttp://pypi.python.org/pypi/ConfigObj/

Thanks for the pointer; I'll check it out.
I'm not sure you can process a config file in this unstructured way; looks
a lot easier if you look for [sections] and process sequentially lines
inside sections.

It works though... now that I've fixed up all my ugly stuff, and a
dumb logic error or two.
The regular expressions look strange too. A comment may be empty. A
setting too. There may be spaces around the = sign. Don't try to catch all
in one go.

I didn't think about empty comments/settings... fixed now.
It also seemed simpler to handle surrounding spaces after the match
was found.

New version of the problematic part:
<code>
self.contents = []
content = {}
# Get the content in each line
for line in lines:
for name in patterns:
# Match each pattern to the current line
match = patterns[name].search(line)
if match:
content[name] = match.group(0).strip()
self.contents.append(content)
content = {}
</code>

new iniparsing.py
http://pastebin.com/f445701d4

new ini_regexen_dicts.py
http://pastebin.com/f1e41cd3d


Much thanks to all for the constructive criticism.

Samuel Harvey
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top