re.search (works)|(doesn't work) depending on for loop order

Discussion in 'Python' started by sgharvey, Mar 22, 2008.

  1. sgharvey

    sgharvey Guest

    .... and by works, I mean works like I expect it to.

    I'm writing my own cheesy config.ini parser because ConfigParser
    doesn't preserve case or order of sections, or order of options w/in
    sections.

    What's confusing me is this:
    If I try matching every line to one pattern at a time, all the
    patterns that are supposed to match, actually match.
    If I try to match every pattern to one line at a time, only one
    pattern will match.

    What am I not understanding about re.search?

    Doesn't match properly:
    <code>
    # Iterate through each pattern for each line
    for line in lines:
    for pattern in patterns:
    # Match each pattern to the current line
    match = patterns[pattern].search(line)
    if match:
    "%s: %s" % (pattern, str(match.groups()) )
    </code>

    _Does_ match properly:
    <code>
    # Let's iterate through all the lines for each pattern
    for pattern in pattern:
    for line in lines:
    # Match each pattern to the current line
    match = patterns[pattern].search(line)
    if match:
    "%s: %s" % (pattern, str(match.groups()) )
    </code>

    Related code:
    The whole src
    http://pastebin.com/f63298772
    regexen and delimiters (imported into whole src)
    http://pastebin.com/f485ac180
     
    sgharvey, Mar 22, 2008
    #1
    1. Advertising

  2. On Sat, 22 Mar 2008 13:27:49 -0700, sgharvey wrote:

    > ... and by works, I mean works like I expect it to.
    >
    > I'm writing my own cheesy config.ini parser because ConfigParser
    > doesn't preserve case or order of sections, or order of options w/in
    > sections.
    >
    > What's confusing me is this:
    > If I try matching every line to one pattern at a time, all the
    > patterns that are supposed to match, actually match.
    > If I try to match every pattern to one line at a time, only one
    > pattern will match.
    >
    > What am I not understanding about re.search?


    That has nothing to do with `re.search` but how files work. A file has a
    "current position marker" that is advanced at each iteration to the next
    line in the file. When it is at the end, it stays there, so you can just
    iterate *once* over an open file unless you rewind it with the `seek()`
    method.

    That only works on "seekable" files and it's not a good idea anyway
    because usually the files and the overhead of reading is greater than the
    time to iterate over in memory data like the patterns.

    Ciao,
    Marc 'BlackJack' Rintsch
     
    Marc 'BlackJack' Rintsch, Mar 22, 2008
    #2
    1. Advertising

  3. sgharvey

    John Machin Guest

    On Mar 23, 8:21 am, Marc 'BlackJack' Rintsch <> wrote:
    > On Sat, 22 Mar 2008 13:27:49 -0700, sgharvey wrote:
    > > ... and by works, I mean works like I expect it to.

    >
    > > I'm writing my own cheesy config.ini parser because ConfigParser
    > > doesn't preserve case or order of sections, or order of options w/in
    > > sections.

    >
    > > What's confusing me is this:
    > > If I try matching every line to one pattern at a time, all the
    > > patterns that are supposed to match, actually match.
    > > If I try to match every pattern to one line at a time, only one
    > > pattern will match.

    >
    > > What am I not understanding about re.search?

    >
    > That has nothing to do with `re.search` but how files work. A file has a
    > "current position marker" that is advanced at each iteration to the next
    > line in the file. When it is at the end, it stays there, so you can just
    > iterate *once* over an open file unless you rewind it with the `seek()`
    > method.
    >
    > That only works on "seekable" files and it's not a good idea anyway
    > because usually the files and the overhead of reading is greater than the
    > time to iterate over in memory data like the patterns.
    >


    Unless the OP has changed the pastebin code since you read it, that's
    absolutely nothing to do with his problem -- his pastebin code slurps
    in the whole .ini file using file.readlines; it is not iterating over
    an open file.
     
    John Machin, Mar 22, 2008
    #3
  4. sgharvey

    John Machin Guest

    On Mar 23, 7:27 am, sgharvey <> wrote:
    > ... and by works, I mean works like I expect it to.


    You haven't told us what you expect it to do. In any case, your
    subject heading indicates that the problem is 99.999% likely to be in
    your logic -- the converse would require the result of re.compile() to
    retain some memory of what it's seen before *AND* to act differently
    depending somehow on those memorised facts.

    >
    > I'm writing my own cheesy config.ini parser because ConfigParser
    > doesn't preserve case or order of sections, or order of options w/in
    > sections.
    >
    > What's confusing me is this:
    > If I try matching every line to one pattern at a time, all the
    > patterns that are supposed to match, actually match.
    > If I try to match every pattern to one line at a time, only one
    > pattern will match.
    >
    > What am I not understanding about re.search?


    Its behaviour is not contingent on previous input.

    The following pseudocode is not very useful; the corrections I have
    made below can be made only after reading the actual pastebin code :-
    ( ... you are using the name "pattern" to refer both to a pattern name
    (e.g. 'setting') and to a compiled regex.

    > Doesn't match properly:
    > <code>
    > # Iterate through each pattern for each line
    > for line in lines:
    > for pattern in patterns:


    you mean: for pattern_name in pattern_names:

    > # Match each pattern to the current line
    > match = patterns[pattern].search(line)


    you mean: match = compiled_regexes[pattern_name].search(line)

    > if match:
    > "%s: %s" % (pattern, str(match.groups()) )


    you mean: print pattern_name, match.groups
    > </code>
    >
    > _Does_ match properly:
    > <code>

    [snip]

    > </code>
    >
    > Related code:
    > The whole src http://pastebin.com/f63298772


    This can't be the code that you ran, because it won't even compile.
    See comments in my update at http://pastebin.com/m77f0617a

    By the way, you should be either (a) using *match* (not search) with a
    \Z at the end of each pattern or (b) checking that there is not
    extraneous guff at the end of the line ... otherwise a line like
    "[blah] waffle" would be classified as a "section".

    Have you considered leading/trailing/embedded spaces?

    > regexen and delimiters (imported into whole src) http://pastebin.com/f485ac180


    HTH,
    John
     
    John Machin, Mar 22, 2008
    #4
  5. sgharvey

    Brian Lane Guest

    -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1

    sgharvey wrote:
    > ... and by works, I mean works like I expect it to.
    >
    > I'm writing my own cheesy config.ini parser because ConfigParser
    > doesn't preserve case or order of sections, or order of options w/in
    > sections.
    >
    > What's confusing me is this:
    > If I try matching every line to one pattern at a time, all the
    > patterns that are supposed to match, actually match.
    > If I try to match every pattern to one line at a time, only one
    > pattern will match.


    I don't see that behavior when I try your code. I had to fix your
    pattern loading:

    patterns[pattern] = re.compile(pattern_strings[pattern], re.VERBOSE)

    I would also recommend against using both the plural and singular
    variable names, its bound to cause confusion eventually.

    I also changed contents to self.contents so that it would be accessible
    outside the class.

    The correct way to do it is run each pattern against each line. This
    will maintain the order of the config.ini file. If you do it the other
    way you will end up with everything ordered based on the patterns
    instead of the file.

    I tried it with Python2.5 on OSX from within TextMate and it ran as
    expected.

    Brian

    - --
    - ---[Office 70.9F]--[Outside 54.5F]--[Server 103.3F]--[Coaster 68.0F]---
    - ---[ KLAHOWYA WSF (366773110) @ 47 31.2076 -122 27.2249 ]---
    Software, Linux, Microcontrollers http://www.brianlane.com
    AIS Parser SDK http://www.aisparser.com
    Movie Landmarks Search Engine http://www.movielandmarks.com

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.7 (Darwin)
    Comment: Remember Lexington Green!

    iD8DBQFH5ZHaIftj/pcSws0RAigtAJsE+NWTxwV5kO797P6AXhNTEp8dmQCfXL9I
    y0nD/oOfNw6ZR6UZIOvwkkE=
    =U+Zo
    -----END PGP SIGNATURE-----
     
    Brian Lane, Mar 22, 2008
    #5
  6. En Sat, 22 Mar 2008 17:27:49 -0300, sgharvey <>
    escribi�:

    > ... and by works, I mean works like I expect it to.
    >
    > I'm writing my own cheesy config.ini parser because ConfigParser
    > doesn't preserve case or order of sections, or order of options w/in
    > sections.


    Take a look at ConfigObj http://pypi.python.org/pypi/ConfigObj/

    Instead of:

    # Remove the '\n's from the end of each line
    lines = [line[0:line.__len__()-1] for line in lines]

    line.__len__() is a crazy (and ugly) way of spelling len(line). The
    comment is misleading; you say you remove '\n's but you don't actually
    check for them. The last line in the file might not have a trailing \n.
    See this:

    lines = [line.rstrip('\n') for line in lines]

    Usually trailing spaces are ignored too; so you end up writing:

    lines = [line.rstrip() for line in lines]

    In this case:
    # Compile the regexen
    patterns = {}
    for pattern in pattern_strings:
    patterns.update(pattern: re.compile(pattern_strings[pattern],
    re.VERBOSE))

    That code does not even compile. I got lost with all those similar names;
    try to choose meaningful ones. What about this:

    patterns = {}
    for name,regexpr in pattern_strings.iteritems():
    patterns[name] = re.compile(regexpr, re.VERBOSE))

    or even:

    patterns = dict((name,re.compile(regexpr, re.VERBOSE))
    for name,regexpr in pattern_strings.iteritems()

    or even compile them directly when you define them.

    I'm not sure you can process a config file in this unstructured way; looks
    a lot easier if you look for [sections] and process sequentially lines
    inside sections.

    if match:
    content.update({pattern: match.groups()})

    I wonder where you got the idea of populating a dict that way. It's a
    basic operation:
    content[name] = value

    The regular expressions look strange too. A comment may be empty. A
    setting too. There may be spaces around the = sign. Don't try to catch all
    in one go.

    --
    Gabriel Genellina
     
    Gabriel Genellina, Mar 23, 2008
    #6
  7. sgharvey

    sgharvey Guest

    On Mar 22, 5:03 pm, "Gabriel Genellina" <>
    wrote:
    > En Sat, 22 Mar 2008 17:27:49 -0300, sgharvey <>
    > escribi�:
    > Take a look at ConfigObjhttp://pypi.python.org/pypi/ConfigObj/


    Thanks for the pointer; I'll check it out.

    > I'm not sure you can process a config file in this unstructured way; looks
    > a lot easier if you look for [sections] and process sequentially lines
    > inside sections.


    It works though... now that I've fixed up all my ugly stuff, and a
    dumb logic error or two.

    > The regular expressions look strange too. A comment may be empty. A
    > setting too. There may be spaces around the = sign. Don't try to catch all
    > in one go.


    I didn't think about empty comments/settings... fixed now.
    It also seemed simpler to handle surrounding spaces after the match
    was found.

    New version of the problematic part:
    <code>
    self.contents = []
    content = {}
    # Get the content in each line
    for line in lines:
    for name in patterns:
    # Match each pattern to the current line
    match = patterns[name].search(line)
    if match:
    content[name] = match.group(0).strip()
    self.contents.append(content)
    content = {}
    </code>

    new iniparsing.py
    http://pastebin.com/f445701d4

    new ini_regexen_dicts.py
    http://pastebin.com/f1e41cd3d

    > --
    > Gabriel Genellina



    Much thanks to all for the constructive criticism.

    Samuel Harvey
     
    sgharvey, Mar 23, 2008
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Andreas Duffner
    Replies:
    2
    Views:
    472
    Andy Dingley
    Jan 8, 2007
  2. joanne matthews (RRes-Roth)
    Replies:
    9
    Views:
    285
  3. zebulon
    Replies:
    6
    Views:
    283
  4. Replies:
    0
    Views:
    122
  5. Isaac Won
    Replies:
    9
    Views:
    387
    Ulrich Eckhardt
    Mar 4, 2013
Loading...

Share This Page