re.search slashes

Discussion in 'Python' started by pyluke, Feb 4, 2006.

  1. pyluke

    pyluke Guest

    I'm parsing LaTeX document and want to find lines with equations blocked
    by "\[" and "\]", but not other instances of "\[" like "a & b & c \\[5pt]"

    so, in short, I was to match "\[" but not "\\]"

    to add to this, I also don't want lines that start with comments.


    I've tried:
    check_eq = re.compile('(?!\%\s*)\\\\\[')
    check_eq.search(line)

    this works in finding the "\[" but also the "\\["

    so I would think this would work
    check_eq = re.compile('(?![\%\s*\\\\])\\\\\[')
    check_eq.search(line)

    but it doesn't. Any tips?
    pyluke, Feb 4, 2006
    #1
    1. Advertising

  2. pyluke wrote:
    > I'm parsing LaTeX document and want to find lines with equations blocked
    > by "\[" and "\]", but not other instances of "\[" like "a & b & c \\[5pt]"
    > so, in short, I was to match "\[" but not "\\]" .... I've tried:
    > check_eq = re.compile('(?!\%\s*)\\\\\[')
    > check_eq.search(line)
    > this works in finding the "\[" but also the "\\["


    If you are parsing with regular expressions, you are running a marathon.
    If you are doing regular expressions without raw strings, you are running
    a marathon barefoot.

    Notice: len('(?!\%\s*)\\\\\[') == 13
    len(r'(?!\%\s*)\\\\\[') == 15

    > so I would think this would work
    > check_eq = re.compile('(?![\%\s*\\\\])\\\\\[')
    > check_eq.search(line)
    >
    > but it doesn't. Any tips?

    Give us examples that should work and that should not (test cases),
    and the proper results of those tests. Don't make people trying to
    help you guess about anything you know.

    --Scott David Daniels
    Scott David Daniels, Feb 4, 2006
    #2
    1. Advertising

  3. pyluke

    Xavier Morel Guest

    Scott David Daniels wrote:
    > pyluke wrote:
    >> I'm parsing LaTeX document and want to find lines with equations blocked
    >> by "\[" and "\]", but not other instances of "\[" like "a & b & c \\[5pt]"
    >> so, in short, I was to match "\[" but not "\\]" .... I've tried:
    >> check_eq = re.compile('(?!\%\s*)\\\\\[')
    > > check_eq.search(line)
    > > this works in finding the "\[" but also the "\\["

    >
    > If you are parsing with regular expressions, you are running a marathon.
    > If you are doing regular expressions without raw strings, you are running
    > a marathon barefoot.
    >
    > Notice: len('(?!\%\s*)\\\\\[') == 13
    > len(r'(?!\%\s*)\\\\\[') == 15
    >
    >> so I would think this would work
    >> check_eq = re.compile('(?![\%\s*\\\\])\\\\\[')
    >> check_eq.search(line)
    >>
    >> but it doesn't. Any tips?

    > Give us examples that should work and that should not (test cases),
    > and the proper results of those tests. Don't make people trying to
    > help you guess about anything you know.
    >
    > --Scott David Daniels
    >


    To add to what scott said, two advices:
    1. Use Kodos, it's a RE debugger and an extremely fine tool to generate
    your regular expressions.
    2. Read the module's documentation. Several time. In your case read the
    "negative lookbehind assertion" part "(?<! ... )" several time, until
    you understand how it may be of use to you.
    Xavier Morel, Feb 4, 2006
    #3
  4. pyluke

    pyluke Guest

    Scott David Daniels wrote:
    > pyluke wrote:
    >> I'm parsing LaTeX document and want to find lines with equations
    >> blocked by "\[" and "\]", but not other instances of "\[" like "a & b
    >> & c \\[5pt]"
    >> so, in short, I was to match "\[" but not "\\]" .... I've tried:
    >> check_eq = re.compile('(?!\%\s*)\\\\\[')
    > > check_eq.search(line)
    > > this works in finding the "\[" but also the "\\["

    >
    > If you are parsing with regular expressions, you are running a marathon.
    > If you are doing regular expressions without raw strings, you are running
    > a marathon barefoot.
    >
    > Notice: len('(?!\%\s*)\\\\\[') == 13
    > len(r'(?!\%\s*)\\\\\[') == 15
    >
    >> so I would think this would work
    >> check_eq = re.compile('(?![\%\s*\\\\])\\\\\[')
    >> check_eq.search(line)
    >>
    >> but it doesn't. Any tips?

    > Give us examples that should work and that should not (test cases),
    > and the proper results of those tests. Don't make people trying to
    > help you guess about anything you know.
    >
    > --Scott David Daniels
    >


    Alright, I'll try to clarify. I'm taking a tex file and modifying some
    of the content. I want to be able to identify a block like the following:

    \[
    \nabla \cdot u = 0
    \]


    I don't want to find the following

    \begin{tabular}{c c}
    a & b \\[4pt]
    1 & 2 \\[3pt]
    \end{tabular}


    When I search a line for the first block by looking for "\[", I find it.
    The problem is, that this also find the second block due to the "\\[".

    I'm not sure what you mean by running a marathon. I do follow your
    statement on raw strings, but that doesn't seem to be the problem. The
    difference in your length example above is just from the two escaped
    slashes... not sure what my point is...

    Thanks
    Lou
    pyluke, Feb 4, 2006
    #4
  5. pyluke

    pyluke Guest


    > To add to what scott said, two advices:
    > 1. Use Kodos, it's a RE debugger and an extremely fine tool to generate
    > your regular expressions.


    Ok, just found this. Will be helpful.

    > 2. Read the module's documentation. Several time. In your case read the
    > "negative lookbehind assertion" part "(?<! ... )" several time, until
    > you understand how it may be of use to you.


    Quite a teacher. I'll read it several times...

    Thanks anyway.
    pyluke, Feb 4, 2006
    #5
  6. pyluke

    pyluke Guest


    > 2. Read the module's documentation. Several time. In your case read the
    > "negative lookbehind assertion" part "(?<! ... )" several time, until
    > you understand how it may be of use to you.


    OK. lookbehind would be more useful/suitable here...
    pyluke, Feb 4, 2006
    #6
  7. pyluke

    pyluke Guest

    pyluke wrote:
    > I'm parsing LaTeX document and want to find lines with equations blocked
    > by "\[" and "\]", but not other instances of "\[" like "a & b & c \\[5pt]"
    >
    > so, in short, I was to match "\[" but not "\\]"
    >
    > to add to this, I also don't want lines that start with comments.
    >
    >
    > I've tried:
    > check_eq = re.compile('(?!\%\s*)\\\\\[')
    > check_eq.search(line)
    >
    > this works in finding the "\[" but also the "\\["
    >
    > so I would think this would work
    > check_eq = re.compile('(?![\%\s*\\\\])\\\\\[')
    > check_eq.search(line)
    >
    > but it doesn't. Any tips?


    Alright, this seems to work:

    re.compile('(?<![(\%\s*)(\\\\)])\\\\\[')
    pyluke, Feb 4, 2006
    #7
  8. pyluke wrote:
    > Scott David Daniels wrote:
    >> pyluke wrote:
    >>> I... want to find lines with ... "\[" but not instances of "\\["

    >>
    >> If you are parsing with regular expressions, you are running a marathon.
    >> If you are doing regular expressions without raw strings, you are running
    >> a marathon barefoot.

    > I'm not sure what you mean by running a marathon.


    I'm referring to this quote from: http://www.jwz.org/hacks/marginal.html
    "(Some people, when confronted with a problem, think ``I know, I'll
    use regular expressions.'' Now they have two problems.)"

    > I do follow your statement on raw strings, but that doesn't seem
    > to be the problem.


    It is an issue in the readability of your code, not the cause of the
    code behavior that you don't like. In your particular case, this is
    all made doubly hard to read since your patterns and search targets
    include back slashes.

    > \[
    > \nabla \cdot u = 0
    > \]
    >
    > I don't want to find the following
    >
    > \begin{tabular}{c c}
    > a & b \\[4pt]
    > 1 & 2 \\[3pt]
    > \end{tabular}
    >


    how about: r'(^|[^\\])\\\['
    Which is:
    Find something beginning with either start-of-line or a
    non-backslash, followed (in either case) by a backslash
    and ending with an open square bracket.

    Generally, (for the example) I would have said a good test set
    describing your problem was:

    re.compile(pattern).search(r'\[ ') is not None
    re.compile(pattern).search(r' \[ ') is not None
    re.compile(pattern).search(r'\\[ ') is None
    re.compile(pattern).search(r' \\[ ') is None

    --Scott David Daniels
    Scott David Daniels, Feb 4, 2006
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Nils N

    Using slashes as querystring

    Nils N, Feb 5, 2004, in forum: ASP .Net
    Replies:
    2
    Views:
    1,891
    Nils N
    Feb 8, 2004
  2. Henrik de Jong
    Replies:
    0
    Views:
    363
    Henrik de Jong
    Jun 18, 2004
  3. qazmlp
    Replies:
    5
    Views:
    674
    Michael Dunn
    Apr 7, 2004
  4. tshad

    Slashes in file names

    tshad, Mar 1, 2005, in forum: HTML
    Replies:
    51
    Views:
    2,833
    David Dorward
    Mar 3, 2005
  5. Dan Wilkin
    Replies:
    1
    Views:
    246
    robic0
    Jul 17, 2006
Loading...

Share This Page