re.search slashes

pyluke · Feb 4, 2006

I'm parsing LaTeX document and want to find lines with equations blocked
by "\[" and "\]", but not other instances of "\[" like "a & b & c \\[5pt]"

so, in short, I was to match "\[" but not "\\]"

to add to this, I also don't want lines that start with comments.

I've tried:
check_eq = re.compile('(?!\%\s*)\\\\\[')
check_eq.search(line)

this works in finding the "\[" but also the "\\["

so I would think this would work
check_eq = re.compile('(?![\%\s*\\\\])\\\\\[')
check_eq.search(line)

but it doesn't. Any tips?

Scott David Daniels · Feb 4, 2006

pyluke said:
I'm parsing LaTeX document and want to find lines with equations blocked
by "\[" and "\]", but not other instances of "\[" like "a & b & c \\[5pt]"
so, in short, I was to match "\[" but not "\\]" .... I've tried:
check_eq = re.compile('(?!\%\s*)\\\\\[')
> check_eq.search(line)
> this works in finding the "\[" but also the "\\["

If you are parsing with regular expressions, you are running a marathon.
If you are doing regular expressions without raw strings, you are running
a marathon barefoot.

Notice: len('(?!\%\s*)\\\\\[') == 13
len(r'(?!\%\s*)\\\\\[') == 15

so I would think this would work
check_eq = re.compile('(?![\%\s*\\\\])\\\\\[')
check_eq.search(line)

but it doesn't. Any tips?

Give us examples that should work and that should not (test cases),
and the proper results of those tests. Don't make people trying to
help you guess about anything you know.

--Scott David Daniels
(e-mail address removed)

Xavier Morel · Feb 4, 2006

Scott said:
pyluke said:

I'm parsing LaTeX document and want to find lines with equations blocked
by "\[" and "\]", but not other instances of "\[" like "a & b & c \\[5pt]"
so, in short, I was to match "\[" but not "\\]" .... I've tried:
check_eq = re.compile('(?!\%\s*)\\\\\[')
check_eq.search(line)
this works in finding the "\[" but also the "\\["

Click to expand...

If you are parsing with regular expressions, you are running a marathon.
If you are doing regular expressions without raw strings, you are running
a marathon barefoot.

Notice: len('(?!\%\s*)\\\\\[') == 13
len(r'(?!\%\s*)\\\\\[') == 15

so I would think this would work
check_eq = re.compile('(?![\%\s*\\\\])\\\\\[')
check_eq.search(line)

but it doesn't. Any tips?

Click to expand...

Give us examples that should work and that should not (test cases),
and the proper results of those tests. Don't make people trying to
help you guess about anything you know.

--Scott David Daniels
(e-mail address removed)

To add to what scott said, two advices:
1. Use Kodos, it's a RE debugger and an extremely fine tool to generate
your regular expressions.
2. Read the module's documentation. Several time. In your case read the
"negative lookbehind assertion" part "(?<! ... )" several time, until
you understand how it may be of use to you.

pyluke · Feb 4, 2006

Scott said:
pyluke said:

I'm parsing LaTeX document and want to find lines with equations
blocked by "\[" and "\]", but not other instances of "\[" like "a & b
& c \\[5pt]"
so, in short, I was to match "\[" but not "\\]" .... I've tried:
check_eq = re.compile('(?!\%\s*)\\\\\[')
check_eq.search(line)
this works in finding the "\[" but also the "\\["

Click to expand...

If you are parsing with regular expressions, you are running a marathon.
If you are doing regular expressions without raw strings, you are running
a marathon barefoot.

Notice: len('(?!\%\s*)\\\\\[') == 13
len(r'(?!\%\s*)\\\\\[') == 15

so I would think this would work
check_eq = re.compile('(?![\%\s*\\\\])\\\\\[')
check_eq.search(line)

but it doesn't. Any tips?

Click to expand...

Give us examples that should work and that should not (test cases),
and the proper results of those tests. Don't make people trying to
help you guess about anything you know.

--Scott David Daniels
(e-mail address removed)

Alright, I'll try to clarify. I'm taking a tex file and modifying some
of the content. I want to be able to identify a block like the following:

\[
\nabla \cdot u = 0
\]

I don't want to find the following

\begin{tabular}{c c}
a & b \\[4pt]
1 & 2 \\[3pt]
\end{tabular}

When I search a line for the first block by looking for "\[", I find it.
The problem is, that this also find the second block due to the "\\[".

I'm not sure what you mean by running a marathon. I do follow your
statement on raw strings, but that doesn't seem to be the problem. The
difference in your length example above is just from the two escaped
slashes... not sure what my point is...

Thanks
Lou

pyluke · Feb 4, 2006

To add to what scott said, two advices:
1. Use Kodos, it's a RE debugger and an extremely fine tool to generate
your regular expressions.

Ok, just found this. Will be helpful.

2. Read the module's documentation. Several time. In your case read the
"negative lookbehind assertion" part "(?<! ... )" several time, until
you understand how it may be of use to you.

Quite a teacher. I'll read it several times...

Thanks anyway.

pyluke · Feb 4, 2006

2. Read the module's documentation. Several time. In your case read the
"negative lookbehind assertion" part "(?<! ... )" several time, until
you understand how it may be of use to you.

OK. lookbehind would be more useful/suitable here...

pyluke · Feb 4, 2006

pyluke said:
I'm parsing LaTeX document and want to find lines with equations blocked
by "\[" and "\]", but not other instances of "\[" like "a & b & c \\[5pt]"

so, in short, I was to match "\[" but not "\\]"

to add to this, I also don't want lines that start with comments.

I've tried:
check_eq = re.compile('(?!\%\s*)\\\\\[')
check_eq.search(line)

this works in finding the "\[" but also the "\\["

so I would think this would work
check_eq = re.compile('(?![\%\s*\\\\])\\\\\[')
check_eq.search(line)

but it doesn't. Any tips?

Alright, this seems to work:

re.compile('(?<![(\%\s*)(\\\\)])\\\\\[')

Scott David Daniels · Feb 4, 2006

pyluke said:
Scott said:

pyluke said:

I... want to find lines with ... "\[" but not instances of "\\["

Click to expand...

If you are parsing with regular expressions, you are running a marathon.
If you are doing regular expressions without raw strings, you are running
a marathon barefoot.

Click to expand...

I'm not sure what you mean by running a marathon.

I'm referring to this quote from: http://www.jwz.org/hacks/marginal.html
"(Some people, when confronted with a problem, think ``I know, I'll
use regular expressions.'' Now they have two problems.)"

> I do follow your statement on raw strings, but that doesn't seem
> to be the problem.

It is an issue in the readability of your code, not the cause of the
code behavior that you don't like. In your particular case, this is
all made doubly hard to read since your patterns and search targets
include back slashes.

\[
\nabla \cdot u = 0
\]

I don't want to find the following

\begin{tabular}{c c}
a & b \\[4pt]
1 & 2 \\[3pt]
\end{tabular}

how about: r'(^|[^\\])\\\['
Which is:
Find something beginning with either start-of-line or a
non-backslash, followed (in either case) by a backslash
and ending with an open square bracket.

Generally, (for the example) I would have said a good test set
describing your problem was:

re.compile(pattern).search(r'\[ ') is not None
re.compile(pattern).search(r' \[ ') is not None
re.compile(pattern).search(r'\\[ ') is None
re.compile(pattern).search(r' \\[ ') is None

--Scott David Daniels
(e-mail address removed)

re.search when used within an if/else fails	17	Nov 19, 2012
Re for Apache log file format	4	Oct 8, 2013
help with looping, re.search, multiple indexing	1	Feb 16, 2007
Python pyPDF4 code to bookmark pdf based upon date text	1	Jan 18, 2023
re.search (works)\|(doesn't work) depending on for loop order	6	Mar 22, 2008
Save mp3s to local storage	1	Feb 8, 2024
Why is regex so slow?	21	Jun 18, 2013
Using regexes versus "in" membership test?	6	Dec 12, 2012

re.search slashes

pyluke

Scott David Daniels

Xavier Morel

pyluke

pyluke

pyluke

pyluke

Scott David Daniels

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads