How to match literal backslashes read from a text file using regular expressions?

cricfan · Jul 12, 2005

I'm parsing a text file to extract word definitions. For example the
input text file contains the following content:

di.va.gate \'di_--v*-.ga_-t\ vb
pas.sim \'pas-*m\ adv : here and there : THROUGHOUT

I am trying to obtain words between two literal backslashes (\ .. \). I
am not able to match words between two literal backslashes using the
regxp - re.compile(r'\\[^\\]*\\').

Here is my sample script:

import re;

#slashPattern = re.compile(re.escape(r'\\[^\\]*\\'));
pattern = r'\\[^\\]*\\'
slashPattern = re.compile(pattern);

fdr = file( "parseinput",'r');
line = fdr.readline();

while (line != ""):
if (slashPattern.match(line)):
print line.rstrip() + " <-- matches pattern " + pattern
else:
print line.rstrip() + " <-- DOES not match pattern " +
pattern
line = fdr.readline();
print;

----------
The output

C:\home\krishna\lang\python>python wsparsetest.py
python wsparsetest.py
di.va.gate \'di_--v*-.ga_-t\ vb <-- DOES not match
pattern \\[^\\]*\\
pas.sim \'pas-*m\ adv : here and there : THROUGHOUT <-- DOES not match
pattern \\[^\\]*\\

John Machin · Jul 13, 2005

I'm parsing a text file to extract word definitions. For example the
input text file contains the following content:

di.va.gate \'di_--v*-.ga_-t\ vb
pas.sim \'pas-*m\ adv : here and there : THROUGHOUT

I am trying to obtain words between two literal backslashes (\ .. \). I
am not able to match words between two literal backslashes using the
regxp - re.compile(r'\\[^\\]*\\').

Here is my sample script:

import re;

Lose the semicolons ...

#slashPattern = re.compile(re.escape(r'\\[^\\]*\\'));
pattern = r'\\[^\\]*\\'
slashPattern = re.compile(pattern);

fdr = file( "parseinput",'r');
line = fdr.readline();

You should upgrade so that you have a modern Python and a modern
tutor[ial] -- then you will be writing:

for line in fdr:
do_something_with(line)

while (line != ""):

Lose the extraneous parentheses ...

if (slashPattern.match(line)):

Your main problem is that you should be using the search() method, not
the match() method. Read the section on this topic in the re docs!!

>>> import re
>>> pat = re.compile(r'\\[^\\]*\\')
>>> pat.match(r'abcd \xyz\ pqr')
>>> pat.search(r'abcd \xyz\ pqr')

Click to expand...

Click to expand...

print line.rstrip() + " <-- matches pattern " + pattern
else:
print line.rstrip() + " <-- DOES not match pattern " +
pattern
line = fdr.readline();
print;

----------
The output

C:\home\krishna\lang\python>python wsparsetest.py
python wsparsetest.py
di.va.gate \'di_--v*-.ga_-t\ vb <-- DOES not match
pattern \\[^\\]*\\
pas.sim \'pas-*m\ adv : here and there : THROUGHOUT <-- DOES not match
pattern \\[^\\]*\\
-----------

What should I be doing to match those literal backslashes?

Thanks

George Sakkis · Jul 13, 2005

This should give you an idea of how to go about it (needs python 2.3 or
newer):

import re
slashPattern = re.compile(r'\\(.*?)\\')

for i,line in enumerate(file("parseinput")):
print "line", i+1,
match = slashPattern.search(line)
if match:
print "matched:", match.group(1)
else:
print "did not match"

#===== output =======================

line 1 matched: 'di_--v*-.ga_-t
line 2 matched: 'pas-*m

#====================================

George

Struggling to read from a file using a for loop.	0	Oct 8, 2019
How can I upload a tar.bz2 file to OpenStack swift object storage container using the Python swift client?	1	Mar 22, 2024
FAQ 6.12 Can I use Perl regular expressions to match balanced text?	0	Jan 9, 2011
Using Regular Expressions to Parse SQL	4	Feb 5, 2008
How do I get the text that is found by a regular expression?	10	Apr 30, 2014
regular expressions eliminating filenames of type foo.thumbnail.jpg	7	Jun 25, 2007
parse a csv file into a text file	29	Feb 6, 2014
Using a function for regular expression substitution	5	Aug 29, 2010

How to match literal backslashes read from a text file using regular expressions?

cricfan

John Machin

George Sakkis

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads