regex line by line over file

J

James Smith

I can't get this to work.
It runs but there is no output when I try it on a file.


#!/usr/bin/python

import os
import sys
import re
from datetime import datetime

#logDir = '/nfs/projects/equinox/platformTools/RTLG/RTLG_logs';
#os.chdir( logDir );

programName = sys.argv[0]
fileName = sys.argv[1]

#pattern = re.compile('\s*\\"SHELF-.*,SC,.*,:\\"Log Collection In Progress\\"')
re.M
p = re.compile('^\s*\"SHELF-.*,SC,.*,:\\\"Log Collection In Progress\\\"')
l = ' "SHELF-17:LOG_COLN_IP,SC,03-25,01-18-58,NEND,NA,,,:\"Log Collection In Progress\",NONE:1700000035-6364-1048,:YEAR=2014,MODE=NONE"'

# this works :)
m = p.match( l )
if m:
print( l )

# this doesn't match anything (or the if doesn't work) :-(
with open(fileName) as f:
for line in f:
# debug code (print the line without adding a linefeed)
# sys.stdout.write( line )
if p.match(line):
print(line)


The test file just has one line:
"SHELF-17:LOG_COLN_IP,SC,03-25,01-18-58,NEND,NA,,,:\"Log Collection In Progress\",NONE:1700000035-6364-1048,:YEAR=2014,MODE=NONE"
 
C

Chris Angelico

re.M
p = re.compile('^\s*\"SHELF-.*,SC,.*,:\\\"Log Collection In Progress\\\"')

If you're expecting this to be parsed as a multiline regex, it won't
be. Probing re.M doesn't do anything on its own; you have to pass it
as an argument to compile.

Not sure if that's your problem or not, though.

ChrisA
 
J

James Smith

I can't get this to work.

It runs but there is no output when I try it on a file.





#!/usr/bin/python



import os

import sys

import re

from datetime import datetime



#logDir = '/nfs/projects/equinox/platformTools/RTLG/RTLG_logs';

#os.chdir( logDir );



programName = sys.argv[0]

fileName = sys.argv[1]



#pattern = re.compile('\s*\\"SHELF-.*,SC,.*,:\\"Log Collection In Progress\\"')

re.M

p = re.compile('^\s*\"SHELF-.*,SC,.*,:\\\"Log Collection In Progress\\\"')

l = ' "SHELF-17:LOG_COLN_IP,SC,03-25,01-18-58,NEND,NA,,,:\"Log Collection In Progress\",NONE:1700000035-6364-1048,:YEAR=2014,MODE=NONE"'



# this works :)

m = p.match( l )

if m:

print( l )



# this doesn't match anything (or the if doesn't work) :-(

with open(fileName) as f:

for line in f:

# debug code (print the line without adding a linefeed)

# sys.stdout.write( line )

if p.match(line):

print(line)





The test file just has one line:

"SHELF-17:LOG_COLN_IP,SC,03-25,01-18-58,NEND,NA,,,:\"Log Collection In Progress\",NONE:1700000035-6364-1048,:YEAR=2014,MODE=NONE"

I tried the re.M in the compile and that didn't help.
 
R

Rustom Mody

I can't get this to work.
It runs but there is no output when I try it on a file.

import os
import sys
import re
from datetime import datetime
#logDir = '/nfs/projects/equinox/platformTools/RTLG/RTLG_logs';
#os.chdir( logDir );
programName = sys.argv[0]
fileName = sys.argv[1]
#pattern = re.compile('\s*\\"SHELF-.*,SC,.*,:\\"Log Collection In Progress\\"')
re.M
p = re.compile('^\s*\"SHELF-.*,SC,.*,:\\\"Log Collection In Progress\\\"')
l = ' "SHELF-17:LOG_COLN_IP,SC,03-25,01-18-58,NEND,NA,,,:\"Log Collection In Progress\",NONE:1700000035-6364-1048,:YEAR=2014,MODE=NONE"'
# this works :)
m = p.match( l )
if m:
print( l )
# this doesn't match anything (or the if doesn't work) :-(
with open(fileName) as f:
for line in f:
# debug code (print the line without adding a linefeed)
# sys.stdout.write( line )
if p.match(line):
print(line)
The test file just has one line:
"SHELF-17:LOG_COLN_IP,SC,03-25,01-18-58,NEND,NA,,,:\"Log Collection In Progress\",NONE:1700000035-6364-1048,:YEAR=2014,MODE=NONE"

Some suggestions (Im far from an re expert!)
1. Use raw strings for re's
2. You probably need non-greedy '*' (among other things)
3. Better to hack out your re in the interpreter
For that
4. Avoid compile (at least while hacking)
5. Findall will show you whats happening better than match

Heres a 'hack-session'

from re import findall

# Start simple
findall(r'^\s',l) [' ']
findall(r'^\s*',l) [' ']
findall(r'^\s*"',l) [' "']
findall(r'^\s*"SHELF-',l) [' "SHELF-']
findall(r'^\s*"SHELF-.*',l) [' "SHELF-17:LOG_COLN_IP,SC,03-25,01-18-58,NEND,NA,,,:"Log Collection In Progress",NONE:1700000035-6364-1048,:YEAR=2014,MODE=NONE"']
findall('^\s*"SHELF-.*',l) [' "SHELF-17:LOG_COLN_IP,SC,03-25,01-18-58,NEND,NA,,,:"Log Collection In Progress",NONE:1700000035-6364-1048,:YEAR=2014,MODE=NONE"']
findall('^\s*"SHELF-.SC*',l) []
findall('^\s*"SHELF-.*SC',l) [' "SHELF-17:LOG_COLN_IP,SC']
findall('^\s*"SHELF-.*?SC',l) [' "SHELF-17:LOG_COLN_IP,SC']
findall('^\s*"SHELF-.*?,SC',l) [' "SHELF-17:LOG_COLN_IP,SC']
findall('(^\s*)"SHELF-.*?,SC',l) [' ']
findall('\(^\s*\)"SHELF-.*?,SC',l) []
findall('(^\s*)"SHELF-.*?,SC',l) [' ']
findall('(^\s*)("SHELF-.*?,SC)',l)
[(' ', '"SHELF-17:LOG_COLN_IP,SC')]
 
C

Chris Angelico

I tried the re.M in the compile and that didn't help.

Okay. Try printing out the repr of the line at the point where you
have the commented-out write to stdout. That might tell you if there's
some other difference. At that point, you'll know if the issue is with
reading it from the file.

Also, please either stop using Google Groups, or clean up its messes.
I don't like having to read through piles of double-spaced junk in the
quoted text. Thanks!

ChrisA
 
S

Steven D'Aprano

I can't get this to work.
It runs but there is no output when I try it on a file.

Simplify, simplify, simplify. Either you will find the problem, or you
will find the simplest example that demonstrates the problem.

In this case, the problem is that your regex is not matching what you
expect it to match. So eliminate all the irrelevant cruft that is
just noise, complicating the problem. Start with the simplest thing that
works and add complexity until the problem returns.

Eliminate the file. You can embed your data in a string, and try to match
the regex against the string. Eliminate all the old commented-out code,
that's just irrelevant. Eliminate reading from sys.argv, that has nothing
to do with the problem.

So we get down to this:

import re
pat = re.compile('^\s*\"SHELF-.*,SC,.*,:\\\"Log Collection In Progress\\\"')
line1 = ' "SHELF-17:LOG_COLN_IP,SC,03-25,01-18-58,NEND,NA,,,:\"Log Collection In Progress\",NONE:1700000035-6364-1048,:YEAR=2014,MODE=NONE"'
print(pat.match(line1))

which matches.

Now let's get rid of those leaning toothpicks. We can use print
to see the repr() of the pattern, and a raw string to clean it up.
At the interactive interpreter:


py> print(pat.pattern)
^\s*"SHELF-.*,SC,.*,:\"Log Collection In Progress\"


Similarly for line1. I'll also use implicit concatenation to split
it over multiple source lines. Raw strings, r'' or r"", don't need
to escape the backslashes. Implicit concatenation means that two
strings with no operator between them is implicitly concatenated
into a single string:

'abc' "def"

becomes 'abcdef'. By putting the pieces inside parentheses, I can
put each piece on a separate line, which makes it easier to read
compared to one giant long line.

pat = re.compile(
r'^\s*"SHELF-.*,SC,.*,:\"Log Collection In Progress\"'
)

line1 = (
' "SHELF-17:LOG_COLN_IP,SC,03-25,01-18-58,NEND,NA,,,:"'
'Log Collection In Progress",NONE:1700000035-6364-1048,:'
'YEAR=2014,MODE=NONE"'
)


And at the interactive interpreter, I get a match:

py> pat.match(line1)
<_sre.SRE_Match object at 0xb721ad78>


So now we move on to the content of the one-line file. I don't have
access to the file, so all I have to go by is what you state it
contains:

The test file just has one line:
"SHELF-17:LOG_COLN_IP,SC,03-25,01-18-58,NEND,NA,,,:\"Log Collection In Progress\",NONE:1700000035-6364-1048,:YEAR=2014,MODE=NONE"
[end quote]

which I interpret like this:

line2 = ' "SHELF-17:LOG_COLN_IP,SC,03-25,01-18-58,NEND,NA,,,:\"Log Collection In Progress\",NONE:1700000035-6364-1048,:YEAR=2014,MODE=NONE"\n'


(note the newline at the end), or if you prefer:

line2 = (
' "SHELF-17:LOG_COLN_IP,SC,03-25,01-18-58,NEND,NA,,,:"'
'Log Collection In Progress",NONE:1700000035-6364-1048,:'
'YEAR=2014,MODE=NONE"\n'
)


Except for the newline, it equals line1, and it also matches the
pattern:

py> pat.match(line2)
<_sre.SRE_Match object at 0xb721ab48>


So now we know that the regex matches the data you think you have.
The next questions are:

- are you reading the right file?
- are you mistaken about the content of the file?

I can't help you with the first. But the second: try running this:

# line2 and pat as defined above
filename = sys.argv[1]
with open(filename) as f:
for line in f:
print(len(line), line==line2, repr(line))
print(repr(pat.match(line)))


which will show you what you have and whether or not it matches
what you think it has. I expect that the file contents is not what
you think it is, because the regex is matching the sample line.

Good luck!
 
J

James Smith

- are you mistaken about the content of the file?

I can't help you with the first. But the second: try running this:

# line2 and pat as defined above
filename = sys.argv[1]
with open(filename) as f:
for line in f:
print(len(line), line==line2, repr(line))
print(repr(pat.match(line)))

which will show you what you have and whether or not it matches
what you think it has. I expect that the file contents is not what
you think it is, because the regex is matching the sample line.

Good luck!

It should match this:
(134, False, '\' "SHELF-17:LOG_COLN_IP,SC,03-25,01-18-58,NEND,NA,,,:\\\\"Log Collection In Progress\\\\",NONE:1700000035-6364-1048,:YEAR=2014,MODE=NONE"\\r\\n\'')

Is the \r\n on the end of the line screwing it up?
 
J

James Smith

(134, False, '\' "SHELF-17:LOG_COLN_IP,SC,03-25,01-18-58,NEND,NA,,,:\\\\"Log Collection In Progress\\\\",NONE:1700000035-6364-1048,:YEAR=2014,MODE=NONE"\\r\\n\'')



Is the \r\n on the end of the line screwing it up?

Got it.
I needed an extra \ where I had 3 in the compile.
It's kinda weird it didn't need the extra \ when I ran it manually from the shell.
 
P

Peter Pearson

- are you mistaken about the content of the file?

I can't help you with the first. But the second: try running this:

# line2 and pat as defined above
filename = sys.argv[1]
with open(filename) as f:
for line in f:
print(len(line), line==line2, repr(line))
print(repr(pat.match(line)))

which will show you what you have and whether or not it matches
what you think it has. I expect that the file contents is not what
you think it is, because the regex is matching the sample line.

Good luck!

It should match this:
(134, False, '\' "SHELF-17:LOG[snip]MODE=NONE"\\r\\n\'')

Is the \r\n on the end of the line screwing it up?

Dude, you've gotten a lot of excellent advice from some extraordinarily
capable (and patient) people, and you appear to be ignoring it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,529
Members
45,001
Latest member
Kendra00E1

Latest Threads

Top