I
It's me
I am never very good with regular expressions. My head always hurts
whenever I need to use it.
I need to read a data file and parse each data record. Each item on the
data record begins with either a string, or a list of strings. I searched
around and didn't see any existing Python packages that does that.
scanf.py, for instance, can do standard items but doesn't know about list.
So, I figure I might have to write a lex engine for it and of course I have
to deal wit RE again.
But I run into problem right from the start. To recognize a list, I need a
RE for the string:
1) begin with [" (left bracket followed by a double quote with zero or more
spaces in between)
2) followed by any characters until ] but only if that left bracket is not
preceeded by the escape character \.
So, I tried:
^\[[" "]*"[a-z,A-Z\,, ]*(\\\])*[a-z,A-Z\,, \"]*]
and tested with:
["This line\] works"]
but it fails with:
["This line fails"]
I would have thought that:
(\\\])*
should work because it's zero or more incidence of the pattern \]
Any help is greatly appreciated.
Sorry for beign OT. I posted this question at the lex group and didn't get
any response. I figure may be somebody would know around here.
whenever I need to use it.
I need to read a data file and parse each data record. Each item on the
data record begins with either a string, or a list of strings. I searched
around and didn't see any existing Python packages that does that.
scanf.py, for instance, can do standard items but doesn't know about list.
So, I figure I might have to write a lex engine for it and of course I have
to deal wit RE again.
But I run into problem right from the start. To recognize a list, I need a
RE for the string:
1) begin with [" (left bracket followed by a double quote with zero or more
spaces in between)
2) followed by any characters until ] but only if that left bracket is not
preceeded by the escape character \.
So, I tried:
^\[[" "]*"[a-z,A-Z\,, ]*(\\\])*[a-z,A-Z\,, \"]*]
and tested with:
["This line\] works"]
but it fails with:
["This line fails"]
I would have thought that:
(\\\])*
should work because it's zero or more incidence of the pattern \]
Any help is greatly appreciated.
Sorry for beign OT. I posted this question at the lex group and didn't get
any response. I figure may be somebody would know around here.