tricky regular expressions

E

Ernesto

So regular expressions have been good to me so far, but now my problem
is a bit trickier. The string I'm getting data from looks like this:

myString =
[USELESS DATA]
Request : Play
[USELESS DATA]
Title: Beethoven's 5th
[USELESS DATA]
Request : next
[USELESS DATA]
Title: song #2
......

I'm using this code to search myString:
.....
pattern = '''(?x)
Title:\s+(.+)
'''
Titles = re.findall(pattern, myString)

.....
The problem is that I only want the "Titles" which are either:

a) Followed by "Request : Play"
b) Followed by "Request : next"

I'm not sure if I should use RE's or some other mechanism. Thanks
 
X

Xavier Morel

Ernesto said:
I'm not sure if I should use RE's or some other mechanism. Thanks
I think a line-based state machine parser could be a better idea. Much
simpler to build and debug if not faster to execute.
 
M

Mitja Trampus

[hint: posting the same question in newsgroups generally
does not help to get responses any quicker]
> The string I'm getting data from looks like this:
> [USELESS DATA]
> Request : Play
> [USELESS DATA]
> Title: Beethoven's 5th
> [USELESS DATA]
> Request : next
> [USELESS DATA]
> Title: song #2
> .....
> The problem is that I only want the "Titles" which are either:
> a) Followed by "Request : Play"
> b) Followed by "Request : next"
>
> I'm not sure if I should use RE's or some other mechanism.

I'd advise against REs - they can quickly get messy. What
I'd do is just what you have described:
1) read all lines that are not [USELESS DATA] (i.e. lines
beginning with either Title or Request) into a list
2) walk through this list, deleting all "Title" lines that
are not followed by an appropriate "Request" line.

Your description is not very exact - but if anything else
needs to be filtered out, just do so. The general idea is to
break your task into smaller steps (instead of one huge RE)
that are easier to manage, write and understand.
 
E

Ernesto

Xavier said:
I think a line-based state machine parser could be a better idea. Much
simpler to build and debug if not faster to execute.

What is a line-based state machine ?
 
P

Petr Jakes

try to google for "finit state machine" OR "state machine" OR FSM

titles =["USELESS DATA","Request : Play",
"USELESS DATA","Title: Beethoven's 5th",
"USELESS DATA","Request : next","USELESS DATA",
"Title: song# 2 ","USELESS DATA","Request : Play",
"USELESS DATA","Title: Beethoven's 5th",
"USELESS DATA","Request : next","USELESS DATA",
"Title: song# 3 ","USELESS DATA","Request : Play"]


for title in range(len(titles)):
if titles[title][:6] =="Title:":
x=1
try:
while titles[title+x]!="Request : Play" and
titles[title+x]!="Request : next":
x+=1
pass
print titles[title], titles[title+x]
except IndexError: pass


HTH

Petr Jakes
PS: just wonder why are you asking the same question in two different
topics....
 
X

Xavier Morel

Ernesto said:
What is a line-based state machine ?
Parse your file line-by-line (since it seems that it's the way your data
is organized).

Keep state informations somewhere.

Change your state based on the current state and the data being fed to
your parser.

For example, here you basically have 3 states:

No Title, which is the initial state of the machine (it has not
encountered any title yet, and you do stuff based on titles)

Title loaded, when you've met a title. "Title loaded" loops on itself:
if you meet a "Title: whatever" line, you change the title currently
stored but you stay in the "Title loaded" state (you change the current
state of the machine from "title loaded" to "title loaded").

Request loaded, which can be reached only when you're in the "Title
loaded", and then encounter a line starting with "Request: ". When you
reach that stage, do your processing (you have a title loaded, which is
the latest title you encountered, and you have a request loaded, which
is the request that immediately follows the loaded title), then you go
back to the "No Title" state, since you've processed (and therefore
unloaded) the current title.

So, the state diagram could kind of look like that:
(it's supposed to be a single state diagram, but i suck at ascii
diagrams so i'll create one mini-diagram for each state)

NoTitle =0> TitleLoaded

=0>
Event: on encountering a line starting with "Title: "
Action: save the title (to whatever variable you see fit)
Change state to: TitleLoaded


TitleLoaded =1> TitleLoaded
||
2
\/
Request

=1>
Event: on encountering a line starting with "Title: "
Action: save the title (replace the current value of your title variable)
Change state to: TitleLoaded

=2>
Event: on encountering a line starting with "Request: "
Action: save the request?; immediately process the Request state
Change state to: Request


Request =3> NoTitle
||
4
\/
TitleLoaded

=3>
Event: the Request state is reached, the request is either "Play" or "Next"
Action: Do whatever you want to do; nuke the content of the title variable
Change state to: NoTitle

=4>
Event: the Request state is reached, the request is neither "Play" nor
"Next"
Action: Nuke the content of the request variable (if you saved it), do
nothing else
Change state to: TitleLoaded

As a final note, i'd recommend reading "Text Processing in Python", even
though it puts a quite big emphasis on functional programming (which you
may or may not appreciate), it's an extremely good initiation to
text-files handling, parsing and processing.
 
E

Ernesto

Petr said:
PS: just wonder why are you asking the same question in two different
topics....

Thanks for the help Peter. That happened accidentally. I meant to
only put that in python topic. Aplogies...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,020
Latest member
GenesisGai

Latest Threads

Top