re Insanity

T

Tim Daneliuk

For some reason, I am having the hardest time doing something that should
be obvious. (Note time of posting ;)

Given an arbitrary string, I want to find each individual instance of
text in the form: "[PROMPT:eek:ptional text]"

I tried this:

y=re.compile(r'\[PROMPT:.*\]')

Which works fine when the text is exactly "[PROMPT:whatever]" but
does not match on:

"something [PROMPT:foo] something [PROMPT:bar] something ..."

The overall goal is to identify the beginning and end of each [PROMPT...]
string in the line.

Ideas anyone?
 
F

Fredrik Lundh

Tim said:
Given an arbitrary string, I want to find each individual instance of
text in the form: "[PROMPT:eek:ptional text]"

I tried this:

y=re.compile(r'\[PROMPT:.*\]')

Which works fine when the text is exactly "[PROMPT:whatever]"

didn't you leave something out here? "compile" only compiles that pattern;
it doesn't match it against your string...
but does not match on:

"something [PROMPT:foo] something [PROMPT:bar] something ..."

The overall goal is to identify the beginning and end of each [PROMPT...]
string in the line.

if the pattern can occur anywhere in the string, you need to use "search",
not "match". if you want multiple matches, you can use "findall" or, better
in this case, "finditer":

import re

s = "something [PROMPT:foo] something [PROMPT:bar] something"

for m in re.finditer(r'\[PROMPT:[^]]*\]', s):
print m.span(0)

prints

(10, 22)
(33, 45)

which looks reasonably correct.

(note the "[^x]*x" form, which is an efficient way to spell "non-greedy match"
for cases like this)

</F>
 
D

Duncan Booth

Tim said:
I tried this:

y=re.compile(r'\[PROMPT:.*\]')

Which works fine when the text is exactly "[PROMPT:whatever]" but
does not match on:

"something [PROMPT:foo] something [PROMPT:bar] something ..."

The overall goal is to identify the beginning and end of each [PROMPT...]
string in the line.

The answer sort of depends on exactly what can be in your optional text:
import re
s = "something [PROMPT:foo] something [PROMPT:bar] something ..."
y=re.compile(r'\[PROMPT:.*\]')
y.findall(s) ['[PROMPT:foo] something [PROMPT:bar]']
y=re.compile(r'\[PROMPT:.*?\]')
y.findall(s) ['[PROMPT:foo]', '[PROMPT:bar]']
y=re.compile(r'\[PROMPT:[^]]*\]')
y.findall(s) ['[PROMPT:foo]', '[PROMPT:bar]']

..* will match as long a string as possible.

..*? will match as short a string as possible. By default this won't match
any newlines.

[^]]* will match as long a string that doesn't contain ']' as possible.
This will match newlines.
 
T

Tim Daneliuk

Fredrik said:
Tim Daneliuk wrote:

Given an arbitrary string, I want to find each individual instance of
text in the form: "[PROMPT:eek:ptional text]"

I tried this:

y=re.compile(r'\[PROMPT:.*\]')

Which works fine when the text is exactly "[PROMPT:whatever]"


didn't you leave something out here? "compile" only compiles that pattern;
it doesn't match it against your string...

Sorry - I thought this was obvious - I was interested more in the conceptual
part of the contruction of the re itself.
but does not match on:

"something [PROMPT:foo] something [PROMPT:bar] something ..."

The overall goal is to identify the beginning and end of each [PROMPT...]
string in the line.


if the pattern can occur anywhere in the string, you need to use "search",
not "match". if you want multiple matches, you can use "findall" or, better
in this case, "finditer":

import re

s = "something [PROMPT:foo] something [PROMPT:bar] something"

for m in re.finditer(r'\[PROMPT:[^]]*\]', s):
print m.span(0)

prints

(10, 22)
(33, 45)

which looks reasonably correct.

(note the "[^x]*x" form, which is an efficient way to spell "non-greedy match"
for cases like this)

Thanks - very helpful. One followup - your re works as advertised. But
if I use: r'\[PROMPT:[^]].*\]' it seems not to. the '.*' instead of just '*'
it matches the entire string ... which seems counterintutive to me.

Thanks,
 
O

Orlando Vazquez

Tim said:
For some reason, I am having the hardest time doing something that should
be obvious. (Note time of posting ;)

Given an arbitrary string, I want to find each individual instance of
text in the form: "[PROMPT:eek:ptional text]"

I tried this:

y=re.compile(r'\[PROMPT:.*\]')

Which works fine when the text is exactly "[PROMPT:whatever]" but
does not match on:

"something [PROMPT:foo] something [PROMPT:bar] something ..."

The overall goal is to identify the beginning and end of each [PROMPT...]
string in the line.

Ideas anyone?

If I understand correctly, this is what you are trying to achieve:
>>> import re
>>> temp = "something [PROMPT:foo] something [PROMPT:bar] something ..."
>>> prompt_re = re.compile(r"\[PROMPT:.*?\]")
>>> prompt_re.findall(temp) ['[PROMPT:foo]', '[PROMPT:bar]']
>>>

HTH,
 
F

Fredrik Lundh

Tim said:
Thanks - very helpful. One followup - your re works as advertised. But
if I use: r'\[PROMPT:[^]].*\]' it seems not to. the '.*' instead of just '*'
it matches the entire string ...

it's not "just '*'", it's "[^]]*". it's the "^]" set (anything but ]) that's repeated.

"[^]].*\]" means match a single non-] character, and then match as many
characters as you possibly can, as long as the next character is a ].

"[^]]*\]" means match as many non-] characters as possible, plus a single ].
which seems counterintutive to me.

then you need to study RE:s a bit more.

(hint: an RE isn't a template, it's a language description, and the RE engine
is designed to answer the question "does this string belong to this language"
(for match) or "is there any substring in this string that belongs to this
language" (for search) as quickly as possible. things like match locations
etc are side effects).

</F>
 
T

Tim Daneliuk

Fredrik said:
Tim Daneliuk wrote:

Thanks - very helpful. One followup - your re works as advertised. But
if I use: r'\[PROMPT:[^]].*\]' it seems not to. the '.*' instead of just '*'
it matches the entire string ...


it's not "just '*'", it's "[^]]*". it's the "^]" set (anything but ]) that's repeated.

"[^]].*\]" means match a single non-] character, and then match as many
characters as you possibly can, as long as the next character is a ].

"[^]]*\]" means match as many non-] characters as possible, plus a single ].

Got it - 'Makes perfect sense too
then you need to study RE:s a bit more.

(hint: an RE isn't a template, it's a language description, and the RE engine
is designed to answer the question "does this string belong to this language"
(for match) or "is there any substring in this string that belongs to this
language" (for search) as quickly as possible. things like match locations
etc are side effects).

Yes, I understand this. But your clarification is most helpful. Thanks!
 
T

Tim Daneliuk

Orlando said:
Tim said:
For some reason, I am having the hardest time doing something that should
be obvious. (Note time of posting ;)

Given an arbitrary string, I want to find each individual instance of
text in the form: "[PROMPT:eek:ptional text]"

I tried this:

y=re.compile(r'\[PROMPT:.*\]')

Which works fine when the text is exactly "[PROMPT:whatever]" but
does not match on:

"something [PROMPT:foo] something [PROMPT:bar] something ..."

The overall goal is to identify the beginning and end of each [PROMPT...]
string in the line.

Ideas anyone?


If I understand correctly, this is what you are trying to achieve:
import re
temp = "something [PROMPT:foo] something [PROMPT:bar] something ..."
prompt_re = re.compile(r"\[PROMPT:.*?\]")
prompt_re.findall(temp) ['[PROMPT:foo]', '[PROMPT:bar]']

HTH,

Yes - that seems to be the simplest solution to the problem. I'd forgotten
entirely about non-greedy matching when I asked the question. Thanks.
 
A

Aahz

Given an arbitrary string, I want to find each individual instance of
text in the form: "[PROMPT:eek:ptional text]"

I tried this:

y=re.compile(r'\[PROMPT:.*\]')

Which works fine when the text is exactly "[PROMPT:whatever]" but
does not match on:

"something [PROMPT:foo] something [PROMPT:bar] something ..."

The overall goal is to identify the beginning and end of each [PROMPT...]
string in the line.

Ideas anyone?

Yeah, read the Friedl book. (Okay, so that's not gonna help right now,
but trust me, if you're going to write lots of regexes, READ THAT BOOK.)
 
T

Tim Daneliuk

Aahz said:
Tim Daneliuk said:
Given an arbitrary string, I want to find each individual instance of
text in the form: "[PROMPT:eek:ptional text]"

I tried this:

y=re.compile(r'\[PROMPT:.*\]')

Which works fine when the text is exactly "[PROMPT:whatever]" but
does not match on:

"something [PROMPT:foo] something [PROMPT:bar] something ..."

The overall goal is to identify the beginning and end of each [PROMPT...]
string in the line.

Ideas anyone?


Yeah, read the Friedl book. (Okay, so that's not gonna help right now,
but trust me, if you're going to write lots of regexes, READ THAT BOOK.)

I've read significant parts of it. The problem is that I don't write
re often enough to recall all the subtle details ... plus I am getting
old and feeble... ;)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top