re Insanity

Tim Daneliuk · Jan 22, 2005

For some reason, I am having the hardest time doing something that should
be obvious. (Note time of posting

Given an arbitrary string, I want to find each individual instance of
text in the form: "[PROMPT

ptional text]"

I tried this:

y=re.compile(r'\[PROMPT:.*\]')

Which works fine when the text is exactly "[PROMPT:whatever]" but
does not match on:

"something [PROMPT:foo] something [PROMPT:bar] something ..."

The overall goal is to identify the beginning and end of each [PROMPT...]
string in the line.

Ideas anyone?

Fredrik Lundh · Jan 22, 2005

Tim said:
Given an arbitrary string, I want to find each individual instance of
text in the form: "[PROMPTptional text]"

I tried this:

y=re.compile(r'\[PROMPT:.*\]')

Which works fine when the text is exactly "[PROMPT:whatever]"

didn't you leave something out here? "compile" only compiles that pattern;
it doesn't match it against your string...

but does not match on:

"something [PROMPT:foo] something [PROMPT:bar] something ..."

The overall goal is to identify the beginning and end of each [PROMPT...]
string in the line.

if the pattern can occur anywhere in the string, you need to use "search",
not "match". if you want multiple matches, you can use "findall" or, better
in this case, "finditer":

import re

s = "something [PROMPT:foo] something [PROMPT:bar] something"

for m in re.finditer(r'\[PROMPT:[^]]*\]', s):
print m.span(0)

prints

(10, 22)
(33, 45)

which looks reasonably correct.

(note the "[^x]*x" form, which is an efficient way to spell "non-greedy match"
for cases like this)

</F>

Duncan Booth · Jan 22, 2005

Tim said:
I tried this:

y=re.compile(r'\[PROMPT:.*\]')

Which works fine when the text is exactly "[PROMPT:whatever]" but
does not match on:

"something [PROMPT:foo] something [PROMPT:bar] something ..."

The overall goal is to identify the beginning and end of each [PROMPT...]
string in the line.

The answer sort of depends on exactly what can be in your optional text:

import re
s = "something [PROMPT:foo] something [PROMPT:bar] something ..."
y=re.compile(r'\[PROMPT:.*\]')
y.findall(s) ['[PROMPT:foo] something [PROMPT:bar]']
y=re.compile(r'\[PROMPT:.*?\]')
y.findall(s) ['[PROMPT:foo]', '[PROMPT:bar]']
y=re.compile(r'\[PROMPT:[^]]*\]')
y.findall(s) ['[PROMPT:foo]', '[PROMPT:bar]']

Click to expand...

Click to expand...

..* will match as long a string as possible.

..*? will match as short a string as possible. By default this won't match
any newlines.

[^]]* will match as long a string that doesn't contain ']' as possible.
This will match newlines.

Tim Daneliuk · Jan 23, 2005

Fredrik said:
Tim Daneliuk wrote:

Given an arbitrary string, I want to find each individual instance of
text in the form: "[PROMPTptional text]"

I tried this:

y=re.compile(r'\[PROMPT:.*\]')

Which works fine when the text is exactly "[PROMPT:whatever]"

Click to expand...

didn't you leave something out here? "compile" only compiles that pattern;
it doesn't match it against your string...

Sorry - I thought this was obvious - I was interested more in the conceptual
part of the contruction of the re itself.

but does not match on:

"something [PROMPT:foo] something [PROMPT:bar] something ..."

The overall goal is to identify the beginning and end of each [PROMPT...]
string in the line.

Click to expand...

if the pattern can occur anywhere in the string, you need to use "search",
not "match". if you want multiple matches, you can use "findall" or, better
in this case, "finditer":

import re

s = "something [PROMPT:foo] something [PROMPT:bar] something"

for m in re.finditer(r'\[PROMPT:[^]]*\]', s):
print m.span(0)

prints

(10, 22)
(33, 45)

which looks reasonably correct.

(note the "[^x]*x" form, which is an efficient way to spell "non-greedy match"
for cases like this)

Thanks - very helpful. One followup - your re works as advertised. But
if I use: r'\[PROMPT:[^]].*\]' it seems not to. the '.*' instead of just '*'
it matches the entire string ... which seems counterintutive to me.

Thanks,

Orlando Vazquez · Jan 23, 2005

Tim said:
For some reason, I am having the hardest time doing something that should
be obvious. (Note time of posting

Given an arbitrary string, I want to find each individual instance of
text in the form: "[PROMPTptional text]"

I tried this:

y=re.compile(r'\[PROMPT:.*\]')

Which works fine when the text is exactly "[PROMPT:whatever]" but
does not match on:

"something [PROMPT:foo] something [PROMPT:bar] something ..."

The overall goal is to identify the beginning and end of each [PROMPT...]
string in the line.

Ideas anyone?

If I understand correctly, this is what you are trying to achieve:

>>> import re
>>> temp = "something [PROMPT:foo] something [PROMPT:bar] something ..."
>>> prompt_re = re.compile(r"\[PROMPT:.*?\]")
>>> prompt_re.findall(temp) ['[PROMPT:foo]', '[PROMPT:bar]']
>>>

Click to expand...

Click to expand...

HTH,

Fredrik Lundh · Jan 23, 2005

Tim said:
Thanks - very helpful. One followup - your re works as advertised. But
if I use: r'\[PROMPT:[^]].*\]' it seems not to. the '.*' instead of just '*'
it matches the entire string ...

it's not "just '*'", it's "[^]]*". it's the "^]" set (anything but ]) that's repeated.

"[^]].*\]" means match a single non-] character, and then match as many
characters as you possibly can, as long as the next character is a ].

"[^]]*\]" means match as many non-] characters as possible, plus a single ].

which seems counterintutive to me.

then you need to study RE:s a bit more.

(hint: an RE isn't a template, it's a language description, and the RE engine
is designed to answer the question "does this string belong to this language"
(for match) or "is there any substring in this string that belongs to this
language" (for search) as quickly as possible. things like match locations
etc are side effects).

</F>

Tim Daneliuk · Jan 23, 2005

Fredrik said:
Tim Daneliuk wrote:

Thanks - very helpful. One followup - your re works as advertised. But
if I use: r'\[PROMPT:[^]].*\]' it seems not to. the '.*' instead of just '*'
it matches the entire string ...

Click to expand...

it's not "just '*'", it's "[^]]*". it's the "^]" set (anything but ]) that's repeated.

"[^]].*\]" means match a single non-] character, and then match as many
characters as you possibly can, as long as the next character is a ].

"[^]]*\]" means match as many non-] characters as possible, plus a single ].

Got it - 'Makes perfect sense too

then you need to study RE:s a bit more.

(hint: an RE isn't a template, it's a language description, and the RE engine
is designed to answer the question "does this string belong to this language"
(for match) or "is there any substring in this string that belongs to this
language" (for search) as quickly as possible. things like match locations
etc are side effects).

Yes, I understand this. But your clarification is most helpful. Thanks!

Tim Daneliuk · Jan 23, 2005

Orlando said:
Tim said:

For some reason, I am having the hardest time doing something that should
be obvious. (Note time of posting

Given an arbitrary string, I want to find each individual instance of
text in the form: "[PROMPTptional text]"

I tried this:

y=re.compile(r'\[PROMPT:.*\]')

Which works fine when the text is exactly "[PROMPT:whatever]" but
does not match on:

"something [PROMPT:foo] something [PROMPT:bar] something ..."

The overall goal is to identify the beginning and end of each [PROMPT...]
string in the line.

Ideas anyone?

Click to expand...

If I understand correctly, this is what you are trying to achieve:

import re
temp = "something [PROMPT:foo] something [PROMPT:bar] something ..."
prompt_re = re.compile(r"\[PROMPT:.*?\]")
prompt_re.findall(temp) ['[PROMPT:foo]', '[PROMPT:bar]']

Click to expand...

Click to expand...

HTH,

Yes - that seems to be the simplest solution to the problem. I'd forgotten
entirely about non-greedy matching when I asked the question. Thanks.

Aahz · Jan 26, 2005

Given an arbitrary string, I want to find each individual instance of
text in the form: "[PROMPTptional text]"

I tried this:

y=re.compile(r'\[PROMPT:.*\]')

Which works fine when the text is exactly "[PROMPT:whatever]" but
does not match on:

"something [PROMPT:foo] something [PROMPT:bar] something ..."

The overall goal is to identify the beginning and end of each [PROMPT...]
string in the line.

Ideas anyone?

Yeah, read the Friedl book. (Okay, so that's not gonna help right now,
but trust me, if you're going to write lots of regexes, READ THAT BOOK.)

Tim Daneliuk · Jan 26, 2005

Aahz said:
Tim Daneliuk said:

Given an arbitrary string, I want to find each individual instance of
text in the form: "[PROMPTptional text]"

I tried this:

y=re.compile(r'\[PROMPT:.*\]')

Which works fine when the text is exactly "[PROMPT:whatever]" but
does not match on:

"something [PROMPT:foo] something [PROMPT:bar] something ..."

The overall goal is to identify the beginning and end of each [PROMPT...]
string in the line.

Ideas anyone?

Click to expand...

Yeah, read the Friedl book. (Okay, so that's not gonna help right now,
but trust me, if you're going to write lots of regexes, READ THAT BOOK.)

I've read significant parts of it. The problem is that I don't write
re often enough to recall all the subtle details ... plus I am getting
old and feeble...

Re for Apache log file format	4	Oct 8, 2013
How do you print a string after it's been searched for an RE?	4	Jun 23, 2011
Python pyPDF4 code to bookmark pdf based upon date text	1	Jan 18, 2023
Reading in cooked mode (was Re: Python MSI not installing, log fileshowing name of a Viatnemese comm	8	Mar 23, 2014
Must be a bug in the re module [was: Why this result with the remodule]	0	Nov 3, 2010
Data saving in condition of changing reality	0	Apr 29, 2022
Need help with this script	4	Mar 12, 2023
Working with named groups in re module	2	Jan 10, 2007

re Insanity

Tim Daneliuk

Fredrik Lundh

Duncan Booth

Tim Daneliuk

Orlando Vazquez

Fredrik Lundh

Tim Daneliuk

Tim Daneliuk

Aahz

Tim Daneliuk

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads