re Insanity

Discussion in 'Python' started by Tim Daneliuk, Jan 22, 2005.

  1. Tim Daneliuk

    Tim Daneliuk Guest

    For some reason, I am having the hardest time doing something that should
    be obvious. (Note time of posting ;)

    Given an arbitrary string, I want to find each individual instance of
    text in the form: "[PROMPT:eek:ptional text]"

    I tried this:

    y=re.compile(r'\[PROMPT:.*\]')

    Which works fine when the text is exactly "[PROMPT:whatever]" but
    does not match on:

    "something [PROMPT:foo] something [PROMPT:bar] something ..."

    The overall goal is to identify the beginning and end of each [PROMPT...]
    string in the line.

    Ideas anyone?
    --
    ----------------------------------------------------------------------------
    Tim Daneliuk
    PGP Key: http://www.tundraware.com/PGP/
    Tim Daneliuk, Jan 22, 2005
    #1
    1. Advertising

  2. Re: Insanity

    Tim Daneliuk wrote:

    > Given an arbitrary string, I want to find each individual instance of
    > text in the form: "[PROMPT:eek:ptional text]"
    >
    > I tried this:
    >
    > y=re.compile(r'\[PROMPT:.*\]')
    >
    > Which works fine when the text is exactly "[PROMPT:whatever]"


    didn't you leave something out here? "compile" only compiles that pattern;
    it doesn't match it against your string...

    > but does not match on:
    >
    > "something [PROMPT:foo] something [PROMPT:bar] something ..."
    >
    > The overall goal is to identify the beginning and end of each [PROMPT...]
    > string in the line.


    if the pattern can occur anywhere in the string, you need to use "search",
    not "match". if you want multiple matches, you can use "findall" or, better
    in this case, "finditer":

    import re

    s = "something [PROMPT:foo] something [PROMPT:bar] something"

    for m in re.finditer(r'\[PROMPT:[^]]*\]', s):
    print m.span(0)

    prints

    (10, 22)
    (33, 45)

    which looks reasonably correct.

    (note the "[^x]*x" form, which is an efficient way to spell "non-greedy match"
    for cases like this)

    </F>
    Fredrik Lundh, Jan 22, 2005
    #2
    1. Advertising

  3. Tim Daneliuk

    Duncan Booth Guest

    Tim Daneliuk wrote:

    >
    > I tried this:
    >
    > y=re.compile(r'\[PROMPT:.*\]')
    >
    > Which works fine when the text is exactly "[PROMPT:whatever]" but
    > does not match on:
    >
    > "something [PROMPT:foo] something [PROMPT:bar] something ..."
    >
    > The overall goal is to identify the beginning and end of each [PROMPT...]
    > string in the line.
    >


    The answer sort of depends on exactly what can be in your optional text:

    >>> import re
    >>> s = "something [PROMPT:foo] something [PROMPT:bar] something ..."
    >>> y=re.compile(r'\[PROMPT:.*\]')
    >>> y.findall(s)

    ['[PROMPT:foo] something [PROMPT:bar]']
    >>> y=re.compile(r'\[PROMPT:.*?\]')
    >>> y.findall(s)

    ['[PROMPT:foo]', '[PROMPT:bar]']
    >>> y=re.compile(r'\[PROMPT:[^]]*\]')
    >>> y.findall(s)

    ['[PROMPT:foo]', '[PROMPT:bar]']
    >>>


    ..* will match as long a string as possible.

    ..*? will match as short a string as possible. By default this won't match
    any newlines.

    [^]]* will match as long a string that doesn't contain ']' as possible.
    This will match newlines.
    Duncan Booth, Jan 22, 2005
    #3
  4. Tim Daneliuk

    Tim Daneliuk Guest

    Re: Insanity

    Fredrik Lundh wrote:

    > Tim Daneliuk wrote:
    >
    >
    >>Given an arbitrary string, I want to find each individual instance of
    >>text in the form: "[PROMPT:eek:ptional text]"
    >>
    >>I tried this:
    >>
    >> y=re.compile(r'\[PROMPT:.*\]')
    >>
    >>Which works fine when the text is exactly "[PROMPT:whatever]"

    >
    >
    > didn't you leave something out here? "compile" only compiles that pattern;
    > it doesn't match it against your string...


    Sorry - I thought this was obvious - I was interested more in the conceptual
    part of the contruction of the re itself.

    >
    >>but does not match on:
    >>
    >> "something [PROMPT:foo] something [PROMPT:bar] something ..."
    >>
    >>The overall goal is to identify the beginning and end of each [PROMPT...]
    >>string in the line.

    >
    >
    > if the pattern can occur anywhere in the string, you need to use "search",
    > not "match". if you want multiple matches, you can use "findall" or, better
    > in this case, "finditer":
    >
    > import re
    >
    > s = "something [PROMPT:foo] something [PROMPT:bar] something"
    >
    > for m in re.finditer(r'\[PROMPT:[^]]*\]', s):
    > print m.span(0)
    >
    > prints
    >
    > (10, 22)
    > (33, 45)
    >
    > which looks reasonably correct.
    >
    > (note the "[^x]*x" form, which is an efficient way to spell "non-greedy match"
    > for cases like this)
    >


    Thanks - very helpful. One followup - your re works as advertised. But
    if I use: r'\[PROMPT:[^]].*\]' it seems not to. the '.*' instead of just '*'
    it matches the entire string ... which seems counterintutive to me.

    Thanks,


    --
    ----------------------------------------------------------------------------
    Tim Daneliuk
    PGP Key: http://www.tundraware.com/PGP/
    Tim Daneliuk, Jan 23, 2005
    #4
  5. Tim Daneliuk wrote:
    > For some reason, I am having the hardest time doing something that should
    > be obvious. (Note time of posting ;)
    >
    > Given an arbitrary string, I want to find each individual instance of
    > text in the form: "[PROMPT:eek:ptional text]"
    >
    > I tried this:
    >
    > y=re.compile(r'\[PROMPT:.*\]')
    >
    > Which works fine when the text is exactly "[PROMPT:whatever]" but
    > does not match on:
    >
    > "something [PROMPT:foo] something [PROMPT:bar] something ..."
    >
    > The overall goal is to identify the beginning and end of each [PROMPT...]
    > string in the line.
    >
    > Ideas anyone?


    If I understand correctly, this is what you are trying to achieve:

    >>> import re
    >>> temp = "something [PROMPT:foo] something [PROMPT:bar] something ..."
    >>> prompt_re = re.compile(r"\[PROMPT:.*?\]")
    >>> prompt_re.findall(temp)

    ['[PROMPT:foo]', '[PROMPT:bar]']
    >>>


    HTH,

    --
    Orlando
    Orlando Vazquez, Jan 23, 2005
    #5
  6. Re: Insanity

    Tim Daneliuk wrote:

    > Thanks - very helpful. One followup - your re works as advertised. But
    > if I use: r'\[PROMPT:[^]].*\]' it seems not to. the '.*' instead of just '*'
    > it matches the entire string ...


    it's not "just '*'", it's "[^]]*". it's the "^]" set (anything but ]) that's repeated.

    "[^]].*\]" means match a single non-] character, and then match as many
    characters as you possibly can, as long as the next character is a ].

    "[^]]*\]" means match as many non-] characters as possible, plus a single ].

    > which seems counterintutive to me.


    then you need to study RE:s a bit more.

    (hint: an RE isn't a template, it's a language description, and the RE engine
    is designed to answer the question "does this string belong to this language"
    (for match) or "is there any substring in this string that belongs to this
    language" (for search) as quickly as possible. things like match locations
    etc are side effects).

    </F>
    Fredrik Lundh, Jan 23, 2005
    #6
  7. Tim Daneliuk

    Tim Daneliuk Guest

    Re: Insanity

    Fredrik Lundh wrote:

    > Tim Daneliuk wrote:
    >
    >
    >>Thanks - very helpful. One followup - your re works as advertised. But
    >>if I use: r'\[PROMPT:[^]].*\]' it seems not to. the '.*' instead of just '*'
    >>it matches the entire string ...

    >
    >
    > it's not "just '*'", it's "[^]]*". it's the "^]" set (anything but ]) that's repeated.
    >
    > "[^]].*\]" means match a single non-] character, and then match as many
    > characters as you possibly can, as long as the next character is a ].
    >
    > "[^]]*\]" means match as many non-] characters as possible, plus a single ].


    Got it - 'Makes perfect sense too

    >
    >
    >>which seems counterintutive to me.

    >
    >
    > then you need to study RE:s a bit more.
    >
    > (hint: an RE isn't a template, it's a language description, and the RE engine
    > is designed to answer the question "does this string belong to this language"
    > (for match) or "is there any substring in this string that belongs to this
    > language" (for search) as quickly as possible. things like match locations
    > etc are side effects).


    Yes, I understand this. But your clarification is most helpful. Thanks!

    ----------------------------------------------------------------------------
    Tim Daneliuk
    PGP Key: http://www.tundraware.com/PGP/
    Tim Daneliuk, Jan 23, 2005
    #7
  8. Tim Daneliuk

    Tim Daneliuk Guest

    Orlando Vazquez wrote:

    > Tim Daneliuk wrote:
    >
    >> For some reason, I am having the hardest time doing something that should
    >> be obvious. (Note time of posting ;)
    >>
    >> Given an arbitrary string, I want to find each individual instance of
    >> text in the form: "[PROMPT:eek:ptional text]"
    >>
    >> I tried this:
    >>
    >> y=re.compile(r'\[PROMPT:.*\]')
    >>
    >> Which works fine when the text is exactly "[PROMPT:whatever]" but
    >> does not match on:
    >>
    >> "something [PROMPT:foo] something [PROMPT:bar] something ..."
    >>
    >> The overall goal is to identify the beginning and end of each [PROMPT...]
    >> string in the line.
    >>
    >> Ideas anyone?

    >
    >
    > If I understand correctly, this is what you are trying to achieve:
    >
    > >>> import re
    > >>> temp = "something [PROMPT:foo] something [PROMPT:bar] something ..."
    > >>> prompt_re = re.compile(r"\[PROMPT:.*?\]")
    > >>> prompt_re.findall(temp)

    > ['[PROMPT:foo]', '[PROMPT:bar]']
    > >>>

    >
    > HTH,
    >
    > --
    > Orlando


    Yes - that seems to be the simplest solution to the problem. I'd forgotten
    entirely about non-greedy matching when I asked the question. Thanks.

    --
    ----------------------------------------------------------------------------
    Tim Daneliuk
    PGP Key: http://www.tundraware.com/PGP/
    Tim Daneliuk, Jan 23, 2005
    #8
  9. Tim Daneliuk

    Aahz Guest

    In article <>,
    Tim Daneliuk <> wrote:
    >
    >Given an arbitrary string, I want to find each individual instance of
    >text in the form: "[PROMPT:eek:ptional text]"
    >
    >I tried this:
    >
    > y=re.compile(r'\[PROMPT:.*\]')
    >
    >Which works fine when the text is exactly "[PROMPT:whatever]" but
    >does not match on:
    >
    > "something [PROMPT:foo] something [PROMPT:bar] something ..."
    >
    >The overall goal is to identify the beginning and end of each [PROMPT...]
    >string in the line.
    >
    >Ideas anyone?


    Yeah, read the Friedl book. (Okay, so that's not gonna help right now,
    but trust me, if you're going to write lots of regexes, READ THAT BOOK.)
    --
    Aahz () <*> http://www.pythoncraft.com/

    "19. A language that doesn't affect the way you think about programming,
    is not worth knowing." --Alan Perlis
    Aahz, Jan 26, 2005
    #9
  10. Tim Daneliuk

    Tim Daneliuk Guest

    Aahz wrote:
    > In article <>,
    > Tim Daneliuk <> wrote:
    >
    >>Given an arbitrary string, I want to find each individual instance of
    >>text in the form: "[PROMPT:eek:ptional text]"
    >>
    >>I tried this:
    >>
    >> y=re.compile(r'\[PROMPT:.*\]')
    >>
    >>Which works fine when the text is exactly "[PROMPT:whatever]" but
    >>does not match on:
    >>
    >> "something [PROMPT:foo] something [PROMPT:bar] something ..."
    >>
    >>The overall goal is to identify the beginning and end of each [PROMPT...]
    >>string in the line.
    >>
    >>Ideas anyone?

    >
    >
    > Yeah, read the Friedl book. (Okay, so that's not gonna help right now,
    > but trust me, if you're going to write lots of regexes, READ THAT BOOK.)


    I've read significant parts of it. The problem is that I don't write
    re often enough to recall all the subtle details ... plus I am getting
    old and feeble... ;)

    --
    ----------------------------------------------------------------------------
    Tim Daneliuk
    PGP Key: http://www.tundraware.com/PGP/
    Tim Daneliuk, Jan 26, 2005
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Big D

    Viewstate Insanity

    Big D, Jan 6, 2004, in forum: ASP .Net
    Replies:
    6
    Views:
    558
    Big D
    Jan 6, 2004
  2. JavaEnquirer

    Java Management Insanity

    JavaEnquirer, Jul 20, 2005, in forum: Java
    Replies:
    6
    Views:
    406
  3. Timothy Smith

    urllib download insanity

    Timothy Smith, May 12, 2005, in forum: Python
    Replies:
    1
    Views:
    407
    Andrew Dalke
    May 12, 2005
  4. Robert

    Switch() parsing insanity

    Robert, Jul 21, 2005, in forum: C Programming
    Replies:
    9
    Views:
    320
    Robert
    Jul 21, 2005
  5. Shark

    Insanity or technicality?

    Shark, Dec 12, 2005, in forum: C++
    Replies:
    4
    Views:
    272
    Howard
    Dec 12, 2005
Loading...

Share This Page