Quickie - Regexp for a string not at the beginning of the line

Discussion in 'Python' started by Rivka Miller, Oct 25, 2012.

  1. Rivka Miller

    Rivka Miller Guest

    Hello Programmers,

    I am looking for a regexp for a string not at the beginning of the
    line.

    For example, I want to find $hello$ that does not occur at the
    beginning of the string, ie all $hello$ that exclude ^$hello$.

    In addition, if you have a more difficult problem along the same
    lines, I would appreciate it. For a single character, eg < not at the
    beginning of the line, it is easier, ie

    ^[^<]+<

    but I cant use the same method for more than one character string as
    permutation is present and probably for more than one occurrence,
    greedy or non-greedy version of [^<]+ would pick first or last but not
    the middle ones, unless I break the line as I go and use the non-
    greedy version of +. I do have the non-greedy version available, but
    what if I didnt?

    If you cannot solve the problem completely, just give me a quick
    solution with the first non beginning of the line and I will go from
    there as I need it in a hurry.

    Thanks
     
    Rivka Miller, Oct 25, 2012
    #1
    1. Advertising

  2. On 25.10.2012 22:53, Rivka Miller wrote:
    > Hello Programmers,
    >
    > I am looking for a regexp for a string not at the beginning of the
    > line.
    >
    > For example, I want to find $hello$ that does not occur at the
    > beginning of the string, ie all $hello$ that exclude ^$hello$.


    .hello

    The dot represents any character. But for specific strings that
    needs adjustments (e.g. looking for hh not at the beginning of a
    line would require something like ^[^h]+hh - ah, well, you wrote
    something similar below).

    Janis

    >
    > In addition, if you have a more difficult problem along the same
    > lines, I would appreciate it. For a single character, eg < not at the
    > beginning of the line, it is easier, ie
    >
    > ^[^<]+<
    >
    > but I cant use the same method for more than one character string as
    > permutation is present and probably for more than one occurrence,
    > greedy or non-greedy version of [^<]+ would pick first or last but not
    > the middle ones, unless I break the line as I go and use the non-
    > greedy version of +. I do have the non-greedy version available, but
    > what if I didnt?
    >
    > If you cannot solve the problem completely, just give me a quick
    > solution with the first non beginning of the line and I will go from
    > there as I need it in a hurry.
    >
    > Thanks
    >
    >
     
    Janis Papanagnou, Oct 25, 2012
    #2
    1. Advertising

  3. Rivka Miller

    Zero Piraeus Guest

    :

    On 25 October 2012 16:53, Rivka Miller <> wrote:
    > I am looking for a regexp for a string not at the beginning of the
    > line.


    There are probably quite a few ways to do this, but '(?<!^)PATTERN'
    has the advantage of explicitly describing what you're trying to do.
    For instance:

    >>> pattern = re.compile(r"(?<!^)\b\w+\b")
    >>> re.findall(pattern, "this is some text")

    ['is', 'some', 'text']

    -[]z.
     
    Zero Piraeus, Oct 25, 2012
    #3
  4. Rivka Miller

    Rivka Miller Guest

    On Oct 25, 2:27 pm, Danny <> wrote:
    > Why you just don't give us the string/input, say a line or two, and what you want off of it, so we can tell better what to suggest


    no one has really helped yet.

    I want to search and modify.

    I dont wanna be tied to a specific language etc so I just want a
    regexp and as many versions as possible. Maybe I should try in emacs
    and so I am now posting to emacs groups also, although javascript has
    rich set of regexp facilities.

    examples

    $hello$ should not be selected but
    not hello but all of the $hello$ and $hello$ ... $hello$ each one
    selected

    =================
    original post
    =================


    Hello Programmers,

    I am looking for a regexp for a string not at the beginning of the
    line.

    For example, I want to find $hello$ that does not occur at the
    beginning of the string, ie all $hello$ that exclude ^$hello$.

    In addition, if you have a more difficult problem along the same
    lines, I would appreciate it. For a single character, eg < not at the
    beginning of the line, it is easier, ie

    ^[^<]+<

    but I cant use the same method for more than one character string as
    permutation is present and probably for more than one occurrence,
    greedy or non-greedy version of [^<]+ would pick first or last but not
    the middle ones, unless I break the line as I go and use the non-
    greedy version of +. I do have the non-greedy version available, but
    what if I didnt?

    If you cannot solve the problem completely, just give me a quick
    solution with the first non beginning of the line and I will go from
    there as I need it in a hurry.

    Thanks
     
    Rivka Miller, Oct 26, 2012
    #4
  5. Rivka Miller

    Dave Angel Guest

    On 10/25/2012 09:08 PM, Rivka Miller wrote:
    > On Oct 25, 2:27 pm, Danny <> wrote:
    >> Why you just don't give us the string/input, say a line or two, and what you want off of it, so we can tell better what to suggest

    > no one has really helped yet.
    >
    > <SNIP>
    >
    > first non beginning of the line and I will go from
    > there as I need it in a hurry.
    >
    >


    Call a tow truck and tell him to jump your spare tire from his left turn
    signal. That'll be about as effective. But crying wolf to several
    towns at once is probably a mistake.

    --

    DaveA
     
    Dave Angel, Oct 26, 2012
    #5
  6. Rivka Miller

    Ed Morton Guest

    On 10/25/2012 8:08 PM, Rivka Miller wrote:
    > On Oct 25, 2:27 pm, Danny <> wrote:
    >> Why you just don't give us the string/input, say a line or two, and what you want off of it, so we can tell better what to suggest

    >
    > no one has really helped yet.


    Because there is no solution - there IS no _RE_ that will match a string not at
    the beginning of a line.

    Now if you want to know how to extract a string that matches an RE in awk,
    that'd be (just one way):

    awk 'match($0,/.[$]hello[$]/) { print substr($0,RSTART+1,RLENGTH-1) }'

    and other tools would have their ways of producing the same output, but that's
    not the question you're asking.

    Ed.
    >
    > I want to search and modify.
    >
    > I dont wanna be tied to a specific language etc so I just want a
    > regexp and as many versions as possible. Maybe I should try in emacs
    > and so I am now posting to emacs groups also, although javascript has
    > rich set of regexp facilities.
    >
    > examples
    >
    > $hello$ should not be selected but
    > not hello but all of the $hello$ and $hello$ ... $hello$ each one
    > selected
    >
    > =================
    > original post
    > =================
    >
    >
    > Hello Programmers,
    >
    > I am looking for a regexp for a string not at the beginning of the
    > line.
    >
    > For example, I want to find $hello$ that does not occur at the
    > beginning of the string, ie all $hello$ that exclude ^$hello$.
    >
    > In addition, if you have a more difficult problem along the same
    > lines, I would appreciate it. For a single character, eg < not at the
    > beginning of the line, it is easier, ie
    >
    > ^[^<]+<
    >
    > but I cant use the same method for more than one character string as
    > permutation is present and probably for more than one occurrence,
    > greedy or non-greedy version of [^<]+ would pick first or last but not
    > the middle ones, unless I break the line as I go and use the non-
    > greedy version of +. I do have the non-greedy version available, but
    > what if I didnt?
    >
    > If you cannot solve the problem completely, just give me a quick
    > solution with the first non beginning of the line and I will go from
    > there as I need it in a hurry.
    >
    > Thanks
    >
     
    Ed Morton, Oct 26, 2012
    #6
  7. Rivka Miller <> writes:

    > On Oct 25, 2:27 pm, Danny <> wrote:
    >> Why you just don't give us the string/input, say a line or two, and
    >> what you want off of it, so we can tell better what to suggest

    >
    > no one has really helped yet.


    Really? I was going to reply but then I saw Janis had given you the
    answer. If it's not the answer, you should just reply saying what it is
    that's wrong with it.

    > I want to search and modify.


    Ah. That was missing from the original post. You can't expect people
    to help with questions that weren't asked! To replace you will usually
    have to capture the single preceding character. E.g. in sed:

    sed -e 's/\(.\)$hello\$/\1XXX/'

    but some RE engines (Perl's, for example) allow you specify zero-width
    assertions. You could, in Perl, write

    s/(?<=.)\$hello\$/XXX/

    without having to capture whatever preceded the target string. But
    since Perl also has negative zero-width look-behind you can code your
    request even more directly:

    s/(?<!^)\$hello\$/XXX/

    > I dont wanna be tied to a specific language etc so I just want a
    > regexp and as many versions as possible. Maybe I should try in emacs
    > and so I am now posting to emacs groups also, although javascript has
    > rich set of regexp facilities.


    You can't always have a universal solution because different PE
    implementations have different syntax and semantics, but you should be
    able to translate Janis's solution of matching *something* before your
    target into every RE implementation around.

    > examples
    >
    > $hello$ should not be selected but
    > not hello but all of the $hello$ and $hello$ ... $hello$ each one
    > selected


    I have taken your $s to be literal. That's not 100 obvious since $ is a
    common (universal?) RE meta-character.

    <snip>
    --
    Ben.
     
    Ben Bacarisse, Oct 26, 2012
    #7
  8. Rivka Miller

    MRAB Guest

    On 2012-10-26 02:08, Rivka Miller wrote:
    > On Oct 25, 2:27 pm, Danny <> wrote:
    >> Why you just don't give us the string/input, say a line or two, and what you want off of it, so we can tell better what to suggest

    >
    > no one has really helped yet.
    >
    > I want to search and modify.
    >
    > I dont wanna be tied to a specific language etc so I just want a
    > regexp and as many versions as possible. Maybe I should try in emacs
    > and so I am now posting to emacs groups also, although javascript has
    > rich set of regexp facilities.
    >
    > examples
    >
    > $hello$ should not be selected but
    > not hello but all of the $hello$ and $hello$ ... $hello$ each one
    > selected
    >

    [snip]
    To match the literal "$hello$" except at the start of a line, use:

    (?<!^)\$hello\$

    with the multiline flag set. You could set the multiline flag within
    the regex like this:

    (?m)(?<!^)\$hello\$

    re.search will find the first occurrence. In order to find all such
    occurrences in Python, you need to use re.findall or re.finditer.
    (Other languages have their own ways.)

    Note that there are different 'flavours' of regex, the most common form
    following the lead of Perl, and that implementations might differ in
    which features they support.
     
    MRAB, Oct 26, 2012
    #8
  9. On 25/10/2012 21:53, Rivka Miller wrote:
    > Hello Programmers,
    >
    > I am looking for a regexp for a string not at the beginning of the
    > line.
    >


    Why bother with a silly regex thingy when simple string methods will
    suffice e.g.

    'yourstring'.find('xyz', 1)

    or

    'yourstring'.index('xyz', 1)

    or

    'xyz' in 'yourstring'[1:]

    --
    Cheers.

    Mark Lawrence.
     
    Mark Lawrence, Oct 26, 2012
    #9
  10. Rivka Miller

    Guest

    On Thu, 25 Oct 2012 18:08:53 -0700 (PDT), Rivka Miller
    <> wrote in
    <>:

    >no one has really helped yet.


    We regret that you are not a satisfied customer.

    Please take your receipt to the cashier and you will receive double your
    money back according to our "you must be satisfied" guarantee.
     
    , Oct 26, 2012
    #10
  11. Rivka Miller

    Rivka Miller Guest

    Thanks everyone, esp this gentleman.

    The solution that worked best for me is just to use a DOT before the
    string as the one at the beginning of the line did not have any char
    before it. I guess, this requires the ability to ignore the CARAT as
    the beginning of the line.

    I am a satisfied custormer. No need for returns. :)

    On Oct 25, 7:11 pm, Ben Bacarisse <> wrote:
    > Rivka Miller <> writes:
    > > On Oct 25, 2:27 pm, Danny <> wrote:
    > >> Why you just don't give us the string/input, say a line or two, and
    > >> what you want off of it, so we can tell better what to suggest

    >
    > > no one has really helped yet.

    >
    > Really?  I was going to reply but then I saw Janis had given you the
    > answer.  If it's not the answer, you should just reply saying what it is
    > that's wrong with it.
    >
    > > I want to search and modify.

    >
    > Ah.  That was missing from the original post.  You can't expect people
    > to help with questions that weren't asked!  To replace you will usually
    > have to capture the single preceding character.  E.g. in sed:
    >
    >   sed -e 's/\(.\)$hello\$/\1XXX/'
    >
    > but some RE engines (Perl's, for example) allow you specify zero-width
    > assertions.  You could, in Perl, write
    >
    >   s/(?<=.)\$hello\$/XXX/
    >
    > without having to capture whatever preceded the target string.  But
    > since Perl also has negative zero-width look-behind you can code your
    > request even more directly:
    >
    >   s/(?<!^)\$hello\$/XXX/
    >
    > > I dont wanna be tied to a specific language etc so I just want a
    > > regexp and as many versions as possible. Maybe I should try in emacs
    > > and so I am now posting to emacs groups also, although javascript has
    > > rich set of regexp facilities.

    >
    > You can't always have a universal solution because different PE
    > implementations have different syntax and semantics, but you should be
    > able to translate Janis's solution of matching *something* before your
    > target into every RE implementation around.
    >
    > > examples

    >
    > > $hello$ should not be selected but
    > > not hello but all of the $hello$ and $hello$ ... $hello$ each one
    > > selected

    >
    > I have taken your $s to be literal.  That's not 100 obvious since $ is a
    > common (universal?) RE meta-character.
    >
    > <snip>
    > --
    > Ben.
     
    Rivka Miller, Oct 26, 2012
    #11
  12. On Thu, Oct 25, 2012 at 10:00 PM, Ed Morton <> wrote:
    > Because there is no solution - there IS no _RE_ that will match a string not
    > at the beginning of a line.


    Depending on what the OP meant, the following would both work:

    - r"^(?!mystring)" (the string does not occur at the beginning)
    - r"(?!^)mystring" (the string occurs elsewhere than the beginning)
    [Someone else's interpretation]

    Both are "regular expressions" even in the academic sense, or else
    have a translation as regular expressions in the academic sense.
    They're also Python regexps. So I don't know what you mean.

    -- Devin
     
    Devin Jeanpierre, Oct 26, 2012
    #12
  13. Am 26.10.2012 06:45, schrieb Rivka Miller:
    > Thanks everyone, esp this gentleman.


    Who is "this"?

    >
    > The solution that worked best for me is just to use a DOT before the
    > string as the one at the beginning of the line did not have any char
    > before it.


    Which was what I suggested, and where you rudely answered...

    > no one has really helped yet.


    And obviously...

    > I am a satisfied custormer.


    ....your perception about yourself and about the role of us
    Usenet posters seems also not be very sane. Good luck.
     
    Janis Papanagnou, Oct 26, 2012
    #13
  14. Rivka Miller wrote:
    > I am looking for a regexp for a string not at the beginning of the
    > line.
    >
    > For example, I want to find $hello$ that does not occur at the
    > beginning of the string, ie all $hello$ that exclude ^$hello$.


    The begging of the string is zero width character. So you could use
    negative lookahead (?!^).
    Then the regular expression looks like:

    /(?!^)\$hello\$/g

    var str = '$hello$ should not be selected but',
    str1 = 'not hello but all of the $hello$ and $hello$ ... $hello$
    each one ';

    str.match(/(?!^)\$hello\$/g); //null
    str1.match(/(?!^)\$hello\$/g); //["$hello$", "$hello$", "$hello$"]
     
    Asen Bozhilov, Oct 26, 2012
    #14
  15. Rivka Miller <> writes:

    > Thanks everyone, esp this gentleman.


    Kind of you to single me out, but it was Janis Papanagnou who first
    posted the solution that you say "works best" for you.

    <snip>
    --
    Ben.
     
    Ben Bacarisse, Oct 26, 2012
    #15
  16. Rivka Miller

    Ed Morton Guest

    On 10/25/2012 11:45 PM, Rivka Miller wrote:
    > Thanks everyone, esp this gentleman.
    >
    > The solution that worked best for me is just to use a DOT before the
    > string as the one at the beginning of the line did not have any char
    > before it.


    That's fine but do you understand that that is not an RE that matches on
    "$hello$ not at the start of a line", it's an RE that matches on "<any
    char>$hello$ anywhere in the line"? There's a difference - if you use a tool
    that prints the text that matches an RE then the output if the first RE existed
    would be "$hello$" while the output for the second RE would be "X$hello$" or
    "Y$hello$" or....

    In some tools you can use /(.)$hello$/ or similar to ignore the first part of
    the RE "(.)" and just print the second "$hello", but that ability and it's
    syntax is tool-specific, you still can't say "here's an RE that does this",
    you've got to say "here's how to find this text using tool <whatever>".

    Ed.

    > I guess, this requires the ability to ignore the CARAT as the beginning of the line.
    >
    > I am a satisfied custormer. No need for returns. :)
    >
    > On Oct 25, 7:11 pm, Ben Bacarisse <> wrote:
    >> Rivka Miller <> writes:
    >>> On Oct 25, 2:27 pm, Danny <> wrote:
    >>>> Why you just don't give us the string/input, say a line or two, and
    >>>> what you want off of it, so we can tell better what to suggest

    >>
    >>> no one has really helped yet.

    >>
    >> Really? I was going to reply but then I saw Janis had given you the
    >> answer. If it's not the answer, you should just reply saying what it is
    >> that's wrong with it.
    >>
    >>> I want to search and modify.

    >>
    >> Ah. That was missing from the original post. You can't expect people
    >> to help with questions that weren't asked! To replace you will usually
    >> have to capture the single preceding character. E.g. in sed:
    >>
    >> sed -e 's/\(.\)$hello\$/\1XXX/'
    >>
    >> but some RE engines (Perl's, for example) allow you specify zero-width
    >> assertions. You could, in Perl, write
    >>
    >> s/(?<=.)\$hello\$/XXX/
    >>
    >> without having to capture whatever preceded the target string. But
    >> since Perl also has negative zero-width look-behind you can code your
    >> request even more directly:
    >>
    >> s/(?<!^)\$hello\$/XXX/
    >>
    >>> I dont wanna be tied to a specific language etc so I just want a
    >>> regexp and as many versions as possible. Maybe I should try in emacs
    >>> and so I am now posting to emacs groups also, although javascript has
    >>> rich set of regexp facilities.

    >>
    >> You can't always have a universal solution because different PE
    >> implementations have different syntax and semantics, but you should be
    >> able to translate Janis's solution of matching *something* before your
    >> target into every RE implementation around.
    >>
    >>> examples

    >>
    >>> $hello$ should not be selected but
    >>> not hello but all of the $hello$ and $hello$ ... $hello$ each one
    >>> selected

    >>
    >> I have taken your $s to be literal. That's not 100 obvious since $ is a
    >> common (universal?) RE meta-character.
    >>
    >> <snip>
    >> --
    >> Ben.

    >
     
    Ed Morton, Oct 26, 2012
    #16
  17. On Fri, Oct 26, 2012 at 8:32 AM, Ed Morton <> wrote:
    > On 10/25/2012 11:45 PM, Rivka Miller wrote:
    >>
    >> Thanks everyone, esp this gentleman.
    >>
    >> The solution that worked best for me is just to use a DOT before the
    >> string as the one at the beginning of the line did not have any char
    >> before it.

    >
    >
    > That's fine but do you understand that that is not an RE that matches on
    > "$hello$ not at the start of a line", it's an RE that matches on "<any
    > char>$hello$ anywhere in the line"? There's a difference - if you use a tool
    > that prints the text that matches an RE then the output if the first RE
    > existed would be "$hello$" while the output for the second RE would be
    > "X$hello$" or "Y$hello$" or....
    >
    > In some tools you can use /(.)$hello$/ or similar to ignore the first part
    > of the RE "(.)" and just print the second "$hello", but that ability and
    > it's syntax is tool-specific, you still can't say "here's an RE that does
    > this", you've got to say "here's how to find this text using tool
    > <whatever>".
    >
    > Ed.
    >
    >
    >> I guess, this requires the ability to ignore the CARAT as the beginning of
    >> the line.
    >>
    >> I am a satisfied custormer. No need for returns. :)
    >>
    >> On Oct 25, 7:11 pm, Ben Bacarisse <> wrote:
    >>>
    >>> Rivka Miller <> writes:
    >>>>
    >>>> On Oct 25, 2:27 pm, Danny <> wrote:
    >>>>>
    >>>>> Why you just don't give us the string/input, say a line or two, and
    >>>>> what you want off of it, so we can tell better what to suggest
    >>>
    >>>
    >>>> no one has really helped yet.
    >>>
    >>>
    >>> Really? I was going to reply but then I saw Janis had given you the
    >>> answer. If it's not the answer, you should just reply saying what it is
    >>> that's wrong with it.
    >>>
    >>>> I want to search and modify.
    >>>
    >>>
    >>> Ah. That was missing from the original post. You can't expect people
    >>> to help with questions that weren't asked! To replace you will usually
    >>> have to capture the single preceding character. E.g. in sed:
    >>>
    >>> sed -e 's/\(.\)$hello\$/\1XXX/'
    >>>
    >>> but some RE engines (Perl's, for example) allow you specify zero-width
    >>> assertions. You could, in Perl, write
    >>>
    >>> s/(?<=.)\$hello\$/XXX/
    >>>
    >>> without having to capture whatever preceded the target string. But
    >>> since Perl also has negative zero-width look-behind you can code your
    >>> request even more directly:
    >>>
    >>> s/(?<!^)\$hello\$/XXX/
    >>>
    >>>> I dont wanna be tied to a specific language etc so I just want a
    >>>> regexp and as many versions as possible. Maybe I should try in emacs
    >>>> and so I am now posting to emacs groups also, although javascript has
    >>>> rich set of regexp facilities.
    >>>
    >>>
    >>> You can't always have a universal solution because different PE
    >>> implementations have different syntax and semantics, but you should be
    >>> able to translate Janis's solution of matching *something* before your
    >>> target into every RE implementation around.
    >>>
    >>>> examples
    >>>
    >>>
    >>>> $hello$ should not be selected but
    >>>> not hello but all of the $hello$ and $hello$ ... $hello$ each one
    >>>> selected
    >>>
    >>>
    >>> I have taken your $s to be literal. That's not 100 obvious since $ is a
    >>> common (universal?) RE meta-character.
    >>>
    >>> <snip>
    >>> --
    >>> Ben.

    >>
    >>

    >
    > --
    > http://mail.python.org/mailman/listinfo/python-list


    I would use str.find('your string')

    It returns -1 if not found, and the index if it finds it.

    why use regex for this?

    --
    Joel Goldstick
     
    Joel Goldstick, Oct 26, 2012
    #17
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Craig G

    Javascript Quickie

    Craig G, Jan 25, 2005, in forum: ASP .Net
    Replies:
    3
    Views:
    425
    Craig G
    Jan 25, 2005
  2. Roy

    Quickie: Printing out code?

    Roy, Oct 18, 2005, in forum: ASP .Net
    Replies:
    2
    Views:
    427
  3. Joao Silva
    Replies:
    16
    Views:
    409
    7stud --
    Aug 21, 2009
  4. Jesse B.
    Replies:
    9
    Views:
    262
    Jesse B.
    Mar 27, 2010
  5. Replies:
    3
    Views:
    115
    Randy Webb
    Jul 17, 2006
Loading...

Share This Page