excluding search string in regular expressions

Discussion in 'Python' started by Franz Steinhaeusler, Oct 21, 2004.

  1. Hello,

    Following Problem:

    find only occurances, where in the line are'::' characters and
    the former line is not equal '**/'

    so 2) and 3) should be found and 1) not.

    1)
    """
    **/
    void C::B
    """

    2)
    """

    void C::B
    """

    3)
    """
    */
    void C::B
    """

    I tried something
    "\*\*/\n.*::"

    But this is the opposite.

    So my question is: how can I exclude a pattern?

    single characters with [^ab] but I need not(ab)

    not_this_brace_pattern(\*\*/\n).*::

    thank you in advance,
    --
    Franz Steinhaeusler
    Franz Steinhaeusler, Oct 21, 2004
    #1
    1. Advertising

  2. On Thu, 21 Oct 2004 13:36:46 +0200, Franz Steinhaeusler
    <> wrote:

    >
    >single characters with [^ab] but I need not(ab)
    >
    >not_this_brace_pattern(\*\*/\n).*::


    Sorry,
    is this the solution (simple concatenating [^*][^*][^/]\n.*:: ?


    The background:
    I want to scan cpp file, whether the have a doxygen comment already:
    It should find all postitions, where this is missing:

    ok

    doxygen comment
    **/
    void CBs::InitButtonPanel (int progn1, int progn2)

    the problem is to find the method or function definition, and for
    that, I need a regex.
    it should ignore blabla::InitButtonPanel(a, b);

    So a mark is that if there is a semikolon at the end,
    it is no function or method defininition.

    So I would need
    [^*][^*][^/]\n.*[)]*[^;]
    but this is not working.

    Thank you again in advance!
    --
    Franz Steinhaeusler
    Franz Steinhaeusler, Oct 21, 2004
    #2
    1. Advertising

  3. Franz Steinhaeusler

    Mitja Guest

    Franz Steinhaeusler wrote:
    > On Thu, 21 Oct 2004 13:36:46 +0200, Franz Steinhaeusler
    > <> wrote:
    >
    >>
    >> single characters with [^ab] but I need not(ab)
    >>
    >> not_this_brace_pattern(\*\*/\n).*::

    >
    > Sorry,
    > is this the solution (simple concatenating
    > [^*][^*][^/]\n.*:: ?


    That should do, though it's admittedly far from elegant; I, too, would like to see a nicer solution.

    > The background:
    > I want to scan cpp file, whether the have a doxygen
    > comment already: It should find all postitions, where
    > this is missing:
    >
    > ok
    >
    > doxygen comment
    > **/
    > void CBs::InitButtonPanel (int progn1, int progn2)


    In this case, I'd replace \n with \w*, meaning any amount of whitespace.
    Mitja, Oct 21, 2004
    #3
  4. On Thu, 21 Oct 2004 14:40:24 +0200, "Mitja" <> wrote:

    >Franz Steinhaeusler wrote:
    >> On Thu, 21 Oct 2004 13:36:46 +0200, Franz Steinhaeusler
    >> <> wrote:
    >>
    >>>
    >>> single characters with [^ab] but I need not(ab)
    >>>
    >>> not_this_brace_pattern(\*\*/\n).*::

    >>
    >> Sorry,
    >> is this the solution (simple concatenating
    >> [^*][^*][^/]\n.*:: ?

    >
    >That should do, though it's admittedly far from elegant; I, too, would like to see a nicer solution.
    >
    >> The background:
    >> I want to scan cpp file, whether the have a doxygen
    >> comment already: It should find all postitions, where
    >> this is missing:
    >>
    >> ok
    >>
    >> doxygen comment
    >> **/
    >> void CBs::InitButtonPanel (int progn1, int progn2)

    >
    >In this case, I'd replace \n with \w*, meaning any amount of whitespace.
    >


    Hello, thank you.

    Oh, not really right (about finding c function/method definition):

    [^*][^*][^/]\w*.*[)]*[^;]


    if func()
    {

    would also be found.

    A more common solution for detecting functions/Methods would be fine.

    [^*][^*][^/]\w*--c-method/function/definition


    --
    Franz Steinhaeusler
    Franz Steinhaeusler, Oct 21, 2004
    #4
  5. Mitja <> wrote:
    > Franz Steinhaeusler wrote:
    > > Franz Steinhaeusler wrote:
    > > > [...]
    > > > single characters with [^ab] but I need not(ab)
    > > >
    > > > not_this_brace_pattern(\*\*/\n).*::

    > >
    > > Sorry,
    > > is this the solution (simple concatenating
    > > [^*][^*][^/]\n.*:: ?

    >
    > That should do, though it's admittedly far from elegant; I, too,
    > would like to see a nicer solution.


    It won't work correctly. Franz needs a sub-expression that
    matches anything which is not "**/". However, [^*][^*][^/]
    is a character-wise negation, not word-wise. It doesn't
    match "**/", but neither does it match "xx/", nor any other
    string which has only one or two of the characters at the
    right position.

    What you need is a "negative look-behind assertion". The
    following Python-RE will do: (?<!\*\*/)\n.*::
    Remember to use raw string notation, or you need to double
    the backslashes:

    my_re_str = r"(?<!\*\*/)\n.*::"
    my_re_obj = re.compile(my_re_str)

    Note that you might want to use \s* instead of \n, so any
    amount of whitespace (including newlines) is matched, not
    just one single newline.

    For more information about regular expressions supported by
    Python, refer to the Library Reference manual:

    http://docs.python.org/lib/re-syntax.html

    Best regards
    Oliver

    --
    Oliver Fromme, Konrad-Celtis-Str. 72, 81369 Munich, Germany

    ``All that we see or seem is just a dream within a dream.''
    (E. A. Poe)
    Oliver Fromme, Oct 21, 2004
    #5
  6. >
    > A more common solution for detecting functions/Methods would be fine.


    Maybe you should go for a real parser here - together with a
    C-syntax-grammar. Trying to cram this stuff into regexps is bound for not
    catching special cases. And its gereally difficult to have a regexp _not_
    macht a certain word.

    Another approach would be to look for closing comments and function
    definitions in several rexes, and use python-logic:

    if doxy_close_rex.match(line):
    line = lines.next()
    if fun_def_rex.match(line):
    ....


    --
    Regards,

    Diez B. Roggisch
    Diez B. Roggisch, Oct 21, 2004
    #6
  7. On Thu, 21 Oct 2004 13:36:46 +0200, Franz Steinhaeusler <> wrote:

    >Hello,
    >
    >Following Problem:
    >
    >find only occurances, where in the line are'::' characters and
    >the former line is not equal '**/'
    >
    >so 2) and 3) should be found and 1) not.
    >
    >1)
    >"""
    >**/
    >void C::B
    >"""
    >
    >2)
    >"""
    >
    >void C::B
    >"""
    >
    >3)
    >"""
    >*/
    >void C::B
    >"""
    >
    >I tried something
    >"\*\*/\n.*::"
    >
    >But this is the opposite.
    >
    >So my question is: how can I exclude a pattern?
    >
    >single characters with [^ab] but I need not(ab)
    >
    >not_this_brace_pattern(\*\*/\n).*::
    >
    >thank you in advance,


    To look back a line, I think I'd just use a generator, and test current
    and last lines to get what I wanted. E.g., perhaps you can adapt this:
    (I am just going literally by
    """
    find only occurances, where in the line are'::' characters and
    the former line is not equal '**/'
    """
    which doesn't need a regex)

    >>> def findem(lineseq):

    ... getline = iter(lineseq).next
    ... curr = getline().rstrip()
    ... while True:
    ... last, curr = curr, getline().rstrip()
    ... if '::' in curr and last != '**/': yield curr
    ...

    I made a file, modifying your data a little:

    >>> print '----\n%s----'% file('franz.txt').read()

    ----
    1)
    """
    **/
    void C::B -- no (1)
    """

    2)
    """

    void C::B -- yes (2)
    """

    3)
    """
    */
    void C::B -- yes (3)
    """
    ----

    Here's what the generator returns:

    >>> for line in findem(file('franz.txt')): print repr(line)

    ...
    'void C::B -- yes (2)'
    'void C::B -- yes (3)'


    Regards,
    Bengt Richter
    Bengt Richter, Oct 21, 2004
    #7
  8. On Thu, 21 Oct 2004 15:32:37 +0200, "Diez B. Roggisch"
    <> wrote:

    >>
    >> A more common solution for detecting functions/Methods would be fine.

    >
    >Maybe you should go for a real parser here - together with a
    >C-syntax-grammar. Trying to cram this stuff into regexps is bound for not
    >catching special cases. And its gereally difficult to have a regexp _not_
    >macht a certain word.
    >


    Hello Diez,

    thanks, yes, it is difficult for "not" find a searchstring in regex ;)

    I only want to find a regex for an editor (which is written in python)
    to have a common function (of course it cannot be so accurate as a
    parser) to find a function/method defininition.

    >Another approach would be to look for closing comments and function
    >definitions in several rexes, and use python-logic:
    >
    >if doxy_close_rex.match(line):
    > line = lines.next()
    > if fun_def_rex.match(line):
    > ....


    --
    Franz Steinhaeusler
    Franz Steinhaeusler, Oct 22, 2004
    #8
  9. On 21 Oct 2004 13:28:28 GMT, Oliver Fromme <>
    wrote:

    >Mitja <> wrote:
    > > Franz Steinhaeusler wrote:
    > > > Franz Steinhaeusler wrote:
    > > > > [...]
    > > > > single characters with [^ab] but I need not(ab)
    > > > >
    > > > > not_this_brace_pattern(\*\*/\n).*::
    > > >
    > > > Sorry,
    > > > is this the solution (simple concatenating
    > > > [^*][^*][^/]\n.*:: ?

    > >
    > > That should do, though it's admittedly far from elegant; I, too,
    > > would like to see a nicer solution.

    >


    Hello Oliver,

    >It won't work correctly. Franz needs a sub-expression that
    >matches anything which is not "**/". However, [^*][^*][^/]
    >is a character-wise negation, not word-wise. It doesn't
    >match "**/", but neither does it match "xx/", nor any other
    >string which has only one or two of the characters at the
    >right position.


    yes, you are right, the approach above is false.

    >
    >What you need is a "negative look-behind assertion".


    ??, sounds interesting ;)

    >The
    >following Python-RE will do: (?<!\*\*/)\n.*::
    >Remember to use raw string notation, or you need to double
    >the backslashes:
    >
    >my_re_str = r"(?<!\*\*/)\n.*::"



    >my_re_obj = re.compile(my_re_str)
    >
    >Note that you might want to use \s* instead of \n, so any
    >amount of whitespace (including newlines) is matched, not
    >just one single newline.
    >
    >For more information about regular expressions supported by
    >Python, refer to the Library Reference manual:
    >
    >http://docs.python.org/lib/re-syntax.html
    >


    (?<!...)
    Matches if the current position in the string is not preceded by a
    match for..

    That is it.

    Many thanks for your helpful reply,

    --
    Franz Steinhaeusler
    Franz Steinhaeusler, Oct 22, 2004
    #9
  10. On Thu, 21 Oct 2004 22:38:00 GMT, (Bengt Richter) wrote:

    >
    >To look back a line, I think I'd just use a generator, and test current
    >and last lines to get what I wanted. E.g., perhaps you can adapt this:
    >(I am just going literally by
    > """
    > find only occurances, where in the line are'::' characters and
    > the former line is not equal '**/'
    > """
    >which doesn't need a regex)
    >
    > >>> def findem(lineseq):

    > ... getline = iter(lineseq).next
    > ... curr = getline().rstrip()
    > ... while True:
    > ... last, curr = curr, getline().rstrip()
    > ... if '::' in curr and last != '**/': yield curr
    > ...
    >[...]
    >
    >Regards,
    >Bengt Richter


    Hello Bengt,

    thank you for suggesting this interesting approach,

    regards
    --
    Franz Steinhaeusler
    Franz Steinhaeusler, Oct 22, 2004
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jay Douglas
    Replies:
    0
    Views:
    592
    Jay Douglas
    Aug 15, 2003
  2. kimimaro
    Replies:
    4
    Views:
    288
    Barry Schwarz
    Nov 1, 2004
  3. jobs
    Replies:
    2
    Views:
    2,088
    Jesse Houwing
    Aug 9, 2007
  4. Kurt Euler
    Replies:
    3
    Views:
    78
    Robert Klemme
    Dec 3, 2003
  5. Noman Shapiro
    Replies:
    0
    Views:
    219
    Noman Shapiro
    Jul 17, 2013
Loading...

Share This Page