Need help with an advanced? regular expression

Discussion in 'Perl Misc' started by Martin Gill, Feb 18, 2005.

  1. Martin Gill

    Martin Gill Guest

    Hi,

    I'm trying to write a regular expression which parses the following string:

    blah blah items 1234, 4567, 4345, and 3245 blah blah blah

    I want to be able to pick up the numbers following the "items" label.

    I thought the following might work, but it doesn't seem to

    /ORs (\b(\d+)\b)+/

    i want it to match:
    1234
    4567
    4345
    3245

    Any help is greatly appreciated.


    --
    --
    Martin Gill
     
    Martin Gill, Feb 18, 2005
    #1
    1. Advertising

  2. Martin Gill

    Martin Gill Guest

    Thanks for the quick reply,

    Bernard El-Hagin wrote:
    > Martin Gill <> wrote:
    >
    >
    >>Hi,
    >>
    >>I'm trying to write a regular expression which parses the
    >>following string:
    >>
    >>blah blah items 1234, 4567, 4345, and 3245 blah blah blah
    >>
    >>I want to be able to pick up the numbers following the "items"
    >>label.
    >>
    >>I thought the following might work, but it doesn't seem to
    >>
    >>/ORs (\b(\d+)\b)+/

    >
    > ^^^
    >


    replace ORs with items. I'm trying to use the regex in different places,
    and I picked the other example.

    >
    > What is that supposed to do?
    >
    >
    >
    >>i want it to match:
    >>1234
    >>4567
    >>4345
    >>3245

    >
    >
    >
    > With the input and specification you've provided this will work for
    > you:
    >
    >
    > print "$_\n" for m/(\d+)/g;
    >
    >


    The problem I have is that the target string could be something like this:

    Over the next 10 days i'll deliver 4 items 1234, 1234, 5321 and 2345.

    I want to use items as the key phrase to identify the list of times.
    The example you gave will also find 10 and 4 which i don't want.

    In english, the regex i need is: Find all numbers after "items".


    --
    --
    Martin Gill
     
    Martin Gill, Feb 18, 2005
    #2
    1. Advertising

  3. Martin Gill wrote:
    > The problem I have is that the target string could be something like this:
    >
    > Over the next 10 days i'll deliver 4 items 1234, 1234, 5321 and 2345.
    >
    > I want to use items as the key phrase to identify the list of times.
    > The example you gave will also find 10 and 4 which i don't want.
    >
    > In english, the regex i need is: Find all numbers after "items".


    You don't necessarily need a pure regex, do you?

    print "$_\n" for substr($_, index $_, 'items') =~ /\d+/g;

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Feb 18, 2005
    #3
  4. Martin Gill <> writes:
    > The problem I have is that the target string could be something like this:
    >
    > Over the next 10 days i'll deliver 4 items 1234, 1234, 5321 and 2345.
    >
    > I want to use items as the key phrase to identify the list of times.
    > The example you gave will also find 10 and 4 which i don't want.
    >
    > In english, the regex i need is: Find all numbers after "items".


    I would first extract the substring beginning with "items" and then
    apply the regexp to find the numbers.

    Maybe it can be done in one single regexp (I don't think it can), but
    even if so, would it be worth the effort?
     
    Arndt Jonasson, Feb 18, 2005
    #4
  5. Martin Gill

    Guest

    Martin Gill wrote:
    >
    > I'm trying to write a regular expression which parses the following
    > string:
    >
    > blah blah items 1234, 4567, 4345, and 3245 blah blah blah
    >
    > I want to be able to pick up the numbers following the "items" label.
    >
    > I thought the following might work, but it doesn't seem to
    >
    > /ORs (\b(\d+)\b)+/
    >
    > i want it to match:
    > 1234
    > 4567
    > 4345
    > 3245



    Well, for one thing, you want to pick up the numbers following the
    "items" string, but in your regular expression you are searching for
    "ORs" instead (which doesn't appear in your string at all).

    If you have a $string variable:

    $string = "blah blah items 1234, 4567, 4345, and 3245 blah blah blah";

    you can print out all the numbers by first matching "items" and then by
    matching all the numbers in the postmatch (the $' variable), like this:

    if ($string =~ /items/)
    {
    # Everything after "items" (the postmatch) is now in $'
    # so extract all the numbers in $' :
    print "$_\n" foreach $' =~ m/\d+/g;
    }

    But be warned! Use of $' carries a performance penalty, making many of
    Perl programmers avoid it. If this performace penalty bothers you, you
    can avoid it with the following similar code:

    if ($string =~ /items(.*)/)
    {
    # Everything after "items" is now in $1
    # so extract all the numbers in $1 :
    print "$_\n" foreach $1 =~ m/\d+/g;
    }

    If this is the only regular expression in your program, or if the
    other regular expressions operate on relatively small strings, then
    using $' should be nothing to worry about. In such cases, I think you
    should use whatever method is more readable to you.

    I hope this helps.

    -- Jean-Luc
     
    , Feb 18, 2005
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. VSK
    Replies:
    2
    Views:
    2,392
  2. =?iso-8859-1?B?bW9vcJk=?=

    Matching abitrary expression in a regular expression

    =?iso-8859-1?B?bW9vcJk=?=, Dec 1, 2005, in forum: Java
    Replies:
    8
    Views:
    884
    Alan Moore
    Dec 2, 2005
  3. Eric Niebler
    Replies:
    2
    Views:
    508
    Phlip
    Mar 17, 2006
  4. Eric Niebler
    Replies:
    0
    Views:
    647
    Eric Niebler
    Oct 24, 2007
  5. Michele Simionato
    Replies:
    1
    Views:
    629
    Lacrima
    Mar 27, 2010
Loading...

Share This Page