help with a regex

Discussion in 'Perl Misc' started by donebrowsers, Mar 12, 2008.

  1. donebrowsers

    donebrowsers Guest

    I have the following dataset:
    zoo-2.10.1p1
    mutt-1.4.2.3-compressed
    lha-1.14i.ac20050924.1
    mysql-server-5.0.45
    p5-Archive-Tar-1.30
    php5-gd-5.2.3-no_x11

    There are package listings on an OpenBSD machine. I want to parse the
    package name, the version, and if there is a flavor, that too. I want
    the following:
    [zoo] [2.10.1p1]
    [mutt] [1.4.2.3] [compressed]
    [lha] [1.14i.ac20050924.1]
    [mysql-server] [5.0.45]
    [p5-Archive-Tar] [1.30]
    [php5-gd] [5.2.3] [no_x11]

    I currently have this regex (which is close but doesn't quite work):
    /(^.*)-(.*)($-\w)?/

    It currently gives me:
    [zoo] [2.10.1p1]
    [mutt-1.4.2.3] [compressed]
    [lha] [1.14i.ac20050924.1]
    [mysql-server] [5.0.45]
    [p5-Archive-Tar] [1.30]
    [php5-gd-5.2.3] [no_x11]

    As you can see I'm having issues with the flavor (the last part of the
    package name which not every package has) part. Any help?
     
    donebrowsers, Mar 12, 2008
    #1
    1. Advertising

  2. donebrowsers wrote:
    > I have the following dataset:
    > zoo-2.10.1p1
    > mutt-1.4.2.3-compressed
    > lha-1.14i.ac20050924.1
    > mysql-server-5.0.45
    > p5-Archive-Tar-1.30
    > php5-gd-5.2.3-no_x11
    >
    > There are package listings on an OpenBSD machine. I want to parse the
    > package name, the version, and if there is a flavor, that too. I want
    > the following:
    > [zoo] [2.10.1p1]
    > [mutt] [1.4.2.3] [compressed]
    > [lha] [1.14i.ac20050924.1]
    > [mysql-server] [5.0.45]
    > [p5-Archive-Tar] [1.30]
    > [php5-gd] [5.2.3] [no_x11]


    $ echo "zoo-2.10.1p1
    mutt-1.4.2.3-compressed
    lha-1.14i.ac20050924.1
    mysql-server-5.0.45
    p5-Archive-Tar-1.30
    php5-gd-5.2.3-no_x11" | \
    perl -lne'
    print join " ", map $_ ? "[$_]" : (), split /-?(\d[\w.]*\d)-?/, $_, 2
    '
    [zoo] [2.10.1p1]
    [mutt] [1.4.2.3] [compressed]
    [lha] [1.14i.ac20050924.1]
    [mysql-server] [5.0.45]
    [p5-Archive-Tar] [1.30]
    [php5-gd] [5.2.3] [no_x11]



    John
    --
    Perl isn't a toolbox, but a small machine shop where you
    can special-order certain sorts of tools at low cost and
    in short order. -- Larry Wall
     
    John W. Krahn, Mar 12, 2008
    #2
    1. Advertising

  3. donebrowsers

    donebrowsers Guest

    While that works and I appreciate it, I was just using the []s as a
    placeholder. I'm actually using PHP's <a href="http://us3.php.net/
    manual/en/function.preg-match.php">preg_match()</a> function which
    uses PERL style regular expressions. I submitted it to this group
    because PERL programmers tend to be better with regular expressions
    than anyone else.

    This function essentially matches parts and adds them to an array with
    [0] matching the whole string, [1]... matching the ()s. So what I
    really have is:
    array(3) {
    [0]=>
    string(12) "zoo-2.10.1p1"
    [1]=>
    string(3) "zoo"
    [2]=>
    string(8) "2.10.1p1"
    }
    array(3) {
    [0]=>
    string(23) "mutt-1.4.2.3-compressed"
    [1]=>
    string(12) "mutt-1.4.2.3"
    [2]=>
    string(10) "compressed"
    }
    array(3) {
    [0]=>
    string(22) "lha-1.14i.ac20050924.1"
    [1]=>
    string(3) "lha"
    [2]=>
    string(18) "1.14i.ac20050924.1"
    }
    array(3) {
    [0]=>
    string(19) "mysql-server-5.0.45"
    [1]=>
    string(12) "mysql-server"
    [2]=>
    string(6) "5.0.45"
    }
    array(3) {
    [0]=>
    string(19) "p5-Archive-Tar-1.30"
    [1]=>
    string(14) "p5-Archive-Tar"
    [2]=>
    string(4) "1.30"
    }
    array(3) {
    [0]=>
    string(20) "php5-gd-5.2.3-no_x11"
    [1]=>
    string(13) "php5-gd-5.2.3"
    [2]=>
    string(6) "no_x11"
    }

    What I want is ie:
    array(4) {
    [0]=>
    string(20) "php5-gd-5.2.3-no_x11"
    [1]=>
    string(13) "php5-gd"
    [2]=>
    "5.2.3"
    [3]=>
    string(6) "no_x11"
    }

    Thanks for your suggestion though, sorry for the confusion.
     
    donebrowsers, Mar 12, 2008
    #3
  4. donebrowsers

    Uri Guttman Guest

    >>>>> "d" == donebrowsers <> writes:

    d> While that works and I appreciate it, I was just using the []s as a
    d> placeholder. I'm actually using PHP's <a href="http://us3.php.net/
    d> manual/en/function.preg-match.php">preg_match()</a> function which
    d> uses PERL style regular expressions. I submitted it to this group
    d> because PERL programmers tend to be better with regular expressions
    d> than anyone else.

    it is Perl, never PERL. preg is NOT perl, nor is it compatible with
    perl. that is why we use Perl and not php. the answer you got was valid
    perl and that will likely be all you will get here.

    uri

    --
    Uri Guttman ------ -------- http://www.sysarch.com --
    ----- Perl Architecture, Development, Training, Support, Code Review ------
    ----------- Search or Offer Perl Jobs ----- http://jobs.perl.org ---------
    --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
     
    Uri Guttman, Mar 12, 2008
    #4
  5. donebrowsers

    donebrowsers Guest

    Fine. How would I use perl to do what I am trying to do? Strip out
    those parts and add them to an array?
     
    donebrowsers, Mar 12, 2008
    #5
  6. donebrowsers

    Uri Guttman Guest

    >>>>> "d" == donebrowsers <> writes:

    d> Fine. How would I use perl to do what I am trying to do? Strip out
    d> those parts and add them to an array?

    what was wrong with the answer you got? it split the package names into
    the parts you wanted. grabbing those and making them into arrays is
    trivial. just assign the grabs to an array or put the split into [] to
    make an anon array. then you can build up the data structure from that.

    uri

    --
    Uri Guttman ------ -------- http://www.sysarch.com --
    ----- Perl Architecture, Development, Training, Support, Code Review ------
    ----------- Search or Offer Perl Jobs ----- http://jobs.perl.org ---------
    --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
     
    Uri Guttman, Mar 12, 2008
    #6
  7. On Wed, 12 Mar 2008 12:31:20 -0700, donebrowsers wrote:

    > I have the following dataset:
    > zoo-2.10.1p1
    > mutt-1.4.2.3-compressed
    > lha-1.14i.ac20050924.1
    > mysql-server-5.0.45
    > p5-Archive-Tar-1.30
    > php5-gd-5.2.3-no_x11
    >
    > There are package listings on an OpenBSD machine. I want to parse the
    > package name, the version, and if there is a flavor, that too. I want
    > the following:
    > [zoo] [2.10.1p1]
    > [mutt] [1.4.2.3] [compressed]
    > [lha] [1.14i.ac20050924.1]
    > [mysql-server] [5.0.45]
    > [p5-Archive-Tar] [1.30]
    > [php5-gd] [5.2.3] [no_x11]
    >
    > I currently have this regex (which is close but doesn't quite work):
    > /(^.*)-(.*)($-\w)?/
    >
    > It currently gives me:
    > [zoo] [2.10.1p1]
    > [mutt-1.4.2.3] [compressed]
    > [lha] [1.14i.ac20050924.1]
    > [mysql-server] [5.0.45]
    > [p5-Archive-Tar] [1.30]
    > [php5-gd-5.2.3] [no_x11]
    >
    > As you can see I'm having issues with the flavor (the last part of the
    > package name which not every package has) part. Any help?


    How do you determine what part is what in p5-Archive-Tar-1.30? The easy
    solution (non-greedy regexpen, look it up in perldoc perlre) will
    misparse this entry, it will give [p5-Archive] [Tar] [1.30].

    Assuming the version always starts with a digit, and never has a dash,
    you may want to try (untested):
    /^(.*?)-(\d[^-]*)(?:-(.*))?$/

    HTH,
    M4
     
    Martijn Lievaart, Mar 12, 2008
    #7
  8. donebrowsers

    donebrowsers Guest

    That's it! Thanks Martijn.
     
    donebrowsers, Mar 12, 2008
    #8
  9. donebrowsers <> wrote:
    >I have the following dataset:
    >zoo-2.10.1p1
    >mutt-1.4.2.3-compressed
    >lha-1.14i.ac20050924.1
    >mysql-server-5.0.45
    >p5-Archive-Tar-1.30
    >php5-gd-5.2.3-no_x11
    >
    >There are package listings on an OpenBSD machine. I want to parse the
    >package name, the version, and if there is a flavor, that too. I want
    >the following:
    >[zoo] [2.10.1p1]
    >[mutt] [1.4.2.3] [compressed]
    >[lha] [1.14i.ac20050924.1]
    >[mysql-server] [5.0.45]
    >[p5-Archive-Tar] [1.30]
    >[php5-gd] [5.2.3] [no_x11]


    The examples are nice, but an additional verbal description of what you are
    trying to do would help a lot.
    Are you trying to split the string at each dash (minus sign) and store the
    pieces in an array?

    That is trivial:
    @pieces = split /-/, $string;

    jue
     
    Jürgen Exner, Mar 12, 2008
    #9
  10. donebrowsers

    donebrowsers Guest

    Yes and no. There are -s in the name of the package, for example php5-
    core. I want the package name php5-core, and the version, 5.2.3; and
    if there is a flavor, for example the mutt package, I want that
    separate. Martijn's regex works perfectly.
     
    donebrowsers, Mar 12, 2008
    #10
  11. donebrowsers

    Ben Morrow Guest

    Quoth donebrowsers <>:
    > I have the following dataset:
    > zoo-2.10.1p1
    > mutt-1.4.2.3-compressed
    > lha-1.14i.ac20050924.1
    > mysql-server-5.0.45
    > p5-Archive-Tar-1.30
    > php5-gd-5.2.3-no_x11
    >
    > There are package listings on an OpenBSD machine. I want to parse the
    > package name, the version, and if there is a flavor, that too. I want
    > the following:
    > [zoo] [2.10.1p1]
    > [mutt] [1.4.2.3] [compressed]
    > [lha] [1.14i.ac20050924.1]


    Assuming the 'flavour' never contains a dot, and 'version' always does
    (otherwise it's impossible to distinguish 'flavour' from 'version'
    without more information)

    my @pkg = /^(.+?)-([^-]+\.[^-]+)(?:-([^.-]+))?$/;

    or, better,

    my @pkg = m{
    ^ (.+?) - # name
    ( [^-]+ \. [^-]+ ) # version must contain a dot
    (?: - ( [^.-]+ ) )? $ # optional flavour musn't
    }x;

    It would probably be simpler to do this in multiple steps:

    my $flavour = s/-([^.-]+)$//;
    my $version = s/-([^-]+)$//;
    my $name = $_;

    (correct for American spellings to taste :) ).

    Doesn't OpenBSD provide tools to do this sort of thing, that reference
    the package database and know what the right answer is?

    Ben
     
    Ben Morrow, Mar 12, 2008
    #11
  12. donebrowsers

    donebrowsers Guest

    Unfortunately as far as I can tell no. I've searched and can't find
    anything.

    More info on the naming schene:
    The stem part identifies the package. It may contain some
    dashes, but
    its form is mostly conventional. For instance, japanese packages
    usually
    start with a `ja' prefix, e.g., "ja-kterm-6.2.0".

    The version part starts at the first digit that follows a `-',
    and goes
    on up to the following `-', or to the end of the package name, if
    no fla-
    vor modifier is present. It is highly recommended that all
    packages have
    a version number. Normally, the version number directly matches
    the
    original software distribution version number, or release date.
    In case
    there are substantial changes in the OpenBSD package, a patch
    level mark-
    er should be appended, e.g., `p1', `p2 ...' For example, assuming
    that
    the screen package for release 2.8 was named "screen-2.9.8" and
    that an
    important security patch led to a newer package, the new package
    would be
    called "screen-2.9.8p1". Obviously, these specific markers are
    reserved
    for OpenBSD purposes.

    Flavored packages will also contain a list of flavors after the
    version
    identifier, in a canonical order determined by FLAVORS in the
    correspond-
    ing port's Makefile. For instance, kterm has an xaw3d flavor:
    "ja-kterm-
    xaw3d".

    Note that, to uniquely identify the version part, no flavor shall
    ever
    start with a digit. Usually, flavored packages are slightly
    different
    versions of the same package that offer very similar
    functionalities.
     
    donebrowsers, Mar 12, 2008
    #12
  13. donebrowsers

    Ben Morrow Guest

    Quoth donebrowsers <>:
    > Unfortunately as far as I can tell no. I've searched and can't find
    > anything.

    <snip>
    > The version part starts at the first digit that follows a `-',
    > and goes on up to the following `-', or to the end of the package
    > name, if no fla- vor modifier is present.

    <snip>
    > Flavored packages will also contain a list of flavors after the
    > version identifier, in a canonical order determined by FLAVORS in
    > the correspond- ing port's Makefile.


    You didn't say there could be more than one flavour.

    <snip>
    > Note that, to uniquely identify the version part, no flavor shall
    > ever start with a digit.


    So why didn't you post that the first time? The simple solution now
    becomes

    my @parts = split /-/, $pkgname;
    my @flavours;
    unshift @flavours, pop @parts while @parts[-1] =~ /^\D/;
    my $version = pop @parts;
    my $name = join '-', @parts;

    Translating that into an evil regex I leave as an exercise. Probably you
    can just add a * in the right place in one of the ones you have already
    been given.

    Ben
     
    Ben Morrow, Mar 12, 2008
    #13
  14. donebrowsers

    Steve K. Guest

    Uri Guttman wrote:
    >>>>>> "d" == donebrowsers <> writes:

    >
    > d> While that works and I appreciate it, I was just using the []s as
    > a d> placeholder. I'm actually using PHP's <a
    > href="http://us3.php.net/ d>
    > manual/en/function.preg-match.php">preg_match()</a> function which
    > d> uses PERL style regular expressions. I submitted it to this group
    > d> because PERL programmers tend to be better with regular
    > expressions d> than anyone else.
    >
    > it is Perl, never PERL. preg is NOT perl, nor is it compatible with
    > perl. that is why we use Perl and not php. the answer you got was
    > valid perl and that will likely be all you will get here.


    1) What right do you have to speak for everyone? You should of said "and
    that will likely be all you will get from me" as there are plenty of
    people who actually offer help without the attitude people like your
    self feel they must attach. You may not like PHP, but it has a place,
    just as Perl does.

    2) Yes we all know it's Perl. The world will cease to rotate properly on
    it's axis and we will die well in advance of 2012 (right...) if someone
    says PERL... one could always leave a friendly little note about it if
    it really bothers you that much rather than the rude fucktardery your
    types like to share.
     
    Steve K., Mar 16, 2008
    #14
  15. donebrowsers

    Uri Guttman Guest

    >>>>> "SK" == Steve K <> writes:

    SK> Uri Guttman wrote:
    >>>>>>> "d" == donebrowsers <> writes:

    >>

    d> While that works and I appreciate it, I was just using the []s as
    >> a d> placeholder. I'm actually using PHP's <a
    >> href="http://us3.php.net/ d>
    >> manual/en/function.preg-match.php">preg_match()</a> function which

    d> uses PERL style regular expressions. I submitted it to this group
    d> because PERL programmers tend to be better with regular
    >> expressions d> than anyone else.
    >>
    >> it is Perl, never PERL. preg is NOT perl, nor is it compatible with
    >> perl. that is why we use Perl and not php. the answer you got was
    >> valid perl and that will likely be all you will get here.


    SK> 1) What right do you have to speak for everyone? You should of
    SK> said "and that will likely be all you will get from me" as there
    SK> are plenty of people who actually offer help without the attitude
    SK> people like your self feel they must attach. You may not like PHP,
    SK> but it has a place, just as Perl does.

    not in this group. that is the whole point. this group is about perl and
    not php. you can find php help over there --->

    SK> 2) Yes we all know it's Perl. The world will cease to rotate
    SK> properly on it's axis and we will die well in advance of 2012
    SK> (right...) if someone says PERL... one could always leave a
    SK> friendly little note about it if it really bothers you that much
    SK> rather than the rude fucktardery your types like to share.

    i would rather correct it when and how i please. you can uncorrect it as
    you wish.

    uri

    --
    Uri Guttman ------ -------- http://www.sysarch.com --
    ----- Perl Architecture, Development, Training, Support, Code Review ------
    ----------- Search or Offer Perl Jobs ----- http://jobs.perl.org ---------
    --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
     
    Uri Guttman, Mar 16, 2008
    #15
  16. Steve K. <> wrote:
    > Uri Guttman wrote:
    >>>>>>> "d" == donebrowsers <> writes:

    >>
    >> d> While that works and I appreciate it, I was just using the []s as
    >> a d> placeholder. I'm actually using PHP's <a
    >> href="http://us3.php.net/ d>
    >> manual/en/function.preg-match.php">preg_match()</a> function which
    >> d> uses PERL style regular expressions. I submitted it to this group
    >> d> because PERL programmers tend to be better with regular
    >> expressions d> than anyone else.
    >>
    >> it is Perl, never PERL. preg is NOT perl, nor is it compatible with
    >> perl. that is why we use Perl and not php. the answer you got was
    >> valid perl and that will likely be all you will get here.

    >
    > 1) What right do you have to speak for everyone? You should of said "and

    ^^^^^^^^^
    ^^^^^^^^^

    "Who would cross the Bridge of Death must answer me these questions three"


    --
    Tad McClellan
    email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
     
    Tad J McClellan, Mar 16, 2008
    #16
  17. Tad J McClellan wrote:
    > Steve K. <> wrote:
    >> Uri Guttman wrote:
    >>>>>>>> "d" == donebrowsers <> writes:
    >>> d> While that works and I appreciate it, I was just using the []s as
    >>> a d> placeholder. I'm actually using PHP's <a
    >>> href="http://us3.php.net/ d>
    >>> manual/en/function.preg-match.php">preg_match()</a> function which
    >>> d> uses PERL style regular expressions. I submitted it to this group
    >>> d> because PERL programmers tend to be better with regular
    >>> expressions d> than anyone else.
    >>>
    >>> it is Perl, never PERL. preg is NOT perl, nor is it compatible with
    >>> perl. that is why we use Perl and not php. the answer you got was
    >>> valid perl and that will likely be all you will get here.

    >> 1) What right do you have to speak for everyone? You should of said "and

    > ^^^^^^^^^
    > ^^^^^^^^^
    >
    > "Who would cross the Bridge of Death must answer me these questions three"


    Blue. No yel-- Auuuuuuuugh!


    John
    --
    Perl isn't a toolbox, but a small machine shop where you
    can special-order certain sorts of tools at low cost and
    in short order. -- Larry Wall
     
    John W. Krahn, Mar 16, 2008
    #17
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?SmViQnVzaGVsbA==?=

    Is ASP Validator Regex Engine Same As VS2003 Find Regex Engine?

    =?Utf-8?B?SmViQnVzaGVsbA==?=, Oct 22, 2005, in forum: ASP .Net
    Replies:
    2
    Views:
    727
    =?Utf-8?B?SmViQnVzaGVsbA==?=
    Oct 22, 2005
  2. Rick Venter

    perl regex to java regex

    Rick Venter, Oct 29, 2003, in forum: Java
    Replies:
    5
    Views:
    1,653
    Ant...
    Nov 6, 2003
  3. Replies:
    2
    Views:
    614
  4. Xah Lee
    Replies:
    1
    Views:
    956
    Ilias Lazaridis
    Sep 22, 2006
  5. Replies:
    3
    Views:
    798
    Reedick, Andrew
    Jul 1, 2008
Loading...

Share This Page