Regexp, Strings and spaces

Discussion in 'Perl Misc' started by Florent Carli, Jun 23, 2004.

  1. Hello experts,

    I'm looking for a regexp to get the information from smtg like this :

    field1="value with or without spaces" field2=valuewithoutspaces

    My only concern is that I don't want to match the quotes caracters.
    For now I came up with :
    my (@res) = $line =~ m/=(?:((?<=["])[^"]+(?=["])|(?<!["])\S+(?!["])))/g
    But the lookbehinds do not work ...

    Any way to do this without using lookbehinds ?

    Thanks!
     
    Florent Carli, Jun 23, 2004
    #1
    1. Advertising

  2. Florent Carli

    Anno Siegel Guest

    Florent Carli <> wrote in comp.lang.perl.misc:
    > Hello experts,
    >
    > I'm looking for a regexp to get the information from smtg like this :
    >
    > field1="value with or without spaces" field2=valuewithoutspaces
    >
    > My only concern is that I don't want to match the quotes caracters.
    > For now I came up with :
    > my (@res) = $line =~ m/=(?:((?<=["])[^"]+(?=["])|(?<!["])\S+(?!["])))/g
    > But the lookbehinds do not work ...
    >
    > Any way to do this without using lookbehinds ?


    Sure: /"?([^"]*)/

    Take a look at one or another of the csv modules too.

    Anno
     
    Anno Siegel, Jun 23, 2004
    #2
    1. Advertising

  3. Florent Carli

    J. Romano Guest

    (Florent Carli) wrote in message news:<>...
    > I'm looking for a regexp to get the information from smtg like this :
    >
    > field1="value with or without spaces" field2=valuewithoutspaces
    >
    > My only concern is that I don't want to match the quotes caracters.
    > For now I came up with :
    > my (@res) = $line =~ m/=(?:((?<=["])[^"]+(?=["])|(?<!["])\S+(?!["])))/g
    > But the lookbehinds do not work ...


    This is easier done without lookbehinds:

    $line =
    'field1="value with or without spaces" field2=valuewithoutspaces'

    while ( $line =~ m/="([^"]*)"|=(\w*)/g )
    {
    push @res, $1 if defined $1;
    push @res, $2 if defined $2;
    }

    Essentially, the above lines of code loop through every instance of
    either
    ="some text"
    or
    =some_text
    The first instance has a pattern match of m/="[^"]*"/ and the second
    instance has a pattern match of m/=(\w*)/ . Therefore, I put them
    together (by joining them with the "|" symbol and put capturing
    parentheses around the text I'm intersted in) with the regular
    expression m/="([^"]*)"|=(\w*)/g .

    The "/g" is used to loop through every match, populating either $1
    or $2 every time through the loop. Inside the loop, I push either $1
    or $2 into the @res array, depending on which one is defined (that is,
    which one happened to match).

    I hope this helps.

    -- Jean-Luc
     
    J. Romano, Jun 23, 2004
    #3
  4. >
    > Sure: /"?([^"]*)/
    >

    This does not work since 'field=hello field2="world"' would get you
    'hello field2=' into $1.
     
    Florent Carli, Jun 24, 2004
    #4
  5. > $line =
    > 'field1="value with or without spaces" field2=valuewithoutspaces'
    >
    > while ( $line =~ m/="([^"]*)"|=(\w*)/g )
    > {
    > push @res, $1 if defined $1;
    > push @res, $2 if defined $2;
    > }
    >


    I think my specifications were bad.
    The "line" can be as long as it wants with so many fields.
    It can be field1="test" field2=test2 field3="test 3"
    field4="testagain"
    and the next line could be
    field1="test 4" field2="test 5" field3=test_6 field4="test n°7"

    What I need was to get value of field2 for any type of field2 I can
    get : "value with space", "valuewithoutspace", valuewithoutspace, or
    even empty or "".
    Any all cases, the value alone (without quotes) must go into $1 and $1
    only.

    For now, the only regexp able to do this I have found is :
    field2=["]?((?<=["])[^"]*(?=["])|(?<!["])\S*(?!["]))
    But like I said, the software I use to parse is using a version of
    perl that does not support lookbehinds ...

    I'm trying to do basically the same thing windows does when you type :
    copy "my file.doc" "d:\my documents"
    or
    copy myfile.doc d:\

    But only with one regexp (and no second pass in perl to remove the
    quotes for instance ;) )
    any idea ?
     
    Florent Carli, Jun 24, 2004
    #5
  6. Florent Carli

    Anno Siegel Guest

    Florent Carli <> wrote in comp.lang.perl.misc:
    > >
    > > Sure: /"?([^"]*)/
    > >

    > This does not work since 'field=hello field2="world"' would get you
    > 'hello field2=' into $1.


    I didn't read your original specification that way.

    The best solution is probably a module (Text::Balanced, or one of
    the CSV modules). For background information, see the FAQ:

    How can I split a [character] delimited string except when inside [character]

    Anno
     
    Anno Siegel, Jun 24, 2004
    #6
  7. (Florent Carli) wrote in
    news::

    > For now, the only regexp able to do this I have found is :
    > field2=["]?((?<=["])[^"]*(?=["])|(?<!["])\S*(?!["]))
    > But like I said, the software I use to parse is using a version of
    > perl that does not support lookbehinds ...
    >
    > I'm trying to do basically the same thing windows does when you type :
    > copy "my file.doc" "d:\my documents"
    > or
    > copy myfile.doc d:\
    >
    > But only with one regexp (and no second pass in perl to remove the
    > quotes for instance ;) )
    > any idea ?


    Is this just out of curiosity?

    If there is some other purpose to this, take a look at Text::Balanced.
    The few times I needed this type of functionality, that module worked
    very well for me.

    --
    A. Sinan Unur
    (reverse each component for email address)
     
    A. Sinan Unur, Jun 24, 2004
    #7
  8. The problem is that I have to enter a regex into a config file of a
    software which does not understand lookbehinds (probably a old version
    of perl, since I get a "bad pattern <?...").
    Anyway, I'm not using perl directly for this, I have to find a regex
    to do that, without lookbehinds, that's it.
    That's why I can not code a second pass to remove quotes after a
    /field2=("[^"]*"|\S*)/ for instance, or something that would give me
    the one backreference I need after a /field2=(?:"([^"]*)"|(\S*))/.
    I can't use a perl module either, of course.
    If fact, I cannot code at all, the only thing I can control is 1
    regexp.

    Thanks!

    > Is this just out of curiosity?
    >
    > If there is some other purpose to this, take a look at Text::Balanced.
    > The few times I needed this type of functionality, that module worked
    > very well for me.
     
    Florent Carli, Jun 25, 2004
    #8
  9. Florent Carli

    J. Romano Guest

    (Florent Carli) wrote in message news:<>...
    >
    > I think my specifications were bad.


    I'm sorry, but did you even try out my code? It does exactly what
    you want. I even tested it.

    > The "line" can be as long as it wants with so many fields.
    > It can be field1="test" field2=test2 field3="test 3"
    > field4="testagain"
    > and the next line could be
    > field1="test 4" field2="test 5" field3=test_6 field4="test n°7"


    It does exactly that. I even created a short script for you to run
    to show you that it works. Here, try this:

    #!/usr/bin/perl -w
    use strict;
    my @res; # results will be stored here
    # Process the input lines (from the DATA section):
    while (<DATA>)
    {
    while ( m/="([^"]*)"|=(\w*)/g )
    {
    push @res, $1 if defined $1;
    push @res, $2 if defined $2;
    }
    }
    # Print out the @res array to show the results:
    foreach (my $i = 0; $i < @res; $i++)
    {
    print "\$res[$i] = \"$res[$i]\"\n";
    }
    __DATA__
    # These are sample input lines:
    field1="test" field2=test2 field3="test 3" field4="testagain"
    field1="test 4" field2="test 5" field3=test_6 field4="test n°7"
    field1=""
    __END__


    > What I need was to get value of field2 for any type of field2 I can
    > get : "value with space", "valuewithoutspace", valuewithoutspace, or
    > even empty or "".
    > Any all cases, the value alone (without quotes) must go into $1 and $1
    > only.


    No, I think you are mistaken. The value alone (without quotes)
    must go into the @res array, and not necessarily into $1. The match
    will either temporarily be in $1 or $2, but regardless of which it
    goes into, it WILL be placed into the @res array, which is what you
    want.

    > For now, the only regexp able to do this I have found is :
    > field2=["]?((?<=["])[^"]*(?=["])|(?<!["])\S*(?!["]))
    > But like I said, the software I use to parse is using a version of
    > perl that does not support lookbehinds ...


    Don't use look-behinds. They are not needed for your task. And
    please test the code I gave you before saying that it doesn't do what
    you want.

    -- Jean-Luc
     
    J. Romano, Jun 25, 2004
    #9
  10. Florent Carli

    J. Romano Guest

    (Florent Carli) wrote in message news:<>...
    > The problem is that I have to enter a regex into a config file of a
    > software which does not understand lookbehinds (probably a old version
    > of perl, since I get a "bad pattern <?...").


    Oh, so that's why you had all those restrictions. Without the
    knowledge of your restrictions, we couldn't really give you a complete
    answer.

    > Anyway, I'm not using perl directly for this, I have to find a regex
    > to do that, without lookbehinds, that's it.


    Are you sure you are using Perl for this? I've done similar things
    myself (that is, putting a regular expression in a config file), but I
    don't think it was Perl that was evaluating them. It could be that
    Perl has nothing to do with this.

    > That's why I can not code a second pass to remove quotes after a
    > /field2=("[^"]*"|\S*)/ for instance, or something that would give me
    > the one backreference I need after a /field2=(?:"([^"]*)"|(\S*))/.
    > I can't use a perl module either, of course.
    > If fact, I cannot code at all, the only thing I can control is 1
    > regexp.


    The main problem is that you are searching for different patterns,
    depending on what your delimeter is. If you have 'value="some text"',
    then you will be looking for the next '"' character to signal the end
    of your pattern. But if you have 'value=some_text', then you will be
    looking for whitespace to signal the end of your pattern. This flow
    of logic (if-then-else) is something that regular expressions alone
    weren't made to handle.

    I don't think your problem has a working solution because regular
    expressions lack the ability to carry out the above logic. So let me
    propose two work-arounds:

    1. You could modify the program that reads the config files to handle
    the logic you need.

    or

    2. You can write a simple Perl script to convert your config file so
    that all the fields have quotes around the values (whether they need
    them or not). In other words, your script would change all instances
    of:

    field1=some_text

    to:

    field1="some_text"

    Then you could just set your regular expression to be:

    m/field[0-9]+="([^"]*)"/

    and then all your fields would be extracted. Problem solved.

    Of course, I would imagine that the second work-around will be much
    easier for you to implement, unless there is some other restriction
    that you haven't shared with us.

    Hopefully you'll find a solution that works for you.

    -- Jean-Luc
     
    J. Romano, Jun 25, 2004
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ben

    Strings, Strings and Damned Strings

    Ben, Jun 22, 2006, in forum: C Programming
    Replies:
    14
    Views:
    787
    Malcolm
    Jun 24, 2006
  2. John B. Matthews
    Replies:
    4
    Views:
    681
    John B. Matthews
    Sep 12, 2008
  3. Roedy Green
    Replies:
    3
    Views:
    641
  4. johkar
    Replies:
    2
    Views:
    2,960
    Mayeul
    Dec 10, 2009
  5. Joao Silva
    Replies:
    16
    Views:
    379
    7stud --
    Aug 21, 2009
Loading...

Share This Page