How to substitute everything but something?

Discussion in 'Perl Misc' started by Eric.Medlin@gmail.com, Jul 19, 2006.

  1. Guest

    I have $rawData[$i] =~ s/>.*<//; That will replace everthing inside >
    < include > and < with nothing. But, I want to replace everthing but
    what is inside > and <. How can I negate what I have?
    , Jul 19, 2006
    #1
    1. Advertising

  2. Paul Lalli Guest

    wrote:
    > I have $rawData[$i] =~ s/>.*<//; That will replace everthing inside >
    > < include > and < with nothing. But, I want to replace everthing but
    > what is inside > and <. How can I negate what I have?


    TIMTOWTDI

    $rawData[$i] =~ s/.*?(>.*<).*/$1/;

    $rawData[$i] =~ /(>.*<)/ and $rawData[$i] = $1;

    .... and probably others.

    Paul Lalli
    Paul Lalli, Jul 19, 2006
    #2
    1. Advertising

  3. Ted Zlatanov Guest

    On 19 Jul 2006, wrote:

    > I have $rawData[$i] =~ s/>.*<//; That will replace everthing inside >
    > < include > and < with nothing. But, I want to replace everthing but
    > what is inside > and <. How can I negate what I have?


    If you are trying to extract text from SGML/HTML/XML/etc. there are
    easier ways. The way you are attempting will not work in many common
    cases. See 'perldoc -q html' to get started.

    In any case. It may help to think of the problem as "extraction" of
    what's between '>' and '<', rather than "elimination" of everything
    except what's between those two delimiters. I hope I understood your
    request correctly.

    You could do something like what's below. Again, consider using a
    parser specific to your data instead of grabbing text like this.

    Ted

    #!/usr/bin/perl

    use warnings;
    use strict;
    use Data::Dumper;

    my $text = join '', <DATA>;
    my @data = ($text =~ m/>(.*?)</g);
    print Dumper \@data;
    __DATA__
    <html><head></head><body>HTML text here</body></html>
    >just text here<

    plain text here
    <><><>text here<><
    Ted Zlatanov, Jul 19, 2006
    #3
  4. wrote:
    > I have $rawData[$i] =~ s/>.*<//; That will replace everthing inside >
    > < include > and < with nothing. But, I want to replace everthing but
    > what is inside > and <. How can I negate what I have?


    s/.*>//, s/<.*// for $rawData[ $i ];


    John
    --
    use Perl;
    program
    fulfillment
    John W. Krahn, Jul 19, 2006
    #4
  5. Ted Zlatanov Guest

    On 19 Jul 2006, wrote:

    wrote:
    >> I have $rawData[$i] =~ s/>.*<//; That will replace everthing inside >
    >> < include > and < with nothing. But, I want to replace everthing but
    >> what is inside > and <. How can I negate what I have?

    >
    > $rawData[$i] =~ /(>.*<)/ and $rawData[$i] = $1;


    He asked for what's inside > <, so the above should be

    $rawData[$i] =~ />(.*)</ and $rawData[$i] = $1;

    Also, while the OP didn't specifically say it, he probably wants the
    non-greedy match

    $rawData[$i] =~ />(.*?)</ and $rawData[$i] = $1;

    so the extracted data doesn't have < and > pairs inside it.

    Ted
    Ted Zlatanov, Jul 19, 2006
    #5
  6. Paul Lalli Guest

    Ted Zlatanov wrote:
    > On 19 Jul 2006, wrote:
    >
    > wrote:
    > >> I have $rawData[$i] =~ s/>.*<//; That will replace everthing inside >
    > >> < include > and < with nothing. But, I want to replace everthing but
    > >> what is inside > and <. How can I negate what I have?

    > >
    > > $rawData[$i] =~ /(>.*<)/ and $rawData[$i] = $1;

    >
    > He asked for what's inside > <, so the above should be
    >
    > $rawData[$i] =~ />(.*)</ and $rawData[$i] = $1;


    He also said he wants to "negate what I have". The two requirements
    are contradictory, as what he has *does* replace > and <, so the
    negation of that should *not* replace > and <.

    I chose to abide by his final requirement. You chose to abide by his
    first. Only the OP knows which one he meant.

    > Also, while the OP didn't specifically say it, he probably wants the
    > non-greedy match
    >
    > $rawData[$i] =~ />(.*?)</ and $rawData[$i] = $1;
    >
    > so the extracted data doesn't have < and > pairs inside it.


    Now you're just being a mind reader.

    Paul Lalli
    Paul Lalli, Jul 19, 2006
    #6
  7. Ted Zlatanov Guest

    On 19 Jul 2006, wrote:

    wrote:
    >> I have $rawData[$i] =~ s/>.*<//; That will replace everthing inside >
    >> < include > and < with nothing. But, I want to replace everthing but
    >> what is inside > and <. How can I negate what I have?

    >
    > s/.*>//, s/<.*// for $rawData[ $i ];


    I think the OP's code will match the biggest >xyz< pair, while your
    code will extract the last >xyz< pair. My followup will extract all
    the >xyz< data. I don't think the problem as specified can be solved
    exactly right, so maybe the OP should help us a little :)

    Ted
    Ted Zlatanov, Jul 19, 2006
    #7
  8. Ted Zlatanov Guest

    On 19 Jul 2006, wrote:

    Ted Zlatanov wrote: > On 19 Jul 2006, wrote: > > wrote:
    >>>> I have $rawData[$i] =~ s/>.*<//; That will replace everthing inside >
    >>>> < include > and < with nothing. But, I want to replace everthing but
    >>>> what is inside > and <. How can I negate what I have?
    >>>
    >>> $rawData[$i] =~ /(>.*<)/ and $rawData[$i] = $1;

    >>
    >> He asked for what's inside > <, so the above should be
    >>
    >> $rawData[$i] =~ />(.*)</ and $rawData[$i] = $1;

    >
    > He also said he wants to "negate what I have". The two requirements
    > are contradictory, as what he has *does* replace > and <, so the
    > negation of that should *not* replace > and <.
    >
    > I chose to abide by his final requirement. You chose to abide by his
    > first. Only the OP knows which one he meant.


    Yeah, see my followup to John Krahn, we don't really know what the
    requirements are. I didn't read the last requirement the way you did,
    obviously.

    >> Also, while the OP didn't specifically say it, he probably wants the
    >> non-greedy match
    >>
    >> $rawData[$i] =~ />(.*?)</ and $rawData[$i] = $1;
    >>
    >> so the extracted data doesn't have < and > pairs inside it.

    >
    > Now you're just being a mind reader.


    Er, you can certainly interpret it that way :) I read

    "everything but what is inside > and <"

    as "the first < should terminate 'what is inside'". Confusing
    requirements breed confusion, I guess. Sorry for that, as I
    perpetuated the confusion.

    Ted
    Ted Zlatanov, Jul 19, 2006
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Oblivion
    Replies:
    1
    Views:
    3,288
    bruce barker
    Feb 22, 2008
  2. Avi
    Replies:
    2
    Views:
    144
    avisemah
    May 6, 2004
  3. lolveley
    Replies:
    3
    Views:
    194
  4. A Browne

    Everything works but IE

    A Browne, Jul 26, 2006, in forum: Javascript
    Replies:
    6
    Views:
    95
    Dr John Stockton
    Jul 26, 2006
  5. Ray
    Replies:
    9
    Views:
    147
    pcx99
    Jan 21, 2007
Loading...

Share This Page