How to substitute everything but something?

E

Eric.Medlin

I have $rawData[$i] =~ s/>.*<//; That will replace everthing inside >
< include > and < with nothing. But, I want to replace everthing but
what is inside > and <. How can I negate what I have?
 
P

Paul Lalli

I have $rawData[$i] =~ s/>.*<//; That will replace everthing inside >
< include > and < with nothing. But, I want to replace everthing but
what is inside > and <. How can I negate what I have?

TIMTOWTDI

$rawData[$i] =~ s/.*?(>.*<).*/$1/;

$rawData[$i] =~ /(>.*<)/ and $rawData[$i] = $1;

.... and probably others.

Paul Lalli
 
T

Ted Zlatanov

I have $rawData[$i] =~ s/>.*<//; That will replace everthing inside >
< include > and < with nothing. But, I want to replace everthing but
what is inside > and <. How can I negate what I have?

If you are trying to extract text from SGML/HTML/XML/etc. there are
easier ways. The way you are attempting will not work in many common
cases. See 'perldoc -q html' to get started.

In any case. It may help to think of the problem as "extraction" of
what's between '>' and '<', rather than "elimination" of everything
except what's between those two delimiters. I hope I understood your
request correctly.

You could do something like what's below. Again, consider using a
parser specific to your data instead of grabbing text like this.

Ted

#!/usr/bin/perl

use warnings;
use strict;
use Data::Dumper;

my $text = join '', <DATA>;
my @data = ($text =~ m/>(.*?)</g);
print Dumper \@data;
__DATA__
just text here<
plain text here
<><><>text here<><
 
J

John W. Krahn

I have $rawData[$i] =~ s/>.*<//; That will replace everthing inside >
< include > and < with nothing. But, I want to replace everthing but
what is inside > and <. How can I negate what I have?

s/.*>//, s/<.*// for $rawData[ $i ];


John
 
T

Ted Zlatanov

I have $rawData[$i] =~ s/>.*<//; That will replace everthing inside >
< include > and < with nothing. But, I want to replace everthing but
what is inside > and <. How can I negate what I have?

$rawData[$i] =~ /(>.*<)/ and $rawData[$i] = $1;

He asked for what's inside > <, so the above should be

$rawData[$i] =~ />(.*)</ and $rawData[$i] = $1;

Also, while the OP didn't specifically say it, he probably wants the
non-greedy match

$rawData[$i] =~ />(.*?)</ and $rawData[$i] = $1;

so the extracted data doesn't have < and > pairs inside it.

Ted
 
P

Paul Lalli

Ted said:
I have $rawData[$i] =~ s/>.*<//; That will replace everthing inside >
< include > and < with nothing. But, I want to replace everthing but
what is inside > and <. How can I negate what I have?

$rawData[$i] =~ /(>.*<)/ and $rawData[$i] = $1;

He asked for what's inside > <, so the above should be

$rawData[$i] =~ />(.*)</ and $rawData[$i] = $1;

He also said he wants to "negate what I have". The two requirements
are contradictory, as what he has *does* replace > and <, so the
negation of that should *not* replace > and <.

I chose to abide by his final requirement. You chose to abide by his
first. Only the OP knows which one he meant.
Also, while the OP didn't specifically say it, he probably wants the
non-greedy match

$rawData[$i] =~ />(.*?)</ and $rawData[$i] = $1;

so the extracted data doesn't have < and > pairs inside it.

Now you're just being a mind reader.

Paul Lalli
 
T

Ted Zlatanov

On 19 Jul 2006, (e-mail address removed) wrote:

I have $rawData[$i] =~ s/>.*<//; That will replace everthing inside >
< include > and < with nothing. But, I want to replace everthing but
what is inside > and <. How can I negate what I have?

s/.*>//, s/<.*// for $rawData[ $i ];

I think the OP's code will match the biggest >xyz< pair, while your
code will extract the last >xyz< pair. My followup will extract all
the >xyz< data. I don't think the problem as specified can be solved
exactly right, so maybe the OP should help us a little :)

Ted
 
T

Ted Zlatanov

I have $rawData[$i] =~ s/>.*<//; That will replace everthing inside >
< include > and < with nothing. But, I want to replace everthing but
what is inside > and <. How can I negate what I have?

$rawData[$i] =~ /(>.*<)/ and $rawData[$i] = $1;

He asked for what's inside > <, so the above should be

$rawData[$i] =~ />(.*)</ and $rawData[$i] = $1;

He also said he wants to "negate what I have". The two requirements
are contradictory, as what he has *does* replace > and <, so the
negation of that should *not* replace > and <.

I chose to abide by his final requirement. You chose to abide by his
first. Only the OP knows which one he meant.

Yeah, see my followup to John Krahn, we don't really know what the
requirements are. I didn't read the last requirement the way you did,
obviously.
Also, while the OP didn't specifically say it, he probably wants the
non-greedy match

$rawData[$i] =~ />(.*?)</ and $rawData[$i] = $1;

so the extracted data doesn't have < and > pairs inside it.

Now you're just being a mind reader.

Er, you can certainly interpret it that way :) I read

"everything but what is inside > and <"

as "the first < should terminate 'what is inside'". Confusing
requirements breed confusion, I guess. Sorry for that, as I
perpetuated the confusion.

Ted
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top