Please help me how is easiest way to extract text between some variable text

M

Mladen

Please help me how is easiest way to extract text between some variable text



Original text



<TH class=name width=100>New name</TH> need to
extract: New name

<TH class=name width=50>Test name </TH> need to
extract: Test name

<TH class=name width=65>Name 2</TH> need
to extract: Name 2



Thanks in advance
 
J

Jürgen Exner

Mladen said:
Please help me how is easiest way to extract text between some variable text

Original text
<TH class=name width=100>New name</TH> need to
extract: New name

<TH class=name width=50>Test name </TH> need to
extract: Test name

<TH class=name width=65>Name 2</TH> need
to extract: Name 2

You have a well-defined data structure. Treating it and analysing it as
if it were plain text would be foolish. Instead take advantage of the
existing structure and use a parser that can parse this data structure.

jue
 
S

sharma__r

Please help me how is easiest way to extract text between some variable text

Original text

<TH class=name width=100>New name</TH>                            need to
extract: New name

<TH class=name width=50>Test name </TH>                             need to
extract: Test name

<TH class=name width=65>Name 2</TH>                                    need
to extract: Name 2

Thanks in advance



#!/usr/local/bin/perl
use strict;
use warnings;
local $\ = qq{\n};
my $np;
$np =
qr{
[<]
(?:
(?> [^<>]+ )
|
(??{ $np })
)*
[>]
}xms
;

my $var ='
original text
<TH class=name width=100>New name</TH>
<TH class=name width=50>Test name </TH>
need to
<TH class=name width=65>Name 2</TH>
need
Thanks in advance
';
while ($var =~ m/ $np /xmsg) {
print $1 if $var =~ m/\G(.*?)<\/TH>/xmscg;
}
__END__
 
C

ccc31807

Please help me how is easiest way to extract text between some variable text
<TH class=name width=100>New name</TH>       need to extract: New name

A couple of weeks back, hymie! posted a thread enditled 'table -->
pre'. He wanted to extract the content of an HTML table to preformat
it. I posted the following script and output.

Perl gives you a number of ways to do what you want, many of them
simple minded and primitive, others pretty sophisticated. I generally
prefer the former, the more simple minded and primitive the better.
You probably should approach a problem like this in an incremental
fashion, by first matching the least possible amount of what you want,
and adding to it little by little until you get what you want. You
don't need to use a regular expression, index() and substr() will do
the same kind of thing.

Other technologies will do the same kind of thing. I routinely do this
in vi (vim), when I want to transfer some content from one function to
another function, for instance, converting a SQL query to a hash
declaration.

CC.

SCRIPT
#! perl
use strict;
use warnings;

my $content = '';
while (<DATA>)
{
next unless /\w/;
chomp;
if ($_ =~ m!<(\/?)table!)
{
$content .= "<$1pre>";
next;
}
elsif ($_ =~ m!<\/?tr!)
{
$content .= "
\n";
next;
}
elsif ($_ =~ m!<t[dh]>([^<]*)<\/t[dh]>!)
{
$content .= sprintf("%-20s", $1);
next;
}
else
{
warn "ERROR: $_\n";
}
}

print $content;

exit(0);

__DATA__
<table>
<tr>
<td>George</td>
<td>Washington</td>
<td>Virginia</td>
<td>1788</td>
</tr>
<tr>
<td>George</td>
<td>Washington</td>
<td>Virginia</td>
<td>1792</td>
</tr>
<tr>
<td>John</td>
<td>Adams</td>
<td>Massachesetts</td>
<td>1796</td>
</tr>
<tr>
<td>Thomas</td>
<td>Jefferson</td>
<td>Virginia</td>
<td>1800</td>
</tr>
<tr>
<td>Thomas</td>
<td>Jefferson</td>
<td>Virginia</td>
<td>1804</td>
</tr>
</table>

OUTPUT'
<pre>
George Washington Virginia 1788

George Washington Virginia 1792

John Adams Massachesetts 1796

Thomas Jefferson Virginia 1800

Thomas Jefferson Virginia 1804
</pre>
 
S

sln

Please help me how is easiest way to extract text between some variable text

Output:
'New name'
'Test name '
'Name 2'

If you wish to run the @content elements through a sub-container to extract
more, you must set up a sub that re-defines the 'Container Expression' regex
for each sub-container you need. There are variations on the theme of the
container expressions, but this superficiously get you started.

-sln

-------------
ie:
my ($open, $close, $rx);
my $comment = qr{ see below };
my $attrib = qr{ see below };
...
defineContainer ( '(?i:TH)' );
...
defineContainer ( '(?i:TR)' );
...
sub defineContainer {
my $tag = shift;
$open = qr{ see below <$tag ... }
$rx = qr( see below }
}
-------------------------

use strict;
use warnings;

# Primitive Definitions
#
my $comment = qr{(?xs)
<! (?:\[CDATA\[.*?\]\]|--.*?--|\[[A-Z][A-Z\ ]*\[.*?\]\]) >
};

my $attrib = qr{(?x)
(?:\s+ (?: [^>"'\/]* (?:"[^"]*"|'[^']*'|["']|(?:\/(?!>))?))++)
};

my $open = qr{(?x) <TH (?: \s*|$attrib ) > };
my $close = qr{(?x) </TH \s*> };

# Container Expression
#
my $rx = qr{(?xs)
$comment
| ( # Recursion group, the 'container'
$open
( # Container 'contents' to capture
(?:
$comment
| (?:(?!$open|$close|$comment).)++
| (?1)
)*
)
$close
)
};

# Parse Code
#
my $tog;
my $text = join '', <DATA>;

my @Contents = map { !($tog=!$tog) && defined() ? $_ : () } $text =~ /$rx/g;

for (@Contents) {
print "'$_'\n";
}


__DATA__
<TH class=name width=100>New name</TH> need to
extract: New name

<TH class=name width=50>Test name </TH> need to
extract: Test name

<TH class=name width=65>Name 2</TH> need
to extract: Name 2
 
P

Peter Scott

The "m" modifier affects only the "^" and "$" anchors. It is a no-op if
your pattern does not contain those anchors.

The "s" modifier affects only the "." metacharacter. It is a no-op if
your pattern does not contain that character.

You should not enable special treatment if you are not going to make use
of that special treatment.

The poster is following the principles in Damian Conway's "Perl Best
Practices," which state: "Use the /xms flags on every regular expression
you ever write [...] It takes about a week to accustom your fingers to
automatically typing /xms on every [regex]..."
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top