Please help me how is easiest way to extract text between some variable text

Mladen · Feb 20, 2011

Please help me how is easiest way to extract text between some variable text

Original text

<TH class=name width=100>New name</TH> need to
extract: New name

<TH class=name width=50>Test name </TH> need to
extract: Test name

<TH class=name width=65>Name 2</TH> need
to extract: Name 2

Thanks in advance

Jürgen Exner · Feb 21, 2011

Mladen said:
Please help me how is easiest way to extract text between some variable text

Original text
<TH class=name width=100>New name</TH> need to
extract: New name

<TH class=name width=50>Test name </TH> need to
extract: Test name

<TH class=name width=65>Name 2</TH> need
to extract: Name 2

You have a well-defined data structure. Treating it and analysing it as
if it were plain text would be foolish. Instead take advantage of the
existing structure and use a parser that can parse this data structure.

jue

sharma__r · Feb 21, 2011

Please help me how is easiest way to extract text between some variable text

Original text

<TH class=name width=100>New name</TH> need to
extract: New name

<TH class=name width=50>Test name </TH> need to
extract: Test name

<TH class=name width=65>Name 2</TH> need
to extract: Name 2

Thanks in advance

#!/usr/local/bin/perl
use strict;
use warnings;
local $\ = qq{\n};
my $np;
$np =
qr{
[<]
(?:
(?> [^<>]+ )
|
(??{ $np })
)*
[>]
}xms
;

my $var ='
original text
<TH class=name width=100>New name</TH>
<TH class=name width=50>Test name </TH>
need to
<TH class=name width=65>Name 2</TH>
need
Thanks in advance
';
while ($var =~ m/ $np /xmsg) {
print $1 if $var =~ m/\G(.*?)<\/TH>/xmscg;
}
__END__

ccc31807 · Feb 21, 2011

Please help me how is easiest way to extract text between some variable text
<TH class=name width=100>New name</TH> need to extract: New name

A couple of weeks back, hymie! posted a thread enditled 'table -->
pre'. He wanted to extract the content of an HTML table to preformat
it. I posted the following script and output.

Perl gives you a number of ways to do what you want, many of them
simple minded and primitive, others pretty sophisticated. I generally
prefer the former, the more simple minded and primitive the better.
You probably should approach a problem like this in an incremental
fashion, by first matching the least possible amount of what you want,
and adding to it little by little until you get what you want. You
don't need to use a regular expression, index() and substr() will do
the same kind of thing.

Other technologies will do the same kind of thing. I routinely do this
in vi (vim), when I want to transfer some content from one function to
another function, for instance, converting a SQL query to a hash
declaration.

CC.

SCRIPT
#! perl
use strict;
use warnings;

my $content = '';
while (<DATA>)
{
next unless /\w/;
chomp;
if ($_ =~ m!<(\/?)table!)
{
$content .= "<$1pre>";
next;
}
elsif ($_ =~ m!<\/?tr!)
{
$content .= "
\n";
next;
}
elsif ($_ =~ m!<t[dh]>([^<]*)<\/t[dh]>!)
{
$content .= sprintf("%-20s", $1);
next;
}
else
{
warn "ERROR: $_\n";
}
}

print $content;

exit(0);

__DATA__
<table>
<tr>
<td>George</td>
<td>Washington</td>
<td>Virginia</td>
<td>1788</td>
</tr>
<tr>
<td>George</td>
<td>Washington</td>
<td>Virginia</td>
<td>1792</td>
</tr>
<tr>
<td>John</td>
<td>Adams</td>
<td>Massachesetts</td>
<td>1796</td>
</tr>
<tr>
<td>Thomas</td>
<td>Jefferson</td>
<td>Virginia</td>
<td>1800</td>
</tr>
<tr>
<td>Thomas</td>
<td>Jefferson</td>
<td>Virginia</td>
<td>1804</td>
</tr>
</table>

OUTPUT'
<pre>
George Washington Virginia 1788

George Washington Virginia 1792

John Adams Massachesetts 1796

Thomas Jefferson Virginia 1800

Thomas Jefferson Virginia 1804
</pre>

sln · Feb 21, 2011

Please help me how is easiest way to extract text between some variable text

Output:
'New name'
'Test name '
'Name 2'

If you wish to run the @content elements through a sub-container to extract
more, you must set up a sub that re-defines the 'Container Expression' regex
for each sub-container you need. There are variations on the theme of the
container expressions, but this superficiously get you started.

-sln

-------------
ie:
my ($open, $close, $rx);
my $comment = qr{ see below };
my $attrib = qr{ see below };
...
defineContainer ( '(?i:TH)' );
...
defineContainer ( '(?i:TR)' );
...
sub defineContainer {
my $tag = shift;
$open = qr{ see below <$tag ... }
$rx = qr( see below }
}
-------------------------

use strict;
use warnings;

# Primitive Definitions
#
my $comment = qr{(?xs)
<! (?:\[CDATA\[.*?\]\]|--.*?--|\[[A-Z][A-Z\ ]*\[.*?\]\]) >
};

my $attrib = qr{(?x)
(?:\s+ (?: [^>"'\/]* (?:"[^"]*"|'[^']*'|["']|(?:\/(?!>))?))++)
};

my $open = qr{(?x) <TH (?: \s*|$attrib ) > };
my $close = qr{(?x) </TH \s*> };

# Container Expression
#
my $rx = qr{(?xs)
$comment
| ( # Recursion group, the 'container'
$open
( # Container 'contents' to capture
(?:
$comment
| (?

?!$open|$close|$comment).)++
| (?1)
)*
)
$close
)
};

# Parse Code
#
my $tog;
my $text = join '', <DATA>;

my @Contents = map { !($tog=!$tog) && defined() ? $_ : () } $text =~ /$rx/g;

for (@Contents) {
print "'$_'\n";
}

__DATA__
<TH class=name width=100>New name</TH> need to
extract: New name

<TH class=name width=50>Test name </TH> need to
extract: Test name

<TH class=name width=65>Name 2</TH> need
to extract: Name 2

Peter Scott · Feb 22, 2011

The "m" modifier affects only the "^" and "$" anchors. It is a no-op if
your pattern does not contain those anchors.

The "s" modifier affects only the "." metacharacter. It is a no-op if
your pattern does not contain that character.

You should not enable special treatment if you are not going to make use
of that special treatment.

The poster is following the principles in Damian Conway's "Perl Best
Practices," which state: "Use the /xms flags on every regular expression
you ever write [...] It takes about a week to accustom your fingers to
automatically typing /xms on every [regex]..."

I dont get this. Please help me!!	2	Jan 24, 2023
Help please	8	Jul 7, 2023
Can anyone please help? HTML - two tables applying different styles	4	Dec 1, 2020
How to create a JSON array with values from DOM(HTML TABLE) when I click a button using JQuery/Javascript?	0	May 1, 2023
How can I calculate the last payment of the year to be the sum of all previous payments for that year and subtracting it from Research Costs value?	7	Aug 22, 2023
Angularjs newbie - second JSON datasource does not display	0	May 18, 2022
How to create a JSON array with values from DOM(HTML TABLE) when I click a button using JQuery/Javascript?	0	May 1, 2023
How to add dropdown selected data to table using jquery	2	Jul 2, 2022

Please help me how is easiest way to extract text between some variable text

Mladen

Jürgen Exner

sharma__r

ccc31807

sln

Peter Scott

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads