S
seminex
Hi all,
I've an HTML (~5Mo) page like that :
<HTML>
<BODY BGCOLOR=#FFFFFF LINK=000066 VLINK=000066 TOPMARGIN=0
LEFTMARGIN=0 MARGINWIDTH=0 MARGINHEIGHT=0><font size=12
color=#000000>Date debut: 07/07/2007<br>Date fin: 08/07/2007<br>Heure
debut : 01:00:00<br>Heure fin: 01:00:00<br>FTI : access<br><br><TABLE
BORDER=>
<tr bgcolor=#FFFFE8>
<th width=100 align=CENTER>login</th>
<th width=70 align=CENTER>ip</th>
<th width=230 align=CENTER>Num. appelant</th>
<th width=80 align=CENTER>J debut</th>
<th width=60 align=CENTER>H debut</th>
<th width=80 align=CENTER>J fin</th>
<th width=60 align=CENTER>H fin</th>
</tr>
<tr>
<td width=100 align=CENTER bgcolor=#FFFFE8>login/access</td>
<td width=70 align=LEFT bgcolor=#FFFFE8>192.168.30.26</td>
<td width=230 align=LEFT bgcolor=#FFFFE8>Supervision ACCESLIBRE</td>
<td width=80 align=LEFT bgcolor=#FFFFE8>2007-07-06</td>
<td width=60 align=LEFT bgcolor=#FFFFE8>23:59:50</td>
<td width=80 align=LEFT bgcolor=#FFFFE8>2007-07-07</td>
<td width=60 align=LEFT bgcolor=#FFFFE8>00:00:00</td>
</tr>
<tr>
<td width=100 align=CENTER bgcolor=#FFFFE8>login/access</td>
<td width=70 align=LEFT bgcolor=#FFFFE8>192.168.30.41</td>
<td width=230 align=LEFT bgcolor=#FFFFE8>Supervision ACCESLIBRE</td>
<td width=80 align=LEFT bgcolor=#FFFFE8>2007-07-07</td>
<td width=60 align=LEFT bgcolor=#FFFFE8>00:00:02</td>
<td width=80 align=LEFT bgcolor=#FFFFE8>2007-07-07</td>
<td width=60 align=LEFT bgcolor=#FFFFE8>00:00:12</td>
</tr>
</HTML>
I would extract only first hours, in this example, "23:59:50" and
"00:00:02".
I've tried more perl program, but I use regular expression ( /^(\d\d):
(\d\d)
\d\d)/) ) to extract my hours but often, they are nothing
(html error), and I've this :
[..]
1 <tr>
2 <td width=100 align=CENTER bgcolor=#FFFFE8>login/access</td>
3 <td width=70 align=LEFT bgcolor=#FFFFE8>192.168.30.41</td>
4 <td width=230 align=LEFT bgcolor=#FFFFE8>Supervision ACCESLIBRE</td>
5 <td width=80 align=LEFT bgcolor=#FFFFE8>2007-07-07</td>
6 <td width=60 align=LEFT bgcolor=#FFFFE8> </td>
7 <td width=80 align=LEFT bgcolor=#FFFFE8>2007-07-07</td>
8 <td width=60 align=LEFT bgcolor=#FFFFE8> </td>
9 </tr>
[..]
So, I ask you if anybody have some sample to extract _only_ line 6..
Because this :
sub tparse {
@input = @_;
chomp(@input);
if($input[0] =~ /^(\d\d)
\d\d)
\d\d)/){
push (@tableau, $input[0]);
}
}
my $p = HTML:
arser->new( api_version => 3,
text_h => [\&tparse, "dtext"]);
$p->parse_file(shift || die "Ne peut ouvrir le fichier ! ($!)\n") ||
die $!;
Extract line 6 and 8 but ONLY if I have hours like 00:00:01 but if I
have nothing, my script extract next and perturb the rest of the
script.
Thank for advance.
I've an HTML (~5Mo) page like that :
<HTML>
<BODY BGCOLOR=#FFFFFF LINK=000066 VLINK=000066 TOPMARGIN=0
LEFTMARGIN=0 MARGINWIDTH=0 MARGINHEIGHT=0><font size=12
color=#000000>Date debut: 07/07/2007<br>Date fin: 08/07/2007<br>Heure
debut : 01:00:00<br>Heure fin: 01:00:00<br>FTI : access<br><br><TABLE
BORDER=>
<tr bgcolor=#FFFFE8>
<th width=100 align=CENTER>login</th>
<th width=70 align=CENTER>ip</th>
<th width=230 align=CENTER>Num. appelant</th>
<th width=80 align=CENTER>J debut</th>
<th width=60 align=CENTER>H debut</th>
<th width=80 align=CENTER>J fin</th>
<th width=60 align=CENTER>H fin</th>
</tr>
<tr>
<td width=100 align=CENTER bgcolor=#FFFFE8>login/access</td>
<td width=70 align=LEFT bgcolor=#FFFFE8>192.168.30.26</td>
<td width=230 align=LEFT bgcolor=#FFFFE8>Supervision ACCESLIBRE</td>
<td width=80 align=LEFT bgcolor=#FFFFE8>2007-07-06</td>
<td width=60 align=LEFT bgcolor=#FFFFE8>23:59:50</td>
<td width=80 align=LEFT bgcolor=#FFFFE8>2007-07-07</td>
<td width=60 align=LEFT bgcolor=#FFFFE8>00:00:00</td>
</tr>
<tr>
<td width=100 align=CENTER bgcolor=#FFFFE8>login/access</td>
<td width=70 align=LEFT bgcolor=#FFFFE8>192.168.30.41</td>
<td width=230 align=LEFT bgcolor=#FFFFE8>Supervision ACCESLIBRE</td>
<td width=80 align=LEFT bgcolor=#FFFFE8>2007-07-07</td>
<td width=60 align=LEFT bgcolor=#FFFFE8>00:00:02</td>
<td width=80 align=LEFT bgcolor=#FFFFE8>2007-07-07</td>
<td width=60 align=LEFT bgcolor=#FFFFE8>00:00:12</td>
</tr>
</HTML>
I would extract only first hours, in this example, "23:59:50" and
"00:00:02".
I've tried more perl program, but I use regular expression ( /^(\d\d):
(\d\d)
(html error), and I've this :
[..]
1 <tr>
2 <td width=100 align=CENTER bgcolor=#FFFFE8>login/access</td>
3 <td width=70 align=LEFT bgcolor=#FFFFE8>192.168.30.41</td>
4 <td width=230 align=LEFT bgcolor=#FFFFE8>Supervision ACCESLIBRE</td>
5 <td width=80 align=LEFT bgcolor=#FFFFE8>2007-07-07</td>
6 <td width=60 align=LEFT bgcolor=#FFFFE8> </td>
7 <td width=80 align=LEFT bgcolor=#FFFFE8>2007-07-07</td>
8 <td width=60 align=LEFT bgcolor=#FFFFE8> </td>
9 </tr>
[..]
So, I ask you if anybody have some sample to extract _only_ line 6..
Because this :
sub tparse {
@input = @_;
chomp(@input);
if($input[0] =~ /^(\d\d)
push (@tableau, $input[0]);
}
}
my $p = HTML:
text_h => [\&tparse, "dtext"]);
$p->parse_file(shift || die "Ne peut ouvrir le fichier ! ($!)\n") ||
die $!;
Extract line 6 and 8 but ONLY if I have hours like 00:00:01 but if I
have nothing, my script extract next and perturb the rest of the
script.
Thank for advance.