NNTP Subject Parsing

$

$_

Does anyone know where i could find some information
about parsing NNTP subject fields?

Psuedo Code and/or RegExp advise would be ideal.

Im looking to parse out multipart messages.
ie: Test Subject (1/1) - file.bin [01/10]
Another test.bin (1/2)

Then store them untill all the parts have been gathered.

Thanks any advice is appreciated.
 
W

Walter Roberson

:Does anyone know where i could find some information
:about parsing NNTP subject fields?

:psuedo Code and/or RegExp advise would be ideal.

:Im looking to parse out multipart messages.
:ie: Test Subject (1/1) - file.bin [01/10]
: Another test.bin (1/2)

:Then store them untill all the parts have been gathered.

There is no standard formatting for multipart messages.

When I did this a couple of years ago, I had to just look to see what
was coming down and tweak it from time to time. As I recall, there were
some complications involving pasting the binaries back together again
automatically, due to the different ways that posters had of storing
the binaries. And there are complications around detecting duplicates
because people tend to use similar subjects for different binaries.

I probably still have the code around. I haven't looked at it in
years. It's probably not my best code, but it worked.
 
C

Chris Mattern

$_@_.%_ said:
Does anyone know where i could find some information
about parsing NNTP subject fields?

How do you parse something that's freeform text?

Chris Mattern
 
$

$_

:Does anyone know where i could find some information
:about parsing NNTP subject fields?

:psuedo Code and/or RegExp advise would be ideal.

:Im looking to parse out multipart messages.
:ie: Test Subject (1/1) - file.bin [01/10]
: Another test.bin (1/2)

:Then store them untill all the parts have been gathered.

There is no standard formatting for multipart messages.

Nod the standard gives alot of freedom to the poster.
When I did this a couple of years ago, I had to just look to see what
was coming down and tweak it from time to time. As I recall, there were
some complications involving pasting the binaries back together again
automatically, due to the different ways that posters had of storing
the binaries. And there are complications around detecting duplicates
because people tend to use similar subjects for different binaries.

I probably still have the code around. I haven't looked at it in
years. It's probably not my best code, but it worked.

I am very happy to hear from someone who has experience with
this sort of function, you help is really helpfull.. thank you.
Here is the regex im thinking about using:
m/(.+)([(\[\{]+?\d+[/-]+?(\d+)[)\]\}]+?)/

Dose this regex look ok?

There are three memory groups
1) the main subject text
2) the proof that this is part of a multi-part message
3) the number of parts for this message

Im planning on creating a hash which has the message-ids for keys
and an array ref as a value, the actual array may contain the total number
of parts expected, and which part that this message id is.

if this regex is ok, I will still need to find a way to know when all parts have
been gathered, then pass the message id's in the correct order to the hash
which populates the Tk::HList, which displays the messages.

Then if the message is selected for download i will pass the message-ids to..
Convert-BulkDecoder

Im still trying to get my head around this.. more to follow (hopefully)

Help would be greatly appreciated.
Thanks in advance for any tips/suggestions/psudo code/regex advice.
 
G

Gerard Lanois

$_@_.%_ said:
Does anyone know where i could find some information
about parsing NNTP subject fields?

Psuedo Code and/or RegExp advise would be ideal.

Im looking to parse out multipart messages.
ie: Test Subject (1/1) - file.bin [01/10]
Another test.bin (1/2)

Then store them untill all the parts have been gathered.

Thanks any advice is appreciated.

My program doesn't store all the parts, but it will assemble
all the parts if they happen to all be present on the server.

See http://ubh.sourceforge.net/

Here is some code which shows how ubh does this.

# untested code follows...

my $subject = 'Test Subject (1/1) - file.bin [01/10]';

# Does it look like it contains a filename with an extension?
if ($subject =~ /\b(.+\.(\w+))\b/) {

# Is it multipart? [x/y] or (x/y)
# Requires at least 2 chars in extension, this avoids
# problems with people posting with size like "10.4 Meg"
# after the filename, and matching after the .4
if ($subject =~ /^(.+\.(\w\w+))\b.*[\(\[](\d+)\/(\d+)[\)\]]/) {
my ($subject_part, $part, $total) = ($1, $3, $4);

# ... etc.
}
}


-Gerard
 
P

Peter Scott

Does anyone know where i could find some information
about parsing NNTP subject fields?

Psuedo Code and/or RegExp advise would be ideal.

Im looking to parse out multipart messages.
ie: Test Subject (1/1) - file.bin [01/10]
Another test.bin (1/2)

Then store them untill all the parts have been gathered.

Are you trying to duplicate the functionality of this:

http://linux.maruhn.com/sec/aub.html
http://yukidoke.org/~mako/projects/aub/

Written in Perl to boot.
 
$

$_

Well ive had a look at both of those pieces of code.
And I must say that the programming is very very impressive indeed!
I've learned quite a bit looking at the examples, I thank you all
very much for the helpfull input.

I've made some progress with this, but ive run into a tricky bit.
What it is.. how do i print this HoHoA so that i can test the result?

#ToDo...combine multi-part articles
#$xover{$_}[0] #subject #$xover{$_}[4] #references
#$xover{$_}[1] #from #$xover{$_}[5] #bytes
#$xover{$_}[2] #date #$xover{$_}[6] #lines
#$xover{$_}[3] #message-id #$xover{$_}[7] #xref:full
#m/(.+)[(\[\{]+?(\d+)[\/\-]+?(\d+)[)\]\}]+?/
#$1 is: subject, $2 is: part, $3 is: total parts
# (HoHoA) subject->total parts->current part, msg id

my %HoHoA;
for my $k (sort keys %xover) {
if ($xover{$k}[0] =~
m/(.+)[(\[\{]+?(\d+)[\/\-]+?(\d+)[)\]\}]+?/) {
push @{$HoHoA{$1}{$3}}, "$2";
push @{$HoHoA{$1}{$3}}, "$xover{$k}[3]";
}
}
 
$

$_

Well ive had a look at both of those pieces of code.
And I must say that the programming is very very impressive indeed!
I've learned quite a bit looking at the examples, I thank you all
very much for the helpfull input.

I've made some progress with this, but ive run into a tricky bit.
What it is.. how do i print this HoHoA so that i can test the result?

#ToDo...combine multi-part articles
#$xover{$_}[0] #subject #$xover{$_}[4] #references
#$xover{$_}[1] #from #$xover{$_}[5] #bytes
#$xover{$_}[2] #date #$xover{$_}[6] #lines
#$xover{$_}[3] #message-id #$xover{$_}[7] #xref:full
#m/(.+)[(\[\{]+?(\d+)[\/\-]+?(\d+)[)\]\}]+?/
#$1 is: subject, $2 is: part, $3 is: total parts
# (HoHoA) subject->total parts->current part, msg id

my %HoHoA;
for my $k (sort keys %xover) {
if ($xover{$k}[0] =~
m/(.+)[(\[\{]+?(\d+)[\/\-]+?(\d+)[)\]\}]+?/) {
push @{$HoHoA{$1}{$3}}, "$2";
push @{$HoHoA{$1}{$3}}, "$xover{$k}[3]";
}
}
n/m i got it :)

open (FH, '> test');
for my $k1 (keys %HoHoA) {
for my $k2 (keys %{$HoHoA{$k1}}) {
print FH "subject: $k1\n";
print FH "has $k2 parts total\n";
print FH "this is the information for this subject\n";
foreach (@{$HoHoA{$k1}{$k2}}) {
print FH "$_\n"
}
print FH "\n"
}
}
close FH;


subject: Att:CHARLI 320bps[04/14] - "The Smoky Mountain Players - Smoky Moumtain Old Time Favorites - 03 - The Great Speckled Bird.mp3" yEnc
has 8 parts total
this is the information for this subject
1
<nMBUb.182379$Rc4.1349880@attbi_s54>
2
<BMBUb.184720$sv6.955576@attbi_s52>
3
<OMBUb.182381$Rc4.1350709@attbi_s54>
4
<0NBUb.182384$Rc4.1350590@attbi_s54>
5
<eNBUb.184723$sv6.954877@attbi_s52>
6
<rNBUb.182385$Rc4.1350702@attbi_s54>
7
<ENBUb.182386$Rc4.1350712@attbi_s54>
8
<QNBUb.185030$5V2.895547@attbi_s53>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,022
Latest member
MaybelleMa

Latest Threads

Top