NNTP Subject Parsing

$_ · Feb 5, 2004

Does anyone know where i could find some information
about parsing NNTP subject fields?

Psuedo Code and/or RegExp advise would be ideal.

Im looking to parse out multipart messages.
ie: Test Subject (1/1) - file.bin [01/10]
Another test.bin (1/2)

Then store them untill all the parts have been gathered.

Thanks any advice is appreciated.

Walter Roberson · Feb 5, 2004

oes anyone know where i could find some information
:about parsing NNTP subject fields?

suedo Code and/or RegExp advise would be ideal.

:Im looking to parse out multipart messages.
:ie: Test Subject (1/1) - file.bin [01/10]
: Another test.bin (1/2)

:Then store them untill all the parts have been gathered.

There is no standard formatting for multipart messages.

When I did this a couple of years ago, I had to just look to see what
was coming down and tweak it from time to time. As I recall, there were
some complications involving pasting the binaries back together again
automatically, due to the different ways that posters had of storing
the binaries. And there are complications around detecting duplicates
because people tend to use similar subjects for different binaries.

I probably still have the code around. I haven't looked at it in
years. It's probably not my best code, but it worked.

Chris Mattern · Feb 5, 2004

$_@_.%_ said:
Does anyone know where i could find some information
about parsing NNTP subject fields?

How do you parse something that's freeform text?

Chris Mattern

$_ · Feb 5, 2004

oes anyone know where i could find some information
:about parsing NNTP subject fields?

suedo Code and/or RegExp advise would be ideal.

:Im looking to parse out multipart messages.
:ie: Test Subject (1/1) - file.bin [01/10]
: Another test.bin (1/2)

:Then store them untill all the parts have been gathered.

There is no standard formatting for multipart messages.

Nod the standard gives alot of freedom to the poster.

When I did this a couple of years ago, I had to just look to see what
was coming down and tweak it from time to time. As I recall, there were
some complications involving pasting the binaries back together again
automatically, due to the different ways that posters had of storing
the binaries. And there are complications around detecting duplicates
because people tend to use similar subjects for different binaries.

I probably still have the code around. I haven't looked at it in
years. It's probably not my best code, but it worked.

I am very happy to hear from someone who has experience with
this sort of function, you help is really helpfull.. thank you.
Here is the regex im thinking about using:
m/(.+)([(\[\{]+?\d+[/-]+?(\d+)[)\]\}]+?)/

Dose this regex look ok?

There are three memory groups
1) the main subject text
2) the proof that this is part of a multi-part message
3) the number of parts for this message

Im planning on creating a hash which has the message-ids for keys
and an array ref as a value, the actual array may contain the total number
of parts expected, and which part that this message id is.

if this regex is ok, I will still need to find a way to know when all parts have
been gathered, then pass the message id's in the correct order to the hash
which populates the Tk::HList, which displays the messages.

Then if the message is selected for download i will pass the message-ids to..
Convert-BulkDecoder

Im still trying to get my head around this.. more to follow (hopefully)

Help would be greatly appreciated.
Thanks in advance for any tips/suggestions/psudo code/regex advice.

Gerard Lanois · Feb 6, 2004

$_@_.%_ said:
Does anyone know where i could find some information
about parsing NNTP subject fields?

Psuedo Code and/or RegExp advise would be ideal.

Im looking to parse out multipart messages.
ie: Test Subject (1/1) - file.bin [01/10]
Another test.bin (1/2)

Then store them untill all the parts have been gathered.

Thanks any advice is appreciated.

My program doesn't store all the parts, but it will assemble
all the parts if they happen to all be present on the server.

See http://ubh.sourceforge.net/

Here is some code which shows how ubh does this.

# untested code follows...

my $subject = 'Test Subject (1/1) - file.bin [01/10]';

# Does it look like it contains a filename with an extension?
if ($subject =~ /\b(.+\.(\w+))\b/) {

# Is it multipart? [x/y] or (x/y)
# Requires at least 2 chars in extension, this avoids
# problems with people posting with size like "10.4 Meg"
# after the filename, and matching after the .4
if ($subject =~ /^(.+\.(\w\w+))\b.*[$\[](\d+)\/(\d+)[$\]]/) {
my ($subject_part, $part, $total) = ($1, $3, $4);

# ... etc.
}
}

-Gerard

Peter Scott · Feb 6, 2004

Does anyone know where i could find some information
about parsing NNTP subject fields?

Psuedo Code and/or RegExp advise would be ideal.

Im looking to parse out multipart messages.
ie: Test Subject (1/1) - file.bin [01/10]
Another test.bin (1/2)

Then store them untill all the parts have been gathered.

Are you trying to duplicate the functionality of this:

http://linux.maruhn.com/sec/aub.html
http://yukidoke.org/~mako/projects/aub/

Written in Perl to boot.

$_ · Feb 7, 2004

Well ive had a look at both of those pieces of code.
And I must say that the programming is very very impressive indeed!
I've learned quite a bit looking at the examples, I thank you all
very much for the helpfull input.

I've made some progress with this, but ive run into a tricky bit.
What it is.. how do i print this HoHoA so that i can test the result?

#ToDo...combine multi-part articles
#$xover{$_}[0] #subject #$xover{$_}[4] #references
#$xover{$_}[1] #from #$xover{$_}[5] #bytes
#$xover{$_}[2] #date #$xover{$_}[6] #lines
#$xover{$_}[3] #message-id #$xover{$_}[7] #xref:full
#m/(.+)[(\[\{]+?(\d+)[\/\-]+?(\d+)[)\]\}]+?/
#$1 is: subject, $2 is: part, $3 is: total parts
# (HoHoA) subject->total parts->current part, msg id

my %HoHoA;
for my $k (sort keys %xover) {
if ($xover{$k}[0] =~
m/(.+)[(\[\{]+?(\d+)[\/\-]+?(\d+)[)\]\}]+?/) {
push @{$HoHoA{$1}{$3}}, "$2";
push @{$HoHoA{$1}{$3}}, "$xover{$k}[3]";
}
}

$_ · Feb 7, 2004

Well ive had a look at both of those pieces of code.

And I must say that the programming is very very impressive indeed!
I've learned quite a bit looking at the examples, I thank you all
very much for the helpfull input.

I've made some progress with this, but ive run into a tricky bit.
What it is.. how do i print this HoHoA so that i can test the result?

#ToDo...combine multi-part articles
#$xover{$_}[0] #subject #$xover{$_}[4] #references
#$xover{$_}[1] #from #$xover{$_}[5] #bytes
#$xover{$_}[2] #date #$xover{$_}[6] #lines
#$xover{$_}[3] #message-id #$xover{$_}[7] #xref:full
#m/(.+)[(\[\{]+?(\d+)[\/\-]+?(\d+)[)\]\}]+?/
#$1 is: subject, $2 is: part, $3 is: total parts
# (HoHoA) subject->total parts->current part, msg id

my %HoHoA;
for my $k (sort keys %xover) {
if ($xover{$k}[0] =~
m/(.+)[(\[\{]+?(\d+)[\/\-]+?(\d+)[)\]\}]+?/) {
push @{$HoHoA{$1}{$3}}, "$2";
push @{$HoHoA{$1}{$3}}, "$xover{$k}[3]";
}
}

n/m i got it

open (FH, '> test');
for my $k1 (keys %HoHoA) {
for my $k2 (keys %{$HoHoA{$k1}}) {
print FH "subject: $k1\n";
print FH "has $k2 parts total\n";
print FH "this is the information for this subject\n";
foreach (@{$HoHoA{$k1}{$k2}}) {
print FH "$_\n"
}
print FH "\n"
}
}
close FH;

subject: Att:CHARLI 320bps[04/14] - "The Smoky Mountain Players - Smoky Moumtain Old Time Favorites - 03 - The Great Speckled Bird.mp3" yEnc
has 8 parts total
this is the information for this subject
1
<nMBUb.182379$Rc4.1349880@attbi_s54>
2
<BMBUb.184720$sv6.955576@attbi_s52>
3
<OMBUb.182381$Rc4.1350709@attbi_s54>
4
<0NBUb.182384$Rc4.1350590@attbi_s54>
5
<eNBUb.184723$sv6.954877@attbi_s52>
6
<rNBUb.182385$Rc4.1350702@attbi_s54>
7
<ENBUb.182386$Rc4.1350712@attbi_s54>
8
<QNBUb.185030$5V2.895547@attbi_s53>

a simple control in an nntp client	31	Dec 4, 2008
HOWTO: Parsing email using Python part2	1	Jul 15, 2011
HOWTO: Parsing email using Python part1	2	Jul 3, 2011
Parsing Email Headers	6	Mar 11, 2010
Recursively Parsing through multipart messages use Mail::Box::Manager;	1	Dec 21, 2005
A Unique XML Parsing Problem	5	Oct 24, 2010
Parsing XML with ElementTree (unicode problem?)	13	Jul 23, 2007
Trouble with parsing text file and grabbing values needed	8	Jul 21, 2006

NNTP Subject Parsing

$_

Walter Roberson

Chris Mattern

$_

Gerard Lanois

Peter Scott

$_

$_

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads