NNTP Subject Parsing

Discussion in 'Perl Misc' started by $_@_.%_, Feb 5, 2004.

  1. $_@_.%_

    $_@_.%_ Guest

    Does anyone know where i could find some information
    about parsing NNTP subject fields?

    Psuedo Code and/or RegExp advise would be ideal.

    Im looking to parse out multipart messages.
    ie: Test Subject (1/1) - file.bin [01/10]
    Another test.bin (1/2)

    Then store them untill all the parts have been gathered.

    Thanks any advice is appreciated.
    $_@_.%_, Feb 5, 2004
    #1
    1. Advertising

  2. In article <IBwUb.13166$>, <$_@_.%_> wrote:
    :Does anyone know where i could find some information
    :about parsing NNTP subject fields?

    :psuedo Code and/or RegExp advise would be ideal.

    :Im looking to parse out multipart messages.
    :ie: Test Subject (1/1) - file.bin [01/10]
    : Another test.bin (1/2)

    :Then store them untill all the parts have been gathered.

    There is no standard formatting for multipart messages.

    When I did this a couple of years ago, I had to just look to see what
    was coming down and tweak it from time to time. As I recall, there were
    some complications involving pasting the binaries back together again
    automatically, due to the different ways that posters had of storing
    the binaries. And there are complications around detecting duplicates
    because people tend to use similar subjects for different binaries.

    I probably still have the code around. I haven't looked at it in
    years. It's probably not my best code, but it worked.
    --
    Ceci, ce n'est pas une idée.
    Walter Roberson, Feb 5, 2004
    #2
    1. Advertising

  3. $_@_.%_ wrote:
    > Does anyone know where i could find some information
    > about parsing NNTP subject fields?


    How do you parse something that's freeform text?

    Chris Mattern
    Chris Mattern, Feb 5, 2004
    #3
  4. $_@_.%_

    $_@_.%_ Guest

    -cnrc.gc.ca (Walter Roberson) Wrote:
    > In article <IBwUb.13166$>, <$_@_.%_> wrote:
    > :Does anyone know where i could find some information
    > :about parsing NNTP subject fields?
    >
    > :psuedo Code and/or RegExp advise would be ideal.
    >
    > :Im looking to parse out multipart messages.
    > :ie: Test Subject (1/1) - file.bin [01/10]
    > : Another test.bin (1/2)
    >
    > :Then store them untill all the parts have been gathered.
    >
    > There is no standard formatting for multipart messages.


    Nod the standard gives alot of freedom to the poster.
    >
    > When I did this a couple of years ago, I had to just look to see what
    > was coming down and tweak it from time to time. As I recall, there were
    > some complications involving pasting the binaries back together again
    > automatically, due to the different ways that posters had of storing
    > the binaries. And there are complications around detecting duplicates
    > because people tend to use similar subjects for different binaries.
    >
    > I probably still have the code around. I haven't looked at it in
    > years. It's probably not my best code, but it worked.


    I am very happy to hear from someone who has experience with
    this sort of function, you help is really helpfull.. thank you.
    >


    Here is the regex im thinking about using:
    m/(.+)([(\[\{]+?\d+[/-]+?(\d+)[)\]\}]+?)/

    Dose this regex look ok?

    There are three memory groups
    1) the main subject text
    2) the proof that this is part of a multi-part message
    3) the number of parts for this message

    Im planning on creating a hash which has the message-ids for keys
    and an array ref as a value, the actual array may contain the total number
    of parts expected, and which part that this message id is.

    if this regex is ok, I will still need to find a way to know when all parts have
    been gathered, then pass the message id's in the correct order to the hash
    which populates the Tk::HList, which displays the messages.

    Then if the message is selected for download i will pass the message-ids to..
    Convert-BulkDecoder

    Im still trying to get my head around this.. more to follow (hopefully)

    Help would be greatly appreciated.
    Thanks in advance for any tips/suggestions/psudo code/regex advice.
    $_@_.%_, Feb 5, 2004
    #4
  5. $_@_.%_ writes:

    > Does anyone know where i could find some information
    > about parsing NNTP subject fields?
    >
    > Psuedo Code and/or RegExp advise would be ideal.
    >
    > Im looking to parse out multipart messages.
    > ie: Test Subject (1/1) - file.bin [01/10]
    > Another test.bin (1/2)
    >
    > Then store them untill all the parts have been gathered.
    >
    > Thanks any advice is appreciated.


    My program doesn't store all the parts, but it will assemble
    all the parts if they happen to all be present on the server.

    See http://ubh.sourceforge.net/

    Here is some code which shows how ubh does this.

    # untested code follows...

    my $subject = 'Test Subject (1/1) - file.bin [01/10]';

    # Does it look like it contains a filename with an extension?
    if ($subject =~ /\b(.+\.(\w+))\b/) {

    # Is it multipart? [x/y] or (x/y)
    # Requires at least 2 chars in extension, this avoids
    # problems with people posting with size like "10.4 Meg"
    # after the filename, and matching after the .4
    if ($subject =~ /^(.+\.(\w\w+))\b.*[\(\[](\d+)\/(\d+)[\)\]]/) {
    my ($subject_part, $part, $total) = ($1, $3, $4);

    # ... etc.
    }
    }


    -Gerard
    Gerard Lanois, Feb 6, 2004
    #5
  6. $_@_.%_

    Peter Scott Guest

    In article <IBwUb.13166$>,
    $_@_.%_ writes:
    >Does anyone know where i could find some information
    >about parsing NNTP subject fields?
    >
    >Psuedo Code and/or RegExp advise would be ideal.
    >
    >Im looking to parse out multipart messages.
    >ie: Test Subject (1/1) - file.bin [01/10]
    > Another test.bin (1/2)
    >
    >Then store them untill all the parts have been gathered.


    Are you trying to duplicate the functionality of this:

    http://linux.maruhn.com/sec/aub.html
    http://yukidoke.org/~mako/projects/aub/

    Written in Perl to boot.

    --
    Peter Scott
    http://www.perldebugged.com/
    *** NEW *** http//www.perlmedic.com/
    Peter Scott, Feb 6, 2004
    #6
  7. $_@_.%_

    $_@_.%_ Guest

    Well ive had a look at both of those pieces of code.
    And I must say that the programming is very very impressive indeed!
    I've learned quite a bit looking at the examples, I thank you all
    very much for the helpfull input.

    I've made some progress with this, but ive run into a tricky bit.
    What it is.. how do i print this HoHoA so that i can test the result?

    #ToDo...combine multi-part articles
    #$xover{$_}[0] #subject #$xover{$_}[4] #references
    #$xover{$_}[1] #from #$xover{$_}[5] #bytes
    #$xover{$_}[2] #date #$xover{$_}[6] #lines
    #$xover{$_}[3] #message-id #$xover{$_}[7] #xref:full
    #m/(.+)[(\[\{]+?(\d+)[\/\-]+?(\d+)[)\]\}]+?/
    #$1 is: subject, $2 is: part, $3 is: total parts
    # (HoHoA) subject->total parts->current part, msg id

    my %HoHoA;
    for my $k (sort keys %xover) {
    if ($xover{$k}[0] =~
    m/(.+)[(\[\{]+?(\d+)[\/\-]+?(\d+)[)\]\}]+?/) {
    push @{$HoHoA{$1}{$3}}, "$2";
    push @{$HoHoA{$1}{$3}}, "$xover{$k}[3]";
    }
    }
    $_@_.%_, Feb 7, 2004
    #7
  8. $_@_.%_

    $_@_.%_ Guest

    > Well ive had a look at both of those pieces of code.
    > And I must say that the programming is very very impressive indeed!
    > I've learned quite a bit looking at the examples, I thank you all
    > very much for the helpfull input.
    >
    > I've made some progress with this, but ive run into a tricky bit.
    > What it is.. how do i print this HoHoA so that i can test the result?
    >
    > #ToDo...combine multi-part articles
    > #$xover{$_}[0] #subject #$xover{$_}[4] #references
    > #$xover{$_}[1] #from #$xover{$_}[5] #bytes
    > #$xover{$_}[2] #date #$xover{$_}[6] #lines
    > #$xover{$_}[3] #message-id #$xover{$_}[7] #xref:full
    > #m/(.+)[(\[\{]+?(\d+)[\/\-]+?(\d+)[)\]\}]+?/
    > #$1 is: subject, $2 is: part, $3 is: total parts
    > # (HoHoA) subject->total parts->current part, msg id
    >
    > my %HoHoA;
    > for my $k (sort keys %xover) {
    > if ($xover{$k}[0] =~
    > m/(.+)[(\[\{]+?(\d+)[\/\-]+?(\d+)[)\]\}]+?/) {
    > push @{$HoHoA{$1}{$3}}, "$2";
    > push @{$HoHoA{$1}{$3}}, "$xover{$k}[3]";
    > }
    > }
    >

    n/m i got it :)

    open (FH, '> test');
    for my $k1 (keys %HoHoA) {
    for my $k2 (keys %{$HoHoA{$k1}}) {
    print FH "subject: $k1\n";
    print FH "has $k2 parts total\n";
    print FH "this is the information for this subject\n";
    foreach (@{$HoHoA{$k1}{$k2}}) {
    print FH "$_\n"
    }
    print FH "\n"
    }
    }
    close FH;


    subject: Att:CHARLI 320bps[04/14] - "The Smoky Mountain Players - Smoky Moumtain Old Time Favorites - 03 - The Great Speckled Bird.mp3" yEnc
    has 8 parts total
    this is the information for this subject
    1
    <nMBUb.182379$Rc4.1349880@attbi_s54>
    2
    <BMBUb.184720$sv6.955576@attbi_s52>
    3
    <OMBUb.182381$Rc4.1350709@attbi_s54>
    4
    <0NBUb.182384$Rc4.1350590@attbi_s54>
    5
    <eNBUb.184723$sv6.954877@attbi_s52>
    6
    <rNBUb.182385$Rc4.1350702@attbi_s54>
    7
    <ENBUb.182386$Rc4.1350712@attbi_s54>
    8
    <QNBUb.185030$5V2.895547@attbi_s53>
    $_@_.%_, Feb 7, 2004
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Chris Vendel
    Replies:
    5
    Views:
    445
    Jesse Hager
    Jun 4, 2004
  2. Parad0x86

    nntp subject line

    Parad0x86, Feb 9, 2010, in forum: Java
    Replies:
    0
    Views:
    318
    Parad0x86
    Feb 9, 2010
  3. Anton Bangratz
    Replies:
    0
    Views:
    151
    Anton Bangratz
    Jun 4, 2008
  4. Replies:
    7
    Views:
    369
    Dr.Ruud
    Aug 9, 2006
  5. sadie-no-reply

    Posting to nntp newsgroup with Perl (Net::NNTP)

    sadie-no-reply, Mar 4, 2007, in forum: Perl Misc
    Replies:
    3
    Views:
    275
    Jamie
    Mar 5, 2007
Loading...

Share This Page