Need help to find byte offsets for regexps in a file

Discussion in 'Perl Misc' started by Robert Dodier, Jul 8, 2006.

  1. Hello,

    I am hoping to find byte offsets of regular expressions in a file.

    I'm working on the built-in doc system for Maxima, an open-
    source computer algebra system. The doc text is a Texinfo
    output file. I want to find the strings " -- Function: FOO (x, y, z)
    ...."
    and print their byte offsets, and the number of bytes from one such
    string to the end of the corresponding documentation item
    (which might be the next " -- Function: " item or a different regex).

    Here is some pseudocode to illustrate what I am attempting --

    let re1 = " --Function: <some name>"
    let re2 = FOO (not sure what to put here yet)
    slurp file into string S (this is OK, texinfo limits file to 300 k)
    byte_offset_1 = 0
    while seach for re1 beginning from byte_offset_1 succeeds
    extract <some name> from re1 match
    search for re2 beginnng from byte_offset_1
    let byte_offset_2 = byte offset of re2 match
    print <some name>, byte_offset_1, byte_offset_2
    let byte_offset_1 = byte_offset_2


    I'm planning to slurp the resulting output into another program
    that will then carry out matching on the list of <some name> strings
    and use file seek to grab the corresponding texts. That program
    will be written in another programming language so let's not worry
    about that now.

    If anyone has some advice about making a workable Perl
    program from this pseudocode, I'll be very grateful.
    Thanks in advance & all the best.

    Robert Dodier
     
    Robert Dodier, Jul 8, 2006
    #1
    1. Advertising

  2. Robert Dodier

    Xicheng Jia Guest

    Robert Dodier wrote:
    > Hello,
    >
    > I am hoping to find byte offsets of regular expressions in a file.
    >
    > I'm working on the built-in doc system for Maxima, an open-
    > source computer algebra system. The doc text is a Texinfo
    > output file. I want to find the strings " -- Function: FOO (x, y, z)
    > ..."
    > and print their byte offsets, and the number of bytes from one such
    > string to the end of the corresponding documentation item
    > (which might be the next " -- Function: " item or a different regex).
    >
    > Here is some pseudocode to illustrate what I am attempting --
    >
    > let re1 = " --Function: <some name>"
    > let re2 = FOO (not sure what to put here yet)
    > slurp file into string S (this is OK, texinfo limits file to 300 k)
    > byte_offset_1 = 0
    > while seach for re1 beginning from byte_offset_1 succeeds
    > extract <some name> from re1 match
    > search for re2 beginnng from byte_offset_1
    > let byte_offset_2 = byte offset of re2 match
    > print <some name>, byte_offset_1, byte_offset_2
    > let byte_offset_1 = byte_offset_2
    >
    >
    > I'm planning to slurp the resulting output into another program
    > that will then carry out matching on the list of <some name> strings
    > and use file seek to grab the corresponding texts. That program
    > will be written in another programming language so let's not worry
    > about that now.
    >
    > If anyone has some advice about making a workable Perl
    > program from this pseudocode, I'll be very grateful.
    > Thanks in advance & all the best.
    >
    > Robert Dodier


    you can use *closures* and a subroutine, check another similar problem
    in this group:

    http://groups.google.com/group/comp...1ff2f39de4d?q=&rnum=14&hl=en#2c0f61ff2f39de4d

    the detailed soluton should be different, but the way is quite
    similar..the thing you want to change, from my understanding, is to
    check the number of characters instead of number of newline before the
    function-definition point, so change from tr/\n// to tr///. Also change
    the $pattern and the s/// expression to suit your problem.

    you might also try 'c', 'g' modifiers of m// expression and the '\G'
    anchor. that might also be helpful.

    Good luck,
    Xicheng
     
    Xicheng Jia, Jul 8, 2006
    #2
    1. Advertising

  3. Robert Dodier <> wrote:

    > I am hoping to find byte offsets of regular expressions in a file.



    perldoc -f pos


    > Here is some pseudocode to illustrate what I am attempting --
    >
    > let re1 = " --Function: <some name>"



    Why _pseudo_ when making it Real Perl is so darn easy?


    my $re1 = " --Function: <some name>";


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, Jul 9, 2006
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Phillip Farber
    Replies:
    0
    Views:
    432
    Phillip Farber
    Aug 20, 2003
  2. gelonida
    Replies:
    1
    Views:
    789
    Gabriel Genellina
    May 6, 2010
  3. Muhammad Adeel
    Replies:
    2
    Views:
    332
    Muhammad Adeel
    Aug 6, 2010
  4. Ironhide

    checksum calculation for file offsets

    Ironhide, Apr 26, 2010, in forum: Perl Misc
    Replies:
    2
    Views:
    249
    Steve C
    Apr 27, 2010
  5. Ironhide
    Replies:
    5
    Views:
    155
    Xho Jingleheimerschmidt
    Mar 26, 2011
Loading...

Share This Page