How can I add tokens at arbitrary positions on a line in a file?

Discussion in 'Perl Misc' started by John Howard, Aug 13, 2005.

  1. John Howard

    John Howard Guest

    I need to edit a file and add tokens at aribrary positions on a line or
    lines.

    I tried doing this with sed but its clumsy (sed can do it for fixed
    locations but not arbitrary).

    For example, if I have a file with lines that may be up to 255 chars
    wide, I need to place tokens at, say, near the 100 char position, the
    150th, the 200th, etc.

    The position is arbitrary because it depends on the nearest comma
    before that
    position. Basically, the lines consist of words separated by commas. I
    need to place tokens just after the nearest comma prior to those
    positions. The positions are relative because the lines vary in length
    (if that makes sense).

    The lines also consist of trailing blanks and some tabs. I'm stripping
    the trailing blanks and changing the tabs to spaces with sed but it
    would probably be better if I did that at the same time with the same
    bit of Perl. (Nothing wrong with using sed here but it makes sense to
    just do it all at once with Perl. With sed I am using an intermediate
    file but I understand I can edit the file in question in situ with
    Perl.)

    Can anyone help me with an example?

    Thanks in advance.
    John Howard, Aug 13, 2005
    #1
    1. Advertising

  2. John Howard

    Matt Garrish Guest

    "John Howard" <> wrote in message
    news:...
    >I need to edit a file and add tokens at aribrary positions on a line or
    > lines.
    >
    > I tried doing this with sed but its clumsy (sed can do it for fixed
    > locations but not arbitrary).
    >
    > For example, if I have a file with lines that may be up to 255 chars
    > wide, I need to place tokens at, say, near the 100 char position, the
    > 150th, the 200th, etc.
    >
    > The position is arbitrary because it depends on the nearest comma
    > before that
    > position. Basically, the lines consist of words separated by commas. I
    > need to place tokens just after the nearest comma prior to those
    > positions. The positions are relative because the lines vary in length
    > (if that makes sense).
    >


    Sounds like you're making the problem harder than it needs to be. Just break
    the line into fifty character chunks and insert your token where appropriate
    (watch for wrapping in the data section). Processing the tabs and spaces at
    the end of the line I leave to you:

    use strict;
    use warnings;

    while (my $line = <DATA>) {

    my $cnt = 0;
    my $newline;

    foreach my $chunk ($line =~ /(.{50})/g) {

    # don't place a marker at the 50 or 250+ positions
    if (($cnt == 0) || ($cnt >= 4)) {
    $newline .= $chunk;
    $cnt += 1;
    next;
    }

    $chunk =~ s/(.*),/$1,<mark>/;
    $newline .= $chunk;

    $cnt += 1;

    }

    print $newline;

    }

    __DATA__
    word1, word2, word3, word4, word5, word6, word7, word8, word8, word6, word7,
    word5, word5, word6, word7, word8, word1, word2, word3, word4, word5, word6,
    word7, word8, word5, word6, word7, word8, word1, word2, word3, word4, word5,
    word6, word7, word8, word5, word6, word7, word8, word1, word2, word3, word4,
    word5, word6, word7, word8
    Matt Garrish, Aug 13, 2005
    #2
    1. Advertising

  3. John Howard wrote:
    > I need to edit a file and add tokens at aribrary positions on a line or
    > lines.
    >
    > I tried doing this with sed but its clumsy (sed can do it for fixed
    > locations but not arbitrary).
    >
    > For example, if I have a file with lines that may be up to 255 chars
    > wide, I need to place tokens at, say, near the 100 char position, the
    > 150th, the 200th, etc.
    >
    > The position is arbitrary because it depends on the nearest comma
    > before that
    > position. Basically, the lines consist of words separated by commas. I
    > need to place tokens just after the nearest comma prior to those
    > positions. The positions are relative because the lines vary in length
    > (if that makes sense).
    >
    > The lines also consist of trailing blanks and some tabs. I'm stripping
    > the trailing blanks and changing the tabs to spaces with sed but it
    > would probably be better if I did that at the same time with the same
    > bit of Perl. (Nothing wrong with using sed here but it makes sense to
    > just do it all at once with Perl. With sed I am using an intermediate
    > file but I understand I can edit the file in question in situ with
    > Perl.)
    >
    > Can anyone help me with an example?


    perl -i.bak -lpe'
    s/(\s+)$/ ($a = $1) =~ y!\t ! !d; $a /e;
    for my $pos ( 200, 150, 100 ) { s/(^.{1,$pos},)/$1 token / }
    ' yourfile



    John
    --
    use Perl;
    program
    fulfillment
    John W. Krahn, Aug 14, 2005
    #3
  4. John Howard

    John Howard Guest

    Matt Garrish wrote:
    > "John Howard" <> wrote in message
    > news:...
    > >I need to edit a file and add tokens at aribrary positions on a line or
    > > lines.
    > >
    > > I tried doing this with sed but its clumsy (sed can do it for fixed
    > > locations but not arbitrary).
    > >
    > > For example, if I have a file with lines that may be up to 255 chars
    > > wide, I need to place tokens at, say, near the 100 char position, the
    > > 150th, the 200th, etc.
    > >
    > > The position is arbitrary because it depends on the nearest comma
    > > before that
    > > position. Basically, the lines consist of words separated by commas. I
    > > need to place tokens just after the nearest comma prior to those
    > > positions. The positions are relative because the lines vary in length
    > > (if that makes sense).
    > >

    >
    > Sounds like you're making the problem harder than it needs to be.


    Thanks for the reply. Unfortunately, it is.

    Think of it as some primative typesetting. The input file is a CSV
    file. I need to skip the first 3 fields and use the 4th field. I asked
    in another post how to get the length of a line but I realise now I
    should have asked how to get the length of a field instead.

    The input will look something like this -

    AAA,BBB,CCC,"A very long line of text,with embedded commas in it"

    It will end up being displayed like this -

    AAA BBB CCC A very long line
    of text that will use
    embedded tokens to determine
    where to wrap around.

    There will be several lines like that. The tokens need to go after the
    first comma nearest but before a possible line break location. The text
    after that comma must be included in the size of the next line etc.
    Hence the arbitrary positions. I realise I should have made that part
    clearer.

    So, not as simple as it seems. I was thinking of using the line length
    to determine how many tokens I might need but I think it might be
    better now just to scan thru the last field of each line and work out
    the positions. I could do this easily in C but I am supposed to do it
    in Perl.
    John Howard, Aug 19, 2005
    #4
  5. "John Howard" <> wrote in
    news::

    > Matt Garrish wrote:
    >> "John Howard" <> wrote in message
    >> news:...
    >> >I need to edit a file and add tokens at aribrary positions on a line
    >> >or
    >> > lines.
    >> >
    >> > I tried doing this with sed but its clumsy (sed can do it for fixed
    >> > locations but not arbitrary).
    >> >
    >> > For example, if I have a file with lines that may be up to 255
    >> > chars wide, I need to place tokens at, say, near the 100 char
    >> > position, the 150th, the 200th, etc.
    >> >
    >> > The position is arbitrary because it depends on the nearest comma
    >> > before that
    >> > position. Basically, the lines consist of words separated by
    >> > commas. I need to place tokens just after the nearest comma prior
    >> > to those positions. The positions are relative because the lines
    >> > vary in length (if that makes sense).
    >> >

    >>
    >> Sounds like you're making the problem harder than it needs to be.

    >
    > Thanks for the reply. Unfortunately, it is.
    >
    > Think of it as some primative typesetting. The input file is a CSV
    > file. I need to skip the first 3 fields and use the 4th field. I asked
    > in another post how to get the length of a line but I realise now I
    > should have asked how to get the length of a field instead.


    It would be extremely helpful to us if you could nail the specification
    down. Above, you say you only need to use the fourth field. Below, you
    use all the fields.

    Second, is there a fixed amount of space allocated to each field in each
    row?

    >
    > The input will look something like this -
    >
    > AAA,BBB,CCC,"A very long line of text,with embedded commas in it"
    >
    > It will end up being displayed like this -
    >
    > AAA BBB CCC A very long line
    > of text that will use
    > embedded tokens to determine
    > where to wrap around.


    What are those embedded tokens?

    > There will be several lines like that. The tokens need to go after the
    > first comma nearest but before a possible line break location. The
    > text after that comma must be included in the size of the next line
    > etc. Hence the arbitrary positions. I realise I should have made that
    > part clearer.


    You might benefit from looking into

    Text::CSV_XS

    for parsing and

    Text::Wrap

    for wrapping each field to a specific width.

    > So, not as simple as it seems. I was thinking of using the line length
    > to determine how many tokens I might need but I think it might be


    I am really very confused about what you mean by tokens.

    Here is something tht might help:

    #!/usr/bin/perl

    use strict;
    use warnings;

    use Text::CSV_XS;
    use Text::Table;
    use Text::Wrap;

    # set to 17 to avoid line wrapping in newsreader
    $Text::Wrap::columns = 17;

    my @data = @{ read_data() };

    my $table = Text::Table->new;
    for my $row ( @data ) {
    $table->add(@$row);
    }

    print $table->table;

    sub read_data {
    my @data;

    my $csv = Text::CSV_XS->new;

    while( my $line = <DATA> ) {
    chomp $line;
    length $line or last;
    if( $csv->parse($line) ) {
    my @fields = $csv->fields;
    $_ = wrap '', '', $_ for @fields;
    push @data, \@fields;
    } else {
    warn "Malformatted CSV line.";
    }
    }
    return \@data;
    }


    __DATA__
    AAA,BBB,CCC,"A very long line of text,with embedded commas in it"
    AAA,BBB,"A very long line of text,with embedded commas in it",CCC
    AAA,"A very long line of text,with embedded commas in it",BBB,CCC
    "A very long line of text,with embedded commas in it",AAA,BBB,CCC

    When you run this, it outputs:

    D:\Home\asu1\UseNet\clpmisc> table
    AAA BBB CCC A very long line
    of text,with
    embedded commas
    in it
    AAA BBB A very long line CCC
    of text,with
    embedded commas
    in it
    AAA A very long line BBB CCC
    of text,with
    embedded commas
    in it
    A very long line AAA BBB CCC
    of text,with
    embedded commas
    in it

    --
    A. Sinan Unur <>
    (reverse each component and remove .invalid for email address)

    comp.lang.perl.misc guidelines on the WWW:
    http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
    A. Sinan Unur, Aug 19, 2005
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Honestmath
    Replies:
    5
    Views:
    549
    Honestmath
    Dec 13, 2004
  2. Knut Krueger
    Replies:
    2
    Views:
    430
    Knut Krueger
    May 21, 2007
  3. scad
    Replies:
    23
    Views:
    1,139
    Alf P. Steinbach
    May 17, 2009
  4. Jack
    Replies:
    2
    Views:
    87
  5. Ting Wang
    Replies:
    3
    Views:
    185
    Paul Lalli
    Dec 13, 2005
Loading...

Share This Page