How can I add tokens at arbitrary positions on a line in a file?

J

John Howard

I need to edit a file and add tokens at aribrary positions on a line or
lines.

I tried doing this with sed but its clumsy (sed can do it for fixed
locations but not arbitrary).

For example, if I have a file with lines that may be up to 255 chars
wide, I need to place tokens at, say, near the 100 char position, the
150th, the 200th, etc.

The position is arbitrary because it depends on the nearest comma
before that
position. Basically, the lines consist of words separated by commas. I
need to place tokens just after the nearest comma prior to those
positions. The positions are relative because the lines vary in length
(if that makes sense).

The lines also consist of trailing blanks and some tabs. I'm stripping
the trailing blanks and changing the tabs to spaces with sed but it
would probably be better if I did that at the same time with the same
bit of Perl. (Nothing wrong with using sed here but it makes sense to
just do it all at once with Perl. With sed I am using an intermediate
file but I understand I can edit the file in question in situ with
Perl.)

Can anyone help me with an example?

Thanks in advance.
 
M

Matt Garrish

John Howard said:
I need to edit a file and add tokens at aribrary positions on a line or
lines.

I tried doing this with sed but its clumsy (sed can do it for fixed
locations but not arbitrary).

For example, if I have a file with lines that may be up to 255 chars
wide, I need to place tokens at, say, near the 100 char position, the
150th, the 200th, etc.

The position is arbitrary because it depends on the nearest comma
before that
position. Basically, the lines consist of words separated by commas. I
need to place tokens just after the nearest comma prior to those
positions. The positions are relative because the lines vary in length
(if that makes sense).

Sounds like you're making the problem harder than it needs to be. Just break
the line into fifty character chunks and insert your token where appropriate
(watch for wrapping in the data section). Processing the tabs and spaces at
the end of the line I leave to you:

use strict;
use warnings;

while (my $line = <DATA>) {

my $cnt = 0;
my $newline;

foreach my $chunk ($line =~ /(.{50})/g) {

# don't place a marker at the 50 or 250+ positions
if (($cnt == 0) || ($cnt >= 4)) {
$newline .= $chunk;
$cnt += 1;
next;
}

$chunk =~ s/(.*),/$1,<mark>/;
$newline .= $chunk;

$cnt += 1;

}

print $newline;

}

__DATA__
word1, word2, word3, word4, word5, word6, word7, word8, word8, word6, word7,
word5, word5, word6, word7, word8, word1, word2, word3, word4, word5, word6,
word7, word8, word5, word6, word7, word8, word1, word2, word3, word4, word5,
word6, word7, word8, word5, word6, word7, word8, word1, word2, word3, word4,
word5, word6, word7, word8
 
J

John W. Krahn

John said:
I need to edit a file and add tokens at aribrary positions on a line or
lines.

I tried doing this with sed but its clumsy (sed can do it for fixed
locations but not arbitrary).

For example, if I have a file with lines that may be up to 255 chars
wide, I need to place tokens at, say, near the 100 char position, the
150th, the 200th, etc.

The position is arbitrary because it depends on the nearest comma
before that
position. Basically, the lines consist of words separated by commas. I
need to place tokens just after the nearest comma prior to those
positions. The positions are relative because the lines vary in length
(if that makes sense).

The lines also consist of trailing blanks and some tabs. I'm stripping
the trailing blanks and changing the tabs to spaces with sed but it
would probably be better if I did that at the same time with the same
bit of Perl. (Nothing wrong with using sed here but it makes sense to
just do it all at once with Perl. With sed I am using an intermediate
file but I understand I can edit the file in question in situ with
Perl.)

Can anyone help me with an example?

perl -i.bak -lpe'
s/(\s+)$/ ($a = $1) =~ y!\t ! !d; $a /e;
for my $pos ( 200, 150, 100 ) { s/(^.{1,$pos},)/$1 token / }
' yourfile



John
 
J

John Howard

Matt said:
Sounds like you're making the problem harder than it needs to be.

Thanks for the reply. Unfortunately, it is.

Think of it as some primative typesetting. The input file is a CSV
file. I need to skip the first 3 fields and use the 4th field. I asked
in another post how to get the length of a line but I realise now I
should have asked how to get the length of a field instead.

The input will look something like this -

AAA,BBB,CCC,"A very long line of text,with embedded commas in it"

It will end up being displayed like this -

AAA BBB CCC A very long line
of text that will use
embedded tokens to determine
where to wrap around.

There will be several lines like that. The tokens need to go after the
first comma nearest but before a possible line break location. The text
after that comma must be included in the size of the next line etc.
Hence the arbitrary positions. I realise I should have made that part
clearer.

So, not as simple as it seems. I was thinking of using the line length
to determine how many tokens I might need but I think it might be
better now just to scan thru the last field of each line and work out
the positions. I could do this easily in C but I am supposed to do it
in Perl.
 
A

A. Sinan Unur

Thanks for the reply. Unfortunately, it is.

Think of it as some primative typesetting. The input file is a CSV
file. I need to skip the first 3 fields and use the 4th field. I asked
in another post how to get the length of a line but I realise now I
should have asked how to get the length of a field instead.

It would be extremely helpful to us if you could nail the specification
down. Above, you say you only need to use the fourth field. Below, you
use all the fields.

Second, is there a fixed amount of space allocated to each field in each
row?
The input will look something like this -

AAA,BBB,CCC,"A very long line of text,with embedded commas in it"

It will end up being displayed like this -

AAA BBB CCC A very long line
of text that will use
embedded tokens to determine
where to wrap around.

What are those embedded tokens?
There will be several lines like that. The tokens need to go after the
first comma nearest but before a possible line break location. The
text after that comma must be included in the size of the next line
etc. Hence the arbitrary positions. I realise I should have made that
part clearer.

You might benefit from looking into

Text::CSV_XS

for parsing and

Text::Wrap

for wrapping each field to a specific width.
So, not as simple as it seems. I was thinking of using the line length
to determine how many tokens I might need but I think it might be

I am really very confused about what you mean by tokens.

Here is something tht might help:

#!/usr/bin/perl

use strict;
use warnings;

use Text::CSV_XS;
use Text::Table;
use Text::Wrap;

# set to 17 to avoid line wrapping in newsreader
$Text::Wrap::columns = 17;

my @data = @{ read_data() };

my $table = Text::Table->new;
for my $row ( @data ) {
$table->add(@$row);
}

print $table->table;

sub read_data {
my @data;

my $csv = Text::CSV_XS->new;

while( my $line = <DATA> ) {
chomp $line;
length $line or last;
if( $csv->parse($line) ) {
my @fields = $csv->fields;
$_ = wrap '', '', $_ for @fields;
push @data, \@fields;
} else {
warn "Malformatted CSV line.";
}
}
return \@data;
}


__DATA__
AAA,BBB,CCC,"A very long line of text,with embedded commas in it"
AAA,BBB,"A very long line of text,with embedded commas in it",CCC
AAA,"A very long line of text,with embedded commas in it",BBB,CCC
"A very long line of text,with embedded commas in it",AAA,BBB,CCC

When you run this, it outputs:

D:\Home\asu1\UseNet\clpmisc> table
AAA BBB CCC A very long line
of text,with
embedded commas
in it
AAA BBB A very long line CCC
of text,with
embedded commas
in it
AAA A very long line BBB CCC
of text,with
embedded commas
in it
A very long line AAA BBB CCC
of text,with
embedded commas
in it
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,756
Messages
2,569,533
Members
45,007
Latest member
OrderFitnessKetoCapsules

Latest Threads

Top