Trouble with Regexps

evlika · Feb 7, 2005

Hi all,
Can't seem to find the right way to extract what I need

Here is what I have:
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
ASCP ASCP [PACK 1,(2,3) FLO 21-50-04-00
DISAG]
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
AUTO DISABLE RL AUTO DISABLE RL 21-31-04-00
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

Here is what I need:

ASCP ASCP [PACK 1,(2,3) FLO DISAG] 21-50-04-00
AUTO DISABLE RL AUTO DISABLE RL 21-31-04-00

The columns are seperated by spaces not tabs. The second example I have
no
problem with. The first one has data on a second line that should be on
the
first line/appended to the second column. Any thoughts?

Thanks!

Gunnar Hjalmarsson · Feb 7, 2005

evlika said:
Here is what I have:
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
ASCP ASCP [PACK 1,(2,3) FLO 21-50-04-00
DISAG]
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
AUTO DISABLE RL AUTO DISABLE RL 21-31-04-00
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

Here is what I need:

ASCP ASCP [PACK 1,(2,3) FLO DISAG] 21-50-04-00
AUTO DISABLE RL AUTO DISABLE RL 21-31-04-00

The columns are seperated by spaces not tabs. The second example I
have no problem with. The first one has data on a second line that
should be on the first line/appended to the second column. Any
thoughts?

Maybe something along these lines:

my @rec;
my $i = -1;
while (<>) {
if( substr($_, 0, 1) eq '@' ) {
map { s/\s*$// } @{ $rec[$i] } if $rec[$i];
$i++;
next;
}
no warnings qw(substr uninitialized);
$rec[$i][0] .= substr($_, 0, 27);
$rec[$i][1] .= substr($_, 27, 22);
$rec[$i][2] .= substr($_, 57, 11);
}
print join("\n", @$_), "\n\n" for @rec;

A. Sinan Unur · Feb 7, 2005

Hi all,
Can't seem to find the right way to extract what I need

Here is what I have:
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
ASCP ASCP [PACK 1,(2,3) FLO 21-50-04-00
DISAG]
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
AUTO DISABLE RL AUTO DISABLE RL 21-31-04-00
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

Here is what I need:

ASCP ASCP [PACK 1,(2,3) FLO DISAG] 21-50-04-00
AUTO DISABLE RL AUTO DISABLE RL 21-31-04-00

This is one of those cases where the lowly substr comes in handy. I have
a feeling someone will post something infinitely neater, but if the
column widths are always the same, then you can do something along the
lines of:

#! /usr/bin/perl

use strict;
use warnings;

use Data:

umper;

my @parsed;

{
local $/ =

"@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
\n";
while(my $line = <DATA>) {
chomp $line;
my @segments = split "\n", $line;
next unless @segments;

my $segment = shift @segments;

my $left = substr $segment, 0, 26;
my $mid = substr $segment, 27, 30;
my $right = substr $segment, 58;

$left =~ s/\s+$//g;
$right =~ s/^\s+//g;
$mid =~ s/^\s+//g;
$mid =~ s/\s+$//g;

for my $s (@segments) {
$s =~ s/^\s+//g;
$s =~ s/\s+$//g;
$mid .= $s;
}
push @parsed, {
field1 => $left,
field2 => $mid,
field3 => $right,
};
}
}

print Dumper \@parsed;

__DATA__

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
ASCP ASCP [PACK 1,(2,3) FLO 21-50-04-00
DISAG]
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
AUTO DISABLE RL AUTO DISABLE RL 21-31-04-00
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

D:\Home\asu1\UseNet\clpmisc> b
$VAR1 = [
{
'field1' => 'ASCP',
'field2' => 'ASCP [PACK 1,(2,3) FLODISAG]',
'field3' => '1-50-04-00'
},
{
'field1' => 'AUTO DISABLE RL',
'field2' => 'AUTO DISABLE RL',
'field3' => '1-31-04-00'
}
];

A. Sinan Unur · Feb 7, 2005

my $right = substr $segment, 58;

Despite my assertions, it seems like I really don't know how to count.
That should be:

my $right = substr $segment, 57;

Sinan

Jeffrey Ross · Feb 7, 2005

evlika said:
Hi all,
Can't seem to find the right way to extract what I need

Here is what I have:
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
ASCP ASCP [PACK 1,(2,3) FLO 21-50-04-00
DISAG]
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
AUTO DISABLE RL AUTO DISABLE RL 21-31-04-00
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

Here is what I need:

ASCP ASCP [PACK 1,(2,3) FLO DISAG] 21-50-04-00
AUTO DISABLE RL AUTO DISABLE RL 21-31-04-00

The columns are seperated by spaces not tabs. The second example I have
no
problem with. The first one has data on a second line that should be on
the
first line/appended to the second column. Any thoughts?

Thanks!

Assuming that your data is in a file called infile and that the 'columns'
are always separated by at least 2 consecutive spaces, that 2 consecutive
space are not present within a 'column', that the optional continuation line
starts with a space, and that "@" in column 1 indicates a separation line,
awk -F" *" '
$0 ~ /^@/ {print f1, f2, f3; f1=f2=f3=""; next}
f1 == "" {f1=$1; f2=$2; f3=$3; next}
{f2=f2 " " $0; next}
END {print f1, f2, f3}
' <infile

In English this may be interpreted as... use awk with two or more spaces as
field separators.
If a line starts with "@" print f1, f2, and f3. Clear f1, f2, and f3. Skip
to next line.
If f1 is empty store field1 in f1, field2 in f2, and field3 in f3. Skip to
next line.
Append this line (which better be the continuation line!) to f2. Skip to
next line.
Once the last line has been processed, print f1, f2, and f3 (in case there's
no final separator line).
The data is read from infile.

Note that I have not tested this, so it may not bequite right but should
give you a a start. It will print a blank line at the beginning and maybe
another at the end. You can avoid that by using "if (f1 != "") print ...".
If the assumptions above do not match your data this solution probably won't
work.
It's probably cleaner in Perl, but I'm more of an awk expert. It would be
much better if you could generate your data with clearer column divisions.
Regards,
Jeffrey.

Bob Walton · Feb 7, 2005

evlika said:
Hi all,
Can't seem to find the right way to extract what I need

Here is what I have:
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
ASCP ASCP [PACK 1,(2,3) FLO 21-50-04-00
DISAG]
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
AUTO DISABLE RL AUTO DISABLE RL 21-31-04-00
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

Here is what I need:

ASCP ASCP [PACK 1,(2,3) FLO DISAG] 21-50-04-00
AUTO DISABLE RL AUTO DISABLE RL 21-31-04-00

The columns are seperated by spaces not tabs. The second example I have
no
problem with. The first one has data on a second line that should be on
the
first line/appended to the second column. Any thoughts?

Well, the actual format of your incoming data is not apparent, so
any responses will have be based upon assumptions. For example,
with my assumptions indicated in []:

Is the "second field" the only one that can be continued onto the
next line, or can the "first field" and "third field" also be
continued sometimes? [the first and seconds fields may be
continued, the third may not, since there is no way to specify a
third continuation field unless there is a non-empty second field
continuation unless the fields are column-based]

Are "records" always separated with a line containing nothing but
a bunch of @'s? [yes]

Can there be two, three, or more continuation lines, or is it
limited to just one? [indefinite number]

Are the input "fields" delimited by two or more space characters,
or do they occur within specific "columns"? [two or more space
characters, implies a field cannot contain two or more
consecutive space characters]

Is there always a @-line at the start of the data? At the end of
the data? [yes, always @-line at beginning and end]

When continuations are appended, is a space character inserted? [yes]

Given those assumptions:

use strict;
use warnings;
my @fields;
while(<DATA>){
chomp;
if(/^\@+$/){
#remove unwanted extra spaces
for my $f(@fields){
$f=~s/^ +//;
$f=~s/ +$//;
$f=~s/ +/ /g;
}
print "$fields[0] $fields[1] $fields[2]\n"
if @fields;
@fields=();
next;
}
else{
my @pf=$_=~/(.*?)(?: ?$| {2,}(.*?)(?: ?$| {2,}(.*)))/;
die "Input error" unless @pf;
no warnings 'uninitialized';
for my $i(0..2){
$fields[$i].=' '.$pf[$i];
}
}
}
__END__
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
ASCP ASCP [PACK 1,(2,3) FLO 21-50-04-00
DISAG]
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
AUTO DISABLE RL AUTO DISABLE RL 21-31-04-00
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

generates the output you say you want (with some liberty taken to
prevent wrapping of your data lines -- I shortened them a bit).
....

ioneabu · Feb 7, 2005

evlika said:
Hi all,
Can't seem to find the right way to extract what I need

Here is what I have:
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
ASCP ASCP [PACK 1,(2,3) FLO 21-50-04-00
DISAG]
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
AUTO DISABLE RL AUTO DISABLE RL 21-31-04-00
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

Here is what I need:

ASCP ASCP [PACK 1,(2,3) FLO DISAG] 21-50-04-00
AUTO DISABLE RL AUTO DISABLE RL 21-31-04-00

The columns are seperated by spaces not tabs. The second example I have
no
problem with. The first one has data on a second line that should be on
the
first line/appended to the second column. Any thoughts?

Thanks!

#!/usr/bin/perl

use strict;
use warnings;

my $arrayref = [ ];
my @record;
while (<>)
{
if ($_ !~ /^@+$/)
{
@record = /(.+)\s{2,}(.+)\s{2,}(.+)/;
}
push @$arrayref, [ @record ];
}

for (@$arrayref)
{
print "$_->[0]\t$_->[1]\t$_->[2]\n" if scalar @$_ == 3;
}

Gunnar Hjalmarsson · Feb 8, 2005

if ($_ !~ /^@+$/)

--------------------^^^^^^
What do you think that does? You probably mean:

if ( $_ !~ /^\@+$/ )

or (cleaner)

unless ( /^\@+$/ )

{
@record = /(.+)\s{2,}(.+)\s{2,}(.+)/;
}
push @$arrayref, [ @record ];

The push() statement should be in the inner block, shouldn't it?

{
@record = /(.+)\s{2,}(.+)\s{2,}(.+)/;
push @$arrayref, [ @record ];
}

print "$_->[0]\t$_->[1]\t$_->[2]\n" if scalar @$_ == 3;

---------------------------------------------^^^^^^^^^^^^^^^^^^^
With the above changes, that condition is redundant.

Nevertheless, I think you missed the point. What's now $arrayref->[0]
and $arrayref->[1] should be merged to one record. See above quote from
the OP.

ioneabu · Feb 8, 2005

Gunnar said:
if ($_ !~ /^@+$/)

Click to expand...

--------------------^^^^^^
What do you think that does? You probably mean:

if ( $_ !~ /^\@+$/ )
or (cleaner)
unless ( /^\@+$/ )

{
@record = /(.+)\s{2,}(.+)\s{2,}(.+)/;
}
push @$arrayref, [ @record ];

Click to expand...

The push() statement should be in the inner block, shouldn't it?
{
@record = /(.+)\s{2,}(.+)\s{2,}(.+)/;
push @$arrayref, [ @record ];
}

print "$_->[0]\t$_->[1]\t$_->[2]\n" if scalar @$_ == 3;

Click to expand...

---------------------------------------------^^^^^^^^^^^^^^^^^^^
With the above changes, that condition is redundant.

Nevertheless, I think you missed the point. What's now $arrayref->[0]

and $arrayref->[1] should be merged to one record. See above quote from
the OP.

Thanks for the tips. I was just looking at the visual format of the
input and desired output so I didn't totally get what he wanted.

wana

evlika · Feb 9, 2005

Outstanding suggestions. I took a little from each one. Thanks much!
--

Iam having trouble adding a level editor to my platformer	0	Nov 4, 2025
? Trouble with: passing dragElement(e.target.id);	8	Jan 11, 2023
Help with my responsive home page	2	Dec 14, 2022
Working on mobile css menu with plenty of frustration!	2	Dec 29, 2022
Help with code	0	Jun 11, 2022
Implementing a Q-Learning Algorithm with Logistic Regression Normalization in C++	0	Jun 4, 2025
Taskcproblem calendar	4	Aug 31, 2023
Help with code plsss	0	Aug 30, 2023

Trouble with Regexps

evlika

Gunnar Hjalmarsson

A. Sinan Unur

A. Sinan Unur

Jeffrey Ross

Bob Walton

ioneabu

Gunnar Hjalmarsson

ioneabu

evlika

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads