sorting text

jamasd · Jun 16, 2004

Here is a sample of my data (each column is separated by tabs):

1234123 jaesdf ytkyk 345234
1264345 ghgfdf ghjhg 657658
3456765 sdasdf ytkyk 456543
1231232 assffg werwe 123454
5447454 asdqfr ytkyk 254364

I am interested in creating a hash with two of the elements in the
list ("ytkyk" and "ghjhg"). I would like to create a program to read
only the third colomn and print the line (row) if it contains one of
the latter items. Can anyone help me write a program. Here is what I
have so far and I would like to create a more efficient program (I am
going to use it for writing a larger program later):

open( File, '<', 'file.txt' ) or die "$!\n";
while ( <File> ) {
next unless ( index($_, 'ytkyk') >= 0 );
next unless ( index($_, 'ghjhg') >= 0 );
print;
}
close( File );

Thank you very much.

Gunnar Hjalmarsson · Jun 16, 2004

Here is a sample of my data (each column is separated by tabs):

1234123 jaesdf ytkyk 345234
1264345 ghgfdf ghjhg 657658
3456765 sdasdf ytkyk 456543
1231232 assffg werwe 123454
5447454 asdqfr ytkyk 254364

I am interested in creating a hash with two of the elements in the
list ("ytkyk" and "ghjhg"). I would like to create a program to read
only the third colomn and print the line (row) if it contains one of
the latter items. Can anyone help me write a program. Here is what I
have so far and I would like to create a more efficient program (I am
going to use it for writing a larger program later):

open( File, '<', 'file.txt' ) or die "$!\n";
while ( <File> ) {
next unless ( index($_, 'ytkyk') >= 0 );
next unless ( index($_, 'ghjhg') >= 0 );
print;
}
close( File );

What makes you believe that what you have is not efficient?

John Bokma · Jun 16, 2004

my $filename = 'file.txt';
open my $fh, $filename or die "Can't open '$filename' for reading:$!";

next unless index($_, 'ytkyk');

The >= 0 test can be replaced, since it's clear it's not the first
position. Even better, (I guess) check the string at the exact position

close $fh or die "Can't close '$filename' after reading: $!";

What makes you believe that what you have is not efficient?

Maybe the OP forgot to explain the "sorting" part :-D.

Web Surfer · Jun 16, 2004

[This followup was posted to comp.lang.perl.misc]

Here is a sample of my data (each column is separated by tabs):

1234123 jaesdf ytkyk 345234
1264345 ghgfdf ghjhg 657658
3456765 sdasdf ytkyk 456543
1231232 assffg werwe 123454
5447454 asdqfr ytkyk 254364

I am interested in creating a hash with two of the elements in the
list ("ytkyk" and "ghjhg"). I would like to create a program to read
only the third colomn and print the line (row) if it contains one of
the latter items. Can anyone help me write a program. Here is what I
have so far and I would like to create a more efficient program (I am
going to use it for writing a larger program later):

open( File, '<', 'file.txt' ) or die "$!\n";
while ( <File> ) {
next unless ( index($_, 'ytkyk') >= 0 );
next unless ( index($_, 'ghjhg') >= 0 );
print;
}
close( File );

Thank you very much.

### Try this untested code ###

#!/usr/bin/perl
use strict;
use warnings;

my ( $buffer , @fields , $filename , %hash1 );

$filename = "file.txt";
open(INPUT,"<$filename") or
die("Can't open file \"$filename\" : $!\n");

%hash1 = ( "ytkyk" => 1 , "ghjhg" => 1 );

while ( $buffer = <INPUT> ) {
chomp $buffer;
@fields = split(/\t+/,$buffer);
if ( 2 < @fields ) { # Ignore if less than 3 fields
next;
}
unless ( exists $hash1{$fields[2]} ) {
next;
}
print "$buffer\n";
}
close INPUT;

John Bokma · Jun 16, 2004

Web said:
[This followup was posted to comp.lang.perl.misc]

Here is a sample of my data (each column is separated by tabs):

1234123 jaesdf ytkyk 345234

Click to expand...

while ( $buffer = <INPUT> ) {
chomp $buffer;

why?, now you have to add back the \n in the print

@fields = split(/\t+/,$buffer);
if ( 2 < @fields ) { # Ignore if less than 3 fields
next;

silly, the OP never specified that could happen. It are 4 fields btw, so
I would test for inequality, not less than..
Don't see any point in putting the constant to the left, btw. Silly C
coding convention IIRC.

Jeff 'japhy' Pinyan · Jun 16, 2004

silly, the OP never specified that could happen. It are 4 fields btw, so
I would test for inequality, not less than..

Because it was the *third* field that contained the string the OP is
searching for. Thus, skip any line that doesn't have enough fields.

Don't see any point in putting the constant to the left, btw. Silly C
coding convention IIRC.

There's nothing wrong with it. It's not "silly". There is a point to it.
It stops you from accidentally writing = instead of == if you mean to do a
comparison. Compare:

if ($foo = 2) { ... }

to

if (2 = $foo) { ... }

The coder *meant* to write ==, but only did =. The first one is not an
error, and the if block is reached all the time. The second one IS an
error.

Gunnar Hjalmarsson · Jun 16, 2004

John said:
next unless index($_, 'ytkyk');

The >= 0 test can be replaced, since it's clear it's not the first
position.

No, it can't. If the string is not found in $_, index() returns -1
which is a true value.

Maybe the OP forgot to explain the "sorting" part :-D.

Maybe. But it just struck me that the code will not print anything. I
would believe that this is what the OP meant to do:

while ( <File> ) {
print and next if index($_, 'ytkyk') >= 0;
print and next if index($_, 'ghjhg') >= 0;
}

Gunnar Hjalmarsson · Jun 16, 2004

Web said:
(e-mail address removed) says:

Here is a sample of my data (each column is separated by tabs):

1234123 jaesdf ytkyk 345234
1264345 ghgfdf ghjhg 657658
3456765 sdasdf ytkyk 456543
1231232 assffg werwe 123454
5447454 asdqfr ytkyk 254364

I am interested in creating a hash with two of the elements in
the list ("ytkyk" and "ghjhg"). I would like to create a program
to read only the third colomn and print the line (row) if it
contains one of the latter items. Can anyone help me write a
program. Here is what I have so far and I would like to create a
more efficient program (I am going to use it for writing a larger
program later):

open( File, '<', 'file.txt' ) or die "$!\n";
while ( <File> ) {
next unless ( index($_, 'ytkyk') >= 0 );
next unless ( index($_, 'ghjhg') >= 0 );
print;
}
close( File );

Click to expand...

### Try this untested code ###

#!/usr/bin/perl
use strict;
use warnings;

my ( $buffer , @fields , $filename , %hash1 );

$filename = "file.txt";
open(INPUT,"<$filename") or
die("Can't open file \"$filename\" : $!\n");

%hash1 = ( "ytkyk" => 1 , "ghjhg" => 1 );

while ( $buffer = <INPUT> ) {
chomp $buffer;
@fields = split(/\t+/,$buffer);
if ( 2 < @fields ) { # Ignore if less than 3 fields
next;
}
unless ( exists $hash1{$fields[2]} ) {
next;
}
print "$buffer\n";
}
close INPUT;

Would a hash creation and involving the regex engine (through split())
be more efficient? What would a benchmark result in?

John Bokma · Jun 17, 2004

Jeff said:
Because it was the *third* field that contained the string the OP is
searching for. Thus, skip any line that doesn't have enough fields.

Was there ever in the specification that there could be less than 4
fields? No.

There's nothing wrong with it. It's not "silly". There is a point to it.
It stops you from accidentally writing = instead of == if you mean to do a
comparison. Compare:

if ($foo = 2) { ... }

Found = in conditional, should be ==

The coder *meant* to write ==, but only did =. The first one is not an
error, and the if block is reached all the time. The second one IS an
error.

No, it's and error if your compiler, interpreter, etc doesn't *WARN*
you. And a programmer turning of those warnings is silly.

Most C, C++ compilers do warn, as does Perl (with use strict, use
warnings). It is IMNSHO a stupid coding convention, illogical,
unreadable, weird. Especially with *inequalities* as the prev post used.

John Bokma · Jun 17, 2004

Gunnar said:
John Bokma wrote:

No, it can't. If the string is not found in $_, index() returns -1
which is a true value.

Arrgh, stupid of me.

Eric Bohlman · Jun 18, 2004

Found = in conditional, should be ==

No, it's and error if your compiler, interpreter, etc doesn't *WARN*
you. And a programmer turning of those warnings is silly.

Most C, C++ compilers do warn, as does Perl (with use strict, use
warnings). It is IMNSHO a stupid coding convention, illogical,
unreadable, weird. Especially with *inequalities* as the prev post
used.

Whether or not one adopts (or is forced by local coding standards to adopt)
that particular convention with regard to tests for equality, it's
ridiculously rigid to invert the sense of relational comparisons for no
other reason than "putting the constant on the left." That really smacks
of a failure to think abstractly leading to an inability to distinguish
means from ends. In this case, the means that *may* help to achieve the
end of making equality comparisons less error-prone winds up, when applied
blindly, making other kinds of comparisons *more* error-prone.

sorting text	4	Jun 16, 2004
Genetic algoritm generating the text	0	Aug 18, 2023
READFILE sorting coding problem	3	Oct 25, 2013
sorting file according to a unicode column	17	May 28, 2014
Sorting	3	May 12, 2012
Php combine identical lines in text file	4	Oct 11, 2023
Parallel sorting algorithms...	0	Sep 7, 2012
C exercise	1	Feb 3, 2022

sorting text

jamasd

Gunnar Hjalmarsson

John Bokma

Web Surfer

John Bokma

Jeff 'japhy' Pinyan

Gunnar Hjalmarsson

Gunnar Hjalmarsson

John Bokma

John Bokma

Eric Bohlman

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads