sorting text

J

jamasd

Here is a sample of my data (each column is separated by tabs):

1234123 jaesdf ytkyk 345234
1264345 ghgfdf ghjhg 657658
3456765 sdasdf ytkyk 456543
1231232 assffg werwe 123454
5447454 asdqfr ytkyk 254364

I am interested in creating a hash with two of the elements in the
list ("ytkyk" and "ghjhg"). I would like to create a program to read
only the third colomn and print the line (row) if it contains one of
the latter items. Can anyone help me write a program. Here is what I
have so far and I would like to create a more efficient program (I am
going to use it for writing a larger program later):

open( File, '<', 'file.txt' ) or die "$!\n";
while ( <File> ) {
next unless ( index($_, 'ytkyk') >= 0 );
next unless ( index($_, 'ghjhg') >= 0 );
print;
}
close( File );

Thank you very much.
 
G

Gunnar Hjalmarsson

Here is a sample of my data (each column is separated by tabs):

1234123 jaesdf ytkyk 345234
1264345 ghgfdf ghjhg 657658
3456765 sdasdf ytkyk 456543
1231232 assffg werwe 123454
5447454 asdqfr ytkyk 254364

I am interested in creating a hash with two of the elements in the
list ("ytkyk" and "ghjhg"). I would like to create a program to read
only the third colomn and print the line (row) if it contains one of
the latter items. Can anyone help me write a program. Here is what I
have so far and I would like to create a more efficient program (I am
going to use it for writing a larger program later):

open( File, '<', 'file.txt' ) or die "$!\n";
while ( <File> ) {
next unless ( index($_, 'ytkyk') >= 0 );
next unless ( index($_, 'ghjhg') >= 0 );
print;
}
close( File );

What makes you believe that what you have is not efficient?
 
J

John Bokma

my $filename = 'file.txt';
open my $fh, $filename or die "Can't open '$filename' for reading:$!";

next unless index($_, 'ytkyk');

The >= 0 test can be replaced, since it's clear it's not the first
position. Even better, (I guess) check the string at the exact position

close $fh or die "Can't close '$filename' after reading: $!";
What makes you believe that what you have is not efficient?

Maybe the OP forgot to explain the "sorting" part :-D.
 
W

Web Surfer

[This followup was posted to comp.lang.perl.misc]

Here is a sample of my data (each column is separated by tabs):

1234123 jaesdf ytkyk 345234
1264345 ghgfdf ghjhg 657658
3456765 sdasdf ytkyk 456543
1231232 assffg werwe 123454
5447454 asdqfr ytkyk 254364

I am interested in creating a hash with two of the elements in the
list ("ytkyk" and "ghjhg"). I would like to create a program to read
only the third colomn and print the line (row) if it contains one of
the latter items. Can anyone help me write a program. Here is what I
have so far and I would like to create a more efficient program (I am
going to use it for writing a larger program later):

open( File, '<', 'file.txt' ) or die "$!\n";
while ( <File> ) {
next unless ( index($_, 'ytkyk') >= 0 );
next unless ( index($_, 'ghjhg') >= 0 );
print;
}
close( File );

Thank you very much.

### Try this untested code ###

#!/usr/bin/perl
use strict;
use warnings;

my ( $buffer , @fields , $filename , %hash1 );

$filename = "file.txt";
open(INPUT,"<$filename") or
die("Can't open file \"$filename\" : $!\n");

%hash1 = ( "ytkyk" => 1 , "ghjhg" => 1 );

while ( $buffer = <INPUT> ) {
chomp $buffer;
@fields = split(/\t+/,$buffer);
if ( 2 < @fields ) { # Ignore if less than 3 fields
next;
}
unless ( exists $hash1{$fields[2]} ) {
next;
}
print "$buffer\n";
}
close INPUT;
 
J

John Bokma

Web said:
[This followup was posted to comp.lang.perl.misc]

Here is a sample of my data (each column is separated by tabs):

1234123 jaesdf ytkyk 345234
while ( $buffer = <INPUT> ) {
chomp $buffer;

why?, now you have to add back the \n in the print
@fields = split(/\t+/,$buffer);
if ( 2 < @fields ) { # Ignore if less than 3 fields
next;

silly, the OP never specified that could happen. It are 4 fields btw, so
I would test for inequality, not less than..
Don't see any point in putting the constant to the left, btw. Silly C
coding convention IIRC.
 
J

Jeff 'japhy' Pinyan

silly, the OP never specified that could happen. It are 4 fields btw, so
I would test for inequality, not less than..

Because it was the *third* field that contained the string the OP is
searching for. Thus, skip any line that doesn't have enough fields.
Don't see any point in putting the constant to the left, btw. Silly C
coding convention IIRC.

There's nothing wrong with it. It's not "silly". There is a point to it.
It stops you from accidentally writing = instead of == if you mean to do a
comparison. Compare:

if ($foo = 2) { ... }

to

if (2 = $foo) { ... }

The coder *meant* to write ==, but only did =. The first one is not an
error, and the if block is reached all the time. The second one IS an
error.
 
G

Gunnar Hjalmarsson

John said:
next unless index($_, 'ytkyk');

The >= 0 test can be replaced, since it's clear it's not the first
position.

No, it can't. If the string is not found in $_, index() returns -1
which is a true value.
Maybe the OP forgot to explain the "sorting" part :-D.

Maybe. But it just struck me that the code will not print anything. I
would believe that this is what the OP meant to do:

while ( <File> ) {
print and next if index($_, 'ytkyk') >= 0;
print and next if index($_, 'ghjhg') >= 0;
}
 
G

Gunnar Hjalmarsson

Web said:
(e-mail address removed) says:
Here is a sample of my data (each column is separated by tabs):

1234123 jaesdf ytkyk 345234
1264345 ghgfdf ghjhg 657658
3456765 sdasdf ytkyk 456543
1231232 assffg werwe 123454
5447454 asdqfr ytkyk 254364

I am interested in creating a hash with two of the elements in
the list ("ytkyk" and "ghjhg"). I would like to create a program
to read only the third colomn and print the line (row) if it
contains one of the latter items. Can anyone help me write a
program. Here is what I have so far and I would like to create a
more efficient program (I am going to use it for writing a larger
program later):

open( File, '<', 'file.txt' ) or die "$!\n";
while ( <File> ) {
next unless ( index($_, 'ytkyk') >= 0 );
next unless ( index($_, 'ghjhg') >= 0 );
print;
}
close( File );

### Try this untested code ###

#!/usr/bin/perl
use strict;
use warnings;

my ( $buffer , @fields , $filename , %hash1 );

$filename = "file.txt";
open(INPUT,"<$filename") or
die("Can't open file \"$filename\" : $!\n");

%hash1 = ( "ytkyk" => 1 , "ghjhg" => 1 );

while ( $buffer = <INPUT> ) {
chomp $buffer;
@fields = split(/\t+/,$buffer);
if ( 2 < @fields ) { # Ignore if less than 3 fields
next;
}
unless ( exists $hash1{$fields[2]} ) {
next;
}
print "$buffer\n";
}
close INPUT;

Would a hash creation and involving the regex engine (through split())
be more efficient? What would a benchmark result in?
 
J

John Bokma

Jeff said:
Because it was the *third* field that contained the string the OP is
searching for. Thus, skip any line that doesn't have enough fields.

Was there ever in the specification that there could be less than 4
fields? No.
There's nothing wrong with it. It's not "silly". There is a point to it.
It stops you from accidentally writing = instead of == if you mean to do a
comparison. Compare:

if ($foo = 2) { ... }

Found = in conditional, should be ==
The coder *meant* to write ==, but only did =. The first one is not an
error, and the if block is reached all the time. The second one IS an
error.

No, it's and error if your compiler, interpreter, etc doesn't *WARN*
you. And a programmer turning of those warnings is silly.

Most C, C++ compilers do warn, as does Perl (with use strict, use
warnings). It is IMNSHO a stupid coding convention, illogical,
unreadable, weird. Especially with *inequalities* as the prev post used.
 
E

Eric Bohlman

Found = in conditional, should be ==


No, it's and error if your compiler, interpreter, etc doesn't *WARN*
you. And a programmer turning of those warnings is silly.

Most C, C++ compilers do warn, as does Perl (with use strict, use
warnings). It is IMNSHO a stupid coding convention, illogical,
unreadable, weird. Especially with *inequalities* as the prev post
used.

Whether or not one adopts (or is forced by local coding standards to adopt)
that particular convention with regard to tests for equality, it's
ridiculously rigid to invert the sense of relational comparisons for no
other reason than "putting the constant on the left." That really smacks
of a failure to think abstractly leading to an inability to distinguish
means from ends. In this case, the means that *may* help to achieve the
end of making equality comparisons less error-prone winds up, when applied
blindly, making other kinds of comparisons *more* error-prone.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,045
Latest member
DRCM

Latest Threads

Top