sorting text

Discussion in 'Perl Misc' started by jamasd@hotmail.com, Jun 16, 2004.

  1. Guest

    Here is a sample of my data (each column is separated by tabs):

    1234123 jaesdf ytkyk 345234
    1264345 ghgfdf ghjhg 657658
    3456765 sdasdf ytkyk 456543
    1231232 assffg werwe 123454
    5447454 asdqfr ytkyk 254364

    I am interested in creating a hash with two of the elements in the
    list ("ytkyk" and "ghjhg"). I would like to create a program to read
    only the third colomn and print the line (row) if it contains one of
    the latter items. Can anyone help me write a program. Here is what I
    have so far and I would like to create a more efficient program (I am
    going to use it for writing a larger program later):

    open( File, '<', 'file.txt' ) or die "$!\n";
    while ( <File> ) {
    next unless ( index($_, 'ytkyk') >= 0 );
    next unless ( index($_, 'ghjhg') >= 0 );
    print;
    }
    close( File );

    Thank you very much.
     
    , Jun 16, 2004
    #1
    1. Advertising

  2. wrote:
    > Here is a sample of my data (each column is separated by tabs):
    >
    > 1234123 jaesdf ytkyk 345234
    > 1264345 ghgfdf ghjhg 657658
    > 3456765 sdasdf ytkyk 456543
    > 1231232 assffg werwe 123454
    > 5447454 asdqfr ytkyk 254364
    >
    > I am interested in creating a hash with two of the elements in the
    > list ("ytkyk" and "ghjhg"). I would like to create a program to read
    > only the third colomn and print the line (row) if it contains one of
    > the latter items. Can anyone help me write a program. Here is what I
    > have so far and I would like to create a more efficient program (I am
    > going to use it for writing a larger program later):
    >
    > open( File, '<', 'file.txt' ) or die "$!\n";
    > while ( <File> ) {
    > next unless ( index($_, 'ytkyk') >= 0 );
    > next unless ( index($_, 'ghjhg') >= 0 );
    > print;
    > }
    > close( File );


    What makes you believe that what you have is not efficient?

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Jun 16, 2004
    #2
    1. Advertising

  3. John Bokma Guest

    Gunnar Hjalmarsson wrote:

    > wrote:
    >
    >> Here is a sample of my data (each column is separated by tabs):
    >>
    >> 1234123 jaesdf ytkyk 345234
    >> 1264345 ghgfdf ghjhg 657658
    >> 3456765 sdasdf ytkyk 456543
    >> 1231232 assffg werwe 123454
    >> 5447454 asdqfr ytkyk 254364
    >>
    >> I am interested in creating a hash with two of the elements in the
    >> list ("ytkyk" and "ghjhg"). I would like to create a program to read
    >> only the third colomn and print the line (row) if it contains one of
    >> the latter items. Can anyone help me write a program. Here is what I
    >> have so far and I would like to create a more efficient program (I am
    >> going to use it for writing a larger program later):
    >>
    >> open( File, '<', 'file.txt' ) or die "$!\n";


    my $filename = 'file.txt';
    open my $fh, $filename or die "Can't open '$filename' for reading:$!";

    >> while ( <File> ) {


    while ( <$fh> ) {

    >> next unless ( index($_, 'ytkyk') >= 0 );

    next unless index($_, 'ytkyk');

    The >= 0 test can be replaced, since it's clear it's not the first
    position. Even better, (I guess) check the string at the exact position

    >> next unless ( index($_, 'ghjhg') >= 0 ); print;
    >> }
    >> close( File );


    close $fh or die "Can't close '$filename' after reading: $!";

    > What makes you believe that what you have is not efficient?


    Maybe the OP forgot to explain the "sorting" part :-D.

    --
    John MexIT: http://johnbokma.com/mexit/
    personal page: http://johnbokma.com/
    Experienced Perl programmer available: http://castleamber.com/
    Happy Customers: http://castleamber.com/testimonials.html
     
    John Bokma, Jun 16, 2004
    #3
  4. Web Surfer Guest

    [This followup was posted to comp.lang.perl.misc]

    In article <>,
    says...
    > Here is a sample of my data (each column is separated by tabs):
    >
    > 1234123 jaesdf ytkyk 345234
    > 1264345 ghgfdf ghjhg 657658
    > 3456765 sdasdf ytkyk 456543
    > 1231232 assffg werwe 123454
    > 5447454 asdqfr ytkyk 254364
    >
    > I am interested in creating a hash with two of the elements in the
    > list ("ytkyk" and "ghjhg"). I would like to create a program to read
    > only the third colomn and print the line (row) if it contains one of
    > the latter items. Can anyone help me write a program. Here is what I
    > have so far and I would like to create a more efficient program (I am
    > going to use it for writing a larger program later):
    >
    > open( File, '<', 'file.txt' ) or die "$!\n";
    > while ( <File> ) {
    > next unless ( index($_, 'ytkyk') >= 0 );
    > next unless ( index($_, 'ghjhg') >= 0 );
    > print;
    > }
    > close( File );
    >
    > Thank you very much.
    >


    ### Try this untested code ###

    #!/usr/bin/perl
    use strict;
    use warnings;

    my ( $buffer , @fields , $filename , %hash1 );

    $filename = "file.txt";
    open(INPUT,"<$filename") or
    die("Can't open file \"$filename\" : $!\n");

    %hash1 = ( "ytkyk" => 1 , "ghjhg" => 1 );

    while ( $buffer = <INPUT> ) {
    chomp $buffer;
    @fields = split(/\t+/,$buffer);
    if ( 2 < @fields ) { # Ignore if less than 3 fields
    next;
    }
    unless ( exists $hash1{$fields[2]} ) {
    next;
    }
    print "$buffer\n";
    }
    close INPUT;
     
    Web Surfer, Jun 16, 2004
    #4
  5. John Bokma Guest

    Web Surfer wrote:

    > [This followup was posted to comp.lang.perl.misc]
    >
    > In article <>,
    > says...
    >
    >>Here is a sample of my data (each column is separated by tabs):
    >>
    >>1234123 jaesdf ytkyk 345234


    > while ( $buffer = <INPUT> ) {
    > chomp $buffer;


    why?, now you have to add back the \n in the print

    > @fields = split(/\t+/,$buffer);
    > if ( 2 < @fields ) { # Ignore if less than 3 fields
    > next;


    silly, the OP never specified that could happen. It are 4 fields btw, so
    I would test for inequality, not less than..
    Don't see any point in putting the constant to the left, btw. Silly C
    coding convention IIRC.

    --
    John MexIT: http://johnbokma.com/mexit/
    personal page: http://johnbokma.com/
    Experienced Perl programmer available: http://castleamber.com/
    Happy Customers: http://castleamber.com/testimonials.html
     
    John Bokma, Jun 16, 2004
    #5
  6. On Wed, 16 Jun 2004, John Bokma wrote:

    >Web Surfer wrote:
    >
    >> if ( 2 < @fields ) { # Ignore if less than 3 fields

    >
    >silly, the OP never specified that could happen. It are 4 fields btw, so
    >I would test for inequality, not less than..


    Because it was the *third* field that contained the string the OP is
    searching for. Thus, skip any line that doesn't have enough fields.

    >Don't see any point in putting the constant to the left, btw. Silly C
    >coding convention IIRC.


    There's nothing wrong with it. It's not "silly". There is a point to it.
    It stops you from accidentally writing = instead of == if you mean to do a
    comparison. Compare:

    if ($foo = 2) { ... }

    to

    if (2 = $foo) { ... }

    The coder *meant* to write ==, but only did =. The first one is not an
    error, and the if block is reached all the time. The second one IS an
    error.

    --
    Jeff Pinyan RPI Acacia Brother #734 RPI Acacia Corp Secretary
    "And I vos head of Gestapo for ten | Michael Palin (as Heinrich Bimmler)
    years. Ah! Five years! Nein! No! | in: The North Minehead Bye-Election
    Oh. Was NOT head of Gestapo AT ALL!" | (Monty Python's Flying Circus)
     
    Jeff 'japhy' Pinyan, Jun 16, 2004
    #6
  7. John Bokma wrote:
    > Gunnar Hjalmarsson wrote:
    >> wrote:
    >>> Here is a sample of my data (each column is separated by tabs):
    >>>
    >>>
    >>> 1234123 jaesdf ytkyk 345234
    >>> 1264345 ghgfdf ghjhg 657658
    >>> 3456765 sdasdf ytkyk 456543
    >>> 1231232 assffg werwe 123454
    >>> 5447454 asdqfr ytkyk 254364
    >>>
    >>> I am interested in creating a hash with two of the elements in
    >>> the list ("ytkyk" and "ghjhg"). I would like to create a
    >>> program to read only the third colomn and print the line (row)
    >>> if it contains one of the latter items. Can anyone help me
    >>> write a program. Here is what I have so far and I would like to
    >>> create a more efficient program (I am going to use it for
    >>> writing a larger program later):


    <snip>

    >>> next unless ( index($_, 'ytkyk') >= 0 );

    >
    > next unless index($_, 'ytkyk');
    >
    > The >= 0 test can be replaced, since it's clear it's not the first
    > position.


    No, it can't. If the string is not found in $_, index() returns -1
    which is a true value.

    >> What makes you believe that what you have is not efficient?

    >
    > Maybe the OP forgot to explain the "sorting" part :-D.


    Maybe. But it just struck me that the code will not print anything. I
    would believe that this is what the OP meant to do:

    while ( <File> ) {
    print and next if index($_, 'ytkyk') >= 0;
    print and next if index($_, 'ghjhg') >= 0;
    }

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Jun 16, 2004
    #7
  8. Web Surfer wrote:
    > says:
    >> Here is a sample of my data (each column is separated by tabs):
    >>
    >> 1234123 jaesdf ytkyk 345234
    >> 1264345 ghgfdf ghjhg 657658
    >> 3456765 sdasdf ytkyk 456543
    >> 1231232 assffg werwe 123454
    >> 5447454 asdqfr ytkyk 254364
    >>
    >> I am interested in creating a hash with two of the elements in
    >> the list ("ytkyk" and "ghjhg"). I would like to create a program
    >> to read only the third colomn and print the line (row) if it
    >> contains one of the latter items. Can anyone help me write a
    >> program. Here is what I have so far and I would like to create a
    >> more efficient program (I am going to use it for writing a larger
    >> program later):
    >>
    >> open( File, '<', 'file.txt' ) or die "$!\n";
    >> while ( <File> ) {
    >> next unless ( index($_, 'ytkyk') >= 0 );
    >> next unless ( index($_, 'ghjhg') >= 0 );
    >> print;
    >> }
    >> close( File );

    >
    > ### Try this untested code ###
    >
    > #!/usr/bin/perl
    > use strict;
    > use warnings;
    >
    > my ( $buffer , @fields , $filename , %hash1 );
    >
    > $filename = "file.txt";
    > open(INPUT,"<$filename") or
    > die("Can't open file \"$filename\" : $!\n");
    >
    > %hash1 = ( "ytkyk" => 1 , "ghjhg" => 1 );
    >
    > while ( $buffer = <INPUT> ) {
    > chomp $buffer;
    > @fields = split(/\t+/,$buffer);
    > if ( 2 < @fields ) { # Ignore if less than 3 fields
    > next;
    > }
    > unless ( exists $hash1{$fields[2]} ) {
    > next;
    > }
    > print "$buffer\n";
    > }
    > close INPUT;


    Would a hash creation and involving the regex engine (through split())
    be more efficient? What would a benchmark result in?

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Jun 16, 2004
    #8
  9. John Bokma Guest

    Jeff 'japhy' Pinyan wrote:
    > On Wed, 16 Jun 2004, John Bokma wrote:
    >
    >
    >>Web Surfer wrote:
    >>
    >>
    >>> if ( 2 < @fields ) { # Ignore if less than 3 fields

    >>
    >>silly, the OP never specified that could happen. It are 4 fields btw, so
    >>I would test for inequality, not less than..

    >
    >
    > Because it was the *third* field that contained the string the OP is
    > searching for. Thus, skip any line that doesn't have enough fields.


    Was there ever in the specification that there could be less than 4
    fields? No.

    >>Don't see any point in putting the constant to the left, btw. Silly C
    >>coding convention IIRC.

    >
    > There's nothing wrong with it. It's not "silly". There is a point to it.
    > It stops you from accidentally writing = instead of == if you mean to do a
    > comparison. Compare:
    >
    > if ($foo = 2) { ... }


    Found = in conditional, should be ==

    > The coder *meant* to write ==, but only did =. The first one is not an
    > error, and the if block is reached all the time. The second one IS an
    > error.


    No, it's and error if your compiler, interpreter, etc doesn't *WARN*
    you. And a programmer turning of those warnings is silly.

    Most C, C++ compilers do warn, as does Perl (with use strict, use
    warnings). It is IMNSHO a stupid coding convention, illogical,
    unreadable, weird. Especially with *inequalities* as the prev post used.

    --
    John MexIT: http://johnbokma.com/mexit/
    personal page: http://johnbokma.com/
    Experienced Perl programmer available: http://castleamber.com/
    Happy Customers: http://castleamber.com/testimonials.html
     
    John Bokma, Jun 17, 2004
    #9
  10. John Bokma Guest

    John Bokma, Jun 17, 2004
    #10
  11. Eric Bohlman Guest

    John Bokma <> wrote in
    news:40d0fc45$0$205$:

    >>>Don't see any point in putting the constant to the left, btw. Silly C
    >>>coding convention IIRC.

    >>
    >> There's nothing wrong with it. It's not "silly". There is a point
    >> to it. It stops you from accidentally writing = instead of == if you
    >> mean to do a comparison. Compare:
    >>
    >> if ($foo = 2) { ... }

    >
    > Found = in conditional, should be ==
    >
    >> The coder *meant* to write ==, but only did =. The first one is not
    >> an error, and the if block is reached all the time. The second one
    >> IS an error.

    >
    > No, it's and error if your compiler, interpreter, etc doesn't *WARN*
    > you. And a programmer turning of those warnings is silly.
    >
    > Most C, C++ compilers do warn, as does Perl (with use strict, use
    > warnings). It is IMNSHO a stupid coding convention, illogical,
    > unreadable, weird. Especially with *inequalities* as the prev post
    > used.


    Whether or not one adopts (or is forced by local coding standards to adopt)
    that particular convention with regard to tests for equality, it's
    ridiculously rigid to invert the sense of relational comparisons for no
    other reason than "putting the constant on the left." That really smacks
    of a failure to think abstractly leading to an inability to distinguish
    means from ends. In this case, the means that *may* help to achieve the
    end of making equality comparisons less error-prone winds up, when applied
    blindly, making other kinds of comparisons *more* error-prone.
     
    Eric Bohlman, Jun 18, 2004
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    2
    Views:
    1,476
    James Kanze
    Jul 6, 2010
  2. Jason
    Replies:
    0
    Views:
    405
    Jason
    Oct 4, 2006
  3. leo
    Replies:
    1
    Views:
    305
    Bob Lehmann
    Dec 5, 2005
  4. Tom Kirchner

    sorting by multiple criterias (sub-sorting)

    Tom Kirchner, Oct 11, 2003, in forum: Perl Misc
    Replies:
    3
    Views:
    500
    Michael Budash
    Oct 11, 2003
  5. Íéêüëáïò Êïýñáò

    Sorting a set works, sorting a dictionary fails ?

    Íéêüëáïò Êïýñáò, Jun 10, 2013, in forum: Python
    Replies:
    12
    Views:
    168
    Ulrich Eckhardt
    Jun 10, 2013
Loading...

Share This Page