Removing lines containing same first string boundaries?

Discussion in 'Perl Misc' started by Tuxedo, Mar 17, 2014.

  1. Tuxedo

    Tuxedo Guest

    I have a plain text file with each line in the format:

    Start of line followed immediately by a string of character(s), a
    whitespace, another string, a newline.

    -------- file.txt -------

    SOMESTRING XXX
    SOMESTRING ZZZ
    SOMEOTHERSTRING YYYZZ23
    DIFFERENTSTRING HELLO

    -----------

    I would like to output each line that contains a string of a first
    character sequence but not repeat any line(s) with the same string as a
    first character sequence that appear further down the file. The output of
    running a perl procedure against the above file would then be:

    SOMESTRING XXX
    SOMEOTHERSTRING YYYZZ23
    DIFFERENTSTRING HELLO

    In other words, no repetition should occur of any first word boundary on
    each line in case the sequence happens to reappear on other line(s) as a
    first character boundary before each line's first whitespace.

    Alternatively, if given a parameter such as '^SOMESTRING' the output
    against the file would be narrowed down to:

    SOMESTRING XXX

    The second character string boundary (XXX) after the whitespace is
    arbitrary and should not affect the result but can be included in the
    output even if it happens to match SOMESTRING. So the output becomes the
    first occurence of ^SOMESTRING plus the remaining characters on the same
    line up until newline.

    Or if for example '^SOME' is passed as a parameter, the result would be:

    SOMESTRING XXX
    SOMEOTHERSTRING YYYZZ23

    In which ways can this be done efficiently in Perl?

    Many thanks for any ideas.

    Tuxedo
    Tuxedo, Mar 17, 2014
    #1
    1. Advertising

  2. Tuxedo

    Tuxedo Guest

    Jürgen Exner wrote:

    [...]

    > What have you tried so far? Where are you stuck? What doesn't work as
    > expected?


    Nothing so far, as I didn't try anything yet.

    > I would simply use split() to isolate the leading word and then use a
    > hash to track which words have already been printed.


    [...]

    > perldoc -f grep


    Thanks for the above pointers. I will look into these.

    Tuxedo
    Tuxedo, Mar 17, 2014
    #2
    1. Advertising

  3. Tuxedo

    John Black Guest

    In article <lg7eo7$7kc$>, says...
    >
    > I have a plain text file with each line in the format:
    >
    > Start of line followed immediately by a string of character(s), a
    > whitespace, another string, a newline.
    >
    > -------- file.txt -------
    >
    > SOMESTRING XXX
    > SOMESTRING ZZZ
    > SOMEOTHERSTRING YYYZZ23
    > DIFFERENTSTRING HELLO
    >
    > -----------
    >
    > I would like to output each line that contains a string of a first
    > character sequence but not repeat any line(s) with the same string as a
    > first character sequence that appear further down the file. The output of
    > running a perl procedure against the above file would then be:
    >
    > SOMESTRING XXX
    > SOMEOTHERSTRING YYYZZ23
    > DIFFERENTSTRING HELLO
    >
    > In other words, no repetition should occur of any first word boundary on
    > each line in case the sequence happens to reappear on other line(s) as a
    > first character boundary before each line's first whitespace.
    >
    > Alternatively, if given a parameter such as '^SOMESTRING' the output
    > against the file would be narrowed down to:
    >
    > SOMESTRING XXX
    >
    > The second character string boundary (XXX) after the whitespace is
    > arbitrary and should not affect the result but can be included in the
    > output even if it happens to match SOMESTRING. So the output becomes the
    > first occurence of ^SOMESTRING plus the remaining characters on the same
    > line up until newline.
    >
    > Or if for example '^SOME' is passed as a parameter, the result would be:
    >
    > SOMESTRING XXX
    > SOMEOTHERSTRING YYYZZ23
    >
    > In which ways can this be done efficiently in Perl?
    >
    > Many thanks for any ideas.
    >
    > Tuxedo


    Just use a regex to match each line to your input parameter.

    if ($line =~ /($param\w*)\s+.*/) { # If the first string matches input parameter
    $string1 = $1; # Extract string1 from the current line

    if (grep {$string1 ne $_} @matched_array { # If string1 has not been seen before
    print $line; # Print the line
    push (@matched_array, $string1); # Put the string in the matched array
    }
    }

    I have not tested this for errors but that is what I would start with. Not shown is the loop
    to grab each line of the file into $line.

    John Black
    John Black, Mar 17, 2014
    #3
  4. Tuxedo <> writes:
    > I have a plain text file with each line in the format:
    >
    > Start of line followed immediately by a string of character(s), a
    > whitespace, another string, a newline.
    >
    > -------- file.txt -------
    >
    > SOMESTRING XXX
    > SOMESTRING ZZZ
    > SOMEOTHERSTRING YYYZZ23
    > DIFFERENTSTRING HELLO
    >
    > -----------
    >
    > I would like to output each line that contains a string of a first
    > character sequence but not repeat any line(s) with the same string as a
    > first character sequence that appear further down the file.


    [...]

    > Alternatively, if given a parameter such as '^SOMESTRING' the output
    > against the file would be narrowed down to:
    >
    > SOMESTRING XXX
    >
    > The second character string boundary (XXX) after the whitespace is
    > arbitrary and should not affect the result


    [...]

    -----------
    my (%seen, $filter, $tag);

    $filter = $ARGV[0] // '.';

    while (<STDIN>) {
    ($tag) = /^(\S+)/ or next;

    next if $seen{$tag} || !/$filter/o;

    $seen{$tag} = 1;
    print;
    }
    -----------

    NB: This will silently ignore lines which don't start with a sequence of
    non-whitespace characters.
    Rainer Weikusat, Mar 17, 2014
    #4
  5. John Black <> writes:
    > In article <lg7eo7$7kc$>, says...
    >> I have a plain text file with each line in the format:
    >>
    >> Start of line followed immediately by a string of character(s), a
    >> whitespace, another string, a newline.
    >>
    >> -------- file.txt -------
    >>
    >> SOMESTRING XXX
    >> SOMESTRING ZZZ
    >> SOMEOTHERSTRING YYYZZ23
    >> DIFFERENTSTRING HELLO
    >>
    >> -----------
    >>
    >> I would like to output each line that contains a string of a first
    >> character sequence but not repeat any line(s) with the same string as a
    >> first character sequence


    [...]

    >> Alternatively, if given a parameter such as '^SOMESTRING' the output
    >> against the file would be narrowed down to:
    >>
    >> SOMESTRING XXX


    [...]

    > Just use a regex to match each line to your input parameter.
    >
    > if ($line =~ /($param\w*)\s+.*/) { # If the first string matches input parameter
    > $string1 = $1; # Extract string1 from the current line
    >
    > if (grep {$string1 ne $_} @matched_array { # If string1 has not been seen before
    > print $line; # Print the line
    > push (@matched_array, $string1); # Put the string in the matched array
    > }
    > }


    This is atrociously inefficient/ unscalable as the running time of the
    algorithm is proportional to the square of the number of lines which are
    printed.
    Rainer Weikusat, Mar 17, 2014
    #5
  6. >>>>> "HL" == Henry Law <> writes:

    HL> On 17/03/14 18:26, Tuxedo wrote:
    >> I would like to output each line that contains


    HL> Strong smell of homework assignment here ...

    Indeed. Highly detailed and precise specification combined with "I
    haven't tried anything yet...."

    Tuxedo, go and try. Come back when you have a specific, concrete
    question that isn't "Solve my problem for me."

    Charlton

    --
    Charlton Wilbur
    Charlton Wilbur, Mar 17, 2014
    #6
  7. Tuxedo <> writes:
    > Rainer Weikusat wrote:
    >
    > [...]
    >
    >> -----------
    >> my (%seen, $filter, $tag);
    >>
    >> $filter = $ARGV[0] // '.';
    >>
    >> while (<STDIN>) {
    >> ($tag) = /^(\S+)/ or next;
    >>
    >> next if $seen{$tag} || !/$filter/o;
    >>
    >> $seen{$tag} = 1;
    >> print;
    >> }
    >> -----------
    >>
    >> NB: This will silently ignore lines which don't start with a sequence of
    >> non-whitespace characters.

    >
    > Many thanks for the above procedure.
    >
    > How exactly should it be run against a file and keyword parameter?


    It expects data to process on stdin and a keyword, if any, as first
    argument, eg, assuming that the script text is in a file named a.pl and
    the test text you posted in a file named text,

    [rw@sable]/tmp#perl a.pl SOME <text
    SOMESTRING XXX
    SOMEOTHERSTRING YYYZZ23

    NB: while (<>) doesn't localize $_ implicitly. If this is supposed to be
    part of a larger program, a

    local $_;

    might need to be added in a suitable scope to avoid overwriting someone
    else's $_.
    Rainer Weikusat, Mar 17, 2014
    #7
  8. Tuxedo

    John Black Guest

    In article <>, says...
    >
    > John Black <> wrote:
    >
    > > if (grep {$string1 ne $_} @matched_array { # If string1 has not been seen before
    > > print $line; # Print the line
    > > push (@matched_array, $string1); # Put the string in the matched array
    > > }

    >
    > Why are you using an O(n^2) algorithm when the same can be achieved with
    > an O(n) algorithm using a hash, which on top of everything else is even
    > simpler to write?
    >
    > jue


    Are you saying that a lookup for a key in a hash array does not have to search the hash for a
    key match in a similar way to how grep is searching an array?

    John Black
    John Black, Mar 17, 2014
    #8
  9. On 2014-03-17, John Black wrote:
    > In article <>, says...
    >>
    >> John Black <> wrote:
    >>
    >> > if (grep {$string1 ne $_} @matched_array { # If string1 has not been seen before
    >> > print $line; # Print the line
    >> > push (@matched_array, $string1); # Put the string in the matched array
    >> > }

    >>
    >> Why are you using an O(n^2) algorithm when the same can be achieved with
    >> an O(n) algorithm using a hash, which on top of everything else is even
    >> simpler to write?
    >>
    >> jue

    >
    > Are you saying that a lookup for a key in a hash array does not have to search the hash for a
    > key match in a similar way to how grep is searching an array?
    >
    > John Black


    Yes. The idea of a hash is that it can transform (aka hash) the key
    (an O(1) operation) into something like an array index, which allows
    an index-based lookup (another O(1) operation).

    The downside of using a hash is memory overhead in setting up the
    hash table.
    Randy Westlund, Mar 17, 2014
    #9
  10. John Black <> writes:
    > In article <>, says...
    >>
    >> John Black <> wrote:
    >>
    >> > if (grep {$string1 ne $_} @matched_array { # If string1 has not been seen before
    >> > print $line; # Print the line
    >> > push (@matched_array, $string1); # Put the string in the matched array
    >> > }

    >>
    >> Why are you using an O(n^2) algorithm when the same can be achieved with
    >> an O(n) algorithm using a hash, which on top of everything else is even
    >> simpler to write?
    >>
    >> jue

    >
    > Are you saying that a lookup for a key in a hash array does not have
    > to search the hash for a key match in a similar way to how grep is
    > searching an array?


    Of course not. A 'hash' is not an array. It's an associative array
    implemented as hash table with separate chaining. This means it is an
    array of pointers to linked lists containing the actual entries. When
    searching for a key, the key is transformed to a number by using it as
    input for the so-called 'hash function' and 'truncating' the resulting
    number down to the current size of the list pointer array via
    mod. Assuming that H(key) is the hash function, an array index is
    calculated as

    ndx = H(key) % @hash_array

    Nowadays, using hash arrays whose sizes are powers of two is common
    because the modulo-operation can then be performed with a binary
    and. The list $hash_array[ndx] points to is then searched for the actual
    key. In ideal conditions (when the table is large enough), all list
    sizes will be 1 which means the lookup requires only a single string
    comparison. Even if the list has a number of entries (because of
    so-called 'hash collisions'), it will still be much shorter than a list
    containing all key/value-pairs in the hash if a 'reasonable' hash
    function is used (it is supposed to produce a 'uniform' distribution of
    values).
    Rainer Weikusat, Mar 17, 2014
    #10
  11. Tuxedo

    Tuxedo Guest

    Rainer Weikusat wrote:

    [...]

    > It expects data to process on stdin and a keyword, if any, as first
    > argument, eg, assuming that the script text is in a file named a.pl and
    > the test text you posted in a file named text,
    >
    > [rw@sable]/tmp#perl a.pl SOME <text
    > SOMESTRING XXX
    > SOMEOTHERSTRING YYYZZ23
    >
    > NB: while (<>) doesn't localize $_ implicitly. If this is supposed to be
    > part of a larger program, a
    >
    > local $_;
    >
    > might need to be added in a suitable scope to avoid overwriting someone
    > else's $_.


    Many thanks for the detailed how-to! I will test.

    Tuxedo
    Tuxedo, Mar 17, 2014
    #11
  12. Rainer Weikusat <> writes:

    [...]

    > -----------
    > my (%seen, $filter, $tag);
    >
    > $filter = $ARGV[0] // '.';
    >
    > while (<STDIN>) {
    > ($tag) = /^(\S+)/ or next;
    >
    > next if $seen{$tag} || !/$filter/o;
    >
    > $seen{$tag} = 1;
    > print;
    > }
    > -----------
    >
    > NB: This will silently ignore lines which don't start with a sequence of
    > non-whitespace characters.


    There are a couple of more possible gotchas with this, in particular,

    - the /o flag means the regex will be compiled exactly once per program
    run. This might become an issue in conditions where the value of
    $filter can change, ie, if thas was put into a subroutine.

    $filter = qr/$filter/

    could be used to pre-compile the current $filter in such cases.

    - the filter match is performed on the whole input line, not only on the
    tag, meaning, it will match in the 2nd and subsequent fields if not
    anchored to the beginning of the line.

    $tag !~ /$filter/

    could be used instead.
    Rainer Weikusat, Mar 17, 2014
    #12
  13. Jürgen Exner <> writes:
    > John Black <> wrote:
    >
    >>Are you saying that a lookup for a key in a hash array does not have to search the hash for a
    >>key match in a similar way to how grep is searching an array?

    >
    > Yes. That's why it's called a hash in the first place. Accessing a hash
    > element is typically O(1).


    This is a little too simplistic. The the amount of work necessarily to
    locate a random key which is stored in the hash is proportional to the
    average length of each used hash chain divided by 2 and the time to
    determine that some key is not in the hash is proportional to the
    average length of a hash chain. Assuming a 'good' hash function is being
    used (which also covers the case of 'degenerated' key sets better known
    as 'algorithmic complexity attack), the average length of a hash chain
    should be the number of keys in the hash divided by the number of entries
    in the array holding the chain pointers. That still proportional to the
    number of entries in the hash, aka O(n), except that it is possible to
    make the table sufficiently large to ensure that the average length of a
    hash chain will be small.

    As an example:

    [rw@sable]~#perl -MDevel::peek -e '%a = 'aaaa' .. 'zzzz'; Dump(\%a)'
    SV = RV(0x817b824) at 0x817b818
    REFCNT = 1
    FLAGS = (TEMP,ROK)
    RV = 0x8195d28
    SV = PVHV(0x8180984) at 0x8195d28
    REFCNT = 2
    FLAGS = (SHAREKEYS)
    ARRAY = 0xb6807008 (0:110767, 1:94524, 2:40904, 3:12427, 4:2864, 5:545, 6:99, 7:12, 8:2)
    hash quality = 98.6%
    KEYS = 228488
    FILL = 151377
    MAX = 262143
    RITER = -1
    EITER = 0x0
    Elt "utei" HASH = 0x8ccc0001
    SV = PV(0xaffa700) at 0xb70b0bf8
    REFCNT = 1
    FLAGS = (POK,pPOK)
    PV = 0xb000720 "utej"\0
    CUR = 4
    LEN = 8
    Elt "vqmw" HASH = 0xe2100002
    SV = PV(0xb07d430) at 0xb70cf8c8
    REFCNT = 1
    FLAGS = (POK,pPOK)
    PV = 0xb0843d0 "vqmx"\0
    CUR = 4
    LEN = 8
    Elt "yehi" HASH = 0x8d680003
    SV = PV(0xb22c828) at 0xb230360
    REFCNT = 1
    FLAGS = (POK,pPOK)
    PV = 0xb233fc0 "yehj"\0
    CUR = 4
    LEN = 8

    There are 228,488 keys in the hash and the chain pointer table has a
    size of 262,143 which means the average length of a chain should be
    about 0.87. However, only 151,377 table entries are in use so
    'average chain length' for the 'locate key which exists' case is about
    1.5 and about 42% of the available table slots are unused (the 'ARRAY ='
    line shows the actual distribution of chain lengths).

    Assuming the key set is small, there's also the problem that calculating
    the hash value is usually more expensive than comparing two keys and
    locating an item on a linked-list is more expensive than locating one in
    an array.

    The general idea behind 'using a hash' is that determining
    exactly when this become better than doing linear searches in an array
    isn't worth the effort for a small number of entries and that hash
    lookups will very likely perform better than most linear searches as the
    key set gets larger (at the expense of wasting a possibly 'large' amount
    of memory).
    Rainer Weikusat, Mar 18, 2014
    #13
  14. Martijn Lievaart <> writes:

    > On Mon, 17 Mar 2014 13:50:25 -0700, Jürgen Exner wrote:
    >
    >> John Black <> wrote:
    >>
    >>>Are you saying that a lookup for a key in a hash array does not have to
    >>>search the hash for a key match in a similar way to how grep is
    >>>searching an array?

    >>
    >> Yes. That's why it's called a hash in the first place. Accessing a hash
    >> element is typically O(1).
    >> Occasionally it may be a little bit worse, and in very extreme cases it
    >> could even be O(n). But those extreme cases are usually artificial and
    >> don't happen in real life, at last not with a reasonable implementation
    >> of the hashing algorithm.

    >
    > And Perl has a very "reasonable" implementation, much effort has been put
    > into that (if only to defeat DOS-attacks).


    This is similar to the rotating hash, but it actually mixes the
    internal state. It takes 9n+9 instructions and produces a full
    4-byte result. Preliminary analysis suggests there are no
    funnels.

    This hash was not in the original Dr. Dobb's article. I
    implemented it to fill a set of requirements posed by Colin
    Plumb. Colin ended up using an even simpler (and weaker) hash
    that was sufficient for his purpose.

    http://burtleburtle.net/bob/hash/doobs.html

    [SCNR]
    Rainer Weikusat, Mar 18, 2014
    #14
  15. Tuxedo

    Kaz Kylheku Guest

    On 2014-03-17, Tuxedo <> wrote:
    > I have a plain text file with each line in the format:
    >
    > Start of line followed immediately by a string of character(s), a
    > whitespace, another string, a newline.
    >
    > -------- file.txt -------
    >
    > SOMESTRING XXX
    > SOMESTRING ZZZ
    > SOMEOTHERSTRING YYYZZ23
    > DIFFERENTSTRING HELLO


    This can be implemented using a very simple, clear on-liner in awk, right from
    your shell prompt.

    The lines marked <- are my tty input; the others are awk output:

    $ awk '{ if (! ($1 in seen)) { print $0 ; seen[$1] } }'
    A B <-
    A B
    A C <-
    A D <-
    D 1 <-
    D 1
    D 2 <-
    D 3 <-
    A Z <-
    Kaz Kylheku, Mar 18, 2014
    #15
  16. Kaz Kylheku <> writes:
    > On 2014-03-17, Tuxedo <> wrote:
    >> I have a plain text file with each line in the format:
    >>
    >> Start of line followed immediately by a string of character(s), a
    >> whitespace, another string, a newline.
    >>
    >> -------- file.txt -------
    >>
    >> SOMESTRING XXX
    >> SOMESTRING ZZZ
    >> SOMEOTHERSTRING YYYZZ23
    >> DIFFERENTSTRING HELLO

    >
    > This can be implemented using a very simple, clear on-liner in awk, right from
    > your shell prompt.
    >
    > The lines marked <- are my tty input; the others are awk output:
    >
    > $ awk '{ if (! ($1 in seen)) { print $0 ; seen[$1] } }'


    perl -ane 'print, $seen{$F[0]} = 1 unless $seen{$F[0]}'
    Rainer Weikusat, Mar 18, 2014
    #16
  17. Tuxedo

    Tim McDaniel Guest

    In article <>,
    Rainer Weikusat <> wrote:
    >Kaz Kylheku <> writes:
    >> On 2014-03-17, Tuxedo <> wrote:
    >>> I have a plain text file with each line in the format:
    >>>
    >>> Start of line followed immediately by a string of character(s), a
    >>> whitespace, another string, a newline.
    >>>
    >>> -------- file.txt -------
    >>>
    >>> SOMESTRING XXX
    >>> SOMESTRING ZZZ
    >>> SOMEOTHERSTRING YYYZZ23
    >>> DIFFERENTSTRING HELLO

    >>
    >> This can be implemented using a very simple, clear on-liner in awk, right from
    >> your shell prompt.
    >>
    >> The lines marked <- are my tty input; the others are awk output:
    >>
    >> $ awk '{ if (! ($1 in seen)) { print $0 ; seen[$1] } }'

    >
    >perl -ane 'print, $seen{$F[0]} = 1 unless $seen{$F[0]}'


    perl -ane 'print if !$seen{$F[0]}++'

    perl -ane '$seen{$F[0]}++ || print'

    --
    Tim McDaniel,
    Tim McDaniel, Mar 18, 2014
    #17
  18. (Tim McDaniel) writes:
    > In article <>,
    > Rainer Weikusat <> wrote:
    >>Kaz Kylheku <> writes:
    >>> On 2014-03-17, Tuxedo <> wrote:
    >>>> I have a plain text file with each line in the format:
    >>>>
    >>>> Start of line followed immediately by a string of character(s), a
    >>>> whitespace, another string, a newline.
    >>>>
    >>>> -------- file.txt -------
    >>>>
    >>>> SOMESTRING XXX
    >>>> SOMESTRING ZZZ
    >>>> SOMEOTHERSTRING YYYZZ23
    >>>> DIFFERENTSTRING HELLO
    >>>
    >>> This can be implemented using a very simple, clear on-liner in awk, right from
    >>> your shell prompt.
    >>>
    >>> The lines marked <- are my tty input; the others are awk output:
    >>>
    >>> $ awk '{ if (! ($1 in seen)) { print $0 ; seen[$1] } }'

    >>
    >>perl -ane 'print, $seen{$F[0]} = 1 unless $seen{$F[0]}'

    >
    > perl -ane 'print if !$seen{$F[0]}++'
    >
    > perl -ane '$seen{$F[0]}++ || print'


    Well, there are quite a few more more-or-less bizarre variants, eg,

    perl -ape '$_=$seen{$F[0]}++?"":$_'

    or

    perl -ape '$seen{$F[0]}++&&undef$_'
    Rainer Weikusat, Mar 18, 2014
    #18
  19. Tuxedo

    Kaz Kylheku Guest

    On 2014-03-18, Tim McDaniel <> wrote:
    > In article <>,
    > Rainer Weikusat <> wrote:
    >>Kaz Kylheku <> writes:
    >>> On 2014-03-17, Tuxedo <> wrote:
    >>>> I have a plain text file with each line in the format:
    >>>>
    >>>> Start of line followed immediately by a string of character(s), a
    >>>> whitespace, another string, a newline.
    >>>>
    >>>> -------- file.txt -------
    >>>>
    >>>> SOMESTRING XXX
    >>>> SOMESTRING ZZZ
    >>>> SOMEOTHERSTRING YYYZZ23
    >>>> DIFFERENTSTRING HELLO
    >>>
    >>> This can be implemented using a very simple, clear on-liner in awk, right from
    >>> your shell prompt.
    >>>
    >>> The lines marked <- are my tty input; the others are awk output:
    >>>
    >>> $ awk '{ if (! ($1 in seen)) { print $0 ; seen[$1] } }'

    >>
    >>perl -ane 'print, $seen{$F[0]} = 1 unless $seen{$F[0]}'

    >
    > perl -ane 'print if !$seen{$F[0]}++'
    >
    > perl -ane '$seen{$F[0]}++ || print'


    gawk '{if(!seen[$1]++)print}'

    (GNU Awk has bignums, which takes care of the overflow.)
    Kaz Kylheku, Mar 18, 2014
    #19
  20. Rainer Weikusat <> writes:
    > (Tim McDaniel) writes:
    >> In article <>,
    >> Rainer Weikusat <> wrote:
    >>>Kaz Kylheku <> writes:
    >>>> On 2014-03-17, Tuxedo <> wrote:
    >>>>> I have a plain text file with each line in the format:
    >>>>>
    >>>>> Start of line followed immediately by a string of character(s), a
    >>>>> whitespace, another string, a newline.
    >>>>>
    >>>>> -------- file.txt -------
    >>>>>
    >>>>> SOMESTRING XXX
    >>>>> SOMESTRING ZZZ
    >>>>> SOMEOTHERSTRING YYYZZ23
    >>>>> DIFFERENTSTRING HELLO
    >>>>
    >>>> This can be implemented using a very simple, clear on-liner in awk, right from
    >>>> your shell prompt.
    >>>>
    >>>> The lines marked <- are my tty input; the others are awk output:
    >>>>
    >>>> $ awk '{ if (! ($1 in seen)) { print $0 ; seen[$1] } }'
    >>>
    >>>perl -ane 'print, $seen{$F[0]} = 1 unless $seen{$F[0]}'

    >>
    >> perl -ane 'print if !$seen{$F[0]}++'
    >>
    >> perl -ane '$seen{$F[0]}++ || print'

    >
    > Well, there are quite a few more more-or-less bizarre variants, eg,
    >
    > perl -ape '$_=$seen{$F[0]}++?"":$_'
    >
    > or
    >
    > perl -ape '$seen{$F[0]}++&&undef$_'


    Coming to think of that, no self-respecting quibbler would ever use a
    hash named %seen. And this auto-split thing is also much too
    straight-forward. So what about

    perl -pe '$£{(/(\S+)/)[0]}++&&undef$_'

    ?
    Rainer Weikusat, Mar 18, 2014
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    1
    Views:
    597
    Craig Deelsnyder
    Oct 25, 2003
  2. Chris  Chiasson
    Replies:
    6
    Views:
    615
    Richard Tobin
    Nov 14, 2006
  3. Replies:
    1
    Views:
    447
    Jonathan Mcdougall
    Dec 6, 2005
  4. Umesh
    Replies:
    8
    Views:
    459
    Charlton Wilbur
    May 30, 2007
  5. Ò»Ê×Ê«

    Passing string over DLL boundaries.

    Ò»Ê×Ê«, Aug 24, 2008, in forum: C++
    Replies:
    4
    Views:
    416
    Juha Nieminen
    Aug 24, 2008
Loading...

Share This Page