Removing lines containing same first string boundaries?

J

John Bokma

Rainer Weikusat said:
Coming to think of that, no self-respecting quibbler would ever use a
hash named %seen. And this auto-split thing is also much too
straight-forward. So what about

perl -pe '$£{(/(\S+)/)[0]}++&&undef$_'

Still too easy to read ;-)
 
B

Ben Bacarisse

Kaz Kylheku said:
I have a plain text file with each line in the format:

Start of line followed immediately by a string of character(s), a
whitespace, another string, a newline.

-------- file.txt -------

SOMESTRING XXX
SOMESTRING ZZZ
SOMEOTHERSTRING YYYZZ23
DIFFERENTSTRING HELLO

This can be implemented using a very simple, clear on-liner in awk, right from
your shell prompt.

The lines marked <- are my tty input; the others are awk output:

$ awk '{ if (! ($1 in seen)) { print $0 ; seen[$1] } }'

There are one-line Perl versions as well of course. Maybe

perl -ane '$seen{$F[0]} = print unless $seen{$F[0]}'

<snip>
 
R

Rainer Weikusat

Ben Bacarisse said:
Kaz Kylheku said:
I have a plain text file with each line in the format:

Start of line followed immediately by a string of character(s), a
whitespace, another string, a newline.

-------- file.txt -------

SOMESTRING XXX
SOMESTRING ZZZ
SOMEOTHERSTRING YYYZZ23
DIFFERENTSTRING HELLO

This can be implemented using a very simple, clear on-liner in awk, right from
your shell prompt.

The lines marked <- are my tty input; the others are awk output:

$ awk '{ if (! ($1 in seen)) { print $0 ; seen[$1] } }'

There are one-line Perl versions as well of course. Maybe

perl -ane '$seen{$F[0]} = print unless $seen{$F[0]}'

That's a neat idea. Obvious extension of that:

perl -ane '$seen{$F[0]} //= print'
 
J

John Black

Ben Bacarisse said:
Kaz Kylheku said:
I have a plain text file with each line in the format:

Start of line followed immediately by a string of character(s), a
whitespace, another string, a newline.

-------- file.txt -------

SOMESTRING XXX
SOMESTRING ZZZ
SOMEOTHERSTRING YYYZZ23
DIFFERENTSTRING HELLO

This can be implemented using a very simple, clear on-liner in awk, right from
your shell prompt.

The lines marked <- are my tty input; the others are awk output:

$ awk '{ if (! ($1 in seen)) { print $0 ; seen[$1] } }'

There are one-line Perl versions as well of course. Maybe

perl -ane '$seen{$F[0]} = print unless $seen{$F[0]}'

That's a neat idea. Obvious extension of that:

perl -ane '$seen{$F[0]} //= print'

I've written many untilities and tools in Perl and I don't understand these one liners at
all...

John Black
 
T

Tim McDaniel

(e-mail address removed)
says...
Ben Bacarisse said:
I have a plain text file with each line in the format:

Start of line followed immediately by a string of character(s), a
whitespace, another string, a newline.

-------- file.txt -------

SOMESTRING XXX
SOMESTRING ZZZ
SOMEOTHERSTRING YYYZZ23
DIFFERENTSTRING HELLO

This can be implemented using a very simple, clear on-liner in
awk, right from your shell prompt.

The lines marked <- are my tty input; the others are awk output:

$ awk '{ if (! ($1 in seen)) { print $0 ; seen[$1] } }'

There are one-line Perl versions as well of course. Maybe

perl -ane '$seen{$F[0]} = print unless $seen{$F[0]}'

That's a neat idea. Obvious extension of that:

perl -ane '$seen{$F[0]} //= print'

I've written many untilities and tools in Perl and I don't understand
these one liners at all...

I can't speak for other people's motives, but for me, I tend to see
Perl one-liners as humor, in most cases. Occasionally there's a
clever technique that's useful and maintainable, and of course I'm
excluding simple education in powerful features that I didn't know
about (e.g., s///r is relatively new). But in most cases, I consider
it to be humor, and of course showing off one's Perl l33t skyllZ (or
however the hep cats express it to-day).

But I suggest that you try to decrypt them, just as a learning
exercise. If there are specific points that still confuse you, please
ask about them here.

"man perlrun" on most systems explains the command line. "perl -p -e"
is something I use occasionally; "perl -pie" even less often; I've
never had to use "perl -a".

This subthread have used the fact that "print" is a function that
returns true if the printing succeeded, which it really ought to do.
"perldoc -f print" should give you its docco.

"//" is a newish operator: "man perlop". "||" would have worked just
as well in this case, I think -- the return values of print on my
system appear to be 1 and undef.
 
E

Eric Pozharski

with said:
*SKIP*
$ awk '{ if (! ($1 in seen)) { print $0 ; seen[$1] } }'
There are one-line Perl versions as well of course. Maybe
perl -ane '$seen{$F[0]} = print unless $seen{$F[0]}'
That's a neat idea. Obvious extension of that:
perl -ane '$seen{$F[0]} //= print'
I've written many untilities and tools in Perl and I don't understand
these one liners at all...

I didn't either. Then I've learned I have to practice skills before
understanding.
 
R

Rainer Weikusat

Martijn Lievaart said:
[...]
perl -ane '$seen{$F[0]} //= print'

As someone already said, %seen is boring.

perl -ane '$_{$F[0]} //= print'

At the expense of some 'loss in generality' (aka 'risk of causing weird
effects in case of name collisions with Perl special variables'), the
symbol table already provides a perfectly usable hash:

perl -ane '${$F[0]} //= print'
 
K

Kaz Kylheku

We're getting there, combining with Rainers suggestion and removing
whitespace, we get

perl -lanE '${$F[0]}//=say'

I doubt that can be shortened any further.

Note that this symbol table hack has obvious flaws: namely that you don't own
the symbol space, and that space has content already:

Here is an obvious failing test case for the perl:

$ perl -lanE '${$F[0]}//=say'
_ foo
[Ctrl-D]

Some other nonworking test cases:

( foo
? foo
$ foo

The output is blank instead of the expected _ foo. The fix is to properly
namespace the keys so that they don't clash with existing symbols; but the
broken solution is already over par for the course as it is:

perl -lanE '${$F[0]}//=say'

awk '{if(!s[$1]++)print;}'

Also, this is almost normal, everyday awk, representing how that tool is
typically used: all that is lacking is whitespace and a meaningful name for the
hash instead of s.
 
R

Rainer Weikusat

Kaz Kylheku said:
We're getting there, combining with Rainers suggestion and removing
whitespace, we get

perl -lanE '${$F[0]}//=say'

I doubt that can be shortened any further.

Note that this symbol table hack has obvious flaws: namely that you don't own
the symbol space, and that space has content already:

"At the expense of some 'loss in generality' (aka 'risk of
causing weird effects in case of name collisions with Perl
special variables'),"

A more interesting test case would be

\ bla

(when using print instead of say).

[...]
the
broken solution is already over par for the course as it is:

perl -lanE '${$F[0]}//=say'

awk '{if(!s[$1]++)print;}'

Also, this is almost normal, everyday awk, representing how that tool is
typically used:

awk '!s[$1]--'

(tested with gawk)

But this is sort of off topic in a Perl news group. Also,

$_{$F[0]}//=print # [*]

is really a nicer algorithm because it doesn't do anything if the test
succeeds.

[*] I don't particularly like the idea of using a special command-line
option telling perl to strip newlines on input just because this means
that a builtin which unconditionally adds a newlines can be used.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,008
Latest member
obedient dusk

Latest Threads

Top