One-liner removing duplicate lines

Discussion in 'Ruby' started by Damien Wyart, Oct 5, 2005.

  1. Damien Wyart

    Damien Wyart Guest

    Hello,

    Converting from Perl to Ruby, I am trying to find an equivalent to this
    Perl one-liner removing duplicate lines in a file (without sorting it at
    first) :

    perl -ne'$s{$_}++||print' infile >outfile

    I guess uniq method could be used, but I can't find how.


    Many thanks in advance,

    --
    Damien Wyart
    Damien Wyart, Oct 5, 2005
    #1
    1. Advertising

  2. On 10/5/05, Damien Wyart <> wrote:
    > Hello,
    >
    > Converting from Perl to Ruby, I am trying to find an equivalent to this
    > Perl one-liner removing duplicate lines in a file (without sorting it at
    > first) :
    >
    > perl -ne'$s{$_}++||print' infile >outfile
    >
    > I guess uniq method could be used, but I can't find how.


    I tried creating a version that mimics the Perl one (because Ruby also
    has the -n option), but in the end this seemed easier (and much more
    readable):

    ruby -e "puts IO.readlines(ARGV[0]).uniq" infile > outfile

    So you are right about using uniq.

    Ryan
    Ryan Leavengood, Oct 5, 2005
    #2
    1. Advertising

  3. Damien Wyart

    Stefan Lang Guest

    On Wednesday 05 October 2005 22:25, Ryan Leavengood wrote:
    > On 10/5/05, Damien Wyart <> wrote:
    > > Hello,
    > >
    > > Converting from Perl to Ruby, I am trying to find an equivalent
    > > to this Perl one-liner removing duplicate lines in a file
    > > (without sorting it at first) :
    > >
    > > perl -ne'$s{$_}++||print' infile >outfile
    > >
    > > I guess uniq method could be used, but I can't find how.

    >
    > I tried creating a version that mimics the Perl one (because Ruby
    > also has the -n option), but in the end this seemed easier (and
    > much more readable):
    >
    > ruby -e "puts IO.readlines(ARGV[0]).uniq" infile > outfile


    or:
    ruby -e 'puts ARGF.readlines.uniq' infile > outfile

    --
    Stefan
    Stefan Lang, Oct 5, 2005
    #3
  4. Damien Wyart

    Eric Mahurin Guest

    Here is a pretty close translation that does what you want:

    ruby -ne 's||=3D{};s[$_]||print;s[$_]=3Dtrue'

    --- Damien Wyart <> wrote:

    > Hello,
    >=20
    > Converting from Perl to Ruby, I am trying to find an
    > equivalent to this
    > Perl one-liner removing duplicate lines in a file (without
    > sorting it at
    > first) :
    >=20
    > perl -ne'$s{$_}++||print' infile >outfile
    >=20
    > I guess uniq method could be used, but I can't find how.
    >=20
    >=20
    > Many thanks in advance,
    >=20
    > --=20
    > Damien Wyart
    >=20
    >=20




    =09
    __________________________________=20
    Yahoo! Mail - PC Magazine Editors' Choice 2005=20
    http://mail.yahoo.com
    Eric Mahurin, Oct 5, 2005
    #4
  5. On 10/5/05, Ryan Leavengood <> wrote:
    >
    > I tried creating a version that mimics the Perl one (because Ruby also
    > has the -n option), but in the end this seemed easier (and much more
    > readable):
    >
    > ruby -e "puts IO.readlines(ARGV[0]).uniq" infile > outfile
    >
    > So you are right about using uniq.


    Just for sake of comparison, here is the more "Perl-like" version:

    ruby -ne "s||=3D{};s[$_]||print;s[$_]=3D1" infile > outfile

    Maybe some Ruby golfers can shorten it some more, but since Ruby lacks
    some of the more terse (and obfuscating) features of Perl, it may not
    be possible.

    Ryan
    Ryan Leavengood, Oct 5, 2005
    #5
  6. On Oct 5, 2005, at 3:25 PM, Ryan Leavengood wrote:

    > On 10/5/05, Damien Wyart <> wrote:
    >
    >> Hello,
    >>
    >> Converting from Perl to Ruby, I am trying to find an equivalent to
    >> this
    >> Perl one-liner removing duplicate lines in a file (without sorting
    >> it at
    >> first) :
    >>
    >> perl -ne'$s{$_}++||print' infile >outfile
    >>
    >> I guess uniq method could be used, but I can't find how.
    >>

    >
    > I tried creating a version that mimics the Perl one (because Ruby also
    > has the -n option), but in the end this seemed easier (and much more
    > readable):
    >
    > ruby -e "puts IO.readlines(ARGV[0]).uniq" infile > outfile
    >
    > So you are right about using uniq.


    That slurps the file though, of course, so mind your memory
    requirements.

    Here's a more direct translation (untested):

    ruby -ne 'BEGIN { $lines = Hash.new(0) }; print if ($lines[$_] += 1)
    == 1' infile > outfile

    James Edward Gray II
    James Edward Gray II, Oct 5, 2005
    #6
  7. Damien Wyart wrote:

    > Hello,
    >
    > Converting from Perl to Ruby, I am trying to find an equivalent to this
    > Perl one-liner removing duplicate lines in a file (without sorting it at
    > first) :
    >
    > perl -ne'$s{$_}++||print' infile >outfile
    >
    > I guess uniq method could be used, but I can't find how.


    true,

    open(outfile, 'w'){|out| out << IO.readlines(infile).uniq.join}

    cheers

    Simon
    Simon Kröger, Oct 5, 2005
    #7
  8. Damien Wyart

    Stefan Lang Guest

    On Wednesday 05 October 2005 22:34, Ryan Leavengood wrote:
    > On 10/5/05, Ryan Leavengood <> wrote:
    > > I tried creating a version that mimics the Perl one (because Ruby
    > > also has the -n option), but in the end this seemed easier (and
    > > much more readable):
    > >
    > > ruby -e "puts IO.readlines(ARGV[0]).uniq" infile > outfile
    > >
    > > So you are right about using uniq.

    >
    > Just for sake of comparison, here is the more "Perl-like" version:
    >
    > ruby -ne "s||={};s[$_]||print;s[$_]=1" infile > outfile
    >
    > Maybe some Ruby golfers can shorten it some more, but since Ruby
    > lacks some of the more terse (and obfuscating) features of Perl, it
    > may not be possible.


    ruby -ne 'a||={};a[$_]||=(print;1)' infile > outfile

    --
    Stefan
    Stefan Lang, Oct 5, 2005
    #8
  9. How about the uniq(1) program? uniq infile > outfile
    Vincent Foley, Oct 5, 2005
    #9
  10. Stefan Lang wrote:

    > On Wednesday 05 October 2005 22:34, Ryan Leavengood wrote:
    >
    >>On 10/5/05, Ryan Leavengood <> wrote:
    >>
    >>>I tried creating a version that mimics the Perl one (because Ruby
    >>>also has the -n option), but in the end this seemed easier (and
    >>>much more readable):
    >>>
    >>>ruby -e "puts IO.readlines(ARGV[0]).uniq" infile > outfile
    >>>
    >>>So you are right about using uniq.

    >>
    >>Just for sake of comparison, here is the more "Perl-like" version:
    >>
    >>ruby -ne "s||={};s[$_]||print;s[$_]=1" infile > outfile
    >>
    >>Maybe some Ruby golfers can shorten it some more, but since Ruby
    >>lacks some of the more terse (and obfuscating) features of Perl, it
    >>may not be possible.

    >
    >
    > ruby -ne 'a||={};a[$_]||=(print;1)' infile > outfile


    ruby -ne 'a||={};a[$_]||=print|1' infile > outfile

    cheers

    Simon
    Simon Kröger, Oct 5, 2005
    #10
  11. Simon Kr=F6ger wrote:

    > Stefan Lang wrote:
    >=20
    >> On Wednesday 05 October 2005 22:34, Ryan Leavengood wrote:
    >>
    >>> On 10/5/05, Ryan Leavengood <> wrote:
    >>>
    >>>> I tried creating a version that mimics the Perl one (because Ruby
    >>>> also has the -n option), but in the end this seemed easier (and
    >>>> much more readable):
    >>>>
    >>>> ruby -e "puts IO.readlines(ARGV[0]).uniq" infile > outfile
    >>>>
    >>>> So you are right about using uniq.
    >>>
    >>>
    >>> Just for sake of comparison, here is the more "Perl-like" version:
    >>>
    >>> ruby -ne "s||=3D{};s[$_]||print;s[$_]=3D1" infile > outfile
    >>>
    >>> Maybe some Ruby golfers can shorten it some more, but since Ruby
    >>> lacks some of the more terse (and obfuscating) features of Perl, it
    >>> may not be possible.

    >>
    >>
    >>
    >> ruby -ne 'a||=3D{};a[$_]||=3D(print;1)' infile > outfile

    >=20
    >=20
    > ruby -ne 'a||=3D{};a[$_]||=3Dprint|1' infile > outfile


    ruby -ne 'a||=3D{};a[$_]||=3D!print' infile > outfile

    >=20
    > cheers
    >=20
    > Simon
    >=20
    >=20
    Simon Kröger, Oct 5, 2005
    #11
  12. ------=_Part_10553_1413365.1128546972034
    Content-Type: text/plain; charset=ISO-8859-1
    Content-Transfer-Encoding: quoted-printable
    Content-Disposition: inline

    On 10/5/05, Vincent Foley <> wrote:
    >
    > How about the uniq(1) program? uniq infile > outfile



    Converting from Perl to Ruby, I am trying to find an equivalent to this
    > Perl one-liner removing duplicate lines in a file (without sorting it at
    > first) :



    He doesn't want sort the file first =3D)

    ------=_Part_10553_1413365.1128546972034--
    Louis J Scoras, Oct 5, 2005
    #12
  13. ruby -ne 'BEGIN{$s=3D{}};$s[$_]=3Dnil;END{puts$s}' infile > outfile

    On 10/6/05, Damien Wyart <> wrote:
    > Hello,
    >
    > Converting from Perl to Ruby, I am trying to find an equivalent to this
    > Perl one-liner removing duplicate lines in a file (without sorting it at
    > first) :
    >
    > perl -ne'$s{$_}++||print' infile >outfile
    >
    > I guess uniq method could be used, but I can't find how.
    >
    >
    > Many thanks in advance,
    >
    > --
    > Damien Wyart
    >
    >



    --
    http://nohmad.sub-port.net
    Gyoung-Yoon Noh, Oct 5, 2005
    #13
  14. ------=_Part_11074_30095953.1128548280433
    Content-Type: text/plain; charset=ISO-8859-1
    Content-Transfer-Encoding: quoted-printable
    Content-Disposition: inline

    On 10/5/05, Louis J Scoras <> wrote:
    >
    > On 10/5/05, Vincent Foley <> wrote:
    > >
    > > How about the uniq(1) program? uniq infile > outfile

    >
    >
    > Converting from Perl to Ruby, I am trying to find an equivalent to this
    > > Perl one-liner removing duplicate lines in a file (without sorting it a=

    t
    > > first) :

    >
    >
    > He doesn't want sort the file first =3D)
    >
    >

    Actually, to do this strait up unix, you'd need something like this
    (probably doesn't work perfectly in all cases--play with the sed part at th=
    e
    end):

    $ cat -n input | sort -k2 | uniq -f1 | sort | sed -e 's/^ *[0-9]*\t//' >
    output

    ------=_Part_11074_30095953.1128548280433--
    Louis J Scoras, Oct 5, 2005
    #14
  15. Damien Wyart wrote:
    > Hello,
    >
    > Converting from Perl to Ruby, I am trying to find an equivalent to this
    > Perl one-liner removing duplicate lines in a file (without sorting it at
    > first) :
    >
    > perl -ne'$s{$_}++||print' infile >outfile


    awk '!a[$0]++' infile >outfile
    William James, Oct 6, 2005
    #15
  16. -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1

    On Oct 5, 2005, at 7:11 PM, William James wrote:
    > Damien Wyart wrote:
    >> Converting from Perl to Ruby, I am trying to find an equivalent to
    >> this
    >> Perl one-liner removing duplicate lines in a file (without sorting
    >> it at
    >> first) :
    >>
    >> perl -ne'$s{$_}++||print' infile >outfile
    >>

    >
    > awk '!a[$0]++' infile >outfile


    My head a splode. Old school.

    Regards,
    jeremy
    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.2 (Darwin)

    iD8DBQFDRIlaAQHALep9HFYRAkWoAJ4sfaj+rDB428AXttWTyWXzjyvwYwCeJyA3
    rNKlJMmyjc9HkkKlgLhNHrQ=
    =di8F
    -----END PGP SIGNATURE-----
    Jeremy Kemper, Oct 6, 2005
    #16
  17. Jeremy Kemper wrote:

    > On Oct 5, 2005, at 7:11 PM, William James wrote:
    >
    >> awk '!a[$0]++' infile >outfile

    >
    > My head a splode. Old school.


    Seriously. Awe.

    Here's different way. Not a golf-winner, but maybe more Rubyish?
    ruby -e"o=nil; ARGF.each {|l| puts l or o=l unless o==l}" infile > outfile

    Devin
    Or disgust. Not sure.
    Devin Mullins, Oct 6, 2005
    #17
  18. Damien Wyart

    Damien Wyart Guest

    * "William James" <> in comp.lang.ruby:
    > awk '!a[$0]++' infile >outfile


    This one is very nice, thanks ! I had an Awk version which was slightly
    longer.

    --
    Damien Wyart
    Damien Wyart, Oct 6, 2005
    #18
  19. Damien Wyart

    Damien Wyart Guest

    * "Vincent Foley" <> in comp.lang.ruby:
    > How about the uniq(1) program? uniq infile > outfile


    Using uniq is not stable, ie you have to use sort(1) before, and the
    initial order of lines is not kept.

    --
    Damien Wyart
    Damien Wyart, Oct 6, 2005
    #19
  20. Damien Wyart

    Damien Wyart Guest

    Many thanks to everyone who responded, the answers are very interesting
    and enlightening !

    --
    Damien Wyart
    Damien Wyart, Oct 6, 2005
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jack
    Replies:
    9
    Views:
    2,636
  2. WuyaSea Operator
    Replies:
    48
    Views:
    1,433
    Ben Phillips
    Sep 24, 2007
  3. Lew
    Replies:
    0
    Views:
    397
  4. Larry
    Replies:
    1
    Views:
    92
    Martien Verbruggen
    Feb 3, 2005
  5. Ninja Li

    One liner to remove duplicate records

    Ninja Li, Apr 30, 2010, in forum: Perl Misc
    Replies:
    6
    Views:
    109
Loading...

Share This Page