File deduplication

Discussion in 'Perl Misc' started by George Mpouras, Sep 2, 2013.

  1. George Mpouras, Sep 2, 2013
    #1
    1. Advertising

  2. George Mpouras

    Justin C Guest

    On 2013-09-02, George Mpouras <> wrote:
    > here is a Perl function to deduplicate your files. Not perfect but works
    >
    >
    > http://antarcticasurfer.wordpress.com/2013/09/02/deduplicate-files-contents/


    I think fdupes is much more likely to serve your
    purpose correctly and efficiently.

    Stop trying to re-invent the wheel, and stop pushing
    your code here, no one is asking for it, no one else
    does it. If you've a perl problem then post a snippet
    and explain what you expect it to do, you'll get any
    help you need. But I for one am fed up with what you
    keep posting, it's not helpful, useful, or wanted[1].

    You're circling the black hole that is my KF, unless
    you alter your trajectory you won't escape it.

    Justin.

    1. Please correct me if I'm wrong. If you look
    forward to the next installment of George's code
    posting please say and I'll re-align what I consider
    this group to be.

    --
    Justin C, by the sea.
    Justin C, Sep 2, 2013
    #2
    1. Advertising

  3. Justin C <> writes:
    > On 2013-09-02, George Mpouras <> wrote:
    >> here is a Perl function to deduplicate your files. Not perfect but works
    >>
    >>
    >> http://antarcticasurfer.wordpress.com/2013/09/02/deduplicate-files-contents/

    >
    > I think fdupes is much more likely to serve your
    > purpose correctly and efficiently.
    >
    > Stop trying to re-invent the wheel,


    Some people believe that they've accomplished a technical feat
    equivalent to inventing the wheel whenever they've managed to tack
    three lines of code together which do something else than 'crash
    immediately'. I'd calls this another nice example of the
    Dunning-Kruger effect in action.
    Rainer Weikusat, Sep 2, 2013
    #3
  4. Στις 2/9/2013 17:44, ο/η Justin C έγÏαψε:
    > On 2013-09-02, George Mpouras <> wrote:
    >> here is a Perl function to deduplicate your files. Not perfect but works


    > I think fdupes is much more likely to serve your
    > purpose correctly and efficiently.




    technical speaking fdupes find same files
    the code I post deduplicate multiple file, content, in place.

    I think you write your reply without wasting 5 seconds to read even what
    the post was about .
    George Mpouras, Sep 2, 2013
    #4
  5. Στις 2/9/2013 18:52, ο/η Rainer Weikusat έγÏαψε:
    > Justin C <> writes:
    >> On 2013-09-02, George Mpouras <> wrote:
    >>> here is a Perl function to deduplicate your files. Not perfect but works
    >>>
    >>>
    >>> http://antarcticasurfer.wordpress.com/2013/09/02/deduplicate-files-contents/

    >>
    >> I think fdupes is much more likely to serve your
    >> purpose correctly and efficiently.
    >>
    >> Stop trying to re-invent the wheel,

    >
    > Some people believe that they've accomplished a technical feat
    > equivalent to inventing the wheel whenever they've managed to tack
    > three lines of code together which do something else than 'crash
    > immediately'. I'd calls this another nice example of the
    > Dunning-Kruger effect in action.
    >



    I do not know any Perl "wheel" dedup a set of files content
    George Mpouras, Sep 2, 2013
    #5
  6. George Mpouras <> writes:
    > Στις 2/9/2013 18:52, ο/η Rainer Weikusat έγÏαψε:
    >> Justin C <> writes:
    >>> On 2013-09-02, George Mpouras <> wrote:
    >>>> here is a Perl function to deduplicate your files. Not perfect but works
    >>>>
    >>>>
    >>>> http://antarcticasurfer.wordpress.com/2013/09/02/deduplicate-files-contents/
    >>>
    >>> I think fdupes is much more likely to serve your
    >>> purpose correctly and efficiently.
    >>>
    >>> Stop trying to re-invent the wheel,

    >>
    >> Some people believe that they've accomplished a technical feat
    >> equivalent to inventing the wheel whenever they've managed to tack
    >> three lines of code together which do something else than 'crash
    >> immediately'. I'd calls this another nice example of the
    >> Dunning-Kruger effect in action.

    >
    > I do not know any Perl "wheel" dedup a set of files content


    This 're-invent the wheel' statement is incredibly stupid for two
    reasons:

    1. 'The wheel' is not some 'static' piece of technology but new kinds
    of wheels are constantly being developed and different kinds, eg,
    wheels used in high-speed trains vs wheels use for wheelbarrows are
    very much different.

    2. The basic design of 'the wheel' represents a very simple way to solve a
    particular problem 'perfectly' and has thus been unchanged for a few
    thousand years. In contrast to this, software which hasn't either
    vanished altogether or undergone a serious redesign for, say, thirty
    years, is extremely rare. The same is true for most other 'human
    inventions': Usually, they're useless trifles and vanish quickly.
    Rainer Weikusat, Sep 2, 2013
    #6
  7. George Mpouras

    Justin C Guest

    On 2013-09-02, George Mpouras <> wrote:
    > Στις 2/9/2013 17:44, ο/η Justin C έγÏαψε:
    >> On 2013-09-02, George Mpouras <> wrote:
    >>> here is a Perl function to deduplicate your files. Not perfect but works

    >
    >> I think fdupes is much more likely to serve your
    >> purpose correctly and efficiently.

    >
    >
    >
    > technical speaking fdupes find same files
    > the code I post deduplicate multiple file, content, in place.
    >
    > I think you write your reply without wasting 5 seconds to read even what
    > the post was about .


    You go ahead and think what you like, but, for once,
    I'm with Rainer, your posts appear to be no more than
    Dunning-Kruger in action.


    Justin.

    --
    Justin C, by the sea.
    Justin C, Sep 3, 2013
    #7
  8. Στις 3/9/2013 11:14, ο/η Justin C έγÏαψε:
    > On 2013-09-02, George Mpouras <> wrote:
    >> Στις 2/9/2013 17:44, ο/η Justin C έγÏαψε:
    >>> On 2013-09-02, George Mpouras <> wrote:
    >>>> here is a Perl function to deduplicate your files. Not perfect but works

    >>
    >>> I think fdupes is much more likely to serve your
    >>> purpose correctly and efficiently.

    >>
    >>
    >>
    >> technical speaking fdupes find same files
    >> the code I post deduplicate multiple file, content, in place.
    >>
    >> I think you write your reply without wasting 5 seconds to read even what
    >> the post was about .

    >
    > You go ahead and think what you like, but, for once,
    > I'm with Rainer, your posts appear to be no more than
    > Dunning-Kruger in action.
    >
    >
    > Justin.
    >


    I do not "think" I am based on facts like man pages.
    George Mpouras, Sep 3, 2013
    #8
  9. George Mpouras <> writes:
    > here is a Perl function to deduplicate your files. Not perfect but works
    >
    >
    > http://antarcticasurfer.wordpress.com/2013/09/02/deduplicate-files-contents/


    This looks a URL to me. As a comment which is not a flame: You're
    doing 'OS detection' at runtime and execute different code based
    on that:

    if ($^O=~/(?i)MSWin/) {
    unless (0 == system(“RD /Q /S \â€$temp/$_\â€")) {
    die “Could not delete \â€$temp/$_\†directory because\â€$^E\â€\nâ€
    }
    } else {
    unless (0 == system(“rm -rf \â€$temp/$_\â€")) {
    die “Could not delete \â€$temp/$_\†directory because \â€$^E\â€\nâ€
    }
    }

    but the OS will rarely ever change at runtime. You should rather move
    this into a BEGIN block and create 'a suitable function' you could
    then call from the main code. I think you should also consider using
    the 'list form' of system so that the runtime doesn't have to parse
    you're command in order to deteremine how to execute them, especially
    as this would also get around the (broken) 'quoted text
    interpolation'. Example:

    ---------------
    BEGIN {
    if ($^O eq 'linux') {
    *rmtree = sub {
    system(qw(rm -rf), $_[0]) == 0 and return;
    die("could not delete '$_[0]': $?");
    };
    }
    }

    rmtree($ARGV[0]);
    ---------------

    Using $^E/ $! here doesn't make much sense because this will only
    contain information about a problem which caused system to fail, not
    about one encountered by the program which was started.

    You should also consider to get rid of the 'inverted comparisons'
    habit: This isn't even theoretically useful when both compared objects
    are lvalues and mainly communicates a certain mathetic refusal to
    accept reality: A lot of programming languages use == as comparison
    operator, have been doing so for fourty years, and partisan syntax
    won't change that. Also, natural western languages work such that
    questions are asked in order to determine properties of object ('Is
    the car blue?') and not objects of properties ('Is blue the colour of
    the car?'). This latter is just awkward and outlandish style.
    Rainer Weikusat, Sep 3, 2013
    #9
  10. Jürgen Exner, Sep 3, 2013
    #10
  11. George Mpouras

    John Bokma Guest

    Justin C <> writes:

    > On 2013-09-02, George Mpouras <> wrote:
    >> here is a Perl function to deduplicate your files. Not perfect but works
    >>
    >>
    >> http://antarcticasurfer.wordpress.com/2013/09/02/deduplicate-files-contents/

    >
    > I think fdupes is much more likely to serve your
    > purpose correctly and efficiently.
    >
    > Stop trying to re-invent the wheel,


    There's plenty that can be improved about fdupes. For example limiting
    it to certain file extensions, skipping directories. Personally I would
    like to have a program which I can give a list of dirs I want to "keep"
    and a list of dirs I want to "empty". The program will remove all files
    that are in the "to empty" list that have a duplicate in the "keep"
    list. And no, that's not the same as the auto-delete option that fdupes has.

    > and stop pushing your code here,


    If the OP just drops links to his site, report him for spam. Otherwise I
    suggest you use a kill file.

    --
    John Bokma j3b

    Blog: http://johnbokma.com/ Perl Consultancy: http://castleamber.com/
    Perl for books: http://johnbokma.com/perl/help-in-exchange-for-books.html
    John Bokma, Sep 3, 2013
    #11

  12. >
    > If the OP just drops links to his site, report him for spam. Otherwise I
    > suggest you use a kill file.
    >


    What else can I say after that, "Please donate me 10 boxes" !
    George Mpouras, Sep 4, 2013
    #12
  13. George Mpouras <> writes:

    >
    > I do not "think"


    That much is obvious.

    --
    "We will need a longer wall when the revolution comes."
    --- AJS, quoting an uncertain source.
    Mart van de Wege, Sep 4, 2013
    #13
  14. Στις 3/9/2013 17:36, ο/η Jürgen Exner έγÏαψε:
    > if ($#dirs == -1)
    >
    > You must be kidding....





    # which is the less funny ?


    my @array;

    print "Array is blank\n" if (
    (0 == scalar @array) ||
    (0 == @array) ||
    (-1 == $#array)
    );
    George Mpouras, Sep 4, 2013
    #14
  15. George Mpouras <> writes:
    > Στις 3/9/2013 17:36, ο/η Jürgen Exner έγÏαψε:
    >> if ($#dirs == -1)
    >>
    >> You must be kidding....

    >
    >
    >
    >
    > # which is the less funny ?
    >
    > my @array;
    >
    > print "Array is blank\n" if (
    > (0 == scalar @array) ||
    > (0 == @array) ||
    > (-1 == $#array)
    > );


    The least funny would be

    print "Array is blank" unless @array;

    which can also be written as

    @array or print "Where did all the flowers go?";

    The inverted comparisons are also just bizarre if none of the
    operators is an lvalue because then, an accidental assignment will
    result in an error either way.
    Rainer Weikusat, Sep 4, 2013
    #15
  16. >
    > @array or print "Where did all the flowers go?";



    the @array is also faster than scalar @array , interesting





    use Benchmark;
    my @array;
    my $results = Benchmark::timethese(5_000_000, {
    method1 => sub{ @array ? 1 : 0 },
    method2 => sub{ scalar @array ? 1 : 0 }});
    Benchmark::cmpthese($results);
    George Mpouras, Sep 4, 2013
    #16
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Luis Esteban Valencia Muñoz
    Replies:
    3
    Views:
    1,408
    Scott Allen
    Jun 4, 2005
  2. Dave
    Replies:
    1
    Views:
    990
    Juan T. Llibre
    Jun 8, 2007
  3. =?ISO-8859-2?Q?Miros=B3aw?= Makowiecki

    Reading of file by next of map file and by next of file descriptor.

    =?ISO-8859-2?Q?Miros=B3aw?= Makowiecki, Jul 10, 2007, in forum: C++
    Replies:
    1
    Views:
    787
    Alf P. Steinbach
    Jul 10, 2007
  4. scad
    Replies:
    4
    Views:
    950
    James Kanze
    May 28, 2009
  5. Ludwigi Beethoven
    Replies:
    5
    Views:
    320
    Mike Hall
    Jul 26, 2003
Loading...

Share This Page