File deduplication

J

Justin C

here is a Perl function to deduplicate your files. Not perfect but works


http://antarcticasurfer.wordpress.com/2013/09/02/deduplicate-files-contents/

I think fdupes is much more likely to serve your
purpose correctly and efficiently.

Stop trying to re-invent the wheel, and stop pushing
your code here, no one is asking for it, no one else
does it. If you've a perl problem then post a snippet
and explain what you expect it to do, you'll get any
help you need. But I for one am fed up with what you
keep posting, it's not helpful, useful, or wanted[1].

You're circling the black hole that is my KF, unless
you alter your trajectory you won't escape it.

Justin.

1. Please correct me if I'm wrong. If you look
forward to the next installment of George's code
posting please say and I'll re-align what I consider
this group to be.
 
R

Rainer Weikusat

Justin C said:
I think fdupes is much more likely to serve your
purpose correctly and efficiently.

Stop trying to re-invent the wheel,

Some people believe that they've accomplished a technical feat
equivalent to inventing the wheel whenever they've managed to tack
three lines of code together which do something else than 'crash
immediately'. I'd calls this another nice example of the
Dunning-Kruger effect in action.
 
G

George Mpouras

Στις 2/9/2013 17:44, ο/η Justin C έγÏαψε:
I think fdupes is much more likely to serve your
purpose correctly and efficiently.



technical speaking fdupes find same files
the code I post deduplicate multiple file, content, in place.

I think you write your reply without wasting 5 seconds to read even what
the post was about .
 
G

George Mpouras

Στις 2/9/2013 18:52, ο/η Rainer Weikusat έγÏαψε:
Some people believe that they've accomplished a technical feat
equivalent to inventing the wheel whenever they've managed to tack
three lines of code together which do something else than 'crash
immediately'. I'd calls this another nice example of the
Dunning-Kruger effect in action.


I do not know any Perl "wheel" dedup a set of files content
 
R

Rainer Weikusat

George Mpouras said:
Στις 2/9/2013 18:52, ο/η Rainer Weikusat έγÏαψε:

I do not know any Perl "wheel" dedup a set of files content

This 're-invent the wheel' statement is incredibly stupid for two
reasons:

1. 'The wheel' is not some 'static' piece of technology but new kinds
of wheels are constantly being developed and different kinds, eg,
wheels used in high-speed trains vs wheels use for wheelbarrows are
very much different.

2. The basic design of 'the wheel' represents a very simple way to solve a
particular problem 'perfectly' and has thus been unchanged for a few
thousand years. In contrast to this, software which hasn't either
vanished altogether or undergone a serious redesign for, say, thirty
years, is extremely rare. The same is true for most other 'human
inventions': Usually, they're useless trifles and vanish quickly.
 
J

Justin C

Στις 2/9/2013 17:44, ο/η Justin C έγÏαψε:




technical speaking fdupes find same files
the code I post deduplicate multiple file, content, in place.

I think you write your reply without wasting 5 seconds to read even what
the post was about .

You go ahead and think what you like, but, for once,
I'm with Rainer, your posts appear to be no more than
Dunning-Kruger in action.


Justin.
 
G

George Mpouras

Στις 3/9/2013 11:14, ο/η Justin C έγÏαψε:
You go ahead and think what you like, but, for once,
I'm with Rainer, your posts appear to be no more than
Dunning-Kruger in action.


Justin.

I do not "think" I am based on facts like man pages.
 
R

Rainer Weikusat

George Mpouras said:
here is a Perl function to deduplicate your files. Not perfect but works


http://antarcticasurfer.wordpress.com/2013/09/02/deduplicate-files-contents/

This looks a URL to me. As a comment which is not a flame: You're
doing 'OS detection' at runtime and execute different code based
on that:

if ($^O=~/(?i)MSWin/) {
unless (0 == system(“RD /Q /S \â€$temp/$_\â€")) {
die “Could not delete \â€$temp/$_\†directory because\â€$^E\â€\nâ€
}
} else {
unless (0 == system(“rm -rf \â€$temp/$_\â€")) {
die “Could not delete \â€$temp/$_\†directory because \â€$^E\â€\nâ€
}
}

but the OS will rarely ever change at runtime. You should rather move
this into a BEGIN block and create 'a suitable function' you could
then call from the main code. I think you should also consider using
the 'list form' of system so that the runtime doesn't have to parse
you're command in order to deteremine how to execute them, especially
as this would also get around the (broken) 'quoted text
interpolation'. Example:

---------------
BEGIN {
if ($^O eq 'linux') {
*rmtree = sub {
system(qw(rm -rf), $_[0]) == 0 and return;
die("could not delete '$_[0]': $?");
};
}
}

rmtree($ARGV[0]);
---------------

Using $^E/ $! here doesn't make much sense because this will only
contain information about a problem which caused system to fail, not
about one encountered by the program which was started.

You should also consider to get rid of the 'inverted comparisons'
habit: This isn't even theoretically useful when both compared objects
are lvalues and mainly communicates a certain mathetic refusal to
accept reality: A lot of programming languages use == as comparison
operator, have been doing so for fourty years, and partisan syntax
won't change that. Also, natural western languages work such that
questions are asked in order to determine properties of object ('Is
the car blue?') and not objects of properties ('Is blue the colour of
the car?'). This latter is just awkward and outlandish style.
 
J

John Bokma

Justin C said:
I think fdupes is much more likely to serve your
purpose correctly and efficiently.

Stop trying to re-invent the wheel,

There's plenty that can be improved about fdupes. For example limiting
it to certain file extensions, skipping directories. Personally I would
like to have a program which I can give a list of dirs I want to "keep"
and a list of dirs I want to "empty". The program will remove all files
that are in the "to empty" list that have a duplicate in the "keep"
list. And no, that's not the same as the auto-delete option that fdupes has.
and stop pushing your code here,

If the OP just drops links to his site, report him for spam. Otherwise I
suggest you use a kill file.
 
G

George Mpouras

If the OP just drops links to his site, report him for spam. Otherwise I
suggest you use a kill file.

What else can I say after that, "Please donate me 10 boxes" !
 
G

George Mpouras

Στις 3/9/2013 17:36, ο/η Jürgen Exner έγÏαψε:
if ($#dirs == -1)

You must be kidding....




# which is the less funny ?


my @array;

print "Array is blank\n" if (
(0 == scalar @array) ||
(0 == @array) ||
(-1 == $#array)
);
 
R

Rainer Weikusat

George Mpouras said:
Στις 3/9/2013 17:36, ο/η Jürgen Exner έγÏαψε:




# which is the less funny ?

my @array;

print "Array is blank\n" if (
(0 == scalar @array) ||
(0 == @array) ||
(-1 == $#array)
);

The least funny would be

print "Array is blank" unless @array;

which can also be written as

@array or print "Where did all the flowers go?";

The inverted comparisons are also just bizarre if none of the
operators is an lvalue because then, an accidental assignment will
result in an error either way.
 
G

George Mpouras

@array or print "Where did all the flowers go?";


the @array is also faster than scalar @array , interesting





use Benchmark;
my @array;
my $results = Benchmark::timethese(5_000_000, {
method1 => sub{ @array ? 1 : 0 },
method2 => sub{ scalar @array ? 1 : 0 }});
Benchmark::cmpthese($results);
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top