Trouble with embedded whitespace in filenames using File::Find

M

Mike Scott

RW> In this case, it was about answering a question someone asked
RW> which happened to be related to perl. If that someone should
RW> perhaps have asked a different question in another newsgroup is
RW> for him to decide.

In other words, it really isn't about being helpful, as far as you're
concerned.

Thank you. I used to have a work colleague who'd suggest that 'if you
want to go there, don't start here'. He was too often right; partial
solutions can be worse than none, and this seems a case in point.

I do notice the OP has gone silent. Would be good to know his progress.
 
R

Rainer Weikusat

Charlton Wilbur said:
RW> In this case, it was about answering a question someone asked
RW> which happened to be related to perl. If that someone should
RW> perhaps have asked a different question in another newsgroup is
RW> for him to decide.

In other words, it really isn't about being helpful, as far as you're
concerned.

My opinion on 'being helpful' apparently differs from your opinion on
that. And - this being the important aspect here - what pre-existing
program could possibly solve the problem the OP was trying to solve is
a discussion of its own and one which belongs elsewhere.
 
P

Peter J. Holzer

RW> MD5 (or any other hashing algorithm) is a lot more expensive
RW> than a comparison and especially so if MD5 needs to process 2G
RW> of data while the comparison would only need 8K.

You make several unfounded assumptions here. [...]
Two, that the number of comparisons is small. The more comparisons you
have, the more the advantage goes to the hashing algorithm. If you have
2 files, it is best to read the first 8K of each and compare them,
since, as you note, odds are that any differences will appear early on.
If you have 1000 files, reading the first 8K of each file for
comparison purposes means a great deal of seeking and reading;

It's about the same amount of seeking and a lot less reading than
computing a hash of each of the 1000 files. At least if the files are a
lot larger than 8k.

and then you either store the first 8K, leading to a large working set
(and the first time you swap, you've lost anything you won by avoiding
calculating hashes),

8k * 1000 is 8 MB. That's negligible. And you only have to store this if
there are actually 1000 files of the same size.

There is also a hybrid approach:

For each group of files of the same size, you could initially read only
the first 8k (or some other size large enough to find the first
difference with a high probability, but small enough to be dwarfed by
the overhead of open(2)), and if those are the identical, switch to
computing a hash (and as Ben said, you can use something like SHA512 -
where a collision is IMHO less likely than a false positive due to a
hardware or software error).

hp
 
C

Clint O

That's easy. Step one: find a real news client. Step two: find a real
news server.

Google Groups is unusable as a posting interface to Usenet.

I used to use slrn. Is there anything better than that these days? I do have a subscription to Supernews.

Thanks,

-Clint
 
P

Peter J. Holzer

I used to use slrn. Is there anything better than that these days? I do have a subscription to Supernews.

I guess that depends on your quality criteria.

I use slrn and think it's pretty decent. It has about all the navigation
and filter capabilities I need, displays text/plain in just about any
charset ok (as long as you stick to left-to-right scripts; But since I
can read neither Hebrew nor Arabic that's rarely a problem) and it lets
me edit my postings with my favourite editor (in fact it insists on it).

It is seriously deficient in the display of non-text/plain articles, but
those aren't common in any newsgroup I currently read.

I have also tried Thunderbird, KNode and Pan (although none of them
recently) but wasn't impressed.

hp
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,014
Latest member
BiancaFix3

Latest Threads

Top