How to identify unpaired files in a list

P

Paul

Hello there. I have a problem where I am trying to identify unpaired
files in a directory.

The directory may have files like the following:
----
abcd0001.txt
abcd0001.def.bak
abcd0002.txt
abcd0003.txt
abcd0003.ghi.bak
abcd0004.xyz.bak
----

What I'd like to do is identify the unpaired files 'abcd0002.txt' and
'abcd0004.xyz.bak' in the list above.

I've tried bringing all the filenames into a single array but I'm not
sure how to delete items that have duplicate *root* names. Deleting
duplicates is trivial, of course, but the extensions on the 2nd files
always change.

I also tried creating 2 separate arrays (one for each file type) so
that I can compare the root filenames, but then I'm left with looping
through one array many times.. which will have a huge performance hit
when the filenames number in the 1,000's.

Does anyone have any good suggestions that they can offer?

Please let me know. Thanks. Paul.
 
R

Roger Pack

I also tried creating 2 separate arrays (one for each file type) so
that I can compare the root filenames, but then I'm left with looping
through one array many times.. which will have a huge performance hit
when the filenames number in the 1,000's.

Does anyone have any good suggestions that they can offer?

Perhaps nested hashes.
=r
 
R

Roger Pack

Perhaps nested hashes.

Oops I meant hashes of arrays, i.e. final data structure:

{'abcd0001' => ['abcd0001.txt', 'abcd0001.bak.txt'], 'abcd0003' =>
['abcd0003']}

then iterate through looking for arrays with length 1 only.
GL.
=r
filename['
 
S

Steven Hirsch

I had the same requirement and ended up using Roger's method except
with an array of arrays:
[['abcd0001.txt', 'abcd0001.bak.txt'],['abcd0003'], etc.]

Perhaps nested hashes.

Oops I meant hashes of arrays, i.e. final data structure:

{'abcd0001' => ['abcd0001.txt', 'abcd0001.bak.txt'], 'abcd0003' =>
['abcd0003']}

then iterate through looking for arrays with length 1 only.
GL.
=r
filename['
 
P

Paul Carvalho

Perhaps nested hashes.

Oops I meant hashes of arrays, i.e. final data structure:

{'abcd0001' => ['abcd0001.txt', 'abcd0001.bak.txt'], 'abcd0003' =>
['abcd0003']}

then iterate through looking for arrays with length 1 only.
GL.
=r

That's cool. I'll give it a try. Thanks.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,142
Latest member
arinsharma
Top