check file exists with case sensitive on a case insensitive filesystem

X

Xah Lee

been a while i coded in perl...

how to check if a file exists with case sensitiveness on os x?

a quick google search says call system ls & grep. That seems too slow.
I have some over 10 thousand files to check.

Xah
∑ http://xahlee.org/

☄
 
U

Uri Guttman

XL> been a while i coded in perl...
XL> how to check if a file exists with case sensitiveness on os x?

XL> a quick google search says call system ls & grep. That seems too slow.
XL> I have some over 10 thousand files to check.

sorry we don't help self declared delusional geniuses who denigrate perl
in every insane cross post they make. my suggestion is to rtfm. or do it
in assembler. or write your own filesystem and plug it in.

uri
 
T

Tad J McClellan

been a while i coded in perl...


Thank you.

how to check if a file exists with case sensitiveness on os x?

a quick google search says call system ls & grep. That seems too slow.
I have some over 10 thousand files to check.


Bummer Dude.
 
A

A. Sinan Unur

been a while i coded in perl...

how to check if a file exists with case sensitiveness on os x?

a quick google search says call system ls & grep. That seems too slow.
I have some over 10 thousand files to check.

How is that too slow?

On Windows XP:

C:\Temp\test> dir
....
38077 File(s) 466 bytes

C:\Temp\test> cat check.bat
ls | grep %1

C:\Temp\test> timethis check ZZCTWD8DIS

C:\Temp\test> ls | grep ZZCTWD8DIS

TimeThis : Elapsed Time : 00:00:00.375

C:\Temp\test> dir ZZCTWD8DIS

2009/04/05 01:21 PM 0 zzctwd8dis

Less than half a second for 38,077 files. Of course, if you wanted to
use this in a loop, it would get very slow very fast.

Using Perl functions to replace ls and grep increases performance to 6
or 7 checks per second (against the same set of 38,077 randomly
generated filenames). I am not providing details because someone of your
stature should be able to write the appropriate benchmarks with no
effort.

Finally, if the contents of the directory are fixed, and you just want
to check the existence of many files case sensitively, just do it the
Perl way: Use a hash.

Sinan

--
A. Sinan Unur <[email protected]>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://www.rehabitation.com/clpmisc/
 
S

sln

been a while i coded in perl...

how to check if a file exists with case sensitiveness on os x?

a quick google search says call system ls & grep. That seems too slow.
I have some over 10 thousand files to check.

Xah
? http://xahlee.org/

?

This will do it for you Zah Zah.
Somewhere in here is the answer to your problem.

-sln

-----------------------------------------
#!/usr/bin/perl

package SMG;
use strict;

use File::stat;
use File::path;
use File::Copy;
use File::Find;
use File::Spec;
use sort 'stable';


require Exporter;
our @ISA = qw(Exporter);
our @EXPORT = qw(SafeMerge GetCommonElementsNxM GetCommonElements);
my $VERSION = 1.00;


#my $current = sort::current();



####################################################################################
# SAFE MERGE
# The two major options -
# 1. Merge 2 directories into 1, preserving/renaming duplicate named files.
# 2. Compare 2 directories and just rename the duplicates of 1 directory.
# Also -
# - The prefix level will be on a per-file monitored basis.
# - File/Dir names are cached and renamed in the "From" array,
# - Move is done via os-rename function into the "ToDir".
# - Remove "From" directory option (after rename).
# - Duplicate's-renaming-only is done in-place in the "FromDir".
# - Report output is created in the current directory (for now).
# - Itteration scheme are within-directory, then cross-directory until no more dups.
#-----------------------------------------------------------------------------------
sub SafeMerge ($$$$$$$)
{
return 0 if (@_ < 2 || @_ > 7);
my ($ToDir,$FromDir,$PrefixName,$PrefixLevel,$Exclude,$Duponly,$RmvFrom) = @_;

$PrefixName = '' unless defined $PrefixName; # Renaming Prefix
$PrefixLevel = 0 unless defined $PrefixLevel; # Number of times to += prefix

$Exclude = [] unless defined $Exclude; # Exclude list (must be re)
$Duponly = 0 unless defined $Duponly; # Flag to just rename dups in "from"
$RmvFrom = 0 unless defined $RmvFrom; # Flag to remove "from" after operation

my @To = ();
my @From = ();
my @Xfound = ();
my $dupsize = 0;
my $passes = 0;

print "\n\n *******************************\n";
print "********[ Safe Merge ]*********\n";
print " *******************************\n";
print "\nPrefix\t$PrefixName\nLevel\t$PrefixLevel\nChecks ... ";

$ToDir = File::Spec->canonpath( "$ToDir" );
$FromDir = File::Spec->canonpath( "$FromDir" );
#print "$ToDir\n$FromDir";
if ($ToDir eq $FromDir) { print "\rFrom/To are identical: $ToDir\n"; return 0; }
if (!-e $ToDir) { print "\rTo directory does not exist: $ToDir\n"; return 0; }
if (!-e $FromDir) { print "\rFrom directory does not exist: $FromDir\n"; return 0;}

# Glob seems to be bothered by spaces in dir path names
my $Sep = "\\".File::Spec->canonpath( "/" );
{
my $td = File::Spec->canonpath( "$ToDir/*" );
my $fd = File::Spec->canonpath( "$FromDir/*" );
if ($td =~ / /) {
$td = "'$td'";
}
if ($fd =~ / /) {
$fd = "'$fd'";
}
foreach (glob ("$td")) {
/.+$Sep(.+)$/;
# /.+\/(.+)$/;
push (@To, [$1, 0]);
}
foreach (glob ("$fd")) {
/.+$Sep(.+)$/;
# /.+\/(.+)$/;
my $ftmp = $1;
for (@{$Exclude}) {
if ($ftmp =~ /$_/) {
# save in @Xfound if $RmvFrom is set (see below)
print "\rExclude - $ftmp\n";
push (@Xfound, $ftmp) if ($RmvFrom);
$ftmp = undef;
last;
}
}
push (@From, [$ftmp, 0, '']) if defined ($ftmp);
}
}

# Will return if "From" empty. Technically not an error.
# If it is empty through exclusions (regexp filter), "From" could be rmdir'd below.
if (@From == 0) {
print "\rEmpty dir: $FromDir\n";
return 0;
}
print "OK\n";


## ============================================================================
## Iterate until there are no more dups across directories after (D2) rename
##
do {
my $Dupref = GetCommonElementsNxM_HashMethod(\@To,\@From, 1, 0);
$dupsize = @{$Dupref};
$passes++;

print "\nP A S S \# $passes\n---------------------\n";
print "Found $dupsize dups:\n";

#**********************************
# Check for cross-directory dups
# -------------
if ($dupsize)
{
my $errflg = 0;
print "\n<DIR> $FromDir\n\n";
for my $ndx (@{$Dupref})
{
my $Fromref = $From[$ndx];
my $pname = $PrefixName;

print "\t$Fromref->[0] ... $PrefixName";

#*****************************************
# Rename and check for in-directory dups
# ----------
while (1)
{
# Level check
if ($Fromref->[1] >= $PrefixLevel)
{
print "$Fromref->[0] <<level $Fromref->[1] exceeded>>\n";
# Clear the name in From so this file doesn't get moved
$Fromref->[0] = "";
$errflg = 1;
last;
}
my $stmp = uc($pname.$Fromref->[0]);
my $found = 0;
for (@From) { if (uc($_->[0]) eq $stmp) { $found = 1; last; } }

if (!$found) {
# OK to rename! Dup not found in "From"
# Change the name in "From" to this new one
# so it gets moved later. Increment its level of prefix
# ---------------------
print "$Fromref->[0]";
print "\n";
if ($Fromref->[1] == 0) { $Fromref->[2] = $Fromref->[0]; }
$Fromref->[0] = $pname.$Fromref->[0];
$Fromref->[1] += 1;
last;
}
else {
# File exist with that name, add another level of prefix.
# ---------------------
print "$PrefixName";
$pname = $pname.$PrefixName;
if ($Fromref->[1] == 0) { $Fromref->[2] = $Fromref->[0]; }
$Fromref->[1] += 1;
}
}
}
if ($errflg) {
print "!!! Can't rename some files, raise the level or change the prefix.\n";
#return 0;
}
}
} while ($dupsize > 0);
##
## End cross-dir iteration
## =============================


#*************************************************************
# Check for rename of duplicates in D2 (only) without moving
# ------------
if ($Duponly)
{
my $rentried = 0;
my $renok = 0;
print "\nRename only ... ";
for (@From) {
my $res = 0;
if ($_->[1] > 0 && $_->[1] < $PrefixLevel) {
$rentried++;
my $fr = File::Spec->canonpath( "$FromDir/$_->[2]" );
my $fr2 = File::Spec->canonpath( "$FromDir/$_->[0]" );
if (!($res = rename ("$fr", "$fr2"))) {
print "Rename error:\t$_->[2] ... $_->[0]\n";
} else {
$renok++; }
}
}
if ($rentried) { print "$renok out of $rentried files OK\n"; }
else { print "No duplicates found\n"; }
return 1;
}

#**********************
# Move "From" into "To"
# -------------------
#print @From."@From\n";
my $sep = File::Spec->canonpath( "/" );
print "\nMoving:\t$FromDir$sep* to $ToDir$sep\n";
my $movcnt = 0;

for (@From) {
my $res = 0;
if ($_->[1] > 0 && $_->[1] < $PrefixLevel) {
my $fr = File::Spec->canonpath( "$FromDir/$_->[2]" );
my $to = File::Spec->canonpath( "$ToDir/$_->[0]" );
$res = File::Copy::move ("$fr", "$to");
if ($res == 0) { print "Move error:\t$_->[2] ... $to\n"; }
}
if ($_->[1] == 0 && $_->[0] gt '') {
my $fr = File::Spec->canonpath( "$FromDir/$_->[0]" );
my $to = File::Spec->canonpath( "$ToDir/$_->[0]" );
$res = File::Copy::move ("$fr", "$to");
if ($res == 0) { print "Move error:\t$fr\n" }
}
$movcnt++ if ($res);
}
my $notmoved = @From - $movcnt;
print "Moved:\t$movcnt out of ".@From." files.\n";

#***************************************
# Check if "FromDir" is to be deleted
# --------------
if ($RmvFrom) {
print "Remove:\t$FromDir ... ";
if ($notmoved > 0) {
print "not deleted, contains $notmoved file(s) that couldn't be moved!\n";
}
else {
# check if @Xfound has values (and their not directories)
# if so, delete these first before trying to remove the From directory
# for now, don't want to "unlink" a directory, and we're not doing a tree here
# ----------------
for (@Xfound) {
my $fr = File::Spec->canonpath( "$FromDir/$_" );
unlink ("$fr") if (!-d $fr) ; }
my $fr = File::Spec->canonpath( "$FromDir" );
if (!rmdir "$fr") {print "$!\n" } else {print "OK\n"; }
}
}
return 1;
}


#######################################################
# Get Common Elements (from two N-dimensioned Array's)
# IN - Refs to the NxN arrays to compare,
# sort flag and the compare field.
# OUT - Ndx's into Right_Array of matching elements
# ---------------------------------------------------
# Notes -
# 1. Elements are assumed textual and case insensitive
# 2. Ignores in-array duplicates
# 3. Sort will be done if sort flag > 0
#
sub GetCommonElementsNxM($$$$)
{
my ($A_Left,$A_Right,$Srtflg,$Fld) = @_;
$Srtflg = 0 unless defined $Srtflg;
$Fld = 0 unless defined $Fld;
# my @Dup = ();
my @Ndx = ();

if ($Srtflg > 0) {
@{$A_Left} = sort {uc($a->[$Fld]) cmp uc($b->[$Fld])} @{$A_Left};
@{$A_Right} = sort {uc($a->[$Fld]) cmp uc($b->[$Fld])} @{$A_Right};
} else {print "==> Common Elements : Not sorting arrays\n";}

my $rpos = 0;
my $rend = @{$A_Right};
my $cnt = 0;
my $llast = undef;
my $rlast = undef;
foreach my $left_element (@{$A_Left})
{
next if (uc($left_element->[$Fld]) eq uc($llast->[$Fld]));

$rpos += $cnt;
$cnt = 0;
foreach my $right_element (@{$A_Right}[$rpos..($rend-1)])
{
last if (uc($left_element->[$Fld]) lt uc($right_element->[$Fld]));
$cnt++;
next if (uc($right_element->[$Fld]) eq uc($rlast->[$Fld]));
if (uc($left_element->[$Fld]) eq uc($right_element->[$Fld]))
{
# push (@Dup, $right_element->[$Fld]); # the string
push (@Ndx, $rpos+$cnt-1); # the index into R_Array
last;
}
$rlast = $right_element;
}
$llast = $left_element;
last if ($rpos >= $rend);
}
# return (\@Dup);
return (\@Ndx);
}

sub GetCommonElementsNxM_HashMethod($$$$)
{
my ($A_Left,$A_Right,$Srtflg,$Fld) = @_;
$Srtflg = 0 unless defined $Srtflg;
$Fld = 0 unless defined $Fld;

my ($element,%count);
#my @Dup = ();
my @Ndx = ();
my $idx = 0;

foreach $element (@{$A_Left}) {
$count{uc $element->[$Fld]} = 1;
}
foreach $element (@{$A_Right}) {
if ($count{uc $element->[$Fld]} == 1) {
#push @Dup, $element->[$Fld];
push @Ndx, $idx;
$count{$element->[$Fld]}++;
}
$idx++;
}
# return (\@Dup);
return (\@Ndx);
}


#######################################################
# Get Common Elements from single Array's
# IN - Refs to the Nx1 arrays to compare, sort flag
# OUT - Ndx's into Right_Array of matching elements
# ---------------------------------------------------
# Notes -
# 1. Elements are assumed textual and case insensitive
# 2. Ignores in-array duplicates
# 3. Sort will be done if sort flag > 0
#######################################################
sub GetCommonElements($$$)
{
my ($A_Left,$A_Right,$Srtflg) = @_;
$Srtflg = 0 unless defined $Srtflg;
# my @Dup = ();
my @Ndx = ();

if ($Srtflg > 0) {
@{$A_Left} = sort {uc($a) cmp uc($b)} @{$A_Left};
@{$A_Right} = sort {uc($a) cmp uc($b)} @{$A_Right};
} else {print "==> Common Elements : Not sorting arrays\n";}

my $rpos = 0;
my $rend = @{$A_Right};
my $cnt = 0;
my $llast = '';
my $rlast = '';
foreach my $left_element (@{$A_Left})
{
next if (uc($left_element) eq uc($llast));

$rpos += $cnt;
$cnt = 0;
foreach my $right_element (@{$A_Right}[$rpos..($rend-1)])
{
last if (uc($left_element) lt uc($right_element));
$cnt++;
next if (uc($right_element) eq uc($rlast));
if (uc($left_element) eq uc($right_element))
{
# push (@Dup, $right_element); # the string
push (@Ndx, $rpos+$cnt-1); # the index into R_Array
last;
}
$rlast = $right_element;
}
$llast = $left_element;
last if ($rpos >= $rend);
}
# return (\@Dup);
return (\@Ndx);
}

sub GetCommonElements_HashMethod($$$$)
{
my ($A_Left,$A_Right,$Srtflg) = @_;
$Srtflg = 0 unless defined $Srtflg;

my ($element,%count);
#my @Dup = ();
my @Ndx = ();
my $idx = 0;

foreach $element (@{$A_Left}) {
$count{uc $element} = 1;
}
foreach $element (@{$A_Right}) {
if ($count{uc $element} == 1) {
#push @Dup, $element;
push @Ndx, $idx;
$count{$element}++;
}
$idx++;
}
# return (\@Dup);
return (\@Ndx);
}

1;
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top