Regex help needed for complicated mass renaming

B

BBQ

I have inherited a series of files in the following format:

TestTest1.txt
TestTest2.jpg
Test_Test3.txt
Test_TestTest4.txt
NASATest.txt
NASATest5.txt
TestTestTestTest.mpg
Test6Test.txt
Test.txt

which I want to rename as follows:

test_test1.txt
test_test2.jpg
test_test3.txt
test_test_test4.txt
nasa_test.txt
nasa_test5.txt
test_test_test_test.mpg
test6_test.txt
test.txt

In other words: everything lower case (this bit I can do) and
underscores before each uppercase letter (as long as it is not the
first character).

I've been trying (and failing) to put a '_' before every captial
letter, I was then going to chop of the first character if it is a '_'
(which I can do) and then lower case everything (which I can do). As
for not inserting multiple '_'s in between multiple capitials in a row
(NASATest.txt -> nasa_test.txt), I'm not sure where to start. I was
going to just alter those by hand.

Better solutions are most welcome (and if they are quite cryptic, a
few comments also would be great).
 
J

Jay Tilton

(e-mail address removed) (BBQ) wrote:

: I have inherited a series of files in the following format:
:
: TestTest1.txt
: TestTest2.jpg
: Test_Test3.txt
: Test_TestTest4.txt
: NASATest.txt
: NASATest5.txt
: TestTestTestTest.mpg
: Test6Test.txt
: Test.txt
:
: which I want to rename as follows:
:
: test_test1.txt
: test_test2.jpg
: test_test3.txt
: test_test_test4.txt
: nasa_test.txt
: nasa_test5.txt
: test_test_test_test.mpg
: test6_test.txt
: test.txt
:
: In other words: everything lower case (this bit I can do) and
: underscores before each uppercase letter (as long as it is not the
: first character).
:
: I've been trying (and failing) to put a '_' before every captial
: letter, I was then going to chop of the first character if it is a '_'
: (which I can do) and then lower case everything (which I can do). As
: for not inserting multiple '_'s in between multiple capitials in a row
: (NASATest.txt -> nasa_test.txt), I'm not sure where to start. I was
: going to just alter those by hand.

A zero-width negative lookahead assertion is just the ticket.

s/
( [[:upper:]] ) # An upper-case character
(?! [[:upper:]] ) # that is not followed by another
/_$1/xg; # Insert a preceding underscore

In fact, zero-width lookahead/lookbehind assertions can take care of most
of the other requirements.

s/
(?! ^) # Don't match at the string's beginning.
(?<! _ ) # A character not preceded by an underscore,
(?=
[[:upper:]] # ...that is a UC letter,
(?!
[[:upper:]] # ...but is not followed by another.
)
)
/_/xg; # Jam an underscore in the matched position

All that's left to do after that is to lower-case the string.
 
F

fifo

(e-mail address removed) (BBQ) wrote:

: I have inherited a series of files in the following format:
:
: TestTest1.txt
: TestTest2.jpg
: Test_Test3.txt
: Test_TestTest4.txt
: NASATest.txt
: NASATest5.txt
: TestTestTestTest.mpg
: Test6Test.txt
: Test.txt
:
: which I want to rename as follows:
:
: test_test1.txt
: test_test2.jpg
: test_test3.txt
: test_test_test4.txt
: nasa_test.txt
: nasa_test5.txt
: test_test_test_test.mpg
: test6_test.txt
: test.txt
:
[snip]
In fact, zero-width lookahead/lookbehind assertions can take care of most
of the other requirements.

s/
(?! ^) # Don't match at the string's beginning.
(?<! _ ) # A character not preceded by an underscore,
(?=
[[:upper:]] # ...that is a UC letter,
(?!
[[:upper:]] # ...but is not followed by another.
)
)
/_/xg; # Jam an underscore in the matched position

All that's left to do after that is to lower-case the string.

This works for the given list, but it doesn't handle things like
"TestNASA.txt", which may or may not be a problem. Here's another WTDI:

my $word = '[A-Z][A-Z][A-Z0-9]*(?![a-z])|[A-Z][a-z0-9]*';
s/($word)/_$1/og;
s/(^|_)_($word)/$1$2/og;
 
S

Steven Vasilogianis

I have inherited a series of files in the following format:

TestTest1.txt
TestTest2.jpg
Test_Test3.txt
Test_TestTest4.txt
NASATest.txt
NASATest5.txt
TestTestTestTest.mpg
Test6Test.txt
Test.txt

my @file_names = qw(
TestTest1.txt TestTest2.jpg Test_Test3.txt
Test_TestTest4.txt NASATest.txt NASATest5.txt
TestTestTestTest.mpg Test6Test.txt Test.txt
);

foreach ( @file_names ) {
@parts = split /(?=(?<!^)[A-Z](?![A-Z]))|_/;
print lc(join '_', @parts), "\n";
}

gives me the output you wanted:
test_test1.txt
test_test2.jpg
test_test3.txt
test_test_test4.txt
nasa_test.txt
nasa_test5.txt
test_test_test_test.mpg
test6_test.txt
test.txt

I came up with the regular expression independently of Jay Tilton's, but
they seem similar enough that I don't need to explain it again. Using
split just seemed to make sense to me here.

HTH,
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,904
Latest member
HealthyVisionsCBDPrice

Latest Threads

Top