capitalizing words

P

Peter Bailey

Hi,
I need to capitalize the words in a string I find in XML files.

The string that's in (.*) below is what I need to change. I just want to
capitalize the first letter of each word in the string.

I'm trying this, in a test:

Dir.chdir("C:/users/pb4072/documents")
file = File.read("test1.txt")
file.gsub(/^<row><entry><text><emph face="b">(.*)<\/emph>/) do |match|
array = $1.split
array.each do |word|
word.capitalize!
end
newfile = File.open("c:/users/pb4072/documents/test1.txt", "w") { |f|
f.print array }
end

And, I'm getting this:

#(.*)<\/emph>theQuickBrownFoxJumpedOverTheLazyDog.

I want this:

<row><entry><text><emph face="b">The Quick Brown Fox Jumped Over The
Lazy Dog.<\/emph>/


Thanks,
Peter
 
T

Todd Benson

Hi,
I need to capitalize the words in a string I find in XML files.

The string that's in (.*) below is what I need to change. I just want to
capitalize the first letter of each word in the string.

I'm trying this, in a test:

Dir.chdir("C:/users/pb4072/documents")
file = File.read("test1.txt")
file.gsub(/^<row><entry><text><emph face="b">(.*)<\/emph>/) do |match|
array = $1.split
array.each do |word|
word.capitalize!
end
newfile = File.open("c:/users/pb4072/documents/test1.txt", "w") { |f|
f.print array }
end

And, I'm getting this:

#(.*)<\/emph>theQuickBrownFoxJumpedOverTheLazyDog.

I want this:

<row><entry><text><emph face="b">The Quick Brown Fox Jumped Over The
Lazy Dog.<\/emph>/


Thanks,
Peter

I don't know what the original text looks like in test1.txt, but this
might point you in the right direction...

irb(main):001:0> s = "the quick brown fox"
=> "the quick brown fox"
irb(main):002:0> s.split.map {|w| w.capitalize}.join ' '
=> "The Quick Brown Fox"

Todd
 
R

Rob Biedenharn

Hi,
I need to capitalize the words in a string I find in XML files.

The string that's in (.*) below is what I need to change. I just
want to
capitalize the first letter of each word in the string.

I'm trying this, in a test:

Dir.chdir("C:/users/pb4072/documents")
file = File.read("test1.txt")
file.gsub(/^<row><entry><text><emph face="b">(.*)<\/emph>/) do |
match|
array = $1.split
array.each do |word|
word.capitalize!
end
newfile = File.open("c:/users/pb4072/documents/test1.txt", "w") { |f|
f.print array }
end

And, I'm getting this:

#(.*)<\/emph>theQuickBrownFoxJumpedOverTheLazyDog.

I want this:

<row><entry><text><emph face="b">The Quick Brown Fox Jumped Over The
Lazy Dog.<\/emph>/


Thanks,
Peter

Dir.chdir("C:/users/pb4072/documents") do |d|
file = File.read("test1.txt")
output = file.gsub(%r{^(<row><entry><text><emph face="b">)(.*)(</
emph>)}m) do |match|
"#{$1}#{$2.gsub(%r{\b\w+\b}){|w|w.capitalize}}#{$3}"
end
File.open("test1.txt", "w") { |f| f.write output }
end

Note the use of three capture groups to get the unchanged initial and
final parts as well as the middle part that is altered. The %r{\b\w+
\b} is a Regexp that matches words, \b is a word-boundary and \w is a
word-character (short for [a-zA-Z0-9_]). Your use of
String#capitalize! returns nil if no change is made.

-Rob

Rob Biedenharn http://agileconsultingllc.com
(e-mail address removed)
 
P

Peter Bailey

Todd said:
I don't know what the original text looks like in test1.txt, but this
might point you in the right direction...

irb(main):001:0> s = "the quick brown fox"
=> "the quick brown fox"
irb(main):002:0> s.split.map {|w| w.capitalize}.join ' '
=> "The Quick Brown Fox"

Todd

Thanks, Todd.
The original text is just:
<row><entry><text><emph face="b">THE QUICK BROWN FOX JUMPED OVER THE
LAZY DOG.<\/emph>/

Should I just make your "s" equal to $1 from my original gsub?

-Peter
 
P

Peter Bailey

Rob said:
file = File.read("test1.txt")
And, I'm getting this:
Peter
Dir.chdir("C:/users/pb4072/documents") do |d|
file = File.read("test1.txt")
output = file.gsub(%r{^(<row><entry><text><emph face="b">)(.*)(</
emph>)}m) do |match|
"#{$1}#{$2.gsub(%r{\b\w+\b}){|w|w.capitalize}}#{$3}"
end
File.open("test1.txt", "w") { |f| f.write output }
end

Note the use of three capture groups to get the unchanged initial and
final parts as well as the middle part that is altered. The %r{\b\w+
\b} is a Regexp that matches words, \b is a word-boundary and \w is a
word-character (short for [a-zA-Z0-9_]). Your use of
String#capitalize! returns nil if no change is made.

-Rob

Rob Biedenharn http://agileconsultingllc.com
(e-mail address removed)

Thanks, Rob. This works beautifully, except that I need that last
</emph> in my output. It's being stripped with your code. I don't see
why, because it's just your $3, isn't it?
 
R

Rob Biedenharn

Rob said:
file = File.read("test1.txt")
And, I'm getting this:
Peter
Dir.chdir("C:/users/pb4072/documents") do |d|
file = File.read("test1.txt")
output = file.gsub(%r{^(<row><entry><text><emph face="b">)(.*)(</
emph>)}m) do |match|
"#{$1}#{$2.gsub(%r{\b\w+\b}){|w|w.capitalize}}#{$3}"
end
File.open("test1.txt", "w") { |f| f.write output }
end

Note the use of three capture groups to get the unchanged initial and
final parts as well as the middle part that is altered. The %r{\b\w+
\b} is a Regexp that matches words, \b is a word-boundary and \w is a
word-character (short for [a-zA-Z0-9_]). Your use of
String#capitalize! returns nil if no change is made.

-Rob

Rob Biedenharn http://agileconsultingllc.com
(e-mail address removed)

Thanks, Rob. This works beautifully, except that I need that last
</emph> in my output. It's being stripped with your code. I don't see
why, because it's just your $3, isn't it?

You said to Todd:
The original text is just:
<row><entry><text><emph face="b">THE QUICK BROWN FOX JUMPED OVER THE
LAZY DOG.<\/emph>/

I assumed that the "<\/emph>/" part was a cut-n-paste of a regexp for
the email (which is one reason that I change from // to %r{}
construction of the Regexp so the / wouldn't have to be escaped. You
may have to change the second group to (.*?) [reluctant match rather
than greedy match] or adjust the third group to exactly match your
input.

-Rob

Rob Biedenharn http://agileconsultingllc.com
(e-mail address removed)
 
P

Peter Bailey

Rob said:
Thanks, Rob. This works beautifully, except that I need that last
</emph> in my output. It's being stripped with your code. I don't see
why, because it's just your $3, isn't it?

You said to Todd:
The original text is just:
<row><entry><text><emph face="b">THE QUICK BROWN FOX JUMPED OVER THE
LAZY DOG.<\/emph>/

I assumed that the "<\/emph>/" part was a cut-n-paste of a regexp for
the email (which is one reason that I change from // to %r{}
construction of the Regexp so the / wouldn't have to be escaped. You
may have to change the second group to (.*?) [reluctant match rather
than greedy match] or adjust the third group to exactly match your
input.

-Rob

Rob Biedenharn http://agileconsultingllc.com
(e-mail address removed)

Rob,
So, here's my original file:
<row><entry><text><emph face="b">THE QUICK BROWN FOX JUMPED OVER THE
LAZY DOG.</emph>

Here's my code, from you:
Dir.chdir("C:/users/pb4072/documents") do |d|
file = File.read("test1.txt")
output = file.gsub(%r{^(<row><entry><text><emph
face="b">)(.*)(<\/emph>)}m) do |match|
"#{$1}#{$2.gsub(%r{\b\w+\b}){|w|w.capitalize}}#{$3}"
end
File.open("test1.txt", "w") { |f| f.write output }
end

Here's what I get. It works great, but, I don't understand why the $3
text is simply blown away.
<row><entry><text><emph face="b">The Quick Brown Fox Jumped Over The
Lazy Dog.

Thanks,
Peter
 
J

Jens Wille

hi peter!

Peter Bailey [2008-04-09 20:04]:
Dir.chdir("C:/users/pb4072/documents") do |d| file =
File.read("test1.txt") output =
file.gsub(%r{^(<row><entry><text><emph
face="b">)(.*)(<\/emph>)}m) do |match|
"#{$1}#{$2.gsub(%r{\b\w+\b}){|w|w.capitalize}}#{$3}" end
File.open("test1.txt", "w") { |f| f.write output } end

Here's what I get. It works great, but, I don't understand why
the $3 text is simply blown away.
because it's reset when you're doing that gsub on $2. the capture
variables only refer to the *last* match. so you have to capture
them into local variables first (can't think of a better way right now).

cheers
jens

--
Jens Wille, Dipl.-Bibl. (FH)
prometheus - Das verteilte digitale Bildarchiv für Forschung & Lehre
Kunsthistorisches Institut der Universität zu Köln
Albertus-Magnus-Platz, D-50923 Köln
Tel.: +49 (0)221 470-6668, E-Mail: (e-mail address removed)
http://www.prometheus-bildarchiv.de/
 
R

Rob Biedenharn

Rob said:
end

Rob Biedenharn http://agileconsultingllc.com
(e-mail address removed)

Thanks, Rob. This works beautifully, except that I need that last
</emph> in my output. It's being stripped with your code. I don't
see
why, because it's just your $3, isn't it?

You said to Todd:
The original text is just:
<row><entry><text><emph face="b">THE QUICK BROWN FOX JUMPED OVER THE
LAZY DOG.<\/emph>/

I assumed that the "<\/emph>/" part was a cut-n-paste of a regexp for
the email (which is one reason that I change from // to %r{}
construction of the Regexp so the / wouldn't have to be escaped. You
may have to change the second group to (.*?) [reluctant match rather
than greedy match] or adjust the third group to exactly match your
input.

-Rob

Rob Biedenharn http://agileconsultingllc.com
(e-mail address removed)

Rob,
So, here's my original file:
<row><entry><text><emph face="b">THE QUICK BROWN FOX JUMPED OVER THE
LAZY DOG.</emph>
OK, change this to a regexp:
1. surround with the regexp literal bits
%r{<row><entry><text><emph face="b">THE QUICK BROWN FOX JUMPED OVER THE
LAZY DOG.</emph>}m

2. add the grouping ()'s
%r{(<row><entry><text><emph face="b">)(THE QUICK BROWN FOX JUMPED OVER
THE
LAZY DOG.)(</emph>)}m

3. replace text with wildcards .* or .*?
%r{(<row><entry><text><emph face="b">)(.*?)(</emph>)}m

4. (optional?) add anchor ^
%r{^(<row><entry><text><emph face="b">)(.*?)(</emph>)}m

I'm assuming that is not the WHOLE file since the <row><entry><text>
tags are not closed. It it quite likely that .* is slurping a lot
more that you think so that's why I've change this to .*? which
matches as little as possible while continuing to succeed.

-Rob

Rob Biedenharn http://agileconsultingllc.com
(e-mail address removed)
 
R

Rob Biedenharn

hi peter!

Peter Bailey [2008-04-09 20:04]:
Dir.chdir("C:/users/pb4072/documents") do |d| file =3D
File.read("test1.txt") output =3D
file.gsub(%r{^(<row><entry><text><emph
face=3D"b">)(.*)(<\/emph>)}m) do |match|
"#{$1}#{$2.gsub(%r{\b\w+\b}){|w|w.capitalize}}#{$3}" end
File.open("test1.txt", "w") { |f| f.write output } end

Here's what I get. It works great, but, I don't understand why
the $3 text is simply blown away.
because it's reset when you're doing that gsub on $2. the capture
variables only refer to the *last* match. so you have to capture
them into local variables first (can't think of a better way right =20
now).

cheers
jens

--=20
Jens Wille, Dipl.-Bibl. (FH)
prometheus - Das verteilte digitale Bildarchiv f=FCr Forschung & Lehre
Kunsthistorisches Institut der Universit=E4t zu K=F6ln
Albertus-Magnus-Platz, D-50923 K=F6ln
Tel.: +49 (0)221 470-6668, E-Mail: (e-mail address removed)
http://www.prometheus-bildarchiv.de/


Ah yes! Good catch, Jens.

Peter, you only *need* to capture $3, but it would make sense to get =20
them all:

head, content, tail =3D $1, $2, $3
"#{head}#{content.gsub(%r{\b\w+\b}){|w|w.capitalize}}#{tail}"

-Rob

Rob Biedenharn http://agileconsultingllc.com
(e-mail address removed)=
 
J

Jens Wille

J

Jens Wille

Peter Bailey [2008-04-09 20:04]:
output = file.gsub(%r{^(<row><entry><text><emph
face="b">)(.*)(<\/emph>)}m) do |match|
"#{$1}#{$2.gsub(%r{\b\w+\b}){|w|w.capitalize}}#{$3}"
end
oh, and for the fun of it, here's what you can do with oniguruma:

Oniguruma::ORegexp.new(
'(?<=^<row><entry><text><emph face="b">).+(?=</emph>)', 'm'
).gsub(file) { |md|
md[0].gsub(%r{\b\w+\b}) { |w| w.capitalize }
}

(note that i needed to change '.*' to '.+')

cheers
jens
 
P

Peter Bailey

Jens said:
Peter Bailey [2008-04-09 20:04]:
output = file.gsub(%r{^(<row><entry><text><emph
face="b">)(.*)(<\/emph>)}m) do |match|
"#{$1}#{$2.gsub(%r{\b\w+\b}){|w|w.capitalize}}#{$3}"
end
oh, and for the fun of it, here's what you can do with oniguruma:

Oniguruma::ORegexp.new(
'(?<=^<row><entry><text><emph face="b">).+(?=</emph>)', 'm'
).gsub(file) { |md|
md[0].gsub(%r{\b\w+\b}) { |w| w.capitalize }
}

(note that i needed to change '.*' to '.+')

cheers
jens

Sorry, Jens, but, I have no idea what you're referring to here. I
googled oniguruma. I see what it is. I installed it, but, it didn't seem
to install successfully. Do I do a "require oniguruma" at the top of my
script?
 
J

Jens Wille

Peter Bailey [2008-04-10 14:26]:
Do I do a "require oniguruma" at the top of my script?
sure. but you really don't need it to solve your task at hand.

it's just the new regexp engine for ruby 1.9 and sometimes i like to
do some stuff with it that the default engine of 1.8 can't do
(zero-width look-behind in this case).

you can still simplify your substitution by using the look-ahead
(which 1.8 *does* understand), so you get rid of the third capture:

file.gsub(%r{^(<row><entry><text><emph>
face="b">)(.*)(?=<\/emph>)}m) {
"#{$1}#{$2.gsub(%r{\b\w+\b}) { |w| w.capitalize }}"
}

cheers
jens
 
P

Peter Bailey

Jens said:
Peter Bailey [2008-04-10 14:26]:
Do I do a "require oniguruma" at the top of my script?
sure. but you really don't need it to solve your task at hand.

it's just the new regexp engine for ruby 1.9 and sometimes i like to
do some stuff with it that the default engine of 1.8 can't do
(zero-width look-behind in this case).

you can still simplify your substitution by using the look-ahead
(which 1.8 *does* understand), so you get rid of the third capture:

file.gsub(%r{^(<row><entry><text><emph>
face="b">)(.*)(?=<\/emph>)}m) {
"#{$1}#{$2.gsub(%r{\b\w+\b}) { |w| w.capitalize }}"
}

cheers
jens

Thanks. But, again, do I need to do a "require" for oniguruma at the
top?
Cheers,
Peter
 
J

Jens Wille

Peter Bailey [2008-04-10 16:20]:
Jens said:
Peter Bailey [2008-04-10 14:26]:
Do I do a "require oniguruma" at the top of my script?
sure. but you really don't need it to solve your task at hand.
Thanks. But, again, do I need to do a "require" for oniguruma at
the top?
if you want to use oniguruma, then yes, you have to require it first.
 
P

Peter Bailey

Jens said:
Peter Bailey [2008-04-10 16:20]:
Jens said:
Peter Bailey [2008-04-10 14:26]:
Do I do a "require oniguruma" at the top of my script?
sure. but you really don't need it to solve your task at hand.
Thanks. But, again, do I need to do a "require" for oniguruma at
the top?
if you want to use oniguruma, then yes, you have to require it first.

OK. Thanks!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,535
Members
45,007
Latest member
obedient dusk

Latest Threads

Top