capitalizing words

Peter Bailey · Apr 8, 2008

Hi,
I need to capitalize the words in a string I find in XML files.

The string that's in (.*) below is what I need to change. I just want to
capitalize the first letter of each word in the string.

I'm trying this, in a test:

Dir.chdir("C:/users/pb4072/documents")
file = File.read("test1.txt")
file.gsub(/^<row><entry><text><emph face="b">(.*)<\/emph>/) do |match|
array = $1.split
array.each do |word|
word.capitalize!
end
newfile = File.open("c:/users/pb4072/documents/test1.txt", "w") { |f|
f.print array }
end

And, I'm getting this:

#(.*)<\/emph>theQuickBrownFoxJumpedOverTheLazyDog.

I want this:

<row><entry><text><emph face="b">The Quick Brown Fox Jumped Over The
Lazy Dog.<\/emph>/

Thanks,
Peter

Todd Benson · Apr 8, 2008

Hi,
I need to capitalize the words in a string I find in XML files.

The string that's in (.*) below is what I need to change. I just want to
capitalize the first letter of each word in the string.

I'm trying this, in a test:

Dir.chdir("C:/users/pb4072/documents")
file = File.read("test1.txt")
file.gsub(/^<row><entry><text><emph face="b">(.*)<\/emph>/) do |match|
array = $1.split
array.each do |word|
word.capitalize!
end
newfile = File.open("c:/users/pb4072/documents/test1.txt", "w") { |f|
f.print array }
end

And, I'm getting this:

#(.*)<\/emph>theQuickBrownFoxJumpedOverTheLazyDog.

I want this:

<row><entry><text><emph face="b">The Quick Brown Fox Jumped Over The
Lazy Dog.<\/emph>/

Thanks,
Peter

I don't know what the original text looks like in test1.txt, but this
might point you in the right direction...

irb(main):001:0> s = "the quick brown fox"
=> "the quick brown fox"
irb(main):002:0> s.split.map {|w| w.capitalize}.join ' '
=> "The Quick Brown Fox"

Todd

Rob Biedenharn · Apr 9, 2008

Hi,
I need to capitalize the words in a string I find in XML files.

The string that's in (.*) below is what I need to change. I just
want to
capitalize the first letter of each word in the string.

I'm trying this, in a test:

Dir.chdir("C:/users/pb4072/documents")
file = File.read("test1.txt")
file.gsub(/^<row><entry><text><emph face="b">(.*)<\/emph>/) do |
match|
array = $1.split
array.each do |word|
word.capitalize!
end
newfile = File.open("c:/users/pb4072/documents/test1.txt", "w") { |f|
f.print array }
end

And, I'm getting this:

#(.*)<\/emph>theQuickBrownFoxJumpedOverTheLazyDog.

I want this:

<row><entry><text><emph face="b">The Quick Brown Fox Jumped Over The
Lazy Dog.<\/emph>/

Thanks,
Peter

Dir.chdir("C:/users/pb4072/documents") do |d|
file = File.read("test1.txt")
output = file.gsub(%r{^(<row><entry><text><emph face="b">)(.*)(</
emph>)}m) do |match|
"#{$1}#{$2.gsub(%r{\b\w+\b}){|w|w.capitalize}}#{$3}"
end
File.open("test1.txt", "w") { |f| f.write output }
end

Note the use of three capture groups to get the unchanged initial and
final parts as well as the middle part that is altered. The %r{\b\w+
\b} is a Regexp that matches words, \b is a word-boundary and \w is a
word-character (short for [a-zA-Z0-9_]). Your use of
String#capitalize! returns nil if no change is made.

-Rob

Rob Biedenharn http://agileconsultingllc.com
(e-mail address removed)

Peter Bailey · Apr 9, 2008

Todd said:
I don't know what the original text looks like in test1.txt, but this
might point you in the right direction...

irb(main):001:0> s = "the quick brown fox"
=> "the quick brown fox"
irb(main):002:0> s.split.map {|w| w.capitalize}.join ' '
=> "The Quick Brown Fox"

Todd

Thanks, Todd.
The original text is just:
<row><entry><text><emph face="b">THE QUICK BROWN FOX JUMPED OVER THE
LAZY DOG.<\/emph>/

Should I just make your "s" equal to $1 from my original gsub?

-Peter

Peter Bailey · Apr 9, 2008

Rob said:
file = File.read("test1.txt")
And, I'm getting this:
Peter

Click to expand...

Dir.chdir("C:/users/pb4072/documents") do |d|
file = File.read("test1.txt")
output = file.gsub(%r{^(<row><entry><text><emph face="b">)(.*)(</
emph>)}m) do |match|
"#{$1}#{$2.gsub(%r{\b\w+\b}){|w|w.capitalize}}#{$3}"
end
File.open("test1.txt", "w") { |f| f.write output }
end

Note the use of three capture groups to get the unchanged initial and
final parts as well as the middle part that is altered. The %r{\b\w+
\b} is a Regexp that matches words, \b is a word-boundary and \w is a
word-character (short for [a-zA-Z0-9_]). Your use of
String#capitalize! returns nil if no change is made.

-Rob

Rob Biedenharn http://agileconsultingllc.com
(e-mail address removed)

Thanks, Rob. This works beautifully, except that I need that last
</emph> in my output. It's being stripped with your code. I don't see
why, because it's just your $3, isn't it?

Rob Biedenharn · Apr 9, 2008

Rob said:
Rob said:

file = File.read("test1.txt")
And, I'm getting this:
Peter

Click to expand...

Dir.chdir("C:/users/pb4072/documents") do |d|
file = File.read("test1.txt")
output = file.gsub(%r{^(<row><entry><text><emph face="b">)(.*)(</
emph>)}m) do |match|
"#{$1}#{$2.gsub(%r{\b\w+\b}){|w|w.capitalize}}#{$3}"
end
File.open("test1.txt", "w") { |f| f.write output }
end

Note the use of three capture groups to get the unchanged initial and
final parts as well as the middle part that is altered. The %r{\b\w+
\b} is a Regexp that matches words, \b is a word-boundary and \w is a
word-character (short for [a-zA-Z0-9_]). Your use of
String#capitalize! returns nil if no change is made.

-Rob

Rob Biedenharn http://agileconsultingllc.com
(e-mail address removed)

Click to expand...

Thanks, Rob. This works beautifully, except that I need that last
</emph> in my output. It's being stripped with your code. I don't see
why, because it's just your $3, isn't it?

You said to Todd:
The original text is just:
<row><entry><text><emph face="b">THE QUICK BROWN FOX JUMPED OVER THE
LAZY DOG.<\/emph>/

I assumed that the "<\/emph>/" part was a cut-n-paste of a regexp for
the email (which is one reason that I change from // to %r{}
construction of the Regexp so the / wouldn't have to be escaped. You
may have to change the second group to (.*?) [reluctant match rather
than greedy match] or adjust the third group to exactly match your
input.

-Rob

Rob Biedenharn http://agileconsultingllc.com
(e-mail address removed)

Peter Bailey · Apr 9, 2008

Rob said:
Thanks, Rob. This works beautifully, except that I need that last
</emph> in my output. It's being stripped with your code. I don't see
why, because it's just your $3, isn't it?

Click to expand...

You said to Todd:
The original text is just:
<row><entry><text><emph face="b">THE QUICK BROWN FOX JUMPED OVER THE
LAZY DOG.<\/emph>/

I assumed that the "<\/emph>/" part was a cut-n-paste of a regexp for
the email (which is one reason that I change from // to %r{}
construction of the Regexp so the / wouldn't have to be escaped. You
may have to change the second group to (.*?) [reluctant match rather
than greedy match] or adjust the third group to exactly match your
input.

-Rob

Rob Biedenharn http://agileconsultingllc.com
(e-mail address removed)

Rob,
So, here's my original file:
<row><entry><text><emph face="b">THE QUICK BROWN FOX JUMPED OVER THE
LAZY DOG.</emph>

Here's my code, from you:
Dir.chdir("C:/users/pb4072/documents") do |d|
file = File.read("test1.txt")
output = file.gsub(%r{^(<row><entry><text><emph
face="b">)(.*)(<\/emph>)}m) do |match|
"#{$1}#{$2.gsub(%r{\b\w+\b}){|w|w.capitalize}}#{$3}"
end
File.open("test1.txt", "w") { |f| f.write output }
end

Here's what I get. It works great, but, I don't understand why the $3
text is simply blown away.
<row><entry><text><emph face="b">The Quick Brown Fox Jumped Over The
Lazy Dog.

Thanks,
Peter

Jens Wille · Apr 9, 2008

hi peter!

Peter Bailey [2008-04-09 20:04]:

Dir.chdir("C:/users/pb4072/documents") do |d| file =
File.read("test1.txt") output =
file.gsub(%r{^(<row><entry><text><emph
face="b">)(.*)(<\/emph>)}m) do |match|
"#{$1}#{$2.gsub(%r{\b\w+\b}){|w|w.capitalize}}#{$3}" end
File.open("test1.txt", "w") { |f| f.write output } end

Here's what I get. It works great, but, I don't understand why
the $3 text is simply blown away.

because it's reset when you're doing that gsub on $2. the capture
variables only refer to the *last* match. so you have to capture
them into local variables first (can't think of a better way right now).

cheers
jens

--
Jens Wille, Dipl.-Bibl. (FH)
prometheus - Das verteilte digitale Bildarchiv fÃ¼r Forschung & Lehre
Kunsthistorisches Institut der UniversitÃ¤t zu KÃ¶ln
Albertus-Magnus-Platz, D-50923 KÃ¶ln
Tel.: +49 (0)221 470-6668, E-Mail: (e-mail address removed)
http://www.prometheus-bildarchiv.de/

Rob Biedenharn · Apr 9, 2008

Rob said:
Rob said:

end

Rob Biedenharn http://agileconsultingllc.com
(e-mail address removed)

Thanks, Rob. This works beautifully, except that I need that last
</emph> in my output. It's being stripped with your code. I don't
see
why, because it's just your $3, isn't it?

Click to expand...

You said to Todd:
The original text is just:
<row><entry><text><emph face="b">THE QUICK BROWN FOX JUMPED OVER THE
LAZY DOG.<\/emph>/

I assumed that the "<\/emph>/" part was a cut-n-paste of a regexp for
the email (which is one reason that I change from // to %r{}
construction of the Regexp so the / wouldn't have to be escaped. You
may have to change the second group to (.*?) [reluctant match rather
than greedy match] or adjust the third group to exactly match your
input.

-Rob

Rob Biedenharn http://agileconsultingllc.com
(e-mail address removed)

Click to expand...

Rob,
So, here's my original file:
<row><entry><text><emph face="b">THE QUICK BROWN FOX JUMPED OVER THE
LAZY DOG.</emph>

OK, change this to a regexp:
1. surround with the regexp literal bits
%r{<row><entry><text><emph face="b">THE QUICK BROWN FOX JUMPED OVER THE
LAZY DOG.</emph>}m

2. add the grouping ()'s
%r{(<row><entry><text><emph face="b">)(THE QUICK BROWN FOX JUMPED OVER
THE
LAZY DOG.)(</emph>)}m

3. replace text with wildcards .* or .*?
%r{(<row><entry><text><emph face="b">)(.*?)(</emph>)}m

4. (optional?) add anchor ^
%r{^(<row><entry><text><emph face="b">)(.*?)(</emph>)}m

I'm assuming that is not the WHOLE file since the <row><entry><text>
tags are not closed. It it quite likely that .* is slurping a lot
more that you think so that's why I've change this to .*? which
matches as little as possible while continuing to succeed.

-Rob

Rob Biedenharn http://agileconsultingllc.com
(e-mail address removed)

Rob Biedenharn · Apr 9, 2008

hi peter!

Peter Bailey [2008-04-09 20:04]:

Dir.chdir("C:/users/pb4072/documents") do |d| file =3D
File.read("test1.txt") output =3D
file.gsub(%r{^(<row><entry><text><emph
face=3D"b">)(.*)(<\/emph>)}m) do |match|
"#{$1}#{$2.gsub(%r{\b\w+\b}){|w|w.capitalize}}#{$3}" end
File.open("test1.txt", "w") { |f| f.write output } end

Here's what I get. It works great, but, I don't understand why
the $3 text is simply blown away.

Click to expand...

because it's reset when you're doing that gsub on $2. the capture
variables only refer to the *last* match. so you have to capture
them into local variables first (can't think of a better way right =20
now).

cheers
jens

--=20
Jens Wille, Dipl.-Bibl. (FH)
prometheus - Das verteilte digitale Bildarchiv f=FCr Forschung & Lehre
Kunsthistorisches Institut der Universit=E4t zu K=F6ln
Albertus-Magnus-Platz, D-50923 K=F6ln
Tel.: +49 (0)221 470-6668, E-Mail: (e-mail address removed)
http://www.prometheus-bildarchiv.de/

Ah yes! Good catch, Jens.

Peter, you only *need* to capture $3, but it would make sense to get =20
them all:

head, content, tail =3D $1, $2, $3
"#{head}#{content.gsub(%r{\b\w+\b}){|w|w.capitalize}}#{tail}"

-Rob

Rob Biedenharn http://agileconsultingllc.com
(e-mail address removed)=

Jens Wille · Apr 9, 2008

Rob Biedenharn [2008-04-09 20:46]:

head, content, tail = $1, $2, $3
"#{head}#{content.gsub(%r{\b\w+\b}){|w|w.capitalize}}#{tail}"

now here's a quick implementation that passes the MatchData object
into the block:

<http://prometheus.khi.uni-koeln.de/svn/scratch/ruby-nuggets/lib/nuggets/string/sub_with_md.rb>

so that code effectively becomes:

str.gsub_with_md(re) { |md|
"#{md[1]}#{md[2].gsub(%r{\b\w+\b}){|w|w.capitalize}}#{md[3]}"
}

;-)

cheers
jens

Jens Wille · Apr 9, 2008

Peter Bailey [2008-04-09 20:04]:

output = file.gsub(%r{^(<row><entry><text><emph
face="b">)(.*)(<\/emph>)}m) do |match|
"#{$1}#{$2.gsub(%r{\b\w+\b}){|w|w.capitalize}}#{$3}"
end

oh, and for the fun of it, here's what you can do with oniguruma:

Oniguruma::ORegexp.new(
'(?<=^<row><entry><text><emph face="b">).+(?=</emph>)', 'm'
).gsub(file) { |md|
md[0].gsub(%r{\b\w+\b}) { |w| w.capitalize }
}

(note that i needed to change '.*' to '.+')

cheers
jens

Peter Bailey · Apr 10, 2008

Jens said:
Peter Bailey [2008-04-09 20:04]:

output = file.gsub(%r{^(<row><entry><text><emph
face="b">)(.*)(<\/emph>)}m) do |match|
"#{$1}#{$2.gsub(%r{\b\w+\b}){|w|w.capitalize}}#{$3}"
end

Click to expand...

oh, and for the fun of it, here's what you can do with oniguruma:

Oniguruma::ORegexp.new(
'(?<=^<row><entry><text><emph face="b">).+(?=</emph>)', 'm'
).gsub(file) { |md|
md[0].gsub(%r{\b\w+\b}) { |w| w.capitalize }
}

(note that i needed to change '.*' to '.+')

cheers
jens

Sorry, Jens, but, I have no idea what you're referring to here. I
googled oniguruma. I see what it is. I installed it, but, it didn't seem
to install successfully. Do I do a "require oniguruma" at the top of my
script?

Jens Wille · Apr 10, 2008

Peter Bailey [2008-04-10 14:26]:

Do I do a "require oniguruma" at the top of my script?

sure. but you really don't need it to solve your task at hand.

it's just the new regexp engine for ruby 1.9 and sometimes i like to
do some stuff with it that the default engine of 1.8 can't do
(zero-width look-behind in this case).

you can still simplify your substitution by using the look-ahead
(which 1.8 *does* understand), so you get rid of the third capture:

file.gsub(%r{^(<row><entry><text><emph>
face="b">)(.*)(?=<\/emph>)}m) {
"#{$1}#{$2.gsub(%r{\b\w+\b}) { |w| w.capitalize }}"
}

cheers
jens

Peter Bailey · Apr 10, 2008

Jens said:
Peter Bailey [2008-04-10 14:26]:

Do I do a "require oniguruma" at the top of my script?

Click to expand...

sure. but you really don't need it to solve your task at hand.

it's just the new regexp engine for ruby 1.9 and sometimes i like to
do some stuff with it that the default engine of 1.8 can't do
(zero-width look-behind in this case).

you can still simplify your substitution by using the look-ahead
(which 1.8 *does* understand), so you get rid of the third capture:

file.gsub(%r{^(<row><entry><text><emph>
face="b">)(.*)(?=<\/emph>)}m) {
"#{$1}#{$2.gsub(%r{\b\w+\b}) { |w| w.capitalize }}"
}

cheers
jens

Thanks. But, again, do I need to do a "require" for oniguruma at the
top?
Cheers,
Peter

Jens Wille · Apr 10, 2008

Peter Bailey [2008-04-10 16:20]:

Jens said:
Jens said:

Peter Bailey [2008-04-10 14:26]:

Do I do a "require oniguruma" at the top of my script?

Click to expand...

sure. but you really don't need it to solve your task at hand.

Click to expand...

Thanks. But, again, do I need to do a "require" for oniguruma at
the top?

if you want to use oniguruma, then yes, you have to require it first.

Peter Bailey · Apr 10, 2008

Jens said:
Peter Bailey [2008-04-10 16:20]:

Jens said:

Peter Bailey [2008-04-10 14:26]:
Do I do a "require oniguruma" at the top of my script?
sure. but you really don't need it to solve your task at hand.

Click to expand...

Thanks. But, again, do I need to do a "require" for oniguruma at
the top?

Click to expand...

if you want to use oniguruma, then yes, you have to require it first.

OK. Thanks!

Simple editing. Works in IRB; not as script.	5	Jan 16, 2007
Regular expressions, capture repeated groups	4	Jul 8, 2010
feedback on code design	23	May 30, 2012
Sencha Touch--Support 2 browsers in just 228K!	64	Jul 16, 2010
Please help package Search::VectorSpace; problem	2	Apr 5, 2005
Ruby Weekly News 3rd - 9th October 2005	0	Oct 11, 2005
Ruby Weekly News 24th - 30th January 2005	4	Jan 30, 2005
Ruby Weekly News 6th - 12th June 2005	0	Jun 14, 2005

capitalizing words

Peter Bailey

Todd Benson

Rob Biedenharn

Peter Bailey

Peter Bailey

Rob Biedenharn

Peter Bailey

Jens Wille

Rob Biedenharn

Rob Biedenharn

Jens Wille

Jens Wille

Peter Bailey

Jens Wille

Peter Bailey

Jens Wille

Peter Bailey

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads