More Regexp and file load problems

D

Don Levan

Hi All,

I am struggling here and would be most appreciative of any help.

I have a regular expression problem I can't seem to figure my way out
of. I had sent a similar request this morning, and gratefully
received two emails from David Black and Axel Etzold. Unfortunately,
I am still having difficulty.

I am trying to extract the font name attribute out of the following
xml (this is an excerpt of a larger file). The weird thing, is that
my regular expression correctly matches the text if I use TextMate's
regular expression 'find in project feature' and with the freeware
Reggy regular expression test tool found on Google code. It also
matches correctly, if I text against a subset of the xml being
parsed. I.E., see this first example.


#!/usr/bin/env ruby
#
# Created by Don Levan on 2007-06-26.
# Copyright (c) 2007. All rights reserved.



string = %q(</TextObj>
</Object>
<ObjectStyle id="0" fontHeight="11" graphicFormat="3843"
fieldBorders="132">
<CharacterStyle mask="16183">
<Font-family codeSet="Roman" fontId="0">Helvetica</Font-family>
<Font-size>9</Font-size>
<Face>256</Face>
<Color>#000000</Color>
</CharacterStyle>")


regexp = Regexp.new(/^\s*<Font-family codeSet=\"\w*\" fontId=\"\d*\">
(\w*)<\/Font-family>\s*$/)

if string =~ regexp
puts "#{$1}"

end


Result: Helvetica




However, if I run this script, there is no result.


#!/usr/bin/env ruby
#
# Created by Don Levan on 2007-06-26.
# Copyright (c) 2007. All rights reserved.




regexp = Regexp.new(/^\s*<Font-family codeSet=\"\w*\" fontId=\"\d*\">
(\w*)<\/Font-family>\s*$/)
file = File.new('/Users/donlevan/Desktop/DDRs/Apple Dealer Price
List.xml')

file.each do |line|
if line =~ regexp
puts "#{$1}"

end
end



I have looked at Hiproct and XML simple, unfortunately I can not get
TextMate configured correctly so I keep getting Loaderrors when I try
to use ruby gems.

Thanks,

Don
 
P

Peña, Botp

From: Don Levan [mailto:[email protected]] :
# file =3D File.new('/Users/donlevan/Desktop/DDRs/Apple Dealer Price =20
# List.xml')
#=20
# file.each do |line|
# if line =3D~ regexp

since your regex passes the string test, i would suspect the contents of =
the file itself. possibly the expression is broken into multiple lines..

# puts "#{$1}"
# end
# end


try grepping the file first

root@pc4all:~# grep -i Helvetica test.txt
<Font-family codeSet=3D"Roman" fontId=3D"0">Helvetica</Font-family>

then pass the result to simplified ruby program=20

root@pc4all:~# cat test.rb
regexp =3D Regexp.new(/^\s*<Font-family codeSet=3D\"\w*\" =
fontId=3D\"\d*\">(\w*)<\/Font-family>\s*$/)
ARGF.each do |line|
if line =3D~ regexp
puts "#{$1}"

end
end


root@pc4all:~# grep -i Helvetica test.txt | ruby test.rb
Helvetica
 
G

Greg

Hi All,

I am struggling here and would be most appreciative of any help.

I have a regular expression problem I can't seem to figure my way out
of. I had sent a similar request this morning, and gratefully
received two emails from David Black and Axel Etzold. Unfortunately,
I am still having difficulty.

I am trying to extract the font name attribute out of the following
xml (this is an excerpt of a larger file). The weird thing, is that
my regular expression correctly matches the text if I use TextMate's
regular expression 'find in project feature' and with the freeware
Reggy regular expression test tool found on Google code. It also
matches correctly, if I text against a subset of the xml being
parsed. I.E., see this first example.

#!/usr/bin/env ruby
#
# Created by Don Levan on 2007-06-26.
# Copyright (c) 2007. All rights reserved.

string = %q(</TextObj>
</Object>
<ObjectStyle id="0" fontHeight="11" graphicFormat="3843"
fieldBorders="132">
<CharacterStyle mask="16183">
<Font-family codeSet="Roman" fontId="0">Helvetica</Font-family>
<Font-size>9</Font-size>
<Face>256</Face>
<Color>#000000</Color>
</CharacterStyle>")

regexp = Regexp.new(/^\s*<Font-family codeSet=\"\w*\" fontId=\"\d*\">
(\w*)<\/Font-family>\s*$/)

if string =~ regexp
puts "#{$1}"

end

Result: Helvetica

However, if I run this script, there is no result.

#!/usr/bin/env ruby
#
# Created by Don Levan on 2007-06-26.
# Copyright (c) 2007. All rights reserved.

regexp = Regexp.new(/^\s*<Font-family codeSet=\"\w*\" fontId=\"\d*\">
(\w*)<\/Font-family>\s*$/)
file = File.new('/Users/donlevan/Desktop/DDRs/Apple Dealer Price
List.xml')

file.each do |line|
if line =~ regexp
puts "#{$1}"

end
end

I have looked at Hiproct and XML simple, unfortunately I can not get
TextMate configured correctly so I keep getting Loaderrors when I try
to use ruby gems.

Thanks,

Don

You use an xml library, possibly rexml
don't forget require 'rubygems'
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,060
Latest member
BuyKetozenseACV

Latest Threads

Top