Alex & Sebastian,
Thanks for taking the time to reply. The string.gsub(/start.*end/m,
'some_value') did indeed help, but I am afraid my problem is a bit
more
complicated.
I am basically trying to cleanup a long xml file. A typical part of
the
string looks like this:
<div class="field field-type-text field-field-
audience">
<h3 class="field-label">audience</h3>
<div class="field-items">
<div class="field-item">Public</div>
</div>
</div>
<div class="field field-type-text field-field-
creator">
<h3 class="field-label">creator</h3>
<div class="field-items">
<div class="field-item">Tom Jones</
div>
</div>
</div>
I am trying to format it like this:
<audience>Public</audience>
<creator>Tom Jones</creator>
So the problem is that the values in the xml change throughout the
string, so I cannot do a pattern match for them directly. Any ideas
would be hugely appreciated!
Jan
Without knowing the whole problem it is difficult to say what the
best solution is, but for the string you post above, I would clean it
up and parse with something like Hpricot:
require 'rubygems'
require 'hpricot'
string = DATA.read #read in string
string.gsub!(/</,'<') #Convert lt and gt symbols to real <>
string.gsub!(/>/,'>')
string.gsub!(/"/,'"') #Put in quotes
doc = Hpricot(string) #Parse with Hpricot
fields = ['audience','creator'] #Create array of 'fields' to extract
fields.each do |f| #For each field...
el = doc.search("//div[@class='field field-type-text field-field-#
{f}']") #...find appropriate divs
el.each do |e| # for each field div...
puts "<#{f}>" + e.at("//div[@class='field-item']").inner_html +
"</#{f}>" #print data
end
end
__END__
<div class="field field-type-text field-field-audience">
<h3 class="field-label">audience</h3>
<div class="field-items">
<div class="field-item">Public</div>
</div>
</div>
<div class="field field-type-text field-field-creator">
<h3 class="field-label">creator</h3>
<div class="field-items">
<div class="field-item">Tom Jones</div>
</div>
</div>
Alex Gutteridge
Bioinformatics Center
Kyoto University