sgub stretching over several lines

Jan Ask · Aug 4, 2007

Hi,

I am trying to do a in a replace a long (multiple line) string:

string = string.gsub(/<h3
class="field-label>audience<\/h3>
<div class="field-items>
<div class="field-item>/, '</audience>')

It somehow doesn't seem to work.

I would like to know how to use a wildcard like '*', '%' or '...' like
below:

string = string.gsub(/<h ... tem>/, '</audience>')

Thanks!
Ask

Shai Rosenfeld · Aug 5, 2007

Jan said:
Hi,

I am trying to do a in a replace a long (multiple line) string:

string = string.gsub(/<h3
class="field-label>audience<\/h3>
<div class="field-items>
<div class="field-item>/, '</audience>')

It somehow doesn't seem to work.

I would like to know how to use a wildcard like '*', '%' or '...' like
below:

string = string.gsub(/<h ... tem>/, '</audience>')

Thanks!
Ask

string = string.gsub(/<h.*tem>/, '</audience>')

*

# the . regexp wildcard means, 'any' character (space, symbol, letter)
# the * regexp wildcard is an operator saying: any regexp match before
me can appear 0 or more times:

i.e, you get whatever character you want, however many times you want
it, between the '<h' string and the 'tem>' string.
hth

happy sunday btw

Alex Gutteridge · Aug 5, 2007

.*

# the . regexp wildcard means, 'any' character (space, symbol, letter)
# the * regexp wildcard is an operator saying: any regexp match before
me can appear 0 or more times:

i.e, you get whatever character you want, however many times you want
it, between the '<h' string and the 'tem>' string.
hth

happy sunday btw

* won't match over multiple lines without the m modifier on the
RegExp, which I think is the OP's problem:

irb(main):021:0> string = "Hi\nJan\nAsk"
=> "Hi\nJan\nAsk"
irb(main):022:0> string.gsub(/Hi.*Ask/,'Hi Jane Ask')
=> "Hi\nJan\nAsk"
irb(main):023:0> string.gsub(/Hi.*Ask/m,'Hi Jane Ask')
=> "Hi Jane Ask"

Alex Gutteridge

Bioinformatics Center
Kyoto University

Sebastian Hungerecker · Aug 5, 2007

Alex said:
.* won't match over multiple lines without the m modifier on the
RegExp, which I think is the OP's problem:

That can't be the OP's problem since the OP doesn't actually use .* in his
regexp (or any other kind of wildcard). He was asking how to use wildcards
so he could simplify his regexp (and make it work).
To know why his original regexp didn't work, we'd have to see the string it's
supposed to match, I suppose.

Jan Ask · Aug 5, 2007

Alex & Sebastian,

Thanks for taking the time to reply. The string.gsub(/start.*end/m,
'some_value') did indeed help, but I am afraid my problem is a bit more
complicated.

I am basically trying to cleanup a long xml file. A typical part of the
string looks like this:

<div class="field field-type-text field-field-audience">

<h3 class="field-label">audience</h3>

<div class="field-items">

<div class="field-item">Public</div>

</div>

</div>

<div class="field field-type-text field-field-creator">
<h3 class="field-label">creator</h3>
<div class="field-items">
<div class="field-item">Tom Jones</div>
</div>
</div>

I am trying to format it like this:
<audience>Public</audience>
<creator>Tom Jones</creator>

So the problem is that the values in the xml change throughout the
string, so I cannot do a pattern match for them directly. Any ideas
would be hugely appreciated!

Jan

Alex Gutteridge · Aug 5, 2007

Alex & Sebastian,

Thanks for taking the time to reply. The string.gsub(/start.*end/m,
'some_value') did indeed help, but I am afraid my problem is a bit
more
complicated.

I am basically trying to cleanup a long xml file. A typical part of
the
string looks like this:

<div class="field field-type-text field-field-
audience">

<h3 class="field-label">audience</h3>

<div class="field-items">

<div class="field-item">Public</div>

</div>

</div>

<div class="field field-type-text field-field-
creator">
<h3 class="field-label">creator</h3>
<div class="field-items">
<div class="field-item">Tom Jones</
div>
</div>
</div>

I am trying to format it like this:
<audience>Public</audience>
<creator>Tom Jones</creator>

So the problem is that the values in the xml change throughout the
string, so I cannot do a pattern match for them directly. Any ideas
would be hugely appreciated!

Jan

Without knowing the whole problem it is difficult to say what the
best solution is, but for the string you post above, I would clean it
up and parse with something like Hpricot:

require 'rubygems'
require 'hpricot'

string = DATA.read #read in string

string.gsub!(/</,'<') #Convert lt and gt symbols to real <>
string.gsub!(/>/,'>')
string.gsub!(/"/,'"') #Put in quotes

doc = Hpricot(string) #Parse with Hpricot

fields = ['audience','creator'] #Create array of 'fields' to extract

fields.each do |f| #For each field...
el = doc.search("//div[@class='field field-type-text field-field-#
{f}']") #...find appropriate divs
el.each do |e| # for each field div...
puts "<#{f}>" + e.at("//div[@class='field-item']").inner_html +
"</#{f}>" #print data
end
end

__END__
<div class="field field-type-text field-field-audience">

<h3 class="field-label">audience</h3>

<div class="field-items">

<div class="field-item">Public</div>

</div>

</div>

<div class="field field-type-text field-field-creator">
<h3 class="field-label">creator</h3>
<div class="field-items">
<div class="field-item">Tom Jones</div>
</div>
</div>

Alex Gutteridge

Bioinformatics Center
Kyoto University

Jan Ask · Aug 5, 2007

Alex said:
Without knowing the whole problem it is difficult to say what the
best solution is, but for the string you post above, I would clean it
up and parse with something like Hpricot:

Thanks, I will have a try.

By the way, I see you are in Kyoto. I am studying at Tsukuba University
(about an hour from Tokyo), so if you come to the big city, I owe you a
beer!

Mini Web Server in C++ (Part One)	4	Oct 2, 2025
Need assistance finetuning HTML, CSS, Javascript - sticky header issue	3	Feb 24, 2022
Why is this WordPress comments form not submitting?	1	Jan 12, 2020
Filter table rows based on multiple checkboxes value	2	Jan 13, 2023
Only one table shows up with the information	2	Mar 29, 2023
iomanip to escape xml	6	Jan 18, 2010
[ANN] Tenjin 0.6.1 - a fast and full-featured template engine	3	Feb 7, 2008
Checking dynamically populated data using ajax with user entered value	5	Apr 11, 2020

sgub stretching over several lines

Jan Ask

Shai Rosenfeld

Alex Gutteridge

Sebastian Hungerecker

Jan Ask

Alex Gutteridge

Jan Ask

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads