sgub stretching over several lines

J

Jan Ask

Hi,

I am trying to do a in a replace a long (multiple line) string:

string = string.gsub(/<h3
class="field-label>audience<\/h3>
<div class="field-items>
&lt;div class=&quot;field-item>/, '</audience>')

It somehow doesn't seem to work.

I would like to know how to use a wildcard like '*', '%' or '...' like
below:

string = string.gsub(/&lt;h ... tem>/, '</audience>')

Thanks!
Ask
 
S

Shai Rosenfeld

Jan said:
Hi,

I am trying to do a in a replace a long (multiple line) string:

string = string.gsub(/&lt;h3
class=&quot;field-label>audience&lt;\/h3&gt;
&lt;div class=&quot;field-items>
&lt;div class=&quot;field-item>/, '</audience>')

It somehow doesn't seem to work.

I would like to know how to use a wildcard like '*', '%' or '...' like
below:

string = string.gsub(/&lt;h ... tem>/, '</audience>')

Thanks!
Ask
string = string.gsub(/&lt;h.*tem>/, '</audience>')

*

# the . regexp wildcard means, 'any' character (space, symbol, letter)
# the * regexp wildcard is an operator saying: any regexp match before
me can appear 0 or more times:

i.e, you get whatever character you want, however many times you want
it, between the '&lt;h' string and the 'tem>' string.
hth

happy sunday btw
 
A

Alex Gutteridge

.*

# the . regexp wildcard means, 'any' character (space, symbol, letter)
# the * regexp wildcard is an operator saying: any regexp match before
me can appear 0 or more times:

i.e, you get whatever character you want, however many times you want
it, between the '&lt;h' string and the 'tem>' string.
hth

happy sunday btw

* won't match over multiple lines without the m modifier on the
RegExp, which I think is the OP's problem:

irb(main):021:0> string = "Hi\nJan\nAsk"
=> "Hi\nJan\nAsk"
irb(main):022:0> string.gsub(/Hi.*Ask/,'Hi Jane Ask')
=> "Hi\nJan\nAsk"
irb(main):023:0> string.gsub(/Hi.*Ask/m,'Hi Jane Ask')
=> "Hi Jane Ask"

Alex Gutteridge

Bioinformatics Center
Kyoto University
 
S

Sebastian Hungerecker

Alex said:
.* won't match over multiple lines without the m modifier on the
RegExp, which I think is the OP's problem:

That can't be the OP's problem since the OP doesn't actually use .* in his
regexp (or any other kind of wildcard). He was asking how to use wildcards
so he could simplify his regexp (and make it work).
To know why his original regexp didn't work, we'd have to see the string it's
supposed to match, I suppose.
 
J

Jan Ask

Alex & Sebastian,

Thanks for taking the time to reply. The string.gsub(/start.*end/m,
'some_value') did indeed help, but I am afraid my problem is a bit more
complicated.

I am basically trying to cleanup a long xml file. A typical part of the
string looks like this:

&lt;div class=&quot;field field-type-text field-field-audience&quot;&gt;

&lt;h3 class=&quot;field-label&quot;&gt;audience&lt;/h3&gt;

&lt;div class=&quot;field-items&quot;&gt;

&lt;div class=&quot;field-item&quot;&gt;Public&lt;/div&gt;

&lt;/div&gt;

&lt;/div&gt;

&lt;div class=&quot;field field-type-text field-field-creator&quot;&gt;
&lt;h3 class=&quot;field-label&quot;&gt;creator&lt;/h3&gt;
&lt;div class=&quot;field-items&quot;&gt;
&lt;div class=&quot;field-item&quot;&gt;Tom Jones&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

I am trying to format it like this:
<audience>Public</audience>
<creator>Tom Jones</creator>

So the problem is that the values in the xml change throughout the
string, so I cannot do a pattern match for them directly. Any ideas
would be hugely appreciated!

Jan
 
A

Alex Gutteridge

Alex & Sebastian,

Thanks for taking the time to reply. The string.gsub(/start.*end/m,
'some_value') did indeed help, but I am afraid my problem is a bit
more
complicated.

I am basically trying to cleanup a long xml file. A typical part of
the
string looks like this:

&lt;div class=&quot;field field-type-text field-field-
audience&quot;&gt;

&lt;h3 class=&quot;field-label&quot;&gt;audience&lt;/h3&gt;

&lt;div class=&quot;field-items&quot;&gt;

&lt;div class=&quot;field-item&quot;&gt;Public&lt;/div&gt;

&lt;/div&gt;

&lt;/div&gt;

&lt;div class=&quot;field field-type-text field-field-
creator&quot;&gt;
&lt;h3 class=&quot;field-label&quot;&gt;creator&lt;/h3&gt;
&lt;div class=&quot;field-items&quot;&gt;
&lt;div class=&quot;field-item&quot;&gt;Tom Jones&lt;/
div&gt;
&lt;/div&gt;
&lt;/div&gt;

I am trying to format it like this:
<audience>Public</audience>
<creator>Tom Jones</creator>

So the problem is that the values in the xml change throughout the
string, so I cannot do a pattern match for them directly. Any ideas
would be hugely appreciated!

Jan

Without knowing the whole problem it is difficult to say what the
best solution is, but for the string you post above, I would clean it
up and parse with something like Hpricot:

require 'rubygems'
require 'hpricot'

string = DATA.read #read in string

string.gsub!(/&lt;/,'<') #Convert lt and gt symbols to real <>
string.gsub!(/&gt;/,'>')
string.gsub!(/&quot;/,'"') #Put in quotes

doc = Hpricot(string) #Parse with Hpricot

fields = ['audience','creator'] #Create array of 'fields' to extract

fields.each do |f| #For each field...
el = doc.search("//div[@class='field field-type-text field-field-#
{f}']") #...find appropriate divs
el.each do |e| # for each field div...
puts "<#{f}>" + e.at("//div[@class='field-item']").inner_html +
"</#{f}>" #print data
end
end

__END__
&lt;div class=&quot;field field-type-text field-field-audience&quot;&gt;

&lt;h3 class=&quot;field-label&quot;&gt;audience&lt;/h3&gt;

&lt;div class=&quot;field-items&quot;&gt;

&lt;div class=&quot;field-item&quot;&gt;Public&lt;/div&gt;

&lt;/div&gt;

&lt;/div&gt;

&lt;div class=&quot;field field-type-text field-field-creator&quot;&gt;
&lt;h3 class=&quot;field-label&quot;&gt;creator&lt;/h3&gt;
&lt;div class=&quot;field-items&quot;&gt;
&lt;div class=&quot;field-item&quot;&gt;Tom Jones&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

Alex Gutteridge

Bioinformatics Center
Kyoto University
 
J

Jan Ask

Alex said:
Without knowing the whole problem it is difficult to say what the
best solution is, but for the string you post above, I would clean it
up and parse with something like Hpricot:


Thanks, I will have a try.

By the way, I see you are in Kyoto. I am studying at Tsukuba University
(about an hour from Tokyo), so if you come to the big city, I owe you a
beer!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top