Cutting a piece of text

Zdebel · Feb 12, 2006

Helo !
I've started to learn ruby and I'm amazed with it. Now I have a problem
that I can't solve. If I have a string like this:
"<lyrics artist=XXX album=XXX title=XXX> Lalalalala </lyrics>" how can I
cut the " artist=XXX album=XXX title=XXX" part, so it would look like:
"<lyrcis> Lalalalala </lyrics>" Could you please help me ?

James Edward Gray II · Feb 12, 2006

Helo !
I've started to learn ruby and I'm amazed with it. Now I have a
problem
that I can't solve. If I have a string like this:
"<lyrics artist=XXX album=XXX title=XXX> Lalalalala </lyrics>" how
can I
cut the " artist=XXX album=XXX title=XXX" part, so it would look like:
"<lyrcis> Lalalalala </lyrics>" Could you please help me ?

You can do it with a regular expression like the following, but I
must stress that this isn't very robust:
(/<(\w+)[^>]+>/, "<\\1>")
=> "<lyrics> Lalalalala </lyrics>"

Hope that helps.

James Edward Gray II

David Vallner · Feb 12, 2006

D=C5=88a Nede=C4=BEa 12 Febru=C3=A1r 2006 17:18 Zdebel nap=C3=ADsal:

Helo !
I've started to learn ruby and I'm amazed with it. Now I have a problem
that I can't solve. If I have a string like this:
"<lyrics artist=3DXXX album=3DXXX title=3DXXX> Lalalalala </lyrics>" how = can I
cut the " artist=3DXXX album=3DXXX title=3DXXX" part, so it would look li= ke:
"<lyrcis> Lalalalala </lyrics>" Could you please help me ?

The very geeky, and most probably least error-prone way would be whacking t=
he=20
string with a DOM parser, clearing the attributes, and then printing it out=
=20
again. Unfortunately, I haven't been doing any DOM manipulation in Ruby, so=
I=20
can't provide code.

David Vallner

James Edward Gray II · Feb 12, 2006

D=C5=88a Nede=C4=BEa 12 Febru=C3=A1r 2006 17:18 Zdebel nap=C3=ADsal:

The very geeky, and most probably least error-prone way would be =20
whacking the
string with a DOM parser, clearing the attributes, and then =20
printing it out
again. Unfortunately, I haven't been doing any DOM manipulation in =20
Ruby, so I
can't provide code.

The following is how you do it for valid XML, but the posted example =20
wasn't quite:

#!/usr/local/bin/ruby -w

require "rexml/document"

doc =3D "<lyrics artist=3D'XXX' album=3D'XXX' title=3D'XXX'> Lalalalala =
</=20
lyrics>"
xml =3D REXML:

ocument.new(doc)
xml.root.attributes.clear
xml.write
puts

__END__

James Edward Gray II

samuel.murphy · Feb 12, 2006

Learn regular expressions. Here's a not great example:

a = "<lyrics artist=XXX album=XXX title=XXX> Lalalalala </lyrics>"
b = a.gsub(/\w*=\w*/ , "")
c = b.gsub(/\s/, "")
print c, "\n"

<lyrics>Lalalalala</lyrics>

A slightly (yes very slightly) more realistic example:

a = '<lyrics artist="Prince" album="purplerain" title="computerblue">
Lalalalala </lyrics>'
b = a.gsub(/\w*="\w*"/ , "")
c = b.gsub(/\s/, "")
print c, "\n"

<lyrics>Lalalalala</lyrics>

And what if there are spaces in a tag:

a = '<lyrics artist="Prince" album="purplerain" title="Computer Blue">
Lalalalala </lyrics>'
b = a.gsub(/\w*=".*"/ , "")
c = b.gsub(/\s/, "")

James Edward Gray II · Feb 12, 2006

I wish I knew how this (/<(\w+)[^>]+>/, "<\\1>")
regular expresion works .

It reads:

/ < # find a < character
( # capture this next part into $1 (\\1 in the replacement
string)
\w+ # followed by one or more word characters
) # end capture
[^>]+ # followed by one or more non > characters

# and finally a > character

/x

The replacement just restores the <\w+> and leaves out the [^>]+ part
(the space and attributes).

Hope that helps.

James Edward Gray II

Zdebel · Feb 12, 2006

Big thank you too all of you guys for such a response. This helped me
alot and my script is working, but I will practice more using your
advices

Marcin MielÅ¼yÅ„ski · Feb 12, 2006

James said:
</lyrics>".sub(/<(\w+)[^>]+>/, "<\\1>")
=> "<lyrics> Lalalalala </lyrics>"

reluctant would a bit faster:

p "<lyrics artist=XXX album=XXX title=XXX> Lalalalala
</lyrics>".gsub(/<(\w+).*?>/, "<\\1>")

lopex

David Vallner · Feb 12, 2006

D=C5=88a Nede=C4=BEa 12 Febru=C3=A1r 2006 19:30 James Edward Gray II nap=C3=
=ADsal:

James said:
James said:

"<lyrics artist=3DXXX album=3DXXX title=3DXXX> Lalalalala </

lyrics>".sub(/<(\w+)[^>]+>/, "<\\1>")
=3D> "<lyrics> Lalalalala </lyrics>"

Click to expand...

reluctant would a bit faster:

p "<lyrics artist=3DXXX album=3DXXX title=3DXXX> Lalalalala </
lyrics>".gsub(/<(\w+).*?>/, "<\\1>")

Click to expand...

Are you sure?

$ ruby regexp_time.rb
Rehearsal -------------------------------------------------
/<(w+)[^>]+>/ 7.210000 0.030000 7.240000 ( 7.266166)
/<(w+).*?>/ 7.710000 0.020000 7.730000 ( 7.757304)
--------------------------------------- total: 14.970000sec

user system total real
/<(w+)[^>]+>/ 7.170000 0.030000 7.200000 ( 7.227075)
/<(w+).*?>/ 7.730000 0.020000 7.750000 ( 7.777196)
$ cat regexp_time.rb
#!/usr/local/bin/ruby -w

require "benchmark"

tests =3D 1000000
data =3D "<lyrics artist=3DXXX album=3DXXX title=3DXXX> Lalalalala </lyr= ics>"

Benchmark.bmbm do |x|
x.report("/<(\w+)[^>]+>/") do
tests.times { data.sub(/<(\w+)[^>]+>/, "<\\1>") }
end
x.report("/<(\w+).*?>/") do
tests.times { data.sub(/<(\w+).*?>/, "<\\1>") }
end
end

__END__

James Edward Gray II

The nongreedy match has to "back up" and retry on every character after the=
=20
tag name, whileas James' [^>] doesn't ever have to back up. In fact, even a=
=20
greedy .* would probably be faster than a nongreedy one in this case.

Gotta love the black art that is optimizing regexps.

David Vallner

Marcin MielÅ¼yÅ„ski · Feb 12, 2006

David said:
The nongreedy match has to "back up" and retry on every character after the
tag name, whileas James' [^>] doesn't ever have to back up. In fact, even a
greedy .* would probably be faster than a nongreedy one in this case.

Gotta love the black art that is optimizing regexps.

Ooops.. You are right!

But as I read greedy quantifiers do backtrack as well (but not in the
case above).

/a+aa/ =~ "aaaaa"
will backtrack two characters

only possesive quantifier (in oniguruma e.g.) consumes in the real,
greedy way.

so
/a++aa/ =~ "aaaaa"
won't match.

lopex

William James · Feb 13, 2006

Zdebel said:
Helo !
I've started to learn ruby and I'm amazed with it. Now I have a problem
that I can't solve. If I have a string like this:
"<lyrics artist=XXX album=XXX title=XXX> Lalalalala </lyrics>" how can I
cut the " artist=XXX album=XXX title=XXX" part, so it would look like:
"<lyrcis> Lalalalala </lyrics>" Could you please help me ?

p " <lyrics artist=XXX album=XXX title=XXX> Lalalalala </lyrics>".
sub(/\s+[^<>]*(?=>)/, '' )

p " <lyrics artist=XXX album=XXX title=XXX> Lalalalala </lyrics>".
scan( /\G ( [^<]+ ) | \G ( < \S* ) [^>]* ( > ) /x ).
flatten.compact.join

Cutting a deck of cards	17	May 26, 2013
Working on mobile css menu with plenty of frustration!	2	Dec 29, 2022
Measuring a string of text	1	Sep 15, 2022
Help with finding difference between two bodies of text in order	0	Sep 10, 2024
Find and count strings of text from multiple files	17	Dec 16, 2021
Can't wrap text around image and one more	1	Jul 25, 2025
How can I arrange a series of radio buttons?	2	Jan 24, 2024
Select files based on text list of filenames(part of the name:date) with condition	0	May 4, 2022

Cutting a piece of text

Zdebel

James Edward Gray II

David Vallner

James Edward Gray II

samuel.murphy

James Edward Gray II

Zdebel

Marcin MielÅ¼yÅ„ski

David Vallner

Marcin MielÅ¼yÅ„ski

William James

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads