Extract/Parse String?

T

tuyet.ctn

How do I extract "treeframe1120266500902" from this String class
and stored it in a variable to be used later?

irb(main):205:0> puts c

<FRAMESET border=0 frameSpacing=0 rows=26,* frameBorder=0
onload=onLoad(); cols=

* onunload=onUnload()><FRAME border=0 name=sidebar_header marginWidth=0
marginHe

ight=0
src="/araneae/PortfolioAdmin/Sidebar/showSidebarFiltersB?&amp;filterId=0&

amp;showHelp=true&amp;common.sessionId=sGCq3td6d5iQGx94yZ9DxA99"
frameBorder=0 n

oResize scrolling=no><FRAME border=0 name=treeframe1120266500902
marginWidth=4 m

arginHeight=0 src="/include/frameReady.html" frameBorder=0
noResize></FRAMESET>



irb(main):206:0> puts c.class

String

=> nil
 
A

Assaph Mehr

Use regular expressions
(http://www.ruby-doc.org/docs/ProgrammingRuby/html/intro.html#S5), then
#scan the string for something that matches. Eg. assuming the format is
always 'treeframe' followed by digits:

irb(main):038:0> c.scan /treeframe\d+/
=> ["treeframe1120266500902"]

You'll get an array with all the results. If you know you have only one
occurence you can use String#slice (or String#[]) to get the first
value:

irb(main):037:0> c[/treeframe\d+/]
=> "treeframe1120266500902"

HTH,
Assaph
 
D

Devin Mullins

How do I extract "treeframe1120266500902" from this String class
and stored it in a variable to be used later?
(Almost) Everything in Ruby is an Object, so what you're asking for is
another String object. "treeframe112..." is just a human-readable
representation of that object, and a variable is just a pointer to that
object.

Like Assaph said, you can use regexes to get such a String. ri
String#match or String#scan or StringScanner, for instance.

If you plan on parsing a lot of HTML, there are some Ruby HTML parsers.
Michael Neumann's Mechanize has been recommended on this list before,
but that's as much as I know about it.

Devin
 
R

Robert Klemme

Assaph Mehr said:
Use regular expressions
(http://www.ruby-doc.org/docs/ProgrammingRuby/html/intro.html#S5),
then #scan the string for something that matches. Eg. assuming the
format is always 'treeframe' followed by digits:

irb(main):038:0> c.scan /treeframe\d+/
=> ["treeframe1120266500902"]

You'll get an array with all the results. If you know you have only
one occurence you can use String#slice (or String#[]) to get the first
value:

irb(main):037:0> c[/treeframe\d+/]
=> "treeframe1120266500902"

HTH,
Assaph

Although that'll work for this particular string, I'd rather think this is a
case for a HTML parser. Apparently the name of a frame is wanted and a HTML
parser is the safest way to get that info.

Kind regards

robert
 
T

tuyet.ctn

Thank you Assaph!

c[/treeframe\d+/] works beautifully!

I also appreciate your link to the intro.html although I couldn't find
examples of regular expressions.

Thanks everyone else for your suggestions. I appreciate it.
 
M

Mark Thomas

Although that'll work for this particular string, I'd rather think this is a
case for a HTML parser. Apparently the name of a frame is wanted and a HTML
parser is the safest way to get that info.

Agree completely. Regular expressions should not be used to parse HTML
or XML. However, XPath is an excellent alternative to regular
expressions in these cases. In XPath, the expression to get the name of
the frame would be '//frame/@name'.

Since I'm new to Ruby, I have to ask: is there an HTML parser that
supports XPath? I know that LibXML does a great job parsing HTML and I
find XPath to be a terrific way to do it--just about anything you want
to extract becomes a one-liner. Do the Ruby bindings expose this
functionality? If not, is there another library that can do this?

- Mark.
 
R

Robert Klemme

Mark said:
Agree completely. Regular expressions should not be used to parse HTML
or XML. However, XPath is an excellent alternative to regular
expressions in these cases. In XPath, the expression to get the name
of the frame would be '//frame/@name'.

Since I'm new to Ruby, I have to ask: is there an HTML parser that
supports XPath? I know that LibXML does a great job parsing HTML and I
find XPath to be a terrific way to do it--just about anything you want
to extract becomes a one-liner. Do the Ruby bindings expose this
functionality? If not, is there another library that can do this?

Rexml can - but then again, it's "just" an XML parser.

Kind regards

robert
 
J

James Britt

Mark said:
Agree completely. Regular expressions should not be used to parse HTML
or XML. However, XPath is an excellent alternative to regular
expressions in these cases. In XPath, the expression to get the name of
the frame would be '//frame/@name'.

Since I'm new to Ruby, I have to ask: is there an HTML parser that
supports XPath? I know that LibXML does a great job parsing HTML and I
find XPath to be a terrific way to do it--just about anything you want
to extract becomes a one-liner. Do the Ruby bindings expose this
functionality? If not, is there another library that can do this?

REXML, part of the standard library, does XPath. If the source HTML is
not also XML, then you'll need to coerce it so REXML can load it.

Michael Neumann's Mechanize lib bundles up this behavior so that you can
grab an HTML doc and operate on select sections; you can also grab the
resulting REXML document and run arbitrary XPath calls on it too.
Search the ruby-talk archives as this was discussed not too long ago.

James

- Mark.


.


--

http://www.ruby-doc.org - The Ruby Documentation Site
http://www.rubyxml.com - News, Articles, and Listings for Ruby & XML
http://www.rubystuff.com - The Ruby Store for Ruby Stuff
http://www.jamesbritt.com - Playing with Better Toys
 
B

Brad Wilson

Since I'm new to Ruby, I have to ask: is there an HTML parser that
supports XPath?

I used tidy to turn HTML into XHTML, and then REXML to navigate and
modify it. I could've turned it back into HTML with tidy again, but
leaving it as XHTML was acceptable for me (parsing HTML elements from
RSS and modifying them for import into a new blog engine).
 
M

Mark Thomas

Michael Neumann's Mechanize lib bundles up this behavior so that you can
grab an HTML doc and operate on select sections; you can also grab the
resulting REXML document and run arbitrary XPath calls on it too.
Search the ruby-talk archives as this was discussed not too long ago.

I saw that comment, but wasn't able to find any documentation for
Mechanize. Sorry if I'm being stupid, but where can I find the
documentation?
- Google searches bring up nothing
- RAA search doesn't find Mechanize
- Rubyforge search brings up project Wee, docs tab is empty, wiki is
blank, homepage has Wee docs but no Mechanize docs.

Sigh... http://search.cpan.org/ makes finding documentation for Perl
modules very easy. Is there an equivalent for Ruby Gems?

- Mark.
 
M

Michael Neumann

Mark said:
I saw that comment, but wasn't able to find any documentation for
Mechanize. Sorry if I'm being stupid, but where can I find the
documentation?

Nowhere, as it's non-existing. And I do not plan to document it, but
I've been told that the www.ruby-web.org project will adopt Mechanize
and maybe they'll document and improve it.

Take a look at the examples.

Regards,

Michael
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,197
Latest member
Sean29G025

Latest Threads

Top