Extract/Parse String?

tuyet.ctn · Jul 6, 2005

How do I extract "treeframe1120266500902" from this String class
and stored it in a variable to be used later?

irb(main):205:0> puts c

<FRAMESET border=0 frameSpacing=0 rows=26,* frameBorder=0
onload=onLoad(); cols=

* onunload=onUnload()><FRAME border=0 name=sidebar_header marginWidth=0
marginHe

ight=0
src="/araneae/PortfolioAdmin/Sidebar/showSidebarFiltersB?&filterId=0&

amp;showHelp=true&common.sessionId=sGCq3td6d5iQGx94yZ9DxA99"
frameBorder=0 n

oResize scrolling=no><FRAME border=0 name=treeframe1120266500902
marginWidth=4 m

arginHeight=0 src="/include/frameReady.html" frameBorder=0
noResize></FRAMESET>

irb(main):206:0> puts c.class

String

=> nil

Assaph Mehr · Jul 6, 2005

Use regular expressions
(http://www.ruby-doc.org/docs/ProgrammingRuby/html/intro.html#S5), then
#scan the string for something that matches. Eg. assuming the format is
always 'treeframe' followed by digits:

irb(main):038:0> c.scan /treeframe\d+/
=> ["treeframe1120266500902"]

You'll get an array with all the results. If you know you have only one
occurence you can use String#slice (or String#[]) to get the first
value:

irb(main):037:0> c[/treeframe\d+/]
=> "treeframe1120266500902"

HTH,
Assaph

Devin Mullins · Jul 6, 2005

How do I extract "treeframe1120266500902" from this String class
and stored it in a variable to be used later?

(Almost) Everything in Ruby is an Object, so what you're asking for is
another String object. "treeframe112..." is just a human-readable
representation of that object, and a variable is just a pointer to that
object.

Like Assaph said, you can use regexes to get such a String. ri
String#match or String#scan or StringScanner, for instance.

If you plan on parsing a lot of HTML, there are some Ruby HTML parsers.
Michael Neumann's Mechanize has been recommended on this list before,
but that's as much as I know about it.

Devin

Robert Klemme · Jul 6, 2005

Assaph Mehr said:
Use regular expressions
(http://www.ruby-doc.org/docs/ProgrammingRuby/html/intro.html#S5),
then #scan the string for something that matches. Eg. assuming the
format is always 'treeframe' followed by digits:

irb(main):038:0> c.scan /treeframe\d+/
=> ["treeframe1120266500902"]

You'll get an array with all the results. If you know you have only
one occurence you can use String#slice (or String#[]) to get the first
value:

irb(main):037:0> c[/treeframe\d+/]
=> "treeframe1120266500902"

HTH,
Assaph

Although that'll work for this particular string, I'd rather think this is a
case for a HTML parser. Apparently the name of a frame is wanted and a HTML
parser is the safest way to get that info.

Kind regards

robert

tuyet.ctn · Jul 6, 2005

Thank you Assaph!

c[/treeframe\d+/] works beautifully!

I also appreciate your link to the intro.html although I couldn't find
examples of regular expressions.

Thanks everyone else for your suggestions. I appreciate it.

Mark Thomas · Jul 7, 2005

Although that'll work for this particular string, I'd rather think this is a

case for a HTML parser. Apparently the name of a frame is wanted and a HTML
parser is the safest way to get that info.

Agree completely. Regular expressions should not be used to parse HTML
or XML. However, XPath is an excellent alternative to regular
expressions in these cases. In XPath, the expression to get the name of
the frame would be '//frame/@name'.

Since I'm new to Ruby, I have to ask: is there an HTML parser that
supports XPath? I know that LibXML does a great job parsing HTML and I
find XPath to be a terrific way to do it--just about anything you want
to extract becomes a one-liner. Do the Ruby bindings expose this
functionality? If not, is there another library that can do this?

- Mark.

Robert Klemme · Jul 7, 2005

Mark said:
Agree completely. Regular expressions should not be used to parse HTML
or XML. However, XPath is an excellent alternative to regular
expressions in these cases. In XPath, the expression to get the name
of the frame would be '//frame/@name'.

Since I'm new to Ruby, I have to ask: is there an HTML parser that
supports XPath? I know that LibXML does a great job parsing HTML and I
find XPath to be a terrific way to do it--just about anything you want
to extract becomes a one-liner. Do the Ruby bindings expose this
functionality? If not, is there another library that can do this?

Rexml can - but then again, it's "just" an XML parser.

Kind regards

robert

James Britt · Jul 7, 2005

Mark said:
Agree completely. Regular expressions should not be used to parse HTML
or XML. However, XPath is an excellent alternative to regular
expressions in these cases. In XPath, the expression to get the name of
the frame would be '//frame/@name'.

Since I'm new to Ruby, I have to ask: is there an HTML parser that
supports XPath? I know that LibXML does a great job parsing HTML and I
find XPath to be a terrific way to do it--just about anything you want
to extract becomes a one-liner. Do the Ruby bindings expose this
functionality? If not, is there another library that can do this?

REXML, part of the standard library, does XPath. If the source HTML is
not also XML, then you'll need to coerce it so REXML can load it.

Michael Neumann's Mechanize lib bundles up this behavior so that you can
grab an HTML doc and operate on select sections; you can also grab the
resulting REXML document and run arbitrary XPath calls on it too.
Search the ruby-talk archives as this was discussed not too long ago.

James

- Mark.

.

--

http://www.ruby-doc.org - The Ruby Documentation Site
http://www.rubyxml.com - News, Articles, and Listings for Ruby & XML
http://www.rubystuff.com - The Ruby Store for Ruby Stuff
http://www.jamesbritt.com - Playing with Better Toys

Brad Wilson · Jul 7, 2005

Since I'm new to Ruby, I have to ask: is there an HTML parser that
supports XPath?

I used tidy to turn HTML into XHTML, and then REXML to navigate and
modify it. I could've turned it back into HTML with tidy again, but
leaving it as XHTML was acceptable for me (parsing HTML elements from
RSS and modifying them for import into a new blog engine).

Mark Thomas · Jul 7, 2005

Michael Neumann's Mechanize lib bundles up this behavior so that you can

grab an HTML doc and operate on select sections; you can also grab the
resulting REXML document and run arbitrary XPath calls on it too.
Search the ruby-talk archives as this was discussed not too long ago.

I saw that comment, but wasn't able to find any documentation for
Mechanize. Sorry if I'm being stupid, but where can I find the
documentation?
- Google searches bring up nothing
- RAA search doesn't find Mechanize
- Rubyforge search brings up project Wee, docs tab is empty, wiki is
blank, homepage has Wee docs but no Mechanize docs.

Sigh... http://search.cpan.org/ makes finding documentation for Perl
modules very easy. Is there an equivalent for Ruby Gems?

- Mark.

Michael Neumann · Jul 7, 2005

Mark said:
I saw that comment, but wasn't able to find any documentation for
Mechanize. Sorry if I'm being stupid, but where can I find the
documentation?

Nowhere, as it's non-existing. And I do not plan to document it, but
I've been told that the www.ruby-web.org project will adopt Mechanize
and maybe they'll document and improve it.

Take a look at the examples.

Regards,

Michael

mechanize: 400 Bad Request	3	Nov 5, 2006
Regular Expression question	5	Aug 12, 2005
Extract alphanumeric text from a string	0	Jul 23, 2009
Frame Changes it's fixed size	0	Mar 3, 2008
Help with code	0	Jun 12, 2022
(HTML Question) Giving a specific width for a frameset	2	Apr 22, 2011
Hyperlink & Frames	1	Apr 18, 2008
HTML frame tag	2	Jul 29, 2004

Extract/Parse String?

tuyet.ctn

Assaph Mehr

Devin Mullins

Robert Klemme

tuyet.ctn

Mark Thomas

Robert Klemme

James Britt

Brad Wilson

Mark Thomas

Michael Neumann

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads