Problems with mechanize and fields embedded in tables

T

Todd A. Jacobs

I'm working with the following versions:

ruby 1.8.2
libwww-mechanize-ruby 0.6.10

and have run across an odd problem. One site that I'm trying to scrape
has started embedding form fields inside of tables, and mechanize no
longer recognizes them as fields.

The fields are there in the HTML code, but aren't accessible to
mechanize. I've tried a couple of work-arounds, but field_add! doesn't
seem to support adding check boxes or file upload fields (is there
another way to add them explicitly?), and I can't see any other way to
find those embedded fields.

If this is a bug in mechanize, how do I report it? If it's a bug in the
coder, what can I do to resolve the problem?
 
T

Todd A. Jacobs

In the course of debugging, I tried this:

require 'mechanize'
agent = WWW::Mechanize.new
selection='http://seeker.dice.com/jobsearch/se.../46ab274a1ab667a09cd9aac11c6bef37@endecaindex'
page = agent.get(selection)
page = agent.click page.links.text('Click Here to Apply')
reply_form = page.forms.with.name('APPLICATION_FORM').first
pp reply_form

As you can see, the SEEKER_CC checkbox and RESUME_FILE filename fields
aren't showing up, but they ARE in the HTML. I suppose it helps if you
have access to the data sources and the methodology of the (error-prone)
programmer that's accessing them. :)
 
T

Todd A. Jacobs

mechanize. I've tried a couple of work-arounds, but field_add! doesn't
seem to support adding check boxes or file upload fields (is there

I've managed to add the fields explicitly:

carbon = WWW::Mechanize::RadioButton.new('SEEKER_CC', nil, true, reply_form)
upload_field = WWW::Mechanize::FileUpload.new('RESUME_FILE', 'foo')

reply_form.checkboxes.push(carbon)
reply_form.file_uploads.push(upload_field)

but this seems kind of kludgy. I'm still looking for a better way.
 
7

7stud --

Todd said:
aren't showing up, but they ARE in the HTML. I suppose it helps if you
have access to the data sources and the methodology of the (error-prone)
programmer that's accessing them. :)

It sounds like javascript may be adding the fields you want. When you
load the page in a browser, the browser's javascript software kicks in
and can add html to the page. However, when you grab a page with
mechanize, you get the pre-javascript page, and as far as I know,
mechanize does not have the ability to interpret the javascript and make
changes to the html based on what the javascript says to do.

Well designed websites design their pages so that users without
javascript enabled are served simpler pages that have all the required
html for forms and the necessary html to navigate around the website.
The trick is getting the server to send you those pages. You have to be
good with html and js and dig around a bit to figure it out. Or, if the
site has a lot of traffic, there might be an article on how to do it.
 
D

Devi Web Development

Well designed websites design their pages so that users without
javascript enabled are served simpler pages that have all the required
html for forms and the necessary html to navigate around the website.
The trick is getting the server to send you those pages. You have to be
good with html and js and dig around a bit to figure it out.

Every browser will let you turn off JavaScript, that's the easy way to
get served a simple version, which is probably what you want.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,577
Members
45,054
Latest member
LucyCarper

Latest Threads

Top