Unexpected behavior of com.gargoylesoftware.htmlunit

R

Roy27

Hi,

If anybody is using "com.gargoylesoftware.htmlunit" packages, would you
please share your experience on the following issue:

Lets we have a html file (test1.html) like below where "<form>" tag is
not placed suitably. However I think it is valid for HTML.

<html>
<head><title>Testing com.gargoylesoftware.htmlunit</title></head>
<body>
<table>
<tr><td>
<form name="frmTest" method="post" action="test2.php">
<table>
<tr><td>Testing com.gargoylesoftware.htmlunit's html processing
behaviour</td></tr>
</table>
</td></tr>
<input type="hidden" name="hidXTNUM" value="50">
</form>
</table>
</body>
</html>

and lets we have codes (like below) to download and process the html
file -

//
String strUrl = "http://some.domain.com/test1.html";
WebClient webClient = new WebClient();
URL url = null;
try {
url = new URL(strUrl);
} catch (Exception ex) {
System.out.println(ex.toString());
}

HtmlPage page = null;
try {
page = (HtmlPage) webClient.getPage(url);
}
catch (Exception ex) {
System.out.println(ex.toString());
}

HtmlForm frmPage = page.getFormByName("frmTest");
frmPage.getInputByName("hidXTNUM").setAttributeValue("value", "100");
//

What I get from the execution of codes -

1. It downloads the html page
2. Also It can process the form: HtmlForm frmPage =
page.getFormByName("frmTest");
3. It could not set the "hidXTNUM" value in the last statement.

I found that WebClient has processed the <form> tag incorrectly and put
the "hidXTNUM" hidden element outside of the form.

Dumping the html file (test1.html) I found the following text like
below where "hidXTNUM" hidden input is outside of the <form>.

<html>
<head><title>Testing com.gargoylesoftware.htmlunit</title></head>
<body>
<table>
<tr><td>
<form name="frmTest" method="post" action="test2.php">
<table>
<tr><td>Testing com.gargoylesoftware.htmlunit's html processing
behaviour</td></tr>
</table>
</form>
</td></tr>
<input type="hidden" name="hidXTNUM" value="50">
</table>
</body>
</html>

I want "HtmlPage" to tolerate malformed html and process the <form> tag
accurately. By the way, browsers could process this sort of malformed
html accurately. Can anyone help me in solving the issue? Does
"HtmlPage" support malformed html?

Thanks in advance
Manik
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top