storing the text of an HTML page

J

JCD

Hello.
In my application, I need to store the text of an HTML page.
For example:
<!DOCTYPE ht....
....
....
</HTML>
I modify it after, to create a new HTML page that I open in a web
browser.
I would like to store this text in my application without creating a
file on the hard disc.
I want to keep line feeds and there are many " in the text.
Is there a way of storing this text and how?
thank you.
 
S

Stefan Ram

JCD said:
Is there a way of storing this text and how?

Yes, usually as an object of any class implementing
java.lang.CharSequence or as an array of code points.

Text also can be stored as an object of java.lang.BigDecimal
or so, but this would be unusual and more difficult to use.
 
L

Lord Zoltar

Hello.
In my application, I need to store the text of an HTML page.
For example:
<!DOCTYPE ht....
...
...
</HTML>
I modify it after, to create a new HTML page that I open in a web
browser.
I would like to store this text in my application without creating a
file on the hard disc.
I want to keep line feeds and there are many " in the text.
Is there a way of storing this text and how?
thank you.

Why do you think you need to keep it in a file on the hard disc?
Normally, getting HTML data from an internet source would result in
the HTML data being stored into some sort of in-memory structure, such
as as String. Writing that to disc seems like it would be extra work.
You say you have to modify the HTML data... what sort of
modifications? If they're fairly simple, you could just use regular
expressions to find/replace substrings in the big string. For
complicated modifications, maybe build a tree out of the HTML nodes,
modify the tree, then turn your tree back into a string.
 
D

Donkey Hot

Why do you think you need to keep it in a file on the hard disc?
Normally, getting HTML data from an internet source would result in
the HTML data being stored into some sort of in-memory structure, such
as as String. Writing that to disc seems like it would be extra work.
You say you have to modify the HTML data... what sort of
modifications? If they're fairly simple, you could just use regular
expressions to find/replace substrings in the big string. For
complicated modifications, maybe build a tree out of the HTML nodes,
modify the tree, then turn your tree back into a string.

1st thing that came to my mind, is that he wants to create a man-in-the-
middle, editing html between a website and a browser. And not wanting to
leave traces on the disk, so that some kind of an antivirus might be able
scan it.

Of course thats funny, but it popped to my mind.
 
R

Roedy Green

I would like to store this text in my application without creating a
file on the hard disc.

The two most likely ways are with a simple giant String and a parse
tree.

Unfortunately most of the stuff out on the web is malformed. Usually
the only stuff you can fully parse is stuff you validated yourself.

see http://mindprod.com/jgloss/parser.html
 
M

Mark Space

JCD said:
I would like to store this text in my application without creating a
file on the hard disc.

A file on disc probably would be the best way. Look into JSP.

Absent that, no, I don't know of any type of convenient storage
mechanism. A resource file would be good, but it's basically a file on
disc anyway. A property would likely be wildly inappropriate, unless
the string you are storing is very short.
 
J

JCD

Actually, I don't want to get HTML data from an internet source : I
already have the source code and the modifications are very simple : I
only have to change a few lines that depend on the results of my
application. I don't need to create a tree or a parser: I want to
store in my java code source this HTML text. The problem is that the
text is very long and it contains many " and many line feeds.
Of course, I could create for example an array containing each line of
the HTML page but it would be too long to write.
Is there a way of storing a giant String with " and line feeds?
 
P

Philipp

JCD said:
Actually, I don't want to get HTML data from an internet source : I
already have the source code and the modifications are very simple : I
only have to change a few lines that depend on the results of my
application. I don't need to create a tree or a parser: I want to
store in my java code source this HTML text. The problem is that the
text is very long and it contains many " and many line feeds.
Of course, I could create for example an array containing each line of
the HTML page but it would be too long to write.
Is there a way of storing a giant String with " and line feeds?

You can store " and line feeds in a String object. No problem there.
Phil
 
R

RedGrittyBrick

Philipp said:
You can store " and line feeds in a String object. No problem there.
Phil

Java string constants cannot span multiple lines. Java has no equivalent
of the "here document" in Shell or Perl. Sometimes I miss these features.

static final String HTML = "
<html>
<head>
...
</body>
</html>
";

String html = <<END;
<html>
<head>
...
</body>
</html>
END

The Java-ish solution seems to be properties files.
 
P

Philipp

RedGrittyBrick said:
Java string constants cannot span multiple lines.

Yep. I didn't understand the OP's request correctly.

At this point I can only recommend to use a decent text editor (eg. try
textpad.com) and replace all newline characters by \n (or its
crossplatform equivalent) and every " by a \"

For example,
String s = "hello \n\"world\"";

Phil
 
P

Philipp

Lew said:
Using the + operator seems simpler somehow.

Using the + operator does not exclude the fact that you need to replace
line breaks by \n... I wasn't implying that you need to write everything
on one code line.
 
R

RedGrittyBrick

Lew said:
You just use the + operator:

static final String HTML =
"<html>\n"
+" <head>\n"
...
+" </body>\n"
+"</html>";

What irks me a little is it looks less like HTML, especially as you have
to escape any special characters. An IDE makes it easier to type in such
a string - e.g. Eclipse inserts <quote><newline><indent><plus><quote>
whenever you press the enter key in a string.

However if you already have a file of HTML (say) which you want to
include as a string constant, I haven't found a way of inserting it into
the Java source without also having to add quotes etc to the start and
end of every line and hunting down and escaping any special characters.
Maybe Eclipse and other IDEs have some clever way of automating this but
I haven't found it.

I often find it easier to add a text file to my final jar and have the
program read it.

For large amounts of text this is probably more appropriate than
including it in a .java source as a long series of concatenated strings
 
J

JCD

I would replace the line breaks with ' ' rather than '\n', for readability..

Actually, I would use a templating engine or JSP rather than embed HTML as a
String.

Which latter, BTW, does not constitute "storing the text".  "Loading" it, maybe.

Hello. Thank you for your answers.
In the end, it seems more difficult to store a huge text in the source
code than storing it in a file on the hard disc. I will add the file
to my .Jar.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top