storing the text of an HTML page

Discussion in 'Java' started by JCD, May 28, 2008.

  1. JCD

    JCD Guest

    Hello.
    In my application, I need to store the text of an HTML page.
    For example:
    <!DOCTYPE ht....
    ....
    ....
    </HTML>
    I modify it after, to create a new HTML page that I open in a web
    browser.
    I would like to store this text in my application without creating a
    file on the hard disc.
    I want to keep line feeds and there are many " in the text.
    Is there a way of storing this text and how?
    thank you.
     
    JCD, May 28, 2008
    #1
    1. Advertising

  2. JCD

    Stefan Ram Guest

    JCD <> writes:
    >Is there a way of storing this text and how?


    Yes, usually as an object of any class implementing
    java.lang.CharSequence or as an array of code points.

    Text also can be stored as an object of java.lang.BigDecimal
    or so, but this would be unusual and more difficult to use.
     
    Stefan Ram, May 28, 2008
    #2
    1. Advertising

  3. JCD

    Lord Zoltar Guest

    On May 28, 1:40 pm, JCD <> wrote:
    > Hello.
    > In my application, I need to store the text of an HTML page.
    > For example:
    > <!DOCTYPE ht....
    > ...
    > ...
    > </HTML>
    > I modify it after, to create a new HTML page that I open in a web
    > browser.
    > I would like to store this text in my application without creating a
    > file on the hard disc.
    > I want to keep line feeds and there are many " in the text.
    > Is there a way of storing this text and how?
    > thank you.


    Why do you think you need to keep it in a file on the hard disc?
    Normally, getting HTML data from an internet source would result in
    the HTML data being stored into some sort of in-memory structure, such
    as as String. Writing that to disc seems like it would be extra work.
    You say you have to modify the HTML data... what sort of
    modifications? If they're fairly simple, you could just use regular
    expressions to find/replace substrings in the big string. For
    complicated modifications, maybe build a tree out of the HTML nodes,
    modify the tree, then turn your tree back into a string.
     
    Lord Zoltar, May 28, 2008
    #3
  4. JCD

    Donkey Hot Guest

    Lord Zoltar <> wrote in news:c1675928-4d26-4753-af9b-
    :

    > On May 28, 1:40 pm, JCD <> wrote:
    >> Hello.
    >> In my application, I need to store the text of an HTML page.
    >> For example:
    >> <!DOCTYPE ht....
    >> ...
    >> ...
    >> </HTML>
    >> I modify it after, to create a new HTML page that I open in a web
    >> browser.
    >> I would like to store this text in my application without creating a
    >> file on the hard disc.
    >> I want to keep line feeds and there are many " in the text.
    >> Is there a way of storing this text and how?
    >> thank you.

    >
    > Why do you think you need to keep it in a file on the hard disc?
    > Normally, getting HTML data from an internet source would result in
    > the HTML data being stored into some sort of in-memory structure, such
    > as as String. Writing that to disc seems like it would be extra work.
    > You say you have to modify the HTML data... what sort of
    > modifications? If they're fairly simple, you could just use regular
    > expressions to find/replace substrings in the big string. For
    > complicated modifications, maybe build a tree out of the HTML nodes,
    > modify the tree, then turn your tree back into a string.
    >


    1st thing that came to my mind, is that he wants to create a man-in-the-
    middle, editing html between a website and a browser. And not wanting to
    leave traces on the disk, so that some kind of an antivirus might be able
    scan it.

    Of course thats funny, but it popped to my mind.
     
    Donkey Hot, May 28, 2008
    #4
  5. JCD

    Roedy Green Guest

    On Wed, 28 May 2008 10:40:36 -0700 (PDT), JCD
    <> wrote, quoted or indirectly quoted someone
    who said :

    >I would like to store this text in my application without creating a
    >file on the hard disc.


    The two most likely ways are with a simple giant String and a parse
    tree.

    Unfortunately most of the stuff out on the web is malformed. Usually
    the only stuff you can fully parse is stuff you validated yourself.

    see http://mindprod.com/jgloss/parser.html
    --

    Roedy Green Canadian Mind Products
    The Java Glossary
    http://mindprod.com
     
    Roedy Green, May 28, 2008
    #5
  6. JCD

    Mark Space Guest

    JCD wrote:

    > I would like to store this text in my application without creating a
    > file on the hard disc.


    A file on disc probably would be the best way. Look into JSP.

    Absent that, no, I don't know of any type of convenient storage
    mechanism. A resource file would be good, but it's basically a file on
    disc anyway. A property would likely be wildly inappropriate, unless
    the string you are storing is very short.
     
    Mark Space, May 28, 2008
    #6
  7. JCD

    JCD Guest

    Actually, I don't want to get HTML data from an internet source : I
    already have the source code and the modifications are very simple : I
    only have to change a few lines that depend on the results of my
    application. I don't need to create a tree or a parser: I want to
    store in my java code source this HTML text. The problem is that the
    text is very long and it contains many " and many line feeds.
    Of course, I could create for example an array containing each line of
    the HTML page but it would be too long to write.
    Is there a way of storing a giant String with " and line feeds?
     
    JCD, May 29, 2008
    #7
  8. JCD

    Philipp Guest

    JCD wrote:
    > Actually, I don't want to get HTML data from an internet source : I
    > already have the source code and the modifications are very simple : I
    > only have to change a few lines that depend on the results of my
    > application. I don't need to create a tree or a parser: I want to
    > store in my java code source this HTML text. The problem is that the
    > text is very long and it contains many " and many line feeds.
    > Of course, I could create for example an array containing each line of
    > the HTML page but it would be too long to write.
    > Is there a way of storing a giant String with " and line feeds?


    You can store " and line feeds in a String object. No problem there.
    Phil
     
    Philipp, May 30, 2008
    #8
  9. Philipp wrote:
    > JCD wrote:
    >> Actually, I don't want to get HTML data from an internet source : I
    >> already have the source code and the modifications are very simple : I
    >> only have to change a few lines that depend on the results of my
    >> application. I don't need to create a tree or a parser: I want to
    >> store in my java code source this HTML text. The problem is that the
    >> text is very long and it contains many " and many line feeds.
    >> Of course, I could create for example an array containing each line of
    >> the HTML page but it would be too long to write.
    >> Is there a way of storing a giant String with " and line feeds?

    >
    > You can store " and line feeds in a String object. No problem there.
    > Phil


    Java string constants cannot span multiple lines. Java has no equivalent
    of the "here document" in Shell or Perl. Sometimes I miss these features.

    static final String HTML = "
    <html>
    <head>
    ...
    </body>
    </html>
    ";

    String html = <<END;
    <html>
    <head>
    ...
    </body>
    </html>
    END

    The Java-ish solution seems to be properties files.

    --
    RGB
     
    RedGrittyBrick, May 30, 2008
    #9
  10. JCD

    Philipp Guest

    RedGrittyBrick wrote:
    > Philipp wrote:
    >> JCD wrote:
    >>> Actually, I don't want to get HTML data from an internet source : I
    >>> already have the source code and the modifications are very simple : I
    >>> only have to change a few lines that depend on the results of my
    >>> application. I don't need to create a tree or a parser: I want to
    >>> store in my java code source this HTML text. The problem is that the
    >>> text is very long and it contains many " and many line feeds.
    >>> Of course, I could create for example an array containing each line of
    >>> the HTML page but it would be too long to write.
    >>> Is there a way of storing a giant String with " and line feeds?

    >>
    >> You can store " and line feeds in a String object. No problem there.

    >
    > Java string constants cannot span multiple lines.


    Yep. I didn't understand the OP's request correctly.

    At this point I can only recommend to use a decent text editor (eg. try
    textpad.com) and replace all newline characters by \n (or its
    crossplatform equivalent) and every " by a \"

    For example,
    String s = "hello \n\"world\"";

    Phil
     
    Philipp, May 30, 2008
    #10
  11. JCD

    Philipp Guest

    Lew wrote:
    > Philipp wrote:
    >> RedGrittyBrick wrote:
    >>> Philipp wrote:
    >>>> JCD wrote:
    >>>>> Actually, I don't want to get HTML data from an internet source : I
    >>>>> already have the source code and the modifications are very simple : I
    >>>>> only have to change a few lines that depend on the results of my
    >>>>> application. I don't need to create a tree or a parser: I want to
    >>>>> store in my java code source this HTML text. The problem is that the
    >>>>> text is very long and it contains many " and many line feeds.
    >>>>> Of course, I could create for example an array containing each line of
    >>>>> the HTML page but it would be too long to write.
    >>>>> Is there a way of storing a giant String with " and line feeds?
    >>>>
    >>>> You can store " and line feeds in a String object. No problem there.
    >>>
    >>> Java string constants cannot span multiple lines.

    >>
    >> Yep. I didn't understand the OP's request correctly.
    >>
    >> At this point I can only recommend to use a decent text editor (eg.
    >> try textpad.com) and replace all newline characters by \n (or its
    >> crossplatform equivalent) and every " by a \"
    >>
    >> For example,
    >> String s = "hello \n\"world\"";

    >
    > Using the + operator seems simpler somehow.


    Using the + operator does not exclude the fact that you need to replace
    line breaks by \n... I wasn't implying that you need to write everything
    on one code line.
     
    Philipp, May 30, 2008
    #11
  12. Lew wrote:
    > RedGrittyBrick wrote:
    >> Java string constants cannot span multiple lines. Java has no
    >> equivalent of the "here document" in Shell or Perl. Sometimes I miss
    >> these features.
    >>
    >> static final String HTML = "
    >> <html>
    >> <head>
    >> ...
    >> </body>
    >> </html>
    >> ";
    >>
    >> String html = <<END;
    >> <html>
    >> <head>
    >> ...
    >> </body>
    >> </html>
    >> END
    >>
    >> The Java-ish solution seems to be properties files.

    >
    > You just use the + operator:
    >
    > static final String HTML =
    > "<html>\n"
    > +" <head>\n"
    > ...
    > +" </body>\n"
    > +"</html>";
    >


    What irks me a little is it looks less like HTML, especially as you have
    to escape any special characters. An IDE makes it easier to type in such
    a string - e.g. Eclipse inserts <quote><newline><indent><plus><quote>
    whenever you press the enter key in a string.

    However if you already have a file of HTML (say) which you want to
    include as a string constant, I haven't found a way of inserting it into
    the Java source without also having to add quotes etc to the start and
    end of every line and hunting down and escaping any special characters.
    Maybe Eclipse and other IDEs have some clever way of automating this but
    I haven't found it.

    I often find it easier to add a text file to my final jar and have the
    program read it.

    For large amounts of text this is probably more appropriate than
    including it in a .java source as a long series of concatenated strings

    --
    RGB
     
    RedGrittyBrick, May 30, 2008
    #12
  13. JCD

    JCD Guest

    On 31 mai, 01:07, Lew <> wrote:
    > Philipp wrote:
    > > Using the + operator does not exclude the fact that you need to replace
    > > line breaks by \n... I wasn't implying that you need to write everything
    > > on one code line.

    >
    > I would replace the line breaks with ' ' rather than '\n', for readability..
    >
    > Actually, I would use a templating engine or JSP rather than embed HTML as a
    > String.
    >
    > Which latter, BTW, does not constitute "storing the text".  "Loading" it, maybe.
    >
    > --
    > Lew


    Hello. Thank you for your answers.
    In the end, it seems more difficult to store a huge text in the source
    code than storing it in a file on the hard disc. I will add the file
    to my .Jar.
     
    JCD, May 31, 2008
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Brian Cryer
    Replies:
    2
    Views:
    502
    Brian Cryer
    Jul 29, 2005
  2. toton
    Replies:
    11
    Views:
    723
    toton
    Oct 13, 2006
  3. =?Utf-8?B?UGF1bA==?=

    HTML Code embedded within HTML Page as Text

    =?Utf-8?B?UGF1bA==?=, Nov 7, 2007, in forum: ASP .Net
    Replies:
    5
    Views:
    450
    Mark Rae [MVP]
    Nov 7, 2007
  4. Jonathan Wood
    Replies:
    1
    Views:
    522
    Jonathan Wood
    Jun 2, 2008
  5. SRM
    Replies:
    1
    Views:
    509
    Göran Andersson
    Jan 9, 2009
Loading...

Share This Page