How to handle text/html content from Firefox copied to Clipboardunder Linux

Discussion in 'Java' started by dimitrypolivaev, Jul 27, 2008.

  1. Hello,

    I post this mail because I need help. The problem is that text/html
    content copied to clipboard by Firefox under Linux (ubuntu 8.04) is
    corrupted when it is read by Java (e.g. JRE 1.6.0_07) and can not be
    used. However text/html content copied by OpenOffice can be obtained
    without any problem. As a consequence java based rich text editors can
    not paste content from Firefox at all. Other editors like OpenOffice
    manage to handle such content too.

    The attached java program gets text/html content from Clipboard and
    writes it to the standard output.
    So you can reproduce the problem by copying parts of internet pages
    shown in Firefox to clipboard, running the program and and seeing what
    it turns into.

    Looking forward to get any comments about this issue,
    Dimitry

    //-- code example begin

    import java.awt.Toolkit;
    import java.awt.datatransfer.Clipboard;
    import java.awt.datatransfer.DataFlavor;
    import java.awt.datatransfer.Transferable;
    import java.io.Reader;

    class ClipboardPrinter
    {
    public static void main( String args[] ) throws Exception
    {
    Clipboard systemClipboard = Toolkit.getDefaultToolkit()
    .getSystemClipboard();
    Transferable transferData = systemClipboard.getContents(null);
    if (transferData == null) {
    System.out.println("no content");
    return;
    }

    DataFlavor htmlReaderFlavor = new DataFlavor(
    "text/html; class=java.io.Reader");
    if (!transferData.isDataFlavorSupported(htmlReaderFlavor)) {
    System.out.println("no text/html reader content");
    return;
    }

    // print raw clipboard data as numbers
    Reader reader = (Reader)
    transferData.getTransferData(htmlReaderFlavor);
    int r = 0;
    int i = 0;
    while (-1 != (r = reader.read())){
    System.out.print(r);
    System.out.print('\t');
    if (++i % 8 == 0) {
    System.out.println();
    }
    }
    System.out.println();

    // print encoded clipboard data as string
    DataFlavor htmlStringFlavor = new DataFlavor(
    "text/html; class=java.lang.String");
    if (!transferData.isDataFlavorSupported(htmlStringFlavor)) {
    System.out.println("no text/html string content");
    return;
    }
    String content = (String) transferData
    .getTransferData(htmlStringFlavor);
    System.out.println(content);

    }
    }
    // -- code example end
     
    dimitrypolivaev, Jul 27, 2008
    #1
    1. Advertising

  2. On 27/07/2008 23:09, dimitrypolivaev allegedly wrote:
    > Hello,
    >
    > I post this mail because I need help. The problem is that text/html
    > content copied to clipboard by Firefox under Linux (ubuntu 8.04) is
    > corrupted when it is read by Java (e.g. JRE 1.6.0_07) and can not be
    > used. However text/html content copied by OpenOffice can be obtained
    > without any problem. As a consequence java based rich text editors can
    > not paste content from Firefox at all. Other editors like OpenOffice
    > manage to handle such content too.
    >
    > The attached java program gets text/html content from Clipboard and
    > writes it to the standard output.
    > So you can reproduce the problem by copying parts of internet pages
    > shown in Firefox to clipboard, running the program and and seeing what
    > it turns into.
    >
    > Looking forward to get any comments about this issue,
    > Dimitry
    >
    > //-- code example begin
    >
    > import java.awt.Toolkit;
    > import java.awt.datatransfer.Clipboard;
    > import java.awt.datatransfer.DataFlavor;
    > import java.awt.datatransfer.Transferable;
    > import java.io.Reader;
    >
    > class ClipboardPrinter
    > {
    > public static void main( String args[] ) throws Exception
    > {
    > Clipboard systemClipboard = Toolkit.getDefaultToolkit()
    > .getSystemClipboard();
    > Transferable transferData = systemClipboard.getContents(null);
    > if (transferData == null) {
    > System.out.println("no content");
    > return;
    > }
    >
    > DataFlavor htmlReaderFlavor = new DataFlavor(
    > "text/html; class=java.io.Reader");
    > if (!transferData.isDataFlavorSupported(htmlReaderFlavor)) {
    > System.out.println("no text/html reader content");
    > return;
    > }
    >
    > // print raw clipboard data as numbers
    > Reader reader = (Reader)
    > transferData.getTransferData(htmlReaderFlavor);
    > int r = 0;
    > int i = 0;
    > while (-1 != (r = reader.read())){
    > System.out.print(r);
    > System.out.print('\t');
    > if (++i % 8 == 0) {
    > System.out.println();
    > }
    > }
    > System.out.println();
    >
    > // print encoded clipboard data as string
    > DataFlavor htmlStringFlavor = new DataFlavor(
    > "text/html; class=java.lang.String");
    > if (!transferData.isDataFlavorSupported(htmlStringFlavor)) {
    > System.out.println("no text/html string content");
    > return;
    > }
    > String content = (String) transferData
    > .getTransferData(htmlStringFlavor);
    > System.out.println(content);
    >
    > }
    > }
    > // -- code example end


    I don't run Ubuntu. Could you post a sample input and output of that
    program?


    Meanwhile, a few suggestions. Your reading and outputting process is
    terribly ugly. Two alternatives (I'd favour the first one, personally):

    1) If you use a plain Reader (as opposed to option 2), read using a char
    array, e.g.:

    Reader r = ...;
    char buf = new char[1 << 5];

    for(int read = 0; (read = r.read(buf)) >= 0; ){
    System.out.println(buf, 0, read);
    }

    buf = null;


    2) If you can assume your data doesn't sport endless lines, use the
    BufferedReader's readLine() method, e.g.:

    Reader r = ...;

    BufferedReader bufr = new BufferedReader(r);

    for( String line; (line = bufr.readLine()) != null; ){
    System.out.println( line );
    }

    --
    DF.
     
    Daniele Futtorovic, Jul 28, 2008
    #2
    1. Advertising

  3. On 28 Jul., 05:20, Daniele Futtorovic <>
    wrote:
    > I don't run Ubuntu. Could you post a sample input and output of that
    > program?



    I rewrote the program to make my intention more clear. Here is the
    output, the new code follows. It looks like a java bug, doesn't it?

    -- Correct output under Windows begin --
    (0) < 60
    (1) h 104
    (2) t 116
    (3) m 109
    (4) l 108
    (5) > 62
    (6) < 60
    (7) b 98
    (8) o 111
    (9) d 100
    (10) y 121
    (11) > 62
    (12) 10
    (13) < 60
    (14) ! 33
    (15) - 45
    (16) - 45
    (17) S 83
    (18) t 116
    (19) a 97
    (20) r 114
    (21) t 116
    (22) F 70
    (23) r 114
    (24) a 97
    (25) g 103
    (26) m 109
    (27) e 101
    (28) n 110
    (29) t 116
    (30) - 45
    (31) - 45
    (32) > 62
    (33) o 111
    (34) u 117
    (35) t 116
    (36) p 112
    (37) u 117
    (38) t 116
    (39) < 60
    (40) ! 33
    (41) - 45
    (42) - 45
    (43) E 69
    (44) n 110
    (45) d 100
    (46) F 70
    (47) r 114
    (48) a 97
    (49) g 103
    (50) m 109
    (51) e 101
    (52) n 110
    (53) t 116
    (54) - 45
    (55) - 45
    (56) > 62
    (57) 10
    (58) < 60
    (59) / 47
    (60) b 98
    (61) o 111
    (62) d 100
    (63) y 121
    (64) > 62
    (65) 10
    (66) < 60
    (67) / 47
    (68) h 104
    (69) t 116
    (70) m 109
    (71) l 108
    (72) > 62

    <html><body>
    <!--StartFragment-->output<!--EndFragment-->
    </body>
    </html>
    -- Correct output under Windows end --

    -- Wrong output under Linux begin --
    (0) ? 65533
    (1) ? 65533
    (2) o 111
    (3) 0
    (4) u 117
    (5) 0
    (6) t 116
    (7) 0
    (8) p 112
    (9) 0
    (10) u 117
    (11) 0
    (12) t 116
    (13) 0

    ??o
    -- Wrong output under Linux end --

    //-- code example begin

    import java.awt.Toolkit;
    import java.awt.datatransfer.Clipboard;
    import java.awt.datatransfer.DataFlavor;
    import java.awt.datatransfer.Transferable;

    class ClipboardPrinter
    {
    public static void main( String args[] ) throws Exception
    {
    Clipboard systemClipboard = Toolkit.getDefaultToolkit()
    .getSystemClipboard();
    Transferable transferData = systemClipboard.getContents(null);
    if (transferData == null) {
    System.out.println("no content");
    return;
    }

    // print encoded clipboard data as string
    DataFlavor htmlStringFlavor = new DataFlavor(
    "text/html; class=java.lang.String");
    if (!transferData.isDataFlavorSupported(htmlStringFlavor)) {
    System.out.println("no text/html string content");
    return;
    }
    String content = (String) transferData
    .getTransferData(htmlStringFlavor);

    for (int i = 0; i < content.length(); i ++){
    printlnCharacter(i, content.charAt(i));
    }

    System.out.println();
    System.out.println(content);

    }

    private static void printlnCharacter(int i, final char c) {
    final int numericValue = c;
    System.out.print("(" + i + ")");
    System.out.print('\t');
    System.out.print(c == 0 || Character.isWhitespace(c) ? ' ' : c);
    System.out.print('\t');
    System.out.print(numericValue);
    System.out.println();
    }
    }
    // -- code example end
     
    dimitrypolivaev, Jul 28, 2008
    #3
  4. dimitrypolivaev

    Mark Space Guest

    dimitrypolivaev wrote:

    > -- Wrong output under Linux begin --
    > (0) ? 65533
    > (1) ? 65533
    > (2) o 111
    > (3) 0
    > (4) u 117
    > (5) 0


    This looks like UTF-16, little endian, with a byte order mark. Where
    the heck are you getting this file? Regardless, try UTF-16LE and see if
    things improve. Also, please read the input as binary and report the
    exact bytes being sent, that BOM is funky looking.
     
    Mark Space, Jul 28, 2008
    #4
  5. dimitrypolivaev

    Roedy Green Guest

    Re: How to handle text/html content from Firefox copied to Clipboard under Linux

    On Sun, 27 Jul 2008 14:09:51 -0700 (PDT), dimitrypolivaev
    <> wrote, quoted or indirectly quoted someone who said
    :

    >The attached java program gets text/html content from Clipboard and
    >writes it to the standard output.


    have a look at my clipboard grabbing code in Quoter. See if quoter
    will display the capture for you. If it does, steal my code.

    See http://mindprod.com/products1.html#QUOTER
    --

    Roedy Green Canadian Mind Products
    The Java Glossary
    http://mindprod.com
     
    Roedy Green, Jul 29, 2008
    #5
  6. dimitrypolivaev

    Lew Guest

    dimitrypolivaev wrote:
    > So for me it looks like either the Firefox writes data to clipboard in
    > a wrong way or java [sic] reads and interprets them wrong.


    It is an unpalatable workman who blames his tools for his own mistakes.

    --
    Lew


    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    From Jewish "scriptures".

    Kethoboth 3b: "The seed (sperm, child) of a Christian is of no
    more value than that of a beast."
     
    Lew, Jul 29, 2008
    #6
  7. > >> -- Wrong output under Linux begin --
    > >> (0)    ?    65533
    > >> (1)    ?    65533
    > >> (2)    o    111
    > >> (3)         0
    > >> (4)    u    117
    > >> (5)         0

    >
    > > This looks like UTF-16, little endian, with a byte order mark.  Where
    > > the heck are you getting this file?  Regardless, try UTF-16LE and see if
    > > things improve.  Also, please read the input as binary and report the
    > > exact bytes being sent, that BOM is funky looking.

    >
    > If correct, this analysis proves that it's not "a java [sic] bug" but
    > programmer error, a failure to apply the correct encoding to the file.
    >


    Please consider that I do not decode data myself, I only take it from
    Clipboard and they should be already decoded by java, and I expect
    that it handles encoding correct.

    > DataFlavor htmlStringFlavor = new DataFlavor("text/html; class=java.lang.String");
    > String content = (String) transferData.getTransferData(htmlStringFlavor);


    I do not use any files, I just get data from the clipboard and write
    them out.

    The java clipboard works fine with mime type "plain/text" (beyond the
    example code), but it does not work with "text/html" contents created
    by Firefox, which I use for getting formatted text.

    So for me it looks like either the Firefox writes data to clipboard in
    a wrong way or java reads and interprets them wrong.

    Dimitry
     
    dimitrypolivaev, Jul 29, 2008
    #7
  8. dimitrypolivaev

    Lew Guest

    dimitrypolivaev wrote:
    > how I can intentionally request a binary content from the clipboard
    > for my text/html target.


    To get the input in unaesthetic "binary" format, read it in a Stream instead of a
    Reader and practically keep the input as byte [].

    --
    Lew


    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    From Jewish "scriptures".

    Sanhedrin 57a . A Jew need not pay a gentile the wages owed him
    for work.
     
    Lew, Jul 29, 2008
    #8
  9. > > -- Wrong output under Linux begin --
    > > (0)        ?       65533
    > > (1)        ?       65533
    > > (2)        o       111
    > > (3)                0
    > > (4)        u       117
    > > (5)                0

    >
    > This looks like UTF-16, little endian, with a byte order mark.  Where
    > the heck are you getting this file?  Regardless, try UTF-16LE and see if
    > things improve.  Also, please read the input as binary and report the
    > exact bytes being sent, that BOM is funky looking.


    Thank you very much for your post, it brings me a bit forward, but the
    questions remain.

    BOM were 0xFFFE = 65534 and I see 0XFFD = 65533 which is a so-called
    REPLACEMENT CHARACTER.

    Further I do not understand why I get binary content in this case and
    how I can intentionally request a binary content from the clipboard
    for my text/html target.

    Dimitry
     
    dimitrypolivaev, Jul 29, 2008
    #9
  10. On 29 Jul., 14:06, Lew <com.lewscanon@lew> wrote:
    > dimitrypolivaev wrote:
    > > So for me it looks like either the Firefox writes data to clipboard in
    > > a wrong way or java [sic] reads and interprets them wrong.

    >
    > It is a poor workman who blames his tools for his own mistakes.
    >
    > --
    > Lew


    Well, what is here my mistake? I still does not see it, and I ask you
    for help. Could anybody tell me how I can get things work ?

    Dimitry
     
    dimitrypolivaev, Jul 29, 2008
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Damon Payne
    Replies:
    0
    Views:
    9,181
    Damon Payne
    Mar 8, 2005
  2. TheKeith
    Replies:
    20
    Views:
    106,552
    Chris Morris
    Oct 29, 2003
  3. hazz
    Replies:
    6
    Views:
    49,618
    SkyUCHC
    Jun 9, 2010
  4. =?ISO-8859-1?Q?KLEIN_St=E9phane?=
    Replies:
    3
    Views:
    451
    hanumizzle
    Oct 6, 2006
  5. nathan _

    ASP CDO sending MS Word copied text

    nathan _, Oct 14, 2009, in forum: ASP General
    Replies:
    5
    Views:
    917
    Adrienne Boswell
    Oct 15, 2009
Loading...

Share This Page