How to handle text/html content from Firefox copied to Clipboardunder Linux

D

dimitrypolivaev

Hello,

I post this mail because I need help. The problem is that text/html
content copied to clipboard by Firefox under Linux (ubuntu 8.04) is
corrupted when it is read by Java (e.g. JRE 1.6.0_07) and can not be
used. However text/html content copied by OpenOffice can be obtained
without any problem. As a consequence java based rich text editors can
not paste content from Firefox at all. Other editors like OpenOffice
manage to handle such content too.

The attached java program gets text/html content from Clipboard and
writes it to the standard output.
So you can reproduce the problem by copying parts of internet pages
shown in Firefox to clipboard, running the program and and seeing what
it turns into.

Looking forward to get any comments about this issue,
Dimitry

//-- code example begin

import java.awt.Toolkit;
import java.awt.datatransfer.Clipboard;
import java.awt.datatransfer.DataFlavor;
import java.awt.datatransfer.Transferable;
import java.io.Reader;

class ClipboardPrinter
{
public static void main( String args[] ) throws Exception
{
Clipboard systemClipboard = Toolkit.getDefaultToolkit()
.getSystemClipboard();
Transferable transferData = systemClipboard.getContents(null);
if (transferData == null) {
System.out.println("no content");
return;
}

DataFlavor htmlReaderFlavor = new DataFlavor(
"text/html; class=java.io.Reader");
if (!transferData.isDataFlavorSupported(htmlReaderFlavor)) {
System.out.println("no text/html reader content");
return;
}

// print raw clipboard data as numbers
Reader reader = (Reader)
transferData.getTransferData(htmlReaderFlavor);
int r = 0;
int i = 0;
while (-1 != (r = reader.read())){
System.out.print(r);
System.out.print('\t');
if (++i % 8 == 0) {
System.out.println();
}
}
System.out.println();

// print encoded clipboard data as string
DataFlavor htmlStringFlavor = new DataFlavor(
"text/html; class=java.lang.String");
if (!transferData.isDataFlavorSupported(htmlStringFlavor)) {
System.out.println("no text/html string content");
return;
}
String content = (String) transferData
.getTransferData(htmlStringFlavor);
System.out.println(content);

}
}
// -- code example end
 
D

Daniele Futtorovic

Hello,

I post this mail because I need help. The problem is that text/html
content copied to clipboard by Firefox under Linux (ubuntu 8.04) is
corrupted when it is read by Java (e.g. JRE 1.6.0_07) and can not be
used. However text/html content copied by OpenOffice can be obtained
without any problem. As a consequence java based rich text editors can
not paste content from Firefox at all. Other editors like OpenOffice
manage to handle such content too.

The attached java program gets text/html content from Clipboard and
writes it to the standard output.
So you can reproduce the problem by copying parts of internet pages
shown in Firefox to clipboard, running the program and and seeing what
it turns into.

Looking forward to get any comments about this issue,
Dimitry

//-- code example begin

import java.awt.Toolkit;
import java.awt.datatransfer.Clipboard;
import java.awt.datatransfer.DataFlavor;
import java.awt.datatransfer.Transferable;
import java.io.Reader;

class ClipboardPrinter
{
public static void main( String args[] ) throws Exception
{
Clipboard systemClipboard = Toolkit.getDefaultToolkit()
.getSystemClipboard();
Transferable transferData = systemClipboard.getContents(null);
if (transferData == null) {
System.out.println("no content");
return;
}

DataFlavor htmlReaderFlavor = new DataFlavor(
"text/html; class=java.io.Reader");
if (!transferData.isDataFlavorSupported(htmlReaderFlavor)) {
System.out.println("no text/html reader content");
return;
}

// print raw clipboard data as numbers
Reader reader = (Reader)
transferData.getTransferData(htmlReaderFlavor);
int r = 0;
int i = 0;
while (-1 != (r = reader.read())){
System.out.print(r);
System.out.print('\t');
if (++i % 8 == 0) {
System.out.println();
}
}
System.out.println();

// print encoded clipboard data as string
DataFlavor htmlStringFlavor = new DataFlavor(
"text/html; class=java.lang.String");
if (!transferData.isDataFlavorSupported(htmlStringFlavor)) {
System.out.println("no text/html string content");
return;
}
String content = (String) transferData
.getTransferData(htmlStringFlavor);
System.out.println(content);

}
}
// -- code example end

I don't run Ubuntu. Could you post a sample input and output of that
program?


Meanwhile, a few suggestions. Your reading and outputting process is
terribly ugly. Two alternatives (I'd favour the first one, personally):

1) If you use a plain Reader (as opposed to option 2), read using a char
array, e.g.:

Reader r = ...;
char buf = new char[1 << 5];

for(int read = 0; (read = r.read(buf)) >= 0; ){
System.out.println(buf, 0, read);
}

buf = null;


2) If you can assume your data doesn't sport endless lines, use the
BufferedReader's readLine() method, e.g.:

Reader r = ...;

BufferedReader bufr = new BufferedReader(r);

for( String line; (line = bufr.readLine()) != null; ){
System.out.println( line );
}
 
D

dimitrypolivaev

I don't run Ubuntu. Could you post a sample input and output of that
program?


I rewrote the program to make my intention more clear. Here is the
output, the new code follows. It looks like a java bug, doesn't it?

-- Correct output under Windows begin --
(0) < 60
(1) h 104
(2) t 116
(3) m 109
(4) l 108
(5) > 62
(6) < 60
(7) b 98
(8) o 111
(9) d 100
(10) y 121
(11) > 62
(12) 10
(13) < 60
(14) ! 33
(15) - 45
(16) - 45
(17) S 83
(18) t 116
(19) a 97
(20) r 114
(21) t 116
(22) F 70
(23) r 114
(24) a 97
(25) g 103
(26) m 109
(27) e 101
(28) n 110
(29) t 116
(30) - 45
(31) - 45
(32) > 62
(33) o 111
(34) u 117
(35) t 116
(36) p 112
(37) u 117
(38) t 116
(39) < 60
(40) ! 33
(41) - 45
(42) - 45
(43) E 69
(44) n 110
(45) d 100
(46) F 70
(47) r 114
(48) a 97
(49) g 103
(50) m 109
(51) e 101
(52) n 110
(53) t 116
(54) - 45
(55) - 45
(56) > 62
(57) 10
(58) < 60
(59) / 47
(60) b 98
(61) o 111
(62) d 100
(63) y 121
(64) > 62
(65) 10
(66) < 60
(67) / 47
(68) h 104
(69) t 116
(70) m 109
(71) l 108
(72) > 62

<html><body>
<!--StartFragment-->output<!--EndFragment-->
</body>
</html>
-- Correct output under Windows end --

-- Wrong output under Linux begin --
(0) ? 65533
(1) ? 65533
(2) o 111
(3) 0
(4) u 117
(5) 0
(6) t 116
(7) 0
(8) p 112
(9) 0
(10) u 117
(11) 0
(12) t 116
(13) 0

??o
-- Wrong output under Linux end --

//-- code example begin

import java.awt.Toolkit;
import java.awt.datatransfer.Clipboard;
import java.awt.datatransfer.DataFlavor;
import java.awt.datatransfer.Transferable;

class ClipboardPrinter
{
public static void main( String args[] ) throws Exception
{
Clipboard systemClipboard = Toolkit.getDefaultToolkit()
.getSystemClipboard();
Transferable transferData = systemClipboard.getContents(null);
if (transferData == null) {
System.out.println("no content");
return;
}

// print encoded clipboard data as string
DataFlavor htmlStringFlavor = new DataFlavor(
"text/html; class=java.lang.String");
if (!transferData.isDataFlavorSupported(htmlStringFlavor)) {
System.out.println("no text/html string content");
return;
}
String content = (String) transferData
.getTransferData(htmlStringFlavor);

for (int i = 0; i < content.length(); i ++){
printlnCharacter(i, content.charAt(i));
}

System.out.println();
System.out.println(content);

}

private static void printlnCharacter(int i, final char c) {
final int numericValue = c;
System.out.print("(" + i + ")");
System.out.print('\t');
System.out.print(c == 0 || Character.isWhitespace(c) ? ' ' : c);
System.out.print('\t');
System.out.print(numericValue);
System.out.println();
}
}
// -- code example end
 
M

Mark Space

dimitrypolivaev said:
-- Wrong output under Linux begin --
(0) ? 65533
(1) ? 65533
(2) o 111
(3) 0
(4) u 117
(5) 0

This looks like UTF-16, little endian, with a byte order mark. Where
the heck are you getting this file? Regardless, try UTF-16LE and see if
things improve. Also, please read the input as binary and report the
exact bytes being sent, that BOM is funky looking.
 
L

Lew

dimitrypolivaev said:
So for me it looks like either the Firefox writes data to clipboard in
a wrong way or java [sic] reads and interprets them wrong.

It is an unpalatable workman who blames his tools for his own mistakes.

--
Lew


- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
From Jewish "scriptures".

Kethoboth 3b: "The seed (sperm, child) of a Christian is of no
more value than that of a beast."
 
D

dimitrypolivaev

-- Wrong output under Linux begin --
This looks like UTF-16, little endian, with a byte order mark.  Where
the heck are you getting this file?  Regardless, try UTF-16LE and see if
things improve.  Also, please read the input as binary and report the
exact bytes being sent, that BOM is funky looking.

If correct, this analysis proves that it's not "a java [sic] bug" but
programmer error, a failure to apply the correct encoding to the file.

Please consider that I do not decode data myself, I only take it from
Clipboard and they should be already decoded by java, and I expect
that it handles encoding correct.
DataFlavor htmlStringFlavor = new DataFlavor("text/html; class=java.lang.String");
String content = (String) transferData.getTransferData(htmlStringFlavor);

I do not use any files, I just get data from the clipboard and write
them out.

The java clipboard works fine with mime type "plain/text" (beyond the
example code), but it does not work with "text/html" contents created
by Firefox, which I use for getting formatted text.

So for me it looks like either the Firefox writes data to clipboard in
a wrong way or java reads and interprets them wrong.

Dimitry
 
L

Lew

dimitrypolivaev said:
how I can intentionally request a binary content from the clipboard
for my text/html target.

To get the input in unaesthetic "binary" format, read it in a Stream instead of a
Reader and practically keep the input as byte [].

--
Lew


- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
From Jewish "scriptures".

Sanhedrin 57a . A Jew need not pay a gentile the wages owed him
for work.
 
D

dimitrypolivaev

-- Wrong output under Linux begin --
This looks like UTF-16, little endian, with a byte order mark.  Where
the heck are you getting this file?  Regardless, try UTF-16LE and see if
things improve.  Also, please read the input as binary and report the
exact bytes being sent, that BOM is funky looking.

Thank you very much for your post, it brings me a bit forward, but the
questions remain.

BOM were 0xFFFE = 65534 and I see 0XFFD = 65533 which is a so-called
REPLACEMENT CHARACTER.

Further I do not understand why I get binary content in this case and
how I can intentionally request a binary content from the clipboard
for my text/html target.

Dimitry
 
D

dimitrypolivaev

dimitrypolivaev said:
So for me it looks like either the Firefox writes data to clipboard in
a wrong way or java [sic] reads and interprets them wrong.

It is a poor workman who blames his tools for his own mistakes.

Well, what is here my mistake? I still does not see it, and I ask you
for help. Could anybody tell me how I can get things work ?

Dimitry
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,900
Latest member
Nell636132

Latest Threads

Top