StreamTokenizer and unicode

P

ppcguy

i'm trying to parse a file using StreamTokenizer class.
the file has strings like "Hello\u1234There".

StreamTokenizer returns this as a string "Hellou1234There".

i understand this class is not unicode capable - but any hints
as to how this could be parsed properly.

i thought any 'u' followed by 4 hex digits would work because
i've got some strings where 'u' + hext digits is part of a string.
 
P

Patricia Shanahan

ppcguy said:
i'm trying to parse a file using StreamTokenizer class.
the file has strings like "Hello\u1234There".

StreamTokenizer returns this as a string "Hellou1234There".

i understand this class is not unicode capable - but any hints
as to how this could be parsed properly.

i thought any 'u' followed by 4 hex digits would work because
i've got some strings where 'u' + hext digits is part of a string.

The u + hex digits is part of Java source code processing, including
processing of string literals.

Since you are reading from a Stream, how about applying a filter to do
any preprocessing you want?

Patricia
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,575
Members
45,053
Latest member
billing-software

Latest Threads

Top