parsing away a character

K

kvram

Hello everybody & thanks for reading this.

I am in the process of decoding a XML-dump of a big database with a
lot
of characters and symbols. I have done a lot of rules to prase away
all noise and what keeps me annoying at the end is this character: ^Q.

If i open the result file in notepad, i see it is represented by white
space. Fine i thought. But it really is not white space. When i look
at it at the console (i work with Cygwin i see it is ^Q .
I tried to print out the ASCII table to get its digit but is is not
included there. So the question is how to get rid of this (control?)
character/String. While
writing this it crossed my mind that i can do a:
readLine and String.contains("^Q") which i do after typing this
message. But the question is really what is this ^Q anyway.

thanks for your time/thoughts/suggestions,
kave
 
L

Lew

what keeps me annoying at the end is this character: ^Q.
Control-Q

If i open the result file in notepad, i see it is represented by white
space. Fine i thought. But it really is not white space.

Because it's a control character doesn't necessarily mean it's whitespace.
When i look at it at the console (i work with Cygwin i see it is ^Q .
I tried to print out the ASCII table to get its digit but is is not

Of cousrse it is. Control-A is 1, Control-B is 2, Control-C is 3, ...
included there. So the question is how to get rid of this (control?)
character/String. While

That depends on how you're processing the data. For example, with a String
you could replace() characters, or if you're streaming through the characters
you could toss the unwanted ones and copy the rest.
writing this it crossed my mind that i can do a:
readLine and String.contains("^Q") which i do after typing this
message. But the question is really what is this ^Q anyway.

Control-Q.

The use of the caret to indicate "Control-whatever" is standard notation,
particularly for consoles.
 
O

Oliver Wong

Hello everybody & thanks for reading this.

I am in the process of decoding a XML-dump of a big database with a
lot
of characters and symbols. I have done a lot of rules to prase away
all noise and what keeps me annoying at the end is this character: ^Q.

If i open the result file in notepad, i see it is represented by white
space. Fine i thought. But it really is not white space. When i look
at it at the console (i work with Cygwin i see it is ^Q .
I tried to print out the ASCII table to get its digit but is is not
included there. So the question is how to get rid of this (control?)
character/String. While
writing this it crossed my mind that i can do a:
readLine and String.contains("^Q") which i do after typing this
message. But the question is really what is this ^Q anyway.

What I'd usually do in a situation like this is dump the data to a
file (which it sounds like you've done already, since you said you opened
something in notepad), and then open it with a hex editor to see the exact
sequence of bytes you're getting.

From there, the rest depends a lot on your code. Typically, when
you're doing XML parsing in Java, you're not working with bytes, but with
characters. Since you mention String.contains(), I assume you are working
with characters, and not bytes. There's a "translation" process occurring
at some point from bytes to characters, but exactly which byte-sequence
maps onto which character sequence depends on the encoding you select.

- Oliver
 
T

Thomas Fritsch

I am in the process of decoding a XML-dump of a big database with a
lot
of characters and symbols. I have done a lot of rules to prase away
all noise and what keeps me annoying at the end is this character: ^Q.

If i open the result file in notepad, i see it is represented by white
space. Fine i thought. But it really is not white space. When i look
at it at the console (i work with Cygwin i see it is ^Q .
I tried to print out the ASCII table to get its digit but is is not
included there. So the question is how to get rid of this (control?)
character/String. While
writing this it crossed my mind that i can do a:
readLine and String.contains("^Q") which i do after typing this
message. But the question is really what is this ^Q anyway.
See <http://en.wikipedia.org/wiki/ASCII>
Quoted from there:
"The use of Control-S (XOFF, an abbreviation for "transmit off") as a
handshaking signal warning a sender to stop transmission because of
impending overflow, and Control-Q (XON, "transmit on") to resume
sending, persists to this day in many systems as a manual output control
technique."
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top