Is there a dedicated unicode separator character?

K

Karla

U+2028 is a line separator
U+2029 is a paragraph separator

Is there a unicode character that means nothing but "field separator"?

I'd like to create a ? separated file that will have the least possible
chance of containing data that accidentaly has the field separator in it.

-Karla
 
O

Oliver Wong

Karla said:
U+2028 is a line separator
U+2029 is a paragraph separator

Is there a unicode character that means nothing but "field separator"?

I'd like to create a ? separated file that will have the least possible
chance of containing data that accidentaly has the field separator in it.

Depends on your data. For any character you can select, I can provide
you with an infinite number of strings which will contain that character.
For example, if you select character U+whatever, there's the string of
length 1 which contaisn only "\uwhatever", there's the string of length 2
which contains "\uwhatever\uwhatever", the string of length 3, and so on.

If it were up to me, I'd probably use one of the "private areas". But
again, you might eventually receive data containing that character, so
you'll have to have some sort of escaping mechanism anyway.

- Oliver
 
S

Steve W. Jackson

Karla <[email protected]> said:
U+2028 is a line separator
U+2029 is a paragraph separator

Is there a unicode character that means nothing but "field separator"?

I'd like to create a ? separated file that will have the least possible
chance of containing data that accidentaly has the field separator in it.

-Karla

I'm no Unicode expert, so I don't know what those characters are
supposed to mean. But the concept of a "field separator" has to rely on
what constitutes a "field" in some context.

When the context is a "command line", the separator between each "field"
or argument is traditionally white space -- one or more characters. In
the Unix awk environment where I once did tons of scripting, that same
default occurred, but it could be changed readily, which made it easy to
handle delimited data -- such as the so-called CSV or comma-separated
values format.

So...your context determines what a "field separator" will be.

= Steve =
 
S

Stefan Ram

Karla said:
U+2028 is a line separator
U+2029 is a paragraph separator
Is there a unicode character that means nothing but "field separator"?

Why I do not understand why this should be discussed in a Java
newsgroup and not »comp.std.internat«, I'd think of

U+001F
 
S

Stefan Ram

Karla said:
U+2028 is a line separator
U+2029 is a paragraph separator
Is there a unicode character that means nothing but "field separator"?

While I do not understand why this should be discussed in a
Java newsgroup and not »comp.std.internat«, I'd think of

U+001F
 
J

JScoobyCed

Karla said:
U+2028 is a line separator
U+2029 is a paragraph separator

Is there a unicode character that means nothing but "field separator"?

I'd like to create a ? separated file that will have the least possible
chance of containing data that accidentaly has the field separator in it.

-Karla
Is this a Java question?
I guess there is no such character, or it would be widely known. But you
can change your strategy: not a field separator, but a field separator
AND a field encloser.

ex. of field separator (using the space as separator):
data1 data2 data3
another example that illustrates your pb:
data 1 data 2 data 3
You will parse this previous line as 6 tokens instead of 3

ex. of field separator and field encloser (using space and double-quote
symbol):
"data 1" "data 2" "data ""3"""

in the last set of data ("data ""3"""), because the field encloser
character is in my data (which is: data "3"), I simply double it. That
means you need to remove double occurrences of your field encloser.

Real life example: CSV format (the separator is the comma character, the
field encloser is the double-quote character)

This way you remove 100% of the risk to be in a ambiguous situation.
 
D

Dale King

There's also

U+001C file separator
U+001D group separator
U+001E record separator
U+001F unit separator

I agree with a previous poster that basically there is no single
universal separator. You just have to define your protocol to use one
and probably have an escape mechanism if you need to use that character
in your data.
Is this a Java question?

It probably isn't, but Java has done more than most other languages for
the adoption of Unicode, so it is not too off-topic.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,904
Latest member
HealthyVisionsCBDPrice

Latest Threads

Top