D
Daniel
I don't understand it. It seems to be as pointless and outside the spirit of c/c++ io as would be a floatstream or intstream.
Daniel
Daniel
I don't understand it.
It seems to be as pointless and outside the spirit of c/c++ io as would
be a floatstream or intstream.
What is wstream
and what you mean by the spirit?
No more than stream of floats, stream of ints etc. A unicode character is an encoding of bytes, as are floats, ints, etc.I further assume that you mean just usage of
wchar_t for streaming. If I misunderstood then, sorry. Does "stream of text"
make any sense to you?
Pointless? Let me try to give two points:
First point: Something like wchar_t is "the character type" in languages
like Java or C#.
C and C++ are often used for writing efficient native
libraries for the likes of Java or C# so wchar_t can be used for
interfacing with those languages.
Second point: On modern hardware an octet of bits tends to be too little
portion of data. Bytes are often emulated by extracting octets from
bigger words. wchar_t can be actually more efficient than char on
such platforms. Therefore usage of wchar_t can be platform-specific
optimization.
Not convinced, but pleased that you took the time to reply, thanks.So ... what is wrong with those points?
"C/C++" seems to be actually rather unfortunate to say in the context. In
C++, wchar_t is a distinct fundamental type. In C, it is a typedef of
an integral type.
Apologies for the imprecision, just colloquial for std::basic_stream<wchar_t>
etc.
The C language approach has always been to do I/O using library functions
which view data as a stream of bytes.
No more than stream of floats, stream of ints etc. A unicode character isan
encoding of bytes, as are floats, ints, etc.
Java streams characters as UTF8 octets, which in my understanding is
also emerging as best practice in C++.
The string representation, whether std::basic_string<char>,
std::basic_string<wchar_t>,std::basic_string<char32_t>, should be
orthogonal to streaming (but in c++ isn't.) The language should (I
hasten to add in my opinion) support streaming wchar_t and
std::basic_string<wchar_t> to and from byte streams.
I see the representation of characters and strings as completely
orthogonal to how streaming should work, which traditionally for C has
been streams of bytes.
But won't things have been simpler if we could have writtenI would turn your question around and ask why are 8 bit representations
of text not equally "pointless"? UTF-8 is about as useless and as
useful as UTF-16. In neither case can any one "character" represent an
entire unicode code point. UTF-32 can, but that does not take you a
great deal further because of combining characters, and because not all
graphemes can be represented by a single unicode codepoint anyway.
UTF-8 is marginally more space efficient than UTF-16 with European
writing systems. UTF-16 is marginally more easy to decompose into
graphemes than is UTF-8 in non-European writing systems.
The Java InputStream classes read and write bytes, Java streams everything as bytes. The Java Reader and Writer classes adapt the byte streams to characters according to an encoding.Java streams characters as UTF16BE. You can also define a byte stream,
in which case, you have to specify its encoding (or it is specified
In practice the char16_t and char32_t variants in C++11 will probably
turn out to be more useful in future, because they can represent UTF-16
and UTF-32 in a platform independent way. UTF-16 seems to have become
the de facto unicode standard for the web [...]
[...] and of course for anything
using Microsoft products. UTF-8 has more or less become the de facto
unicode standard for unix-like systems.
Chris
I don't understand this sentence, I'm sure it's correct, but I don't understand it. At some point the internal representation of a character needs to be converted into the format of the data that flows on the stream. But thatcould happen at different places. For example, c++ could have supported (surely?)Streams are used for locale-dependent string formatting. Connecting them
directly to output files only works if the internal representation of
characters exactly matches the output format. This is the same for narrow
and wide character streams.
Any serious text processing software needs to support files with
different character encodings
I expect wide streams to be useful only for non-portable simple Windows
programs where both the internal and external text representations are
UTF-16. E.g. things like:
std::wcout << L"PATH=" << _wgetenv(L"PATH") << std::endl;
Here, using std::wcout makes it possible to output Unicode data. By using
narrow std::cout this would not be possible as Windows does not support
UTF-8 locales.
So if there were no std::wcout (nor any other wide char
output function) the program should prepare the text in an internal
buffer as UTF-16 and then copy it out to the terminal in binary mode,
which would not be so nice for some casual text output.
All of the files that I work with on Windows and other platforms
are stored as UTF8. All of the software that I've seen that works
on them essentially use binary streams and libraries to handle
encoding conversions.
Not sure what you mean there. Do you propose the output stream would
contain a mix of wchar_t and char?
Which is supportable for the unicode encodings.Or do you mean wchar_t s should be automatically converted into char?
Yes
This is not possible unless the stream knows both the encodings used for
wchar_t and char, and they happen to be convertible into each other.
Right, EBCDIC and many others, the way it works in JAVA is that there is aYes, in an ideal world we would only have UTF-32, UTF-16 and UTF-8
encodings and such seamless conversion would be possible. However, the
real world is still far away from that goal, there are zillions of
encodings and UTF-8 is even not a possible option everywhere (Windows
console!)
It might well be that the C++ stream design is bad and there are far
better alternatives possible.
In its current form it seems it attempts to
do many different things at the same time, with the result that it is not
really good in any of them.
Appreciate your comments, thanks, just trying to formulate my own thoughts onBut this is a general streams problem, not
directly related to wchar_t.
I see the representation of characters and strings as completely
orthogonal to how streaming should work, which traditionally for C has
been streams of bytes.
Daniel <[email protected]> wrote in
It was possibly introduced for supporting a major OS whose whole SDK was
defined in UCS-2 (later redeclared to be UTF-16), plus maybe Java also had
some indirect influence.
As it happens, today wchar_t is non-portable (size
heavily depending on the platform) so the utility of related concepts like
wstream is also greatly reduced.
James,If this is wstream, it was *not* introduced to support wchar_t
at the system level. Just the opposite: it was introduced to
support localization internally (e.g. where the characters you
want to output as digits aren't present in the single byte
encodings of char). It defines all system level IO in terms of
char, and defines how the wchar_t should be mapped to and from
char (in filebuf, so that you can write char, even though
everything upstream was in wchar_t).
Just for my own understanding, could you clarify where in the wostream
composition that char first appears? In basic_filebuf<wchar_t>, the
signatures of the put methods are still wchar_t, and the buffer variable
appears to be also an array of wchar_t.
Just for my own understanding, could you clarify where in the
wostream composition that char first appears? In
basic_filebuf<wchar_t>, the signatures of the put methods are
still wchar_t, and the buffer variable appears to be also an
array of wchar_t.
Want to reply to this thread or ask your own question?
You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.