Aliasing in C++11

Discussion in 'C++' started by molw5.iwg@gmail.com, Feb 21, 2013.

  1. Guest

    Based on my reading of the standard the compiler is free to assume a pointer to
    a strongly typed enumeration aliases only other pointers to the same type and
    raw character pointers. For example, I would like to define a more restrictive
    byte array for interacting with binary data as follows:

    enum byte : uint8_t {};
    std::vector <byte> buffer;

    Specifically, I believe string-like classes could benefit greatly from this
    sort of implementation by defining their internal state using definitions
    similar to the above:

    class string
    {
    ...
    private:
    enum byte : char {};
    std::unique_ptr <byte[]> buffer;
    };

    The tests I've performed with GCC support the above interpretation – doesa
    strict reading of the standard support the above, and/or is there some other
    well-known alternative? Thanks in advance,

    -molw5
    , Feb 21, 2013
    #1
    1. Advertising

  2. Öö Tiib Guest

    On Thursday, 21 February 2013 23:27:25 UTC+2, wrote:
    > Based on my reading of the standard the compiler is free to assume a pointer
    > to a strongly typed enumeration aliases only other pointers to the same type
    > and raw character pointers. For example, I would like to define a more
    > restrictive byte array for interacting with binary data as follows:
    >
    > enum byte : uint8_t {};


    No you are wrong. Your's is traditional enum with underlying type. This is
    strongly typed enum:

    enum class byte : uint8_t {};

    > std::vector <byte> buffer;


    Even with yours 'byte' it is definitely more restrictive.

    > Specifically, I believe string-like classes could benefit greatly from this
    > sort of implementation by defining their internal state using definitions
    > similar to the above:
    >
    > class string
    > {
    > ...
    > private:
    > enum byte : char {};
    > std::unique_ptr <byte[]> buffer;
    > };


    Do not perhaps write yet another text-containing class. The market is full
    there are way too lot of such.

    > The tests I've performed with GCC support the above interpretation – does a
    > strict reading of the standard support the above, and/or is there some other
    > well-known alternative? Thanks in advance,


    What is the "this"? It should work. Currently most people use std::string
    (that actually contains UTF-8 encoded text) for storing texts. I fully
    agree with you that it is loose and unsafe thing. However it is unlikely
    that some revolution is coming. Billions of lines of code and millions of
    interfaces all over the world use that std::string and problems are
    consistently elsewhere.
    Öö Tiib, Feb 21, 2013
    #2
    1. Advertising

  3. Guest

    On Thursday, February 21, 2013 2:48:20 PM UTC-7, Öö Tiib wrote:
    > No you are wrong. Your's is traditional enum with underlying type. This is
    >
    > strongly typed enum:
    >
    >
    >
    > enum class byte : uint8_t {};


    Apologies - never the less in this context the distinction is irrelevant (the enumeration has no members).

    > What is the "this"? It should work. Currently most people use std::string
    >
    > (that actually contains UTF-8 encoded text) for storing texts. I fully
    >
    > agree with you that it is loose and unsafe thing. However it is unlikely
    >
    > that some revolution is coming. Billions of lines of code and millions of
    >
    > interfaces all over the world use that std::string and problems are
    >
    > consistently elsewhere.


    It does work – in the context of serialization, however, writes to the buffer
    almost always invalidate every other state as the underlying char* pointer may
    alias everything (including the pointer itself). The compiler is almost
    never able to inline to the point where it can resolve these sort of aliasing
    problems. I was asking whether or not another solution was commonly used to
    define a raw character array (string, buffer, vector, what have you) with
    stronger aliasing properties similar to the above – clearly this solutionis
    C++11 specific and I'd imagine others have attempted to address this problem in
    the past.

    Clearly the above could be used to define other primitive-equivalent types with
    stronger aliasing properties, string is merely the most interesting as the
    external interface need not change (as char* may still alias byte*). I believe
    it would be possible to write a standard conforming string library that uses
    such a byte definition, freeing the compiler to maintain state across writes to
    the string; I'm not, at present, planning to write one myself.
    , Feb 21, 2013
    #3
  4. Öö Tiib Guest

    On Friday, 22 February 2013 01:04:15 UTC+2, wrote:
    > On Thursday, February 21, 2013 2:48:20 PM UTC-7, Öö Tiib wrote:
    > > No you are wrong. Your's is traditional enum with underlying type. Thisis
    > > strongly typed enum:
    > >
    > > enum class byte : uint8_t {};

    >
    > Apologies - never the less in this context the distinction is irrelevant
    > (the enumeration has no members).


    It is somewhat relevant. By language rules a value of enum class type does
    not implicitly convert to values of integral types. Traditional enum does.
    Lack of named enumerators actually does not matter since enum may have all
    the values of underlying type regardless if enumerator for particular value
    exists or not.

    > > What is the "this"? It should work. Currently most people use std::string
    > > (that actually contains UTF-8 encoded text) for storing texts. I fully
    > > agree with you that it is loose and unsafe thing. However it is unlikely
    > > that some revolution is coming. Billions of lines of code and millions of
    > > interfaces all over the world use that std::string and problems are
    > > consistently elsewhere.

    >
    > It does work – in the context of serialization, however, writes to the buffer
    > almost always invalidate every other state as the underlying char* pointer
    > may alias everything (including the pointer itself). The compiler is almost
    > never able to inline to the point where it can resolve these sort of aliasing
    > problems. I was asking whether or not another solution was commonly usedto
    > define a raw character array (string, buffer, vector, what have you) with
    > stronger aliasing properties similar to the above – clearly this solution is
    > C++11 specific and I'd imagine others have attempted to address this problem
    > in the past.


    Lot of people certainly have. It is very likely that you can find something
    already implemented. I in fact haven't. I use std::string for text and
    std::vector<char> for byte buffer. I know it is unsafe so I am more
    careful. The benefit why I do it is that majority of libraries and tools support types like that. I would have to waste performance into conversions
    when using something else.

    > Clearly the above could be used to define other primitive-equivalent types
    > with stronger aliasing properties, string is merely the most interesting
    > as the external interface need not change (as char* may still alias byte*).
    > I believe it would be possible to write a standard conforming string
    > library that uses such a byte definition, freeing the compiler to maintain
    > state across writes to the string; I'm not, at present, planning to write
    > one myself.


    It feels that you are correct that it is possible. However ... writing
    standard conforming string library does not feel to have point whatsoever.
    Standard currently requires the std::string to be externally as loose and
    unsafe as it is. So only thing possible is to make it internally more
    efficient for particular purpose, not safer. It is unlikely to make
    some major difference in efficiency either since there are lot of different
    implementations of std::string already floating around as there are lot
    of other text-containing and managing libraries and classes for any
    purpose imaginable.
    Öö Tiib, Feb 21, 2013
    #4
  5. Guest

    On Thursday, February 21, 2013 4:36:09 PM UTC-7, Öö Tiib wrote:
    > It is somewhat relevant. By language rules a value of enum class type does
    >
    > not implicitly convert to values of integral types. Traditional enum does..
    >
    > Lack of named enumerators actually does not matter since enum may have all
    >
    > the values of underlying type regardless if enumerator for particular value
    >
    > exists or not.


    Agreed – I'm still not seeing the relevance to this topic.

    > Lot of people certainly have. It is very likely that you can find something
    >
    > already implemented. I in fact haven't. I use std::string for text and
    >
    > std::vector<char> for byte buffer. I know it is unsafe so I am more
    >
    > careful. The benefit why I do it is that majority of libraries and tools support types like that. I would have to waste performance into conversions
    >
    > when using something else.


    Like I said – still looking for additional information. Thank you for the
    response.

    > It feels that you are correct that it is possible. However ... writing
    >
    > standard conforming string library does not feel to have point whatsoever..
    >
    > Standard currently requires the std::string to be externally as loose and
    >
    > unsafe as it is. So only thing possible is to make it internally more
    >
    > efficient for particular purpose, not safer. It is unlikely to make
    >
    > some major difference in efficiency either since there are lot of different
    >
    > implementations of std::string already floating around as there are lot
    >
    > of other text-containing and managing libraries and classes for any
    >
    > purpose imaginable.


    The advantage is the compiler is able to maintain state across string writes,
    as I mentioned above; that alters the performance of user code. Obviously the
    impact is domain specific – what isn't?
    , Feb 22, 2013
    #5
  6. Öö Tiib Guest

    On Friday, 22 February 2013 04:13:07 UTC+2, wrote:
    > The advantage is the compiler is able to maintain state across string writes,
    > as I mentioned above; that alters the performance of user code. Obviously the
    > impact is domain specific – what isn't?


    I am still unsure why compiler can not optimize away any aliasing
    checks already by simply assuming that you do not somehow use underlying buffer
    of std::string or std::vector<char> under question as storage for some other
    objects possibly involved in your domain-specific solution?
    Öö Tiib, Feb 22, 2013
    #6
  7. Guest

    On Thursday, February 21, 2013 8:57:57 PM UTC-7, Öö Tiib wrote:
    > I am still unsure why compiler can not optimize away any aliasing
    >
    > checks already by simply assuming that you do not somehow use underlying buffer
    >
    > of std::string or std::vector<char> under question as storage for some other
    >
    > objects possibly involved in your domain-specific solution?


    I honestly don't know how to respond to that. Review the strict aliasing rules?
    , Feb 22, 2013
    #7
  8. Öö Tiib Guest

    Re: String is not UTF (was Re: Aliasing in C++11)

    On Friday, 22 February 2013 17:13:52 UTC+2, Andy Champ wrote:
    > On 21/02/2013 21:48, Öö Tiib wrote:
    > > What is the "this"? It should work. Currently most people use std::string
    > > (that actually contains UTF-8 encoded text) for storing texts. I fully
    > > agree with you that it is loose and unsafe thing. However it is unlikely
    > > that some revolution is coming. Billions of lines of code and millions of
    > > interfaces all over the world use that std::string and problems are
    > > consistently elsewhere.

    >
    > std::string does not contain UTF-8 encoded text. It contains chars. If
    > your implementation treats those chars as UTF-8 encoded characters, then
    > fine - but that is NOT part of the standard, it's just something that
    > *nix operating systems tend to do.


    I did in fact describe most widespread practice. char is a byte by C++
    standard keep there whatever encoding standard is silent. Other
    possibility is to use std::wstring for texts if wchar_t can contain
    UTF-16LE. It might help in Windows or with QT as GUI. That is anyway
    minority maybe 20% of C++ code written.

    > You might like to consider what happens when you resize a string to
    > remove part of a multibyte character. There's nothing there to make it
    > UTF safe...


    There are no alternatives. Such and all other difficulties are normal
    work. That is why developers are for.

    > I suspect this is why fstream::eek:pen takes a char* - someone assumed that
    > a char* was utf-8, and for those operating systems where a filename is
    > unicode it's broken.


    I repeat ... there are no serious support to Unicode in C++. fstream was
    likely designed when no one thought that file names can be anything but
    ASCII. UTF-8 is most popular encoding. Majority of HTML or other XML you
    see in internet are in that. So it makes sense to use something what you
    do not have to convert.
    Öö Tiib, Feb 22, 2013
    #8
  9. Öö Tiib Guest

    On Friday, 22 February 2013 06:12:59 UTC+2, wrote:
    > On Thursday, February 21, 2013 8:57:57 PM UTC-7, Öö Tiib wrote:
    > > I am still unsure why compiler can not optimize away any aliasing
    > > checks already by simply assuming that you do not somehow use underlying buffer
    > > of std::string or std::vector<char> under question as storage for some other
    > > objects possibly involved in your domain-specific solution?

    >
    > I honestly don't know how to respond to that. Review the strict aliasingrules?


    It all seems to be about storage taken with malloc(). It feels that if you
    use underlying buffer of std::string or std::vector<char> for odd purposes
    then you are on your own anyway. I can't find that standard compliant
    compiler is required to expect that std::string::iterator and double* may
    point to same thing.

    So ... what you do seems more and more domain-specific.
    Öö Tiib, Feb 22, 2013
    #9
  10. Guest

    On Friday, February 22, 2013 9:19:28 AM UTC-7, Öö Tiib wrote:
    > It all seems to be about storage taken with malloc(). It feels that if you
    >
    > use underlying buffer of std::string or std::vector<char> for odd purposes
    >
    > then you are on your own anyway. I can't find that standard compliant
    >
    > compiler is required to expect that std::string::iterator and double* may
    >
    > point to same thing.
    >
    >
    >
    > So ... what you do seems more and more domain-specific.


    I don't know why I'm still replying to this – std::string:iterator contains a
    raw character pointer or offset into it's buffer. The compiler is forced to
    assume the write itself may alias double*.
    , Feb 22, 2013
    #10
  11. Öö Tiib Guest

    On Friday, 22 February 2013 18:28:00 UTC+2, wrote:
    > I don't know why I'm still replying to this – std::string:iterator contains a
    > raw character pointer or offset into it's buffer. The compiler is forcedto
    > assume the write itself may alias double*.


    std::string::iterator is nowhere required to contain ordinary raw character
    pointers. Its members are not specified by standard.
    Öö Tiib, Feb 22, 2013
    #11
  12. Guest

    On Friday, February 22, 2013 9:54:02 AM UTC-7, Öö Tiib wrote:
    > std::string::iterator is nowhere required to contain ordinary raw character
    >
    > pointers. Its members are not specified by standard.


    No kidding? So I suppose the above byte definition could be used instead?
    I'm sorry Tiib, I'm done – perhaps someone with more patience will be willing
    to pick this up with you.
    , Feb 22, 2013
    #12
  13. Öö Tiib Guest

    On Friday, 22 February 2013 18:58:58 UTC+2, wrote:
    > On Friday, February 22, 2013 9:54:02 AM UTC-7, Öö Tiib wrote:
    > > std::string::iterator is nowhere required to contain ordinary raw character
    > > pointers. Its members are not specified by standard.

    >
    > No kidding? So I suppose the above byte definition could be used instead?
    > I'm sorry Tiib, I'm done – perhaps someone with more patience will be willing
    > to pick this up with you.


    That 'byte' of yours used internally in std::string::iterator? If one
    implementing C++ compiler feels it beneficial then easily. There are no
    requirements that there are pointers inside whatsoever. Implementation
    may use pointers, yes. However whatever implementation inner things with
    whatever implementation-specific attributes may be in it. Standard does
    only specify interface requirements for standard library.
    Öö Tiib, Feb 22, 2013
    #13
  14. Nobody Guest

    Re: String is not UTF (was Re: Aliasing in C++11)

    On Fri, 22 Feb 2013 15:13:52 +0000, Andy Champ wrote:

    > I suspect this is why fstream::eek:pen takes a char* - someone assumed that a
    > char* was utf-8, and for those operating systems where a filename is
    > unicode it's broken.


    I assume that it's because fopen() takes a char*.

    All widely-used OSes can reference (some) files using char*, even if it's
    suboptimal (e.g. on Windows, only files whose names are valid in the
    current codepage can be opened that way).

    Making fstream::eek:pen() take e.g. a wchar_t* or std::wstring would be even
    more broken on Unix than using char* is on Windows. Unix filenames are
    just NUL-terminated sequences of bytes with no defined encoding.
    Nobody, Feb 23, 2013
    #14
  15. Öö Tiib Guest

    Re: String is not UTF (was Re: Aliasing in C++11)

    On Saturday, 23 February 2013 23:32:47 UTC+2, Andy Champ wrote:
    >
    > C++ on Windows is only 20% of all C++? I'm astonished. Do you have a
    > source for that?


    Yup. Trends change. Sad I can't share the source.

    Most of the commercial C++ code is written to work on several platforms
    (Mac, Linux, Tablets, Consoles, Windows) and so it is not Windows
    specific and companies do not care. Microsoft has achieved that with
    their C++ unfriendliness and bad compilers.

    The hobbyist developers use g++ or CLang way more than MSVC and those
    are better on other platforms like Linux.

    Most of the Windows and only Windows stuff is currently written in
    C# or other .NET things and so it is not C++.
    Öö Tiib, Feb 24, 2013
    #15
  16. Bo Persson Guest

    Re: String is not UTF (was Re: Aliasing in C++11)

    Andy Champ skrev 2013-02-22 16:13:
    > On 21/02/2013 21:48, Öö Tiib wrote:
    >> What is the "this"? It should work. Currently most people use std::string
    >> (that actually contains UTF-8 encoded text) for storing texts. I fully
    >> agree with you that it is loose and unsafe thing. However it is unlikely
    >> that some revolution is coming. Billions of lines of code and millions of
    >> interfaces all over the world use that std::string and problems are
    >> consistently elsewhere.

    >
    > std::string does not contain UTF-8 encoded text. It contains chars. If
    > your implementation treats those chars as UTF-8 encoded characters, then
    > fine - but that is NOT part of the standard, it's just something that
    > *nix operating systems tend to do.
    >
    > You might like to consider what happens when you resize a string to
    > remove part of a multibyte character. There's nothing there to make it
    > UTF safe...
    >
    > I suspect this is why fstream::eek:pen takes a char* - someone assumed that
    > a char* was utf-8, and for those operating systems where a filename is
    > unicode it's broken.
    >


    Actually, it's not. The historical reason is that fstream::eek:pen was
    designed at a time when std::string did not yet exist.

    Note that in C++11 we do have an fstream::eek:pen(std::string). And without
    a required UTF-8 support.


    Bo Persson
    Bo Persson, Feb 24, 2013
    #16
  17. Öö Tiib Guest

    Re: String is not UTF (was Re: Aliasing in C++11)

    On Sunday, 24 February 2013 10:26:33 UTC+2, Paavo Helde wrote:
    > �� Tiib <> wrote in news:751648e2-4344-4338-96ef-
    > :
    >
    > > On Saturday, 23 February 2013 23:32:47 UTC+2, Andy Champ wrote:
    > >>
    > >> C++ on Windows is only 20% of all C++? I'm astonished. Do you have a
    > >> source for that?

    > >
    > > Yup. Trends change. Sad I can't share the source.
    > >
    > > Most of the commercial C++ code is written to work on several platforms
    > > (Mac, Linux, Tablets, Consoles, Windows) and so it is not Windows
    > > specific and companies do not care.

    >
    > So you don't count portable programs running on Windows as Windows
    > programs? With such definitions the 20% number makes more sense indeed...


    Sure, they are Windows programs when built for Windows. However most
    Windows-specific is likely removed, what remains is likely isolated
    to small modules and the texts are likely kept as UTF-8 not UTF-16LE.

    > > Microsoft has achieved that with
    > > their C++ unfriendliness and bad compilers.

    >
    > IMO, they have a quite decent compiler and an excellent debugger (with
    > some braindead quirks of course, but who doesn't have them). The compiler
    > is a bit lagging when adapting to standards compliance, but this doesn't
    > make it unusable.


    Last 15 years the trend has been to lag behind of others. At the moment free IDEs and free compilers are from several sides better than MS commercial
    tools. WinDbg is fine; the one integrated to IDE does not apparently
    understand what is going on. Good engineer can work well even with bad
    tools.

    > It is probably true that writing strictly Windows-specific stuff in a
    > Windows-specific language like C# is easier than in C++. That's the whole
    > point and not-so-secret agenda of creating the Windows-specific languages
    > in the first place. So actually the percent of Windows-specific programs
    > written in C++ should be zero; if it is 20% this probably means somebody
    > has chosen a wrong tool for the job.


    Somebody does always something with wrong tool; no statistics are needed;
    that is human nature. :D C++ can be best tool to solve a problem for
    Windows as well. When efficiency is needed then C++ is unrivaled.
    C++ has also unrivaled power of integrating different things together.
    When those powers are not needed then C++ is perhaps too complicated tool
    for many.
    Öö Tiib, Feb 25, 2013
    #17
  18. Nobody Guest

    Re: String is not UTF (was Re: Aliasing in C++11)

    On Sat, 23 Feb 2013 21:37:17 +0000, Andy Champ wrote:

    > On 23/02/2013 16:54, Nobody wrote:
    >> Making fstream::eek:pen() take e.g. a wchar_t* or std::wstring would be even
    >> more broken on Unix than using char* is on Windows. Unix filenames are
    >> just NUL-terminated sequences of bytes with no defined encoding.

    >
    > Ah - I hadn't realised that. So what does ls display if you have
    > backspaces or newlines in the filename? Something stupid I take it? It
    > does rather explain the decision.


    Originally, ls just copied the byte sequence to stdout. Modern versions
    (at least the GNU version) will decode the string according to the current
    locale then re-encode it, with question marks (or escape sequences with
    -Q) for non-printable characters or sequences which cannot be decoded
    according to the current locale.

    Taking the encoding into account means that multi-column output is aligned
    correctly when dealing with multi-byte characters (i.e. columns are based
    upon characters rather than bytes).
    Nobody, Feb 25, 2013
    #18
  19. James Kanze Guest

    Re: String is not UTF (was Re: Aliasing in C++11)

    On Saturday, 23 February 2013 16:54:06 UTC, Nobody wrote:
    > On Fri, 22 Feb 2013 15:13:52 +0000, Andy Champ wrote:
    > > I suspect this is why fstream::eek:pen takes a char* - someone assumed that a
    > > char* was utf-8, and for those operating systems where a filename is
    > > unicode it's broken.


    > I assume that it's because fopen() takes a char*.


    And because fstream preceded std::string by a number of years.

    > All widely-used OSes can reference (some) files using char*, even if it's
    > suboptimal (e.g. on Windows, only files whose names are valid in the
    > current codepage can be opened that way).


    > Making fstream::eek:pen() take e.g. a wchar_t* or std::wstring would be even
    > more broken on Unix than using char* is on Windows. Unix filenames are
    > just NUL-terminated sequences of bytes with no defined encoding.


    I suspect that more likely, any suggestion of having `fstream`
    take `wchar_t` (or even `std::string`) simply came up too late.
    What you can pass to fstream::eek:pen is implementation defined
    anyway, so there's no problem with other OSs; the implementation
    defined legal set of wchar_t filenames under Unix is empty; you
    get the same sort of error that you get when you try to open
    a file named ":::" under Windows.

    --
    James
    James Kanze, Feb 25, 2013
    #19
  20. Rui Maciel Guest

    Re: String is not UTF (was Re: Aliasing in C++11)

    Andy Champ wrote:

    > I didn't know about that. But then...
    >
    > <http://www.cplusplus.com/reference/fstream/fstream/open/>
    >
    > doesn't have it, nor do MS
    >
    > <http://msdn.microsoft.com/en-us/library/4dx08bh4.aspx>
    >
    > although
    >
    > <http://en.cppreference.com/w/cpp/io/basic_fstream/open>
    >
    > does. Do you know which compilers support it?


    Those compilers that comply with C++11 support it.


    Rui Maciel
    Rui Maciel, Feb 25, 2013
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Tim Tyler

    LCD anti-aliasing in Java

    Tim Tyler, Sep 4, 2003, in forum: Java
    Replies:
    2
    Views:
    1,283
    Tim Tyler
    Sep 5, 2003
  2. Kevin Bertman

    Anti-aliasing GIF Images

    Kevin Bertman, Nov 26, 2004, in forum: Java
    Replies:
    4
    Views:
    745
    marcus
    Nov 29, 2004
  3. Wesley T Perkins

    Aliasing a class name?

    Wesley T Perkins, Jun 28, 2005, in forum: Java
    Replies:
    8
    Views:
    1,954
    John Currier
    Jul 1, 2005
  4. Roedy Green

    More anti-aliasing puzzles

    Roedy Green, Aug 10, 2005, in forum: Java
    Replies:
    25
    Views:
    2,187
    Roedy Green
    Aug 16, 2005
  5. palmis

    aliasing

    palmis, Feb 2, 2006, in forum: Java
    Replies:
    0
    Views:
    723
    palmis
    Feb 2, 2006
Loading...

Share This Page