this cast to const char*

Discussion in 'C++' started by Gernot Frisch, May 5, 2011.

  1. Hi,

    with "MFC" I can do:
    CString str(_T("test"); printf("%s", str); // prints "test"

    With my own string class, however, there seems to be a 4 byte "header"
    before the string data.

    I have the member >const TCHAR* m_data;< as the first member of my string
    class.

    How does MS doe this?

    Thank you,
    -Gernot
    Gernot Frisch, May 5, 2011
    #1
    1. Advertising

  2. Gernot Frisch

    Kai-Uwe Bux Guest

    Gernot Frisch wrote:

    > with "MFC" I can do:
    > CString str(_T("test"); printf("%s", str); // prints "test"
    >
    > With my own string class, however, there seems to be a 4 byte "header"
    > before the string data.
    >
    > I have the member >const TCHAR* m_data;< as the first member of my string
    > class.
    >
    > How does MS doe this?


    First, MS could use compiler magic to define the UB of the first line. With
    your own (homegrown) string class, any use of printf would be UB (the specs
    of printf simply won't know about your string class).

    Second, I doubt that MS actually does use compiler magic and the first line
    could just "work" by accident. Question: does your string class have a
    virtual method (e.g., the destructor)? does the CString class?


    Best,

    Kai-Uwe Bux
    Kai-Uwe Bux, May 5, 2011
    #2
    1. Advertising

  3. Gernot Frisch

    Felix Bytow Guest

    hi,

    > with "MFC" I can do:
    > CString str(_T("test"); printf("%s", str); // prints "test"
    >
    > With my own string class, however, there seems to be a 4 byte "header"
    > before the string data.
    >
    > I have the member >const TCHAR* m_data;< as the first member of my
    > string class.
    >
    > How does MS doe this?
    >


    MS provides a casting operator for that.
    When writing a class you can define your own casting operators.

    e.g.:
    class str
    {
    char *buff;
    // some stuff

    // here comes the casting operator
    operator const char * (void) const
    {
    return buff;
    }
    };

    this way an object of "str" will be implicitly casted to const char * if
    needed.
    you could also declare it "explicit" though

    within your own casting operators you can also do some more complex
    stuff than simply returning a member of your class.

    I hope I helped you :)

    Felix
    Felix Bytow, May 5, 2011
    #3
  4. Gernot Frisch

    Öö Tiib Guest

    On May 5, 8:27 pm, "Gernot Frisch" <> wrote:
    > Hi,
    >
    > with "MFC" I can do:
    > CString str(_T("test"); printf("%s", str); // prints "test"


    Note that all good compilers and MS own static code analyser "prefast"
    complain against using class type object as printf argument.

    > With my own string class, however, there seems to be a 4 byte "header"
    > before the string data.
    >
    > I have the member >const TCHAR* m_data;< as the first member of my string
    > class.
    >
    > How does MS doe this?


    Likely by not having any virtual functions in their CString so it has
    no vtable. Who cares? It is wrong anyway.
    Öö Tiib, May 5, 2011
    #4
  5. Gernot Frisch

    gwowen Guest

    On May 5, 6:27 pm, "Gernot Frisch" <> wrote:
    > Hi,
    >
    > with "MFC" I can do:
    > CString str(_T("test"); printf("%s", str); // prints "test"
    >
    > With my own string class, however, there seems to be a 4 byte "header"
    > before the string data.
    >
    > I have the member >const TCHAR* m_data;< as the first member of my string
    > class.


    Passing C++ classes to variable argument functions like printf() is
    not advisable ... but if you *MUST* you can have your class inherit
    from a POD type with the m_data with its only (or first) member. This
    will force your non-POD class to be layout compatible, and so the
    "header" (which will probably be the table of function pointers for
    virtual dispatch). This will almost certainly work, as long as the
    size of m_data matches the size of argument the compiler expects for
    vararg functions...
    gwowen, May 5, 2011
    #5
  6. Gernot Frisch

    Öö Tiib Guest

    On May 5, 8:40 pm, Felix Bytow <-chemnitz.de>
    wrote:
    > hi,
    >
    > > with "MFC" I can do:
    > > CString str(_T("test"); printf("%s", str); // prints "test"

    >
    > > With my own string class, however, there seems to be a 4 byte "header"
    > > before the string data.

    >
    > > I have the member >const TCHAR* m_data;< as the first member of my
    > > string class.

    >
    > > How does MS doe this?

    >
    > MS provides a casting operator for that.
    > When writing a class you can define your own casting operators.
    >
    > e.g.:
    > class str
    > {
    >         char *buff;
    >         // some stuff
    >
    >         // here comes the casting operator
    >         operator const char * (void) const
    >         {
    >                 return buff;
    >         }
    >
    > };
    >
    > this way an object of "str" will be implicitly casted to const char * if
    > needed.
    > you could also declare it "explicit" though
    >
    > within your own casting operators you can also do some more complex
    > stuff than simply returning a member of your class.


    Nope, you are wrong. To use the casting operator you need to write:

    CString str(_T("test"); printf("%s", (LPCTSTR)str);

    It works just by accident, like Kai-Uwe said.

    Such implicit casting operators actually make CString dangerous to
    use, so if you use MFC for GUI then keep the CString strictly inside
    of your GUI classes.
    Öö Tiib, May 5, 2011
    #6
  7. Gernot Frisch

    Öö Tiib Guest

    On May 5, 8:50 pm, gwowen <> wrote:
    > On May 5, 6:27 pm, "Gernot Frisch" <> wrote:
    >
    > > Hi,

    >
    > > with "MFC" I can do:
    > > CString str(_T("test"); printf("%s", str); // prints "test"

    >
    > > With my own string class, however, there seems to be a 4 byte "header"
    > > before the string data.

    >
    > > I have the member >const TCHAR* m_data;< as the first member of my string
    > > class.

    >
    > Passing C++ classes to variable argument functions like printf() is
    > not advisable ... but if you *MUST* you can have your class inherit
    > from a POD type with the m_data with its only (or first) member.  This
    > will force your non-POD class to be layout compatible, and so the
    > "header" (which will probably be the table of function pointers for
    > virtual dispatch). This will almost certainly work, as long as the
    > size of m_data matches the size of argument the compiler expects for
    > vararg functions...


    It will force only POD base subobject to be layout compatible (and
    printf does not cast argument to that base) so if it works then again
    by accident.
    Öö Tiib, May 5, 2011
    #7
  8. Gernot Frisch

    Bo Persson Guest

    Öö Tiib wrote:
    > On May 5, 8:50 pm, gwowen <> wrote:
    >> On May 5, 6:27 pm, "Gernot Frisch" <> wrote:
    >>
    >>> Hi,

    >>
    >>> with "MFC" I can do:
    >>> CString str(_T("test"); printf("%s", str); // prints "test"

    >>
    >>> With my own string class, however, there seems to be a 4 byte
    >>> "header" before the string data.

    >>
    >>> I have the member >const TCHAR* m_data;< as the first member of
    >>> my string class.

    >>
    >> Passing C++ classes to variable argument functions like printf() is
    >> not advisable ... but if you *MUST* you can have your class inherit
    >> from a POD type with the m_data with its only (or first) member.
    >> This will force your non-POD class to be layout compatible, and so
    >> the "header" (which will probably be the table of function
    >> pointers for virtual dispatch). This will almost certainly work,
    >> as long as the size of m_data matches the size of argument the
    >> compiler expects for vararg functions...

    >
    > It will force only POD base subobject to be layout compatible (and
    > printf does not cast argument to that base) so if it works then
    > again by accident.


    It actually isn't by accident (not anymore, at least), MS has
    documented that passing a CString by value gets you the pointer to
    your string. They don't dare to break all the printfs already
    existing!


    Bo Persson
    Bo Persson, May 5, 2011
    #8
  9. Gernot Frisch

    gwowen Guest

    On May 5, 6:58 pm, Öö Tiib <> wrote:

    > It will force only POD base subobject to be layout compatible


    For single inheritance, thats a distinction without a difference.
    gwowen, May 5, 2011
    #9
  10. Gernot Frisch

    Öö Tiib Guest

    On May 5, 9:27 pm, gwowen <> wrote:
    > On May 5, 6:58 pm, Öö Tiib <> wrote:
    >
    > > It will force only POD base subobject to be layout compatible

    >
    > For single inheritance, thats a distinction without a difference.


    Hmm ... really? Where does standard say that POD base sub-object when
    used with single inheritance should be located at very start of object
    of derived class?
    Öö Tiib, May 5, 2011
    #10
  11. Gernot Frisch

    Balog Pal Guest

    "Kai-Uwe Bux" <>
    > Gernot Frisch wrote:
    >
    >> with "MFC" I can do:
    >> CString str(_T("test"); printf("%s", str); // prints "test"


    Passing non-pod for a function taking (...) is undefined behavior. If you
    try to compile something like this in gcc, you get a warning telling you're
    on the wrong track and at runtime termonate() will be called.

    >> With my own string class, however, there seems to be a 4 byte "header"
    >> before the string data.
    >>
    >> I have the member >const TCHAR* m_data;< as the first member of my string
    >> class.
    >>
    >> How does MS doe this?

    >
    > First, MS could use compiler magic to define the UB of the first line.
    > With
    > your own (homegrown) string class, any use of printf would be UB (the
    > specs
    > of printf simply won't know about your string class).
    >
    > Second, I doubt that MS actually does use compiler magic and the first
    > line
    > could just "work" by accident. Question: does your string class have a
    > virtual method (e.g., the destructor)? does the CString class?


    It is more than accident. Not compiler but library magic. At least the
    CString implementations I looked at (up to VS6, MFC4.2) the implementation
    used a string-holder struct with some header (having refcount among others)
    and the string data itself at the tail (variable lendth). And CString itself
    had no data elements but a sole pointer. That was set to point to start of
    the sting data inside the mentioned holder. Member functions used
    pointer-math to substract offset and get to the full structure.

    And compiler implementation for ... works just passing the structure content
    as a whole, as if it were POD, no ctor or dtor calls. All that together
    leading to "works".

    I did not see it specifically documented as a feature of CString, so it is
    unofficial heuristic at best, but the intention to make and keep it work is
    IMO clear.

    The visual compiler painfully lacks similar analiser as gcc's
    __attribute__(format) that checks the types and format string components, my
    usual tech s to use format helpers consistently. I.e:

    printf("int:%d, long: %ld, str: %s", f_d(i), f_ld(lo), f_s(str));

    where f_* are inline functions taking argument of the type proper for the
    format and just returning it. So I can pass either regular string or
    CString, it will be safe and fine. Creating compile time errors for serious
    mistakes too.
    Balog Pal, May 6, 2011
    #11
  12. On 06.05.11 01.11, Balog Pal wrote:
    [CString binary compatible to const char*]
    > It is more than accident. Not compiler but library magic. At least the
    > CString implementations I looked at (up to VS6, MFC4.2) the
    > implementation used a string-holder struct with some header (having
    > refcount among others) and the string data itself at the tail (variable
    > lendth). And CString itself had no data elements but a sole pointer.
    > That was set to point to start of the sting data inside the mentioned
    > holder. Member functions used pointer-math to substract offset and get
    > to the full structure.

    [...]
    > I did not see it specifically documented as a feature of CString, so it
    > is unofficial heuristic at best, but the intention to make and keep it
    > work is IMO clear.


    I made my own string class implementation to work in the same way
    (without knowing the CString implementation at this time). But I did
    this for a completely different reason. It makes the conversion from the
    string class to const char* very cheap. Otherwise the stored pointer
    must always be compared against NULL before adjusting it to the real
    string content. On the other side accessing the string length and the
    ref count at negative offsets does not cause any significant overhead on
    most platforms.

    Furthermore, using this memory layout enables array classes in the same
    library to simply cast from CString* to const char** by a reinterpret
    cast, which would necessarily cause an allocation otherwise.

    So I would not bet that the implementation is done due to the printf
    compatibility. This is most likely a spin off.


    > The visual compiler painfully lacks similar analiser as gcc's
    > __attribute__(format) that checks the types and format string
    > components, my usual tech s to use format helpers consistently. I.e:
    >
    > printf("int:%d, long: %ld, str: %s", f_d(i), f_ld(lo), f_s(str));


    Well, C++ and printf...
    There is still no reasonable replacement in the standard. One must be
    stoned to use the iostream output operators, because besides being type
    safe they create completely unreadable code, at least if you use
    different formatting (like hex and decimal) concurrently.


    Marcel
    Marcel Müller, May 6, 2011
    #12
  13. Gernot Frisch

    m0shbear Guest

    On May 5, 7:54 pm, Marcel Müller <> wrote:
    > On 06.05.11 01.11, Balog Pal wrote:
    > [CString binary compatible to const char*]
    >
    > > It is more than accident. Not compiler but library magic. At least the
    > > CString implementations I looked at (up to VS6, MFC4.2) the
    > > implementation used a string-holder struct with some header (having
    > > refcount among others) and the string data itself at the tail (variable
    > > lendth). And CString itself had no data elements but a sole pointer.
    > > That was set to point to start of the sting data inside the mentioned
    > > holder. Member functions used pointer-math to substract offset and get
    > > to the full structure.

    > [...]
    > > I did not see it specifically documented as a feature of CString, so it
    > > is unofficial heuristic at best, but the intention to make and keep it
    > > work is IMO clear.

    >
    > I made my own string class implementation to work in the same way
    > (without knowing the CString implementation at this time). But I did
    > this for a completely different reason. It makes the conversion from the
    > string class to const char* very cheap. Otherwise the stored pointer
    > must always be compared against NULL before adjusting it to the real
    > string content. On the other side accessing the string length and the
    > ref count at negative offsets does not cause any significant overhead on
    > most platforms.
    >
    > Furthermore, using this memory layout enables array classes in the same
    > library to simply cast from CString* to const char** by a reinterpret
    > cast, which would necessarily cause an allocation otherwise.
    >
    > So I would not bet that the implementation is done due to the printf
    > compatibility. This is most likely a spin off.
    >
    > > The visual compiler painfully lacks similar analiser as gcc's
    > > __attribute__(format) that checks the types and format string
    > > components, my usual tech s to use format helpers consistently. I.e:

    >
    > > printf("int:%d, long: %ld, str: %s", f_d(i), f_ld(lo), f_s(str));

    >
    > Well, C++ and printf...
    > There is still no reasonable replacement in the standard. One must be
    > stoned to use the iostream output operators, because besides being type
    > safe they create completely unreadable code, at least if you use
    > different formatting (like hex and decimal) concurrently.
    >
    > Marcel


    Boost has a nice replacement for std::printf, using overloaded '%'
    instead of ',' for varargs.
    m0shbear, May 6, 2011
    #13
  14. Gernot Frisch

    gwowen Guest

    On May 5, 7:37 pm, Öö Tiib <> wrote:

    > Hmm ... really? Where does standard say that POD base sub-object when
    > used with single inheritance should be located at very start of object
    > of derived class?


    The standard doesn't. Every single implementation that is or ever
    will be in existence does.
    gwowen, May 6, 2011
    #14
  15. On May 5, 10:27 am, "Gernot Frisch" <> wrote:
    > Hi,
    >
    > with "MFC" I can do:
    > CString str(_T("test"); printf("%s", str); // prints "test"
    >
    > With my own string class, however, there seems to be a 4 byte "header"
    > before the string data.
    >
    > I have the member >const TCHAR* m_data;< as the first member of my string
    > class.
    >
    > How does MS doe this?


    I just thought I'd pipe in that my own company has done this as well
    with its own custom string class, hacked it just like CString so it
    works in printf. I hate it. The code will test and work on windows,
    but blow up as soon as it's tried on a non-windows platform, and as
    most developers are windows based, including myself, this is quite
    annoying. I wish they never had a cast operator in their string class,
    and I wish they didn't use C-vararg printf style functions. But they
    do, and I suffer. Heed the warnings from this thread.
    Joshua Maurice, May 6, 2011
    #15
  16. Gernot Frisch

    Öö Tiib Guest

    On May 6, 8:28 am, gwowen <> wrote:
    > On May 5, 7:37 pm, Öö Tiib <> wrote:
    >
    > > Hmm ... really? Where does standard say that POD base sub-object when
    > > used with single inheritance should be located at very start of object
    > > of derived class?

    >
    > The standard doesn't.  Every single implementation that is or ever
    > will be in existence does.


    Uhh. I never dare to be so absolute about C++ compilers. Here i can
    even provide evidence of opposite with a compiler manufactured by
    CString creators themselves.

    <code>
    // WARNING: this is meant as example
    // of really awful coding practices
    #include<iostream>
    #include<cstdio>

    struct Pod { int p; };

    class DerivedFromPod
    : public Pod // single derived
    {
    public:
    DerivedFromPod() { p=42; d=0; };
    virtual ~DerivedFromPod() {};
    private:
    int d;
    };

    int main()
    {
    DerivedFromPod* der = new DerivedFromPod();
    Pod* pod = der; // implicit cast here

    std::cout << "der is at: " << der
    << " pod is at: " << pod << std::endl;
    printf( "ints from der %d, %d \n", *der );
    printf( "ints from pod %d, %d \n", *pod );
    delete der;
    }
    </code>

    Compiling it for Win32 MS compiler Visual C++ 0.9 (bundled in VS
    "CString" 2008 Professional)
    Running it produces something like:

    der is at: 00356940 pod is at: 00356944
    ints from der 4290588, 42
    ints from pod 42, -242263521

    So there we are with your "Every single implementation that is or ever
    will be in existence does".
    Öö Tiib, May 6, 2011
    #16
  17. * Öö Tiib, on 06.05.2011 12:55:
    > On May 6, 8:28 am, gwowen<> wrote:
    >> On May 5, 7:37 pm, Öö Tiib<> wrote:
    >>
    >>> Hmm ... really? Where does standard say that POD base sub-object when
    >>> used with single inheritance should be located at very start of object
    >>> of derived class?

    >>
    >> The standard doesn't. Every single implementation that is or ever
    >> will be in existence does.

    >
    > Uhh. I never dare to be so absolute about C++ compilers. Here i can
    > even provide evidence of opposite with a compiler manufactured by
    > CString creators themselves.
    >
    > <code>
    > // WARNING: this is meant as example
    > // of really awful coding practices
    > #include<iostream>
    > #include<cstdio>
    >
    > struct Pod { int p; };
    >
    > class DerivedFromPod
    > : public Pod // single derived
    > {
    > public:
    > DerivedFromPod() { p=42; d=0; };
    > virtual ~DerivedFromPod() {};
    > private:
    > int d;
    > };
    >
    > int main()
    > {
    > DerivedFromPod* der = new DerivedFromPod();
    > Pod* pod = der; // implicit cast here
    >
    > std::cout<< "der is at: "<< der
    > << " pod is at: "<< pod<< std::endl;
    > printf( "ints from der %d, %d \n", *der );
    > printf( "ints from pod %d, %d \n", *pod );
    > delete der;
    > }
    > </code>
    >
    > Compiling it for Win32 MS compiler Visual C++ 0.9 (bundled in VS
    > "CString" 2008 Professional)
    > Running it produces something like:
    >
    > der is at: 00356940 pod is at: 00356944
    > ints from der 4290588, 42
    > ints from pod 42, -242263521
    >
    > So there we are with your "Every single implementation that is or ever
    > will be in existence does".


    One of the reasons why one should not reinterpret_cast up or down a class
    hierarchy, but use static_cast which adjusts pointer values appropriately.

    That said, I think the original discussion had as an implicit assumption that
    one would not introduce virtual methods in derived class.

    And in that case the compiler would have to be perverse to start changing the
    layout. I'm not sure but I think that for C++0x the compiler would have to stop
    such practice, if it ever did. I.e., considerations of layout are not inherently
    inappropriate, but one needs to be very careful (like, no virtuals).


    Cheers,

    - Alf

    --
    blog at <url: http://alfps.wordpress.com>
    Alf P. Steinbach /Usenet, May 6, 2011
    #17
  18. Gernot Frisch

    Öö Tiib Guest

    On May 6, 2:25 pm, "Alf P. Steinbach /Usenet" <alf.p.steinbach
    > wrote:
    >
    > That said, I think the original discussion had as an implicit assumption that
    > one would not introduce virtual methods in derived class.


    One should not assume such things implicitly. Exceptional design
    constraints should be always documented or if possible enforced with
    static asserts. Otherwise someone maintains the code and breaks it. As
    result some printf (possibly called in rare conditions) starts to
    write utter crap or to crash.

    > And in that case the compiler would have to be perverse to start changingthe
    > layout. I'm not sure but I think that for C++0x the compiler would have to stop
    > such practice, if it ever did. I.e., considerations of layout are not inherently
    > inappropriate, but one needs to be very careful (like, no virtuals).


    I think that such "undefined behavior" that works predictably and
    predictability is portable can be documented by standard. I have no
    real need for it since i avoid using the UB anyway but someone may
    benefit from it.

    For example if the location of some hidden and undocumented 'vtable
    const&' members is anyway similar on most major implementations then
    why not to document some std::vtable<T> const& as a real implicit
    member of every class that has virtual functions?

    If then also to specify the memory layout of a T& (that is internally
    a T* const anyway on most implementations) then every class would be
    layout compatible as a result.

    Like i said i don't need to dig in there and would object someone
    really using such deep internals but some people (who want to
    implement reflection for example) might benefit. It does not anyway
    justify usage of the structured objects as arguments for a printf that
    is quite awful (confusing and error-prone) practice.
    Öö Tiib, May 6, 2011
    #18
  19. Gernot Frisch

    Goran Guest

    On May 5, 7:27 pm, "Gernot Frisch" <> wrote:
    > Hi,
    >
    > with "MFC" I can do:
    > CString str(_T("test"); printf("%s", str); // prints "test"
    >
    > With my own string class, however, there seems to be a 4 byte "header"
    > before the string data.
    >
    > I have the member >const TCHAR* m_data;< as the first member of my string
    > class.
    >
    > How does MS doe this?


    Luck (well, cheating). First of all, as said, passing non-pod is UB.
    What actually happens is that your string is passed to printf as a
    POD. But CString class is made in such a way that "this" is also a
    pointer to the first character (there's more to CString than a
    pointer, and you don't want to know that ;-).

    So... Your code has an error, but you don't see it. To make code error-
    free, do:

    printf("%s", static_cast<LPCTSTR>(yourstring)));

    Or, to avoid casting-induced eyesore, apply some simple DRY:

    inline LPCTSTR chars(const CString& s) { return s; }

    and then

    printf("%s", chars(yourstring));

    Had you compiled for Unicode, your code would have not worked. With
    Unicode on windows, use _tprintf.

    Finally, the only way to trick printf with your type is the trick MS
    used. DONT DO THAT! ;-)

    Goran.
    Goran, May 6, 2011
    #19
  20. Gernot Frisch

    Balog Pal Guest

    "Marcel Müller" <>
    [...]
    > Furthermore, using this memory layout enables array classes in the same
    > library to simply cast from CString* to const char** by a reinterpret
    > cast, which would necessarily cause an allocation otherwise.
    >
    > So I would not bet that the implementation is done due to the printf
    > compatibility. This is most likely a spin off.


    Makes sense.

    >> The visual compiler painfully lacks similar analiser as gcc's
    >> __attribute__(format) that checks the types and format string
    >> components, my usual tech s to use format helpers consistently. I.e:
    >>
    >> printf("int:%d, long: %ld, str: %s", f_d(i), f_ld(lo), f_s(str));

    >
    > Well, C++ and printf...


    It was just example. With MFC you certainly use CString::Format. That is IME
    superior to most alternatives in most cases I worked with. And the above
    trick covers the practical problems. (Certainly I'd wish to have gcc-like
    compiler support...)

    > There is still no reasonable replacement in the standard. One must be
    > stoned to use the iostream output operators, because besides being type
    > safe they create completely unreadable code, at least if you use different
    > formatting (like hex and decimal) concurrently.


    Yeah. :(
    Balog Pal, May 6, 2011
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Thomas Matthews
    Replies:
    5
    Views:
    2,357
    tom_usenet
    Aug 2, 2004
  2. Santa
    Replies:
    1
    Views:
    1,049
    Mark A. Odell
    Jul 17, 2003
  3. Replies:
    24
    Views:
    799
    Netocrat
    Oct 30, 2005
  4. lovecreatesbeauty
    Replies:
    1
    Views:
    992
    Ian Collins
    May 9, 2006
  5. Javier
    Replies:
    2
    Views:
    533
    James Kanze
    Sep 4, 2007
Loading...

Share This Page