String in programming languages that are based off C

Discussion in 'C Programming' started by janus, Feb 17, 2014.

  1. Right. I meant hard to see them doing the same thing in the given
    context. But then, freed from the context in question, the third
    could also do the same thing.

    Ben Bacarisse, Feb 17, 2014
    1. Advertisements

  2. janus

    Kaz Kylheku Guest

    You mean ts->string or (*ts).string. (We cannot apply the -> operator to an
    identifier, and then in the same scope apply the . operator; it's
    These two could be similar.

    The luaC_newobj function could initialize the "string" pointer inside "ts" to
    be to an area past the "ts" structure, sacrificing a storage location for the
    sake of terser code.

    Furthermore, sacrificing space is not necessary because "ts" could be using the
    famous "C struct hack", such that string is actually an array at the end:

    struct ts_struct {
    /* ... */
    char string[1];

    If N bytes of memory are allocated, where N >= sizeof (ts_struct), then
    there are N - offsetof(struct ts_struct, string) bytes available in string[],
    Kaz Kylheku, Feb 18, 2014
    1. Advertisements

  3. It's not designed for stand alone programs, but for adding scripting to video
    games implemented in C/C++.
    Malcolm McLean, Feb 18, 2014
  4. Memory games.
    ts is a pointer to some sort of internal structure used by luaC_newobj.
    The space behind this structure is being used to hold a string. It's rather
    a dangerous thing to do, and usually indicates poor design, which is why C
    makes the syntax a bit tricky. However it probably isn't a bug - the Lua
    system likely knows that that space is not used for anything else and is big
    enough to hold the biggest legal string.
    Malcolm McLean, Feb 18, 2014
  5. What's dangerous about it?
    So why did C99 provide a specific syntax to simply doing this?

    Ben Bacarisse, Feb 18, 2014
  6. janus

    Kaz Kylheku Guest

    I wouldn't assume any such thing from the above statement, but rather interpret
    the statement as being about the language dialect only, not about the use of

    Of course Lua programs can have binding to API's that are not in the
    Library section of 1989 ANSI C.

    Presumably, it has a core that doesn't fail to build if some of these are
    Kaz Kylheku, Feb 18, 2014
  7. janus

    janus Guest

    Check out this link for the code,
    janus, Feb 18, 2014
  8. janus

    janus Guest

    My bad, was thinking of ts->string and not ts.string
    janus, Feb 18, 2014
  9. [131 double-spaced lines deleted]
    Please use a real newsreader to post here rather than the horribly
    broken Google Groups web interface. GG, for some reason, likes to
    double-space, and sometimes quadruple-space, quoted text. Articles
    should have actual line breaks to keep them below 80 columns, preferably
    about 72 columns. I use as my news server
    (it's free) and Gnus, which runs under Emacs, as my newsreader;
    Thunderbird also includes a decent newsreader.

    If you must use GG, please copy-and-paste your article into a decent text
    editor, edit out the added blank lines, and trim the quoted text down to just
    what's necessary for your followup to make sense; you don't need to quote all
    of a 100+-line article to add a one-line reply. (But do keep some context.)
    Keith Thompson, Feb 18, 2014
  10. janus

    James Kuyper Guest

    On 02/18/2014 01:12 PM, janus wrote:
    That's a great improvement. First of all, it includes the code that you
    gave in your first message, code which was entirely missing from more
    complete code that you showed in your second message. That code now
    appears in context, and that context confirms Ben Bacarisse's
    explanation of the use of ts+1, which I misunderstood. That code
    over-allocates space for the struct, and ts+1 points to the first byte
    of excess space, which is where the actual contents of the string is stored.

    Now, I'm finally prepared to answer your original question. The approach
    used in the actual Lua code has one key advantage: it has defined
    behavior even when using C90.

    However, since I don't believe in catering to old standards (C99 is
    already 14 years old), I would favor taking advantage of C99's concept
    of flexible array members. Your suggested alternative code:

    ts = &luaC_newobj(L, LUA_TSTRING, totalsize, list, 0)->ts;
    ts->tsv.len = l;
    ts->tsv.hash = h;
    ts->tsv.reserved = 0;
    memcpy(ts.string, str, l*sizeof(char));
    ts.string[l] = '\0'; /* ending 0 */

    is inherently wrong because the use of ts.string instead of ts->string.
    Even with that correction, it's still wrong, given that "ts" has the
    type TString*, and TString is a union type, not a struct type. However,
    if TString were modified as follows:

    typedef union TString {
    L_Umaxalign dummy; /* ensures maximum alignment for strings */
    struct {
    lu_byte reserved;
    unsigned int hash;
    size_t len; /* number of characters in string */
    char string[];
    } tsv;
    } TString;

    then your code would be correct if you replaced ts.string with
    ts->tsv.string, and I would strongly favor using that approach instead
    of the one used in the actual Lua implementation.

    However, with the original code, both ts and ts+1 are guaranteed (by
    whoever typedefs L_Umaxalign - the C standard guarantees no such thing)
    to be maximally aligned. With the above modification, ts is guaranteed
    to be maximally aligned, but ts->tsv.string is not. The phrase "ensures
    maximum alignment for strings" is ambiguous - I'm not sure if it would
    be considered to apply to ts->tsv.string, or only to ts.

    If use of C2011 were permitted, there would be no need for "dummy", and
    therefore no need for a union (making things a bit simpler), in order to
    ensure that ts->string was maximally aligned:

    #include <stddef.h> // for max_align_t

    typedef struct TString {
    lu_byte reserved;
    unsigned int hash;
    size_t len; /* number of characters in string */
    _Alignof(max_align_t) char string[];
    } TString;
    James Kuyper, Feb 18, 2014
  11. janus

    Kaz Kylheku Guest

    So you think that the ages old, reliable array-[1]-at-the-end-of-a-struct hack
    suddenly does not work in C99 compilers when they are operated in C90 mode?
    Kaz Kylheku, Feb 18, 2014
  12. [...]

    I don't believe that's an option in this case. The Lua
    implementation apparently is very carefully written to conform to
    the C89/C90 standard (and also to compile as C++) to maximize the
    number of compilers that can be used to compile it. It's optimized
    for portability over modernity.

    Keith Thompson, Feb 18, 2014
  13. You're writing to a block of memory in an uncontrolled way.
    Buffers (reserved areas of memory) are a valid concept in C, but they should
    normally be of only one type of object. Otherwise it's tempting to say
    "this buffer can hold a hundred size_ts or two hundred sint16s".
    Let's say that the struct has a member added or subtracted. Will this break the
    code? How would you find out? Let's say we move to wchar_t for our strings.
    Will adding a byte member to the struct break the code now? How would you find out?

    There are answers, of course. Sometimes you have to do these things. But
    often it's a sign of bad programming, micro-optimisation which impacts the
    maintainability of the code. Most IT projects don't fail because the
    program fragments memory too much. They fail because the interactions between
    the various components get too complicated for the programmers to understand,
    additional development causes unexpected bugs elsewhere, and becomes too
    expensive and error-prone to be viable.
    Malcolm McLean, Feb 18, 2014
  14. (snip, someone wrote)
    If you write and read back in the same program, then there should
    be no problem. But yes, if you want to read on a different system,
    where there might be different size or byte order, then it is
    a problem.

    In the days of smaller computers, it used to be much more common
    to write out temporary files and read them back again.

    -- glen
    glen herrmannsfeldt, Feb 18, 2014
  15. Maybe this is a difference in the use of the term dangerous. Some data
    structures need more care than others, but I don't call it dangerous.
    Sure, but (a) I think this is a case where you want to do this sort of
    thing, and (b) I don't think using the space beyond the declared members
    is, itself, a source of complexity. The way it is accessed might
    be (and there I think a flexible array member is a great help) but the
    alternative -- usually to allocate a separate area -- also adds some
    complexity to the code.
    Ben Bacarisse, Feb 19, 2014
  16. janus

    James Kuyper Guest

    It might not be an option for Lua code; but janus was asking about the
    reasons for their approach. In he ever needs to do something like this
    in a context where compatibility with C90 is not an issue, he should
    consider the benefits of using the flexible array member approach.
    Instead of accessing the array through a cast that is not type-safe, he
    can access it through a named struct member of a specific declared type,
    which seems much safer to me.
    James Kuyper, Feb 19, 2014
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.