Text processing

Discussion in 'C Programming' started by jacob navia, Sep 26, 2011.

  1. jacob navia

    jacob navia Guest

    Following the series about commenting code, here is an installment about
    text processing.

    What do you think?

    Thanks for your attention.

    -----------------------------------------------------------------------
    Text manipulation

    Text files are a widely used format for storing data. They are usually
    quite compact (no text processing formats like bold, italics, or other
    font related instructions) and they are widely portable if written in
    the ASCII subset of text data.

    A widely used application of text files are program files. Most
    programming languages (and here C is not an exception) store the
    program in text format.

    So let's see a simple application of a text manipulating program.
    The task at hand is to prepare a C program text to be translated
    into several languages. Obviously, the character string:

    "Please enter the file name"

    will not be readily comprehensible to a spanish user. It would
    be better if the program would show in Spain the character string:

    "Entre el nombre del fichero por favor"

    To prepare this translation, we need to extract all character
    strings from the program text and store them in some table.
    Instead of referencing directly a character string, the program
    will reference a certain offset from our table. In the above
    example the character string would be replaced by

    StringTable[6]

    To do this transformation we will write into the first line
    of our program:

    static char *StringTable[];

    Then, in each line where a character string appears we will
    replace it with an index into the string table.

    printf("Please enter the file name");

    will become

    printf(StringTable[x]);

    where "x" will be the index for that string in our table.

    At the end of the file we will append the definition of our
    string table with:

    static char *StringTable[] = {
    ...,
    ...,
    "Please enter the file name",
    ...,

    NULL
    };

    After some hours of work, we come with the following solution. We test a
    bit, and it seems to work.
    1 #include <stdio.h>
    2 #include <stdlib.h>
    3 #include <strings.h>
    4
    5 // Reads a single character constant
    6 static int ReadCharConstant(FILE *infile)
    7 {
    8 int c;
    9 c = fgetc(infile);
    10 putchar('\'');
    11 while (c != EOF && c != '\'') {
    12 putchar(c);
    13 if (c == '\\') {
    14 c = fgetc(infile);
    15 if (c == EOF)
    16 return EOF;
    17 putchar(c);
    18 }
    19 c = fgetc(infile);
    20 }
    21 if (c != EOF){
    22 putchar(c);
    23 c = fgetc(infile);
    24 }
    25 return c;
    26 }
    27
    28 static int ReadLongComment(FILE *infile)
    29 {
    30 int c;
    31 putchar('/');
    32 putchar('*');
    33 c = fgetc(infile);
    34
    35 do {
    36
    37 while (c != '*' && c != EOF) {
    38 putchar(c);
    39 c = fgetc(infile);
    40 }
    41 if (c == '*') {
    42 putchar(c);
    43 c = fgetc(infile);
    44 }
    45 } while (c != '/' && c != EOF); /* Problem 2 */
    46 if (c == '/')
    47 putchar(c);
    48 return c;
    49 }
    50
    51 static int ReadLineComment(FILE *infile)
    52 {
    53 int c = fgetc(infile);
    54
    55 putchar('/'); putchar('/');
    56 while (c != EOF && c != '\n') {
    57 putchar(c);
    58 c = fgetc(infile);
    59 }
    60 return c;
    61 }
    62 static char *stringBuffer;
    63 static char *stringBufferPointer;
    64 static char *stringBufferEnd;
    65 static size_t stringBufferSize;
    66 static unsigned stringCount;
    67
    68 #define BUFFER_SIZE 1024
    69
    70 static void OutputStrings(void)
    71 {
    72 char *p = stringBuffer,*strPtr;
    73 printf("\nstatic char *StringTable[]={\n");
    74 while (*p) {
    75 printf("\t\"%s\",\n",p);
    76 p += strlen(p)+1;
    77 }
    78 printf("\tNULL\n};\n");
    79 free(stringBuffer);
    80 stringBuffer = NULL;
    81 }
    82 static void PutCharInBuffer(int c)
    83 {
    84 if (stringBufferPointer == stringBufferEnd) {
    85 size_t newSize = stringBufferSize + BUFFER_SIZE;
    86 char *tmp = realloc(stringBuffer,newSize);
    87 if (tmp == NULL) {
    88 fprintf(stderr,"Memory exhausted\n");
    89 exit(EXIT_FAILURE);
    90 }
    91 stringBuffer = tmp;
    92 stringBufferPointer = tmp+stringBufferSize;
    93 stringBufferSize += BUFFER_SIZE;
    94 stringBufferEnd = tmp + stringBufferSize;
    95 }
    96 *stringBufferPointer++ = c;
    97 }
    98
    99 static int ReadString(FILE *infile)
    100 {
    101 int c;
    102 if (stringBuffer == NULL) {
    103 stringBuffer = malloc(BUFFER_SIZE);
    104 if (stringBuffer == NULL)
    105 return EOF;
    106 stringBufferPointer = stringBuffer;
    107 stringBufferEnd = stringBufferPointer+BUFFER_SIZE;
    108 stringBufferSize = BUFFER_SIZE;
    109 }
    110 c = fgetc(infile);
    111 while (c != EOF && c != '"') {
    112 PutCharInBuffer(c);
    113 if (c == '\\') {
    114 c = fgetc(infile);
    115 if (c != '\n')
    116 PutCharInBuffer(c);
    117 }
    118 c = fgetc(infile);
    119 }
    120 if (c == EOF)
    121 return EOF;
    122 PutCharInBuffer(0);
    123 printf("StringTable[%d]",stringCount);
    124 stringCount++;
    125 return fgetc(infile);
    126 }
    127
    128 static int ProcessChar(int c,FILE *infile)
    129 {
    130 switch (c) {
    131 case '\'':
    132 c = ReadCharConstant(infile);
    133 break;
    134 case '"':
    135 c = ReadString(infile);
    136 break;
    137 case '/':
    138 c = fgetc(infile);
    139 if (c == '*')
    140 c = ReadLongComment(infile);
    141 else if (c == '/')
    142 c = ReadLineComment(infile);
    143 else {
    144 putchar(c);
    145 c = fgetc(infile);
    146 }
    147 break;
    148 case '#':
    149 putchar(c);
    150 while (c != EOF && c != '\n') {
    151 c = fgetc(infile);
    152 putchar(c);
    153 }
    154 if (c == '\n')
    155 c=fgetc(infile);
    156 break;
    157 default:
    158 putchar(c);
    159 c = fgetc(infile);
    160 break;
    161 }
    162 return c;
    163 }
    164 int main(int argc,char *argv[])
    165 {
    166 FILE *infile;
    167
    168 if (argc < 2) {
    169 fprintf(stderr,"Usage: strings <file name>\n");
    170 return EXIT_FAILURE;
    171 }
    172 if (!strcmp(argv[1],"-")) {
    173 infile = stdin;
    174 } else {
    175 infile = fopen(argv[1],"r");
    176 if (infile == NULL) {
    177 fprintf(stderr,"Can't open %s\n",argv[1]);
    178 return EXIT_FAILURE;
    179 }
    180 }
    181 int c = fgetc(infile);
    182 printf("static char *StringTable[];\n");
    183 while (c != EOF) {
    184 c = ProcessChar(c,infile);
    185 }
    186 PutCharInBuffer(0);
    187 PutCharInBuffer(0);
    188 OutputStrings();
    189 }


    The general structure of this program is simple. We
    o Open the given file to process
    o We process each character
    o We are interested only in the following tokens:
    Char constants
    Comments
    Character strings
    Preprocessor directives

    Why those?
    Char constants could contain double quotes, what would lead the other
    parts of our programs to see strings where there aren't any. For instance:

    case'"':

    would be misunderstood as the start of a never ending string.

    Comments are necessary since we should not process strings in comments.
    Preprocessor directives should be ignored since we do NOT want to
    translate

    #include "myfile.h"

    Our string parsing routine stores the contents of each string in a buffer
    that is grown if needed, printing into standard output only the

    StringTable[x]

    instead of the stored string. Each string is finished with a zero, and
    after the last string we store additional zeroes to mark the end of
    the buffer.

    After the whole file is processed we write the contents of the buffer
    in the output (written to stdout) and that was it. We have extracted
    the strings into a table.

    Analysis
    -------
    Our program seems to work, but there are several corner cases that
    it doesn't handle at all.

    For instance it is legal in C to write:

    "String1" "String2"

    and this will be understood as

    "String1String2"

    by the compiler. Our translation amkes this into:

    StringTable[0] StringTable[1]

    what is a syntax error.

    Another weak point is that a string can be present several times in our
    table
    since we do not check if the string is present before storing it in our
    table.

    And there are many corner cases that are just ignored. For instance you can
    continue a single line comment with a backslash, a very bad idea of course
    but a legal one. We do not follow comments like these:

    // This is a comment \
    and this line is a comment too

    And (due to low level of testing) there could be a lot of hidden bugs in it.

    But this should be a simple utility to quickly extract the strings from a
    file without too much manual work. We know we do not use the features it
    deosn't support, and it will serve our purposes well.

    What is important to know is that there is always a point where we stop
    developing and decided that we will pass to another thing. Either because
    we get fed up or because our boss tell us that we should do xxx instead of
    continuing the development of an internal utility.

    In this case we stop the first development now. See the exercises for
    the many ways as to how we could improve this simple program.

    Exercises:

    1: This filter can read from stdin and write to stdout. Add a command line
    option to specify the name of an output file. How many changes you would
    need to do in the code to implement that?

    2: The program can store a string several times. What would be needed to
    avoid that? What data structure would you recommend?

    3: Implement the concatenation of strings, i.e.

    "String1" "String2" --> "String1String2"

    4: Seeing in the code

    printf(StringTable[21]);

    is not very easy to follow. Implement the change so that we would have
    instead in the output:

    // StringTable[21]--> "Please enter the file name"
    printf(StringTable[21]);

    i.e. each line would be preceeded with one or several comment lines that
    describe the strings being used.

    5: Add an option so that the name of the string table can be changed from
    "StringTable" to some other name. The reason is that a user complained
    that the "new" string table destroyed her program: she had a
    "StringTable"
    variable in her program!
    How could you do this change automatically?

    6: The program needs to be part of an IDE where the IDE will need to
    call the program as a routine (not as an independent program).
    What would be needed to do that? What do you think about the global
    variables used in the original program?
     
    jacob navia, Sep 26, 2011
    #1
    1. Advertising

  2. In article <j5pk64$vsu$>,
    jacob navia <> wrote:

    > A widely used application of text files are program files. Most
    > programming languages (and here C is not an exception) store the
    > program in text format.


    I'm not sure who your target audience is, but I find this to be less
    precise than I believe it should be. I think most people, when they
    think of programs, are thinking of the executable binaries, not the
    source code. While some languages, like BASIC, and other interpreted
    languages do indeed save their programs as text files, C and most others
    do not.
     
    Mark Storkamp, Sep 26, 2011
    #2
    1. Advertising

  3. On Mon, 26 Sep 2011 12:29:26 +0200, jacob navia wrote:

    > Following the series about commenting code, here is an installment about
    > text processing.
    >
    > What do you think?
    >
    > Thanks for your attention.

    <snip>

    Without actually delving into the code, i notice the attempt is pretty much
    useless since many, nay, most languages do limit themselves to the ascii-
    standard. Even for an example program, this is a serious omission if you're
    claiming to do machine translations.

    -------------------------------------------------------------------------------
    _______________________________________
    / Yow! Did something bad happen or am I \
    \ in a drive-in movie?? /
    ---------------------------------------
    \
    \
    ___
    {~._.~}
    ( Y )
    ()~*~()
    (_)-(_)
    -------------------------------------------------------------------------------
     
    Kleuskes & Moos, Sep 26, 2011
    #3
  4. jacob navia

    BartC Guest

    "Mark Storkamp" <> wrote in message
    news:-september.org...
    > In article <j5pk64$vsu$>,
    > jacob navia <> wrote:
    >
    >> A widely used application of text files are program files. Most
    >> programming languages (and here C is not an exception) store the
    >> program in text format.

    >
    > I'm not sure who your target audience is, but I find this to be less
    > precise than I believe it should be. I think most people, when they
    > think of programs, are thinking of the executable binaries, not the
    > source code. While some languages, like BASIC, and other interpreted
    > languages do indeed save their programs as text files, C and most others
    > do not.


    I think he means program source code.

    --
    Bartc
     
    BartC, Sep 26, 2011
    #4
  5. jacob navia

    jacob navia Guest

    Le 26/09/11 16:47, BartC a écrit :
    >
    >
    > "Mark Storkamp" <> wrote in message
    > news:-september.org...
    >> In article <j5pk64$vsu$>,
    >> jacob navia <> wrote:
    >>
    >>> A widely used application of text files are program files. Most
    >>> programming languages (and here C is not an exception) store the
    >>> program in text format.

    >>
    >> I'm not sure who your target audience is, but I find this to be less
    >> precise than I believe it should be. I think most people, when they
    >> think of programs, are thinking of the executable binaries, not the
    >> source code. While some languages, like BASIC, and other interpreted
    >> languages do indeed save their programs as text files, C and most others
    >> do not.

    >
    > I think he means program source code.
    >


    Well, just a summary reading of the article would let you know
    that I am aiming at the C source code text...
     
    jacob navia, Sep 26, 2011
    #5
  6. jacob navia

    jacob navia Guest

    Le 26/09/11 15:34, Mark Storkamp a écrit :
    > In article<j5pk64$vsu$>,
    > jacob navia<> wrote:
    >
    >> A widely used application of text files are program files. Most
    >> programming languages (and here C is not an exception) store the
    >> program in text format.

    >
    > I'm not sure who your target audience is, but I find this to be less
    > precise than I believe it should be. I think most people, when they
    > think of programs, are thinking of the executable binaries, not the
    > source code. While some languages, like BASIC, and other interpreted
    > languages do indeed save their programs as text files, C and most others
    > do not.


    Yes, I will add source code explicitely.

    Thanks
     
    jacob navia, Sep 26, 2011
    #6
  7. jacob navia

    jacob navia Guest

    Le 26/09/11 14:18, Ben Bacarisse a écrit :
    > jacob navia<> writes:
    >
    > <snip>
    >> What do you think?

    > <snip>
    >> 3 #include<strings.h>

    >
    > Type: string.h
    >
    > <snip>
    >> Analysis
    >> -------
    >> Our program seems to work, but there are several corner cases that
    >> it doesn't handle at all.

    >
    > There are several other than just adjacent string literals:
    >
    > 1. wide strings


    True, they are not supported. I will mention that.

    > 2. strings used in initialisers (two cases: char s[] = "a"; and
    > char *s = "a"; at file scope)


    The second case would work

    char *a = StringTable[12];

    The first not. I will mention that. The solution would be to do this
    only in functions.

    > 3. escaped newlines


    They are supported (modulo bugs...)

    > 4. _Pragma("abc")


    Completely forgot. Thanks


    >
    > It's probably reasonable to ignore trigraphs and digraphs (maybe even
    > _Pragma) but the others are significant, I think.
    >


    You are right

    > You also have two bugs. Empty strings don't work (do you test? that was
    > my second test case) to the extent that they can cause the resulting
    > program to access beyond the end of the string table.


    I will look at it. I missed that case. I tested with all
    the source of the container library...


    > And, second, the
    > output does not compile because the initial
    >
    > static char *StringTable[];
    >
    > is a tentative definition of an object with incomplete type (that's a
    > constraint violation).
    >


    The output WILL compile under gcc/Macintosh OSX.

    What system are you using.

    > From a high-level design point of view, printf strings present a
    > problem. The order in which components are printed may have to change
    > from language to language, but printf's arguments are in a fixed order.
    >
    > <snip>



    Thanks Ben
     
    jacob navia, Sep 26, 2011
    #7
  8. jacob navia

    BartC Guest

    "Ben Bacarisse" <> wrote in message
    news:...
    > jacob navia <> writes:


    >> Our program seems to work, but there are several corner cases that
    >> it doesn't handle at all.

    >
    > There are several other than just adjacent string literals:


    And fopen("file") perhaps. Maybe other cases where you don't want to
    translate.

    > From a high-level design point of view, printf strings present a
    > problem. The order in which components are printed may have to change
    > from language to language, but printf's arguments are in a fixed order.


    That's one problem. My experience is that translation of messages in a
    program is always a lot trickier than it seems at first.

    And the approach is simplistic. For example, when the same string occurs in
    several places, is it repeated in the string table? That will be more work
    to translate. But then identical strings can also have different
    translations depending on context. Or abbreviations such as "R", (meaning
    Red), or "s" to be added to make a plural, which in isolation in a table
    will be quite puzzling!

    But I got the impression this is just a programming tutorial. Coders who are
    at this level probably will not have internationalisation of their programs
    as a priority!

    --
    Bartc
     
    BartC, Sep 26, 2011
    #8
  9. jacob navia

    Nobody Guest

    On Mon, 26 Sep 2011 08:34:03 -0500, Mark Storkamp wrote:

    >> A widely used application of text files are program files. Most
    >> programming languages (and here C is not an exception) store the
    >> program in text format.

    >
    > I'm not sure who your target audience is, but I find this to be less
    > precise than I believe it should be. I think most people, when they
    > think of programs, are thinking of the executable binaries, not the
    > source code. While some languages, like BASIC, and other interpreted
    > languages do indeed save their programs as text files, C and most others
    > do not.


    Another counter-example is that many older dialects of BASIC *didn't*
    store their programs as text files, but used a tokenised format for the
    sake of space and performance.
     
    Nobody, Sep 26, 2011
    #9
  10. jacob navia <> writes:

    > Le 26/09/11 14:18, Ben Bacarisse a écrit :
    >> jacob navia<> writes:
    >>
    >> <snip>
    >>> What do you think?

    >> <snip>
    >>> 3 #include<strings.h>

    >>
    >> Type: string.h
    >>
    >> <snip>
    >>> Analysis
    >>> -------
    >>> Our program seems to work, but there are several corner cases that
    >>> it doesn't handle at all.

    >>
    >> There are several other than just adjacent string literals:
    >>
    >> 1. wide strings

    >
    > True, they are not supported. I will mention that.
    >
    >> 2. strings used in initialisers (two cases: char s[] = "a"; and
    >> char *s = "a"; at file scope)

    >
    > The second case would work
    >
    > char *a = StringTable[12];


    I don't think so. Does it not contravene 6.6 p9?

    > The first not. I will mention that. The solution would be to do this
    > only in functions.


    That's a solution for the second case not the first -- and you don't
    think there's a problem with it.

    >> 3. escaped newlines

    >
    > They are supported (modulo bugs...)


    OK, then I found bugs in every case I tried. Maybe we are talking at
    cross-purposes. I mean a \ at the end of a source line.

    >> 4. _Pragma("abc")

    >
    > Completely forgot. Thanks
    >
    >>
    >> It's probably reasonable to ignore trigraphs and digraphs (maybe even
    >> _Pragma) but the others are significant, I think.
    >>

    >
    > You are right
    >
    >> You also have two bugs. Empty strings don't work (do you test? that was
    >> my second test case) to the extent that they can cause the resulting
    >> program to access beyond the end of the string table.

    >
    > I will look at it. I missed that case. I tested with all
    > the source of the container library...
    >
    >
    >> And, second, the
    >> output does not compile because the initial
    >>
    >> static char *StringTable[];
    >>
    >> is a tentative definition of an object with incomplete type (that's a
    >> constraint violation).
    >>

    >
    > The output WILL compile under gcc/Macintosh OSX.
    >
    > What system are you using.


    gcc 4.5.2 but I am not sure that really matters. Is what I said not
    right about it being a tentative definition of an object with incomplete
    type?

    <snip>
    --
    Ben.
     
    Ben Bacarisse, Sep 26, 2011
    #10
  11. jacob navia

    Ben Pfaff Guest

    jacob navia <> writes:

    > Following the series about commenting code, here is an installment about
    > text processing.
    >
    > What do you think?


    There is existing software that does a pretty good job with
    string translations (e.g. GNU gettext). I don't know whether you
    are actually writing new software that also handles this, or
    whether you are just pointing out a way that it can be done to
    people new to the topic. If it's the former, it seems somewhat
    wasteful (what's inadequate about current attempts?); if it's the
    latter, that makes some sense to me.
    --
    Ben Pfaff
    http://benpfaff.org
     
    Ben Pfaff, Sep 26, 2011
    #11
  12. jacob navia

    jacob navia Guest

    Le 26/09/11 20:11, Ben Pfaff a écrit :
    > jacob navia<> writes:
    >
    >> Following the series about commenting code, here is an installment about
    >> text processing.
    >>
    >> What do you think?

    >
    > There is existing software that does a pretty good job with
    > string translations (e.g. GNU gettext). I don't know whether you
    > are actually writing new software that also handles this, or
    > whether you are just pointing out a way that it can be done to
    > people new to the topic. If it's the former, it seems somewhat
    > wasteful (what's inadequate about current attempts?); if it's the
    > latter, that makes some sense to me.


    I have written in the context of my IDE a better software than that.
    The objective here is to present simple code that does some
    filtering, and discuss its problems and ways to improve it an hand
    of actual code.

    This improves the S/N ration of this group.
     
    jacob navia, Sep 26, 2011
    #12
  13. On Sep 26, 2:18 pm, Ben Bacarisse <> wrote:
    > And, second, the
    > output does not compile because the initial
    >
    >   static char *StringTable[];
    >
    > is a tentative definition of an object with incomplete type (that's a
    > constraint violation).


    6.9.2p3 says the declared type of a tentative definition with internal
    linkage must not be an incomplete type, but it isn't a constraint,
    which matters because it means compilers are not required to issue any
    diagnostics. And I wonder if that is really meant to apply to the
    declared type, rather than the composite type for the final implicit
    definition mentioned in p2. Compilers are already required to accept
    "int array[]; int array[20] = {1};" -- without the static keyword --
    and they would surely need to treat the static keyword specially to
    reject it if present.
     
    Harald van Dijk, Sep 26, 2011
    #13
  14. Kleuskes & Moos <> writes:
    [...]
    > Without actually delving into the code, i notice the attempt is pretty much
    > useless since many, nay, most languages do limit themselves to the ascii-
    > standard.

    [...]

    I think you accidentally a word there.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Sep 26, 2011
    #14
  15. On Sep 26, 9:36 pm, Harald van Dijk <> wrote:
    > On Sep 26, 2:18 pm, Ben Bacarisse <> wrote:
    >
    > > And, second, the
    > > output does not compile because the initial

    >
    > >   static char *StringTable[];

    >
    > > is a tentative definition of an object with incomplete type (that's a
    > > constraint violation).

    >
    > 6.9.2p3 says the declared type of a tentative definition with internal
    > linkage must not be an incomplete type, but it isn't a constraint,
    > which matters because it means compilers are not required to issue any
    > diagnostics. And I wonder if that is really meant to apply to the
    > declared type, rather than the composite type for the final implicit
    > definition mentioned in p2. Compilers are already required to accept
    > "int array[]; int array[20] = {1};" -- without the static keyword --
    > and they would surely need to treat the static keyword specially to
    > reject it if present.


    I feel I should add that the standard does clearly and unambiguously
    disallow this, so regardless of the intent, programs should avoid this
    construct.
     
    Harald van Dijk, Sep 26, 2011
    #15
  16. jacob navia

    jacob navia Guest

    Le 26/09/11 22:33, Harald van Dijk a écrit :
    > On Sep 26, 9:36 pm, Harald van Dijk<> wrote:
    >> On Sep 26, 2:18 pm, Ben Bacarisse<> wrote:
    >>
    >>> And, second, the
    >>> output does not compile because the initial

    >>
    >>> static char *StringTable[];

    >>
    >>> is a tentative definition of an object with incomplete type (that's a
    >>> constraint violation).

    >>
    >> 6.9.2p3 says the declared type of a tentative definition with internal
    >> linkage must not be an incomplete type, but it isn't a constraint,
    >> which matters because it means compilers are not required to issue any
    >> diagnostics. And I wonder if that is really meant to apply to the
    >> declared type, rather than the composite type for the final implicit
    >> definition mentioned in p2. Compilers are already required to accept
    >> "int array[]; int array[20] = {1};" -- without the static keyword --
    >> and they would surely need to treat the static keyword specially to
    >> reject it if present.

    >
    > I feel I should add that the standard does clearly and unambiguously
    > disallow this, so regardless of the intent, programs should avoid this
    > construct.


    You are right.

    One way of getting rid of the problem is to avoid the static qualifier
    but that would mean surely a link error when used in many files.

    Since I write into stdout there isn't the possibility of rewinding to
    add the size of the table after we know how many strings there are.

    Mmm, there must be a solution but it doesn't come immediately. I think
    I have to sleep, I had a hell of a day today at the job.

    jacob
     
    jacob navia, Sep 26, 2011
    #16
  17. Harald van Dijk <> writes:

    > On Sep 26, 2:18 pm, Ben Bacarisse <> wrote:
    >> And, second, the
    >> output does not compile because the initial
    >>
    >>   static char *StringTable[];
    >>
    >> is a tentative definition of an object with incomplete type (that's a
    >> constraint violation).

    >
    > 6.9.2p3 says the declared type of a tentative definition with internal
    > linkage must not be an incomplete type, but it isn't a constraint,
    > which matters because it means compilers are not required to issue any
    > diagnostics. And I wonder if that is really meant to apply to the
    > declared type, rather than the composite type for the final implicit
    > definition mentioned in p2. Compilers are already required to accept
    > "int array[]; int array[20] = {1};" -- without the static keyword --
    > and they would surely need to treat the static keyword specially to
    > reject it if present.


    Yes, it's very odd. I assume there is some advantage in knowing the
    complete type of objects with internal linkage at the get go. I can't
    think of one, though.

    However, I don't think 6.9.2p3 makes much sense if the final composite
    type is assumed to be the intended meaning because, I don't think the
    final composite type *can* be incomplete? For example, a translation
    union with nothing other than

    int array[];

    is fine and causes the type of array to be int [1] by at the end.

    --
    Ben.
     
    Ben Bacarisse, Sep 26, 2011
    #17
  18. On Sep 26, 11:58 pm, Ben Bacarisse <> wrote:
    > However, I don't think 6.9.2p3 makes much sense if the final composite
    > type is assumed to be the intended meaning because, I don't think the
    > final composite type *can* be incomplete?  For example, a translation
    > union with nothing other than
    >
    >   int array[];
    >
    > is fine and causes the type of array to be int [1] by at the end.


    I was thinking of incomplete structure and union types.
     
    Harald van Dijk, Sep 26, 2011
    #18
  19. Harald van Dijk <> writes:

    > On Sep 26, 11:58 pm, Ben Bacarisse <> wrote:
    >> However, I don't think 6.9.2p3 makes much sense if the final composite
    >> type is assumed to be the intended meaning because, I don't think the
    >> final composite type *can* be incomplete?  For example, a translation
    >> union with nothing other than
    >>
    >>   int array[];
    >>
    >> is fine and causes the type of array to be int [1] by at the end.

    >
    > I was thinking of incomplete structure and union types.


    I'd considered that and ruled them out. Composite types must be
    compatible, and an incomplete struct type is not compatible with a
    complete one declared in the same translation unit (I think!).

    --
    Ben.
     
    Ben Bacarisse, Sep 27, 2011
    #19
  20. On Sep 27, 2:41 am, Ben Bacarisse <> wrote:
    > Harald van Dijk <> writes:
    > > On Sep 26, 11:58 pm, Ben Bacarisse <> wrote:
    > >> However, I don't think 6.9.2p3 makes much sense if the final composite
    > >> type is assumed to be the intended meaning because, I don't think the
    > >> final composite type *can* be incomplete?  For example, a translation
    > >> union with nothing other than

    >
    > >>   int array[];

    >
    > >> is fine and causes the type of array to be int [1] by at the end.

    >
    > > I was thinking of incomplete structure and union types.

    >
    > I'd considered that and ruled them out.  Composite types must be
    > compatible, and an incomplete struct type is not compatible with a
    > complete one declared in the same translation unit (I think!).


    An incomplete struct type is completed by a struct definition with the
    same tag in the same scope, see 6.7.2.3p4 for the official wording and
    p12 for an example.

    struct X x; /* tentative definition with incomplete type */
    struct X { int a; }; /* completed here */
     
    Harald van Dijk, Sep 27, 2011
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jason Heyes
    Replies:
    4
    Views:
    382
    Karl Heinz Buchegger
    Mar 24, 2005
  2. Joe Francia
    Replies:
    0
    Views:
    302
    Joe Francia
    Jul 8, 2003
  3. phil hunt

    Text-to-HTML processing program

    phil hunt, Jan 3, 2004, in forum: Python
    Replies:
    11
    Views:
    591
    Reinier Post
    Jan 8, 2004
  4. Michael Ellis

    Cleaner idiom for text processing?

    Michael Ellis, May 26, 2004, in forum: Python
    Replies:
    16
    Views:
    490
    Peter Otten
    May 27, 2004
  5. Hubert Hung-Hsien Chang
    Replies:
    2
    Views:
    516
    Michael Foord
    Sep 17, 2004
Loading...

Share This Page