Character Array vs String

Discussion in 'C Programming' started by Sheky, Nov 9, 2011.

  1. Sheky

    Sheky Guest

    Could anybody please mention difference between character array and
    string in C?
     
    Sheky, Nov 9, 2011
    #1
    1. Advertising

  2. On Nov 9, 1:09 pm, Sheky <> wrote:
    > Could anybody please mention difference between character array and
    > string in C?
    >

    There's not much difference. In C, a string is an array of characters
    with a terminating 0, or null character.

    There's special syntax for specifying a string, which is the
    conventional quotes. "Fred" creates a constant character array, 5
    characters long, with a terminal null. We call that a "strign
    literal". char name[5] = {'F', 'r', 'e', 'd', '\0'}; does exactly the
    same thing, but the array is writeable.

    Occasionally you might need a character array without a terminating
    null. This wouldn't be a string in C terms, and will probably cause a
    crash if you feed it to a string function like strcpy(). Usually it's
    better to put on the null even if you don't use it, because it's only
    one byte, and it means you can print the string out with no trouble if
    you need to do so for some reason. The other situation is where you
    have several strings in the same array, with nulls between them. name
    = e.g. char *name = "Fred\0Bloggs\0\0". A few Microsoft Windows
    functions like to receive lists of strings like this. They use a
    sequence of two nulls to indicate the end of the list.

    --
    Visit my website. Nice program to demonstrate entropy and the
    hydrophobic effect.
    http://www,malcolmmclean.site11.com/www
     
    Malcolm McLean, Nov 9, 2011
    #2
    1. Advertising

  3. Sheky

    James Kuyper Guest

    On 11/09/2011 06:09 AM, Sheky wrote:
    > Could anybody please mention difference between character array and
    > string in C?


    A character array is an array of objects of character type, either char,
    signed char, unsigned char, or wchar_t. It remains an array, regardless
    of what's stored in those objects.

    A string is a data structure that could, among other things, be stored
    in a character array. "A string is a contiguous sequence of characters
    terminated by and including the first null character." (7.1.1p1).

    Example:

    char array[] = "One\0Two";

    Every single character in that array can be treated as the first
    character of a different string; most of those strings overlap each
    other. For instance, array+3 points at the string "", which is empty
    except for the terminating null character. array+5 points at the string
    "wo",

    --
    James Kuyper
     
    James Kuyper, Nov 9, 2011
    #3
  4. On 2011-11-09, Sheky <> wrote:
    > Could anybody please mention difference between character array and
    > string in C?


    To my knowledge, C does not have strings.

    --
    qcar
     
    Quentin Carbonneaux, Nov 9, 2011
    #4
  5. Sheky

    Eric Sosman Guest

    On 11/9/2011 6:09 AM, Sheky wrote:
    > Could anybody please mention difference between character array and
    > string in C?


    A character array is an array whose individual elements are
    characters: `char this[42]', for example. (The term "character"
    is ambiguous, since C has four types that could claim the name:
    `char', `signed char', `unsigned char', and `wchar_t'. In informal
    use "character" usually means `char', but be aware that the term
    is not quite so specific and misunderstanding may occur.) Anyhow:
    A character array is an array of "character" elements, just as an
    `int' array is an array of `int' elements.

    A string is a particular data structure that can be stored in
    a character array. It consists of some sequence of "payload"
    characters plus a special "sentinel" character to mark the end of
    the string. The sentinel character has the numeric value zero, and
    no payload character has that value. It is possible to have no payload
    characters at all (the sentinel will be in the array's first position),
    and this represents the empty string. (Note: The terms "payload" and
    "sentinel" are not formally defined by C; they're just my attempt to
    name and explain the parts of a string.)

    Example:

    char message[10];
    strcpy(message, "Hello");

    `message' is a character array: A ten-element array where each element
    is a character. After strcpy(), a string of six characters (five
    payload and one sentinel) inhabits the first six positions of the
    array. The remaining four characters are still part of the array,
    but are not part of the string.


    <---------------------- array ------------------------------>

    [0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
    +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
    message: | 'H' | 'e' | 'l' | 'l' | 'o' | 0 | ? | ? | ? | ? |
    +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
    sen-
    <---------- payload ----------> ti-
    nel

    <------------- string ------------->

    --
    Eric Sosman
    d
     
    Eric Sosman, Nov 9, 2011
    #5
  6. For unicode support in C is documented in C99. In C89 the 8-bit ANSI char string was assumed and the termination of a string is a 0 stored in the of a string.

    In some encoding C-code compiled to deal with ANSI C string might fail.
    This was what I tested long time ago.

    In programming languages length of a string might be stored explicitly in 2 to 4 bytes. Be careful of the length of two bytes stored that will cause trouble.

    I think that the wchar that could support unicode can enhance the PC platform over the unix-lynux platform that might not support unicode in non-English users.
     
    88888 Dihedral, Nov 9, 2011
    #6
  7. Sheky

    James Kuyper Guest

    On 11/09/2011 07:01 AM, Quentin Carbonneaux wrote:
    > On 2011-11-09, Sheky <> wrote:
    >> Could anybody please mention difference between character array and
    >> string in C?

    >
    > To my knowledge, C does not have strings.


    See 7.1.1p1, which I cited in my own response to Quentin.

    C doesn't have a string type, but that's a very different question.
    --
    James Kuyper
     
    James Kuyper, Nov 9, 2011
    #7
  8. On Nov 9, 2:01 pm, Quentin Carbonneaux <> wrote:
    > On 2011-11-09, Sheky <> wrote:
    >
    > > Could anybody please mention difference between character array and
    > > string in C?

    >
    > To my knowledge, C does not have strings.
    >

    It has syntactic support for strings, as string literals. It also has
    functions in the standard library that operate on strings. What it
    doesn't have is a string type variable. Strings are passed about as
    character pointers.
    --
    MiniBasic - a simple Basic interpreter with good string support
    http://www.malcolmmclean.site11.com/www
     
    Malcolm McLean, Nov 9, 2011
    #8
  9. Sheky

    BartC Guest

    "Sheky" <> wrote in message
    news:...
    > Could anybody please mention difference between character array and
    > string in C?


    As I understand it, a character array can contain a string, amongst other
    things.

    So in a ten-character array, if the first three characters are 'A', 'B' and
    'C', and the fourth character is zero, then it contains the 3-character
    string "ABC" at the beginning (and the other 6 characters can contain
    anything).

    But the more you look into this, the more complicated it gets (so that same
    array also contains the string "BC", and the last 5 characters form a
    5-character array, which may or may not contain a string of it's own, and so
    on). So don't worry about it too deeply.

    --
    Bartc
     
    BartC, Nov 9, 2011
    #9
  10. CONTEXT!!
    Leave some in so we know what you are responding to.

    On Nov 9, 12:48 pm, 88888 Dihedral <>
    wrote:

    > For unicode  support in C is documented in C99. In C89 the 8-bit ANSI char


    the character code you are trying to refer to is "ASCII" not "ANSI"

    > string was assumed


    no it wasn't. K&R (pre-standard C) may have done this (but I'm not
    convinced even K&R was locked to ASCII). C89 went to a little trouble
    to make it char set independent. No reason why a conforming C89
    implementaions could not use EBCDIC (a rather nasty IBM character
    code). And such things exist.

    > and the termination of a string  is a 0 stored [at] the [end] of a string.


    yes, the standard insists on this

    > In some encoding C-code compiled to deal with ANSI C string might fail.


    no. All conforming implementations must use zero to terminate a
    string.
    Badly written C programs might assume a particular character encoding.
    But it isn't hard to write programs that are character encoding
    neutral.

    > This was what I tested long time ago.


    anything that failed was either not a C compiler or your test progarm
    was broken.

    > In


    --some--

    > programming languages length of a string might be stored explicitly in 2 to 4 bytes. Be careful of the length of two bytes stored that will cause trouble.


    Pascal did this typically. C++ std::string probably does it. There are
    a zillion string libraries out there that do it. They will only cause
    problems if you pass a non-C string to code taht is expecting a C-
    string.

    > I think that the wchar that could support unicode can enhance the PC platform over the unix-lynux platform that might  not support unicode in non-English users.


    I'm pretty sure all main-stream OS's (Win, Linux, MacOS) support
    unicode already. Hence no enhancement necessary.
     
    Nick Keighley, Nov 9, 2011
    #10
  11. On 2011-11-09, James Kuyper <> wrote:
    > On 11/09/2011 07:01 AM, Quentin Carbonneaux wrote:
    >> On 2011-11-09, Sheky <> wrote:
    >>> Could anybody please mention difference between character array and
    >>> string in C?

    >>
    >> To my knowledge, C does not have strings.

    >
    > See 7.1.1p1, which I cited in my own response to Quentin.


    I saw it.

    > C doesn't have a string type, but that's a very different question.


    My answer was a bit misleading... But, as you guessed it I tried to state that
    C does not have a string type (I did not think of a string as a data structure,
    which it is).

    Thanks for making it clear.

    --
    qcar
     
    Quentin Carbonneaux, Nov 9, 2011
    #11
  12. Sheky

    osmium Guest

    "Nick Keighley" wrote:

    On Nov 9, 12:48 pm, 88888 Dihedral <>
    wrote:

    > For unicode support in C is documented in C99. In C89 the 8-bit ANSI char


    >the character code you are trying to refer to is "ASCII" not "ANSI"


    Furthermore, ASCII is a 7-bit code, not 8. It is usually extended in some
    fashion to become eight bits in actual use as opposed to an abstraction.
    ANSI is the organization that "blesses" some stuff for the USA.

    ANSI - American National Standards Institute.

    ASCII - American Standard Code for Information Interchange.
     
    osmium, Nov 9, 2011
    #12
  13. On Nov 9, 3:53 pm, Nick Keighley <>
    wrote:
    >
    > no. All conforming implementations must use zero to terminate a
    > string.
    > Badly written C programs might assume a particular character encoding.
    > But it isn't hard to write programs that are character encoding
    > neutral.
    >

    Sometimes it's harder than it looks.

    For instance IFF files have 4-letter ASCII tags which indicate what
    sort of "chunk" you are reading. So the obvious thing to write is

    fread(chunk, 1, 4, fp);
    if(!strncmp(chunk, "DATA", 4))
    /* we've got a data chunk */

    That will break on a non-ascii system. The solution is to hardcode the
    values. But then you can no longer read the word "DATA" and it becomes
    a lot harder to see that the chunk identifier is correct.

    --
    MiniBasic - a simple script interpreter
    http://www.malcolmmclean.site11.com/www
     
    Malcolm McLean, Nov 9, 2011
    #13
  14. Sheky

    James Kuyper Guest

    On 11/09/2011 10:48 AM, Malcolm McLean wrote:
    ....
    > For instance IFF files have 4-letter ASCII tags which indicate what
    > sort of "chunk" you are reading. So the obvious thing to write is
    >
    > fread(chunk, 1, 4, fp);
    > if(!strncmp(chunk, "DATA", 4))
    > /* we've got a data chunk */
    >
    > That will break on a non-ascii system. The solution is to hardcode the
    > values. But then you can no longer read the word "DATA" and it becomes
    > a lot harder to see that the chunk identifier is correct.


    You can make it a macro, whose name is more informative than the
    hardcoded values. However, the better solution (though not always
    feasible) is to convert those files from ASCII to the native encoding on
    that platform, as part of the process of porting them to that platform.
    If a C implementation uses a non-ascii encoding when targeting that
    platform, then it's likely to be the case that the local text oriented
    utilities (such as file editors or browsers) will do so, as well.
     
    James Kuyper, Nov 9, 2011
    #14
  15. Sheky

    Willem Guest

    James Kuyper wrote:
    ) On 11/09/2011 10:48 AM, Malcolm McLean wrote:
    ) ...
    )> For instance IFF files have 4-letter ASCII tags which indicate what
    )> sort of "chunk" you are reading. So the obvious thing to write is
    )>
    )> fread(chunk, 1, 4, fp);
    )> if(!strncmp(chunk, "DATA", 4))
    )> /* we've got a data chunk */
    )>
    )> That will break on a non-ascii system. The solution is to hardcode the
    )> values. But then you can no longer read the word "DATA" and it becomes
    )> a lot harder to see that the chunk identifier is correct.
    )
    ) You can make it a macro, whose name is more informative than the
    ) hardcoded values. However, the better solution (though not always
    ) feasible) is to convert those files from ASCII to the native encoding on
    ) that platform,

    They are not ASCII files. They are binary files with chunks that are
    identified by a 4-byte header which has meaning when read as ASCII.


    SaSW, Willem
    --
    Disclaimer: I am in no way responsible for any of the statements
    made in the above text. For all I know I might be
    drugged or something..
    No I'm not paranoid. You all think I'm paranoid, don't you !
    #EOT
     
    Willem, Nov 9, 2011
    #15
  16. Sheky

    James Kuyper Guest

    On 11/09/2011 11:24 AM, Willem wrote:
    > James Kuyper wrote:
    > ) On 11/09/2011 10:48 AM, Malcolm McLean wrote:
    > ) ...
    > )> For instance IFF files have 4-letter ASCII tags which indicate what
    > )> sort of "chunk" you are reading. So the obvious thing to write is
    > )>
    > )> fread(chunk, 1, 4, fp);
    > )> if(!strncmp(chunk, "DATA", 4))
    > )> /* we've got a data chunk */
    > )>
    > )> That will break on a non-ascii system. The solution is to hardcode the
    > )> values. But then you can no longer read the word "DATA" and it becomes
    > )> a lot harder to see that the chunk identifier is correct.
    > )
    > ) You can make it a macro, whose name is more informative than the
    > ) hardcoded values. However, the better solution (though not always
    > ) feasible) is to convert those files from ASCII to the native encoding on
    > ) that platform,
    >
    > They are not ASCII files. They are binary files with chunks that are
    > identified by a 4-byte header which has meaning when read as ASCII.


    That makes it harder; the conversion utility would have to know about
    the file format. It's still not impossible, but obviously far less
    convenient.
     
    James Kuyper, Nov 9, 2011
    #16
  17. Sheky

    BartC Guest

    "James Kuyper" <> wrote in message
    news:...
    > On 11/09/2011 11:24 AM, Willem wrote:
    >> James Kuyper wrote:
    >> ) On 11/09/2011 10:48 AM, Malcolm McLean wrote:
    >> ) ...
    >> )> For instance IFF files have 4-letter ASCII tags which indicate what
    >> )> sort of "chunk" you are reading. So the obvious thing to write is
    >> )>
    >> )> fread(chunk, 1, 4, fp);
    >> )> if(!strncmp(chunk, "DATA", 4))
    >> )> /* we've got a data chunk */
    >> )>
    >> )> That will break on a non-ascii system. The solution is to hardcode the
    >> )> values. But then you can no longer read the word "DATA" and it becomes
    >> )> a lot harder to see that the chunk identifier is correct.


    >> They are not ASCII files. They are binary files with chunks that are
    >> identified by a 4-byte header which has meaning when read as ASCII.

    >
    > That makes it harder; the conversion utility would have to know about
    > the file format. It's still not impossible, but obviously far less
    > convenient.


    You just use an a macro or function such as:

    if(!strncmp(chunk,ASCII("DATA"),4)

    That's if you're worried that your program might not work on a non-ASCII C
    system. On an ASCII one, then the function or macro will do nothing.

    --
    Bartc
     
    BartC, Nov 9, 2011
    #17
  18. On Nov 9, 6:30 pm, James Kuyper <> wrote:
    > On 11/09/2011 11:24 AM, Willem wrote:
    >
    > > They are not ASCII files.  They are binary files with chunks that are
    > > identified by a 4-byte header which has meaning when read as ASCII.

    >
    > That makes it harder; the conversion utility would have to know about
    > the file format. It's still not impossible, but obviously far less
    > convenient.
    >

    It would be easy enough to write such a utility for IFF files, because
    they have a structure whereby you have a "chunk" length, and
    identifier telling you what sort of chunk it is. So you can just skip
    through all the chunks, changing the identifier tags from ASCII to
    EBCDIC.

    But then you'd have two file formats, identical except for the tags,
    and the potential for extra costs and incompatibilities would be
    large. A bit like the decision to encode newline/carriage return as
    just a newline. It saved a byte, but to this day text files won't
    display properly on Windows as a result.
    --
    MiniBasic - a fully functional Basic interpreter, written in ANSI C.
    http://www.malcolmmclean.site11.com/www
     
    Malcolm McLean, Nov 9, 2011
    #18
  19. Sheky

    Willem Guest

    James Kuyper wrote:
    ) That makes it harder; the conversion utility would have to know about
    ) the file format. It's still not impossible, but obviously far less
    ) convenient.

    And it would likely go against the spec of the file format.


    SaSW, Willem
    --
    Disclaimer: I am in no way responsible for any of the statements
    made in the above text. For all I know I might be
    drugged or something..
    No I'm not paranoid. You all think I'm paranoid, don't you !
    #EOT
     
    Willem, Nov 9, 2011
    #19
  20. Sheky

    Tobias Blass Guest

    On 2011-11-09, Malcolm McLean <> wrote:
    > On Nov 9, 6:30 pm, James Kuyper <> wrote:
    >> On 11/09/2011 11:24 AM, Willem wrote:
    >>
    >> > They are not ASCII files.  They are binary files with chunks that are
    >> > identified by a 4-byte header which has meaning when read as ASCII.

    >>
    >> That makes it harder; the conversion utility would have to know about
    >> the file format. It's still not impossible, but obviously far less
    >> convenient.
    >>

    > It would be easy enough to write such a utility for IFF files, because
    > they have a structure whereby you have a "chunk" length, and
    > identifier telling you what sort of chunk it is. So you can just skip
    > through all the chunks, changing the identifier tags from ASCII to
    > EBCDIC.
    >


    Wouldn't it be easier to use a text file, so the program checks for
    "DATA" and you encode your file in EBDIC for EBDIC systems and in ASCII
    for ASCII systems... (if you can't change the file format, well I liked
    the function like macro idea elsewhere in this thread)
    > But then you'd have two file formats, identical except for the tags,
    > and the potential for extra costs and incompatibilities would be
    > large. A bit like the decision to encode newline/carriage return as
    > just a newline. It saved a byte, but to this day text files won't
    > display properly on Windows as a result.


    Well Windows developed after UNIX, so they could have adopted the \n
    encoding if they wanted to. You could as well reverse your argument and
    say "but to this day text files won't display properly on *NIX as a
    result" (most *NIX utilities can handle \r\n encodings, though). I also
    don't think \n was used to save a byte(CMIIW). \n is more "natural" (you
    want a newline, so you add a newline character), but \r\n is more natural
    if you are used to typewriters. Since typewriters are quite rare these
    days I think the *NIX way makes more sense, but YMMV.
     
    Tobias Blass, Nov 9, 2011
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Velvet
    Replies:
    9
    Views:
    14,850
    Joerg Jooss
    Jan 19, 2006
  2. Brand Bogard

    8 bit character string to 16 bit character string

    Brand Bogard, May 25, 2006, in forum: C Programming
    Replies:
    8
    Views:
    743
    those who know me have no need of my name
    May 28, 2006
  3. herman
    Replies:
    5
    Views:
    7,623
    =?ISO-8859-1?Q?Erik_Wikstr=F6m?=
    Aug 30, 2007
  4. PerlFAQ Server
    Replies:
    0
    Views:
    399
    PerlFAQ Server
    Jan 25, 2011
  5. Bart Vandewoestyne
    Replies:
    8
    Views:
    747
    Bart Vandewoestyne
    Sep 25, 2012
Loading...

Share This Page