Stroustrup section 1.5.4, word counting

Discussion in 'C Programming' started by arnuld, Mar 9, 2007.

  1. arnuld

    arnuld Guest

    this is an example programme that counts lines, words and characters.
    i have noticed one thing that this programme counts space, a newline
    and a tab as a character.

    i know:

    1. a newline is represented as '\n'
    2. a tab as '\t'
    3. a space as ' '

    what i want to know is whether a newline, a space and a tab are
    represented internally as characters ?

    i know everything is represented as machine's character set, most
    probably ASCII where 'A' is 65 but i am actually confused on this
    '\t', '\n' , ' ', and character issue.

    any help

    here is the code that counts characters,words,tabs and newlines:

    // word counting


    #include <stdio.h>

    #define IN 0
    #define OUT 1

    int main(void) {
    int c, nl, nw, nc, state;

    state = OUT;
    nl = nc = nw = 0;

    while((c = getchar()) != EOF)
    {
    ++nc;

    if (c == '\n')
    ++nl;

    if( c == ' ' || c == '\n' || c == '\t')
    state = OUT;

    else if (state == OUT)
    {
    state = IN;
    ++ nw;
    }
    }

    printf("%d NEWLINES \t %d WORDS \t %d CHARs \n", nl, nw, nc);

    return 0;
    }
    arnuld, Mar 9, 2007
    #1
    1. Advertising

  2. arnuld

    santosh Guest

    arnuld wrote:
    > this is an example programme that counts lines, words and characters.
    > i have noticed one thing that this programme counts space, a newline
    > and a tab as a character.
    >
    > i know:
    >
    > 1. a newline is represented as '\n'
    > 2. a tab as '\t'
    > 3. a space as ' '
    >
    > what i want to know is whether a newline, a space and a tab are
    > represented internally as characters ?


    It depends on the machine and it's character set.

    > i know everything is represented as machine's character set, most
    > probably ASCII where 'A' is 65 but i am actually confused on this
    > '\t', '\n' , ' ', and character issue.
    >
    > any help


    Generally end-of-line sequence is represented by one or two
    characters. Under UNIX it's a single linefeed character, while under
    DOS-like systems it's a carriage-return followed by a linefeed. MacOS
    used to use a single carriage-return. Doubtless other systems may use
    more variations.

    Spaces and tabs are usually represented by one character.

    > here is the code that counts characters,words,tabs and newlines:
    >
    > // word counting


    It's better to use /* ... */ style comments, especially when you're
    posting code onto Usenet.
    santosh, Mar 9, 2007
    #2
    1. Advertising

  3. "arnuld" <> writes:
    [snip]

    You mean K&R, not Stroustrup.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Mar 10, 2007
    #3
  4. "santosh" <> writes:
    > arnuld wrote:
    >> this is an example programme that counts lines, words and characters.
    >> i have noticed one thing that this programme counts space, a newline
    >> and a tab as a character.
    >>
    >> i know:
    >>
    >> 1. a newline is represented as '\n'
    >> 2. a tab as '\t'
    >> 3. a space as ' '
    >>
    >> what i want to know is whether a newline, a space and a tab are
    >> represented internally as characters ?

    >
    > It depends on the machine and it's character set.
    >
    >> i know everything is represented as machine's character set, most
    >> probably ASCII where 'A' is 65 but i am actually confused on this
    >> '\t', '\n' , ' ', and character issue.
    >>
    >> any help

    >
    > Generally end-of-line sequence is represented by one or two
    > characters. Under UNIX it's a single linefeed character, while under
    > DOS-like systems it's a carriage-return followed by a linefeed. MacOS
    > used to use a single carriage-return. Doubtless other systems may use
    > more variations.

    [...]

    But C's I/O routines, when operating on files opened in text mode,
    hide those details for you. Regardless of how an end-of-line is
    represented in an external file (and there are a *lot* of ways to do
    this, including fixed-length records with no specific marker), it's
    mapped to a single '\n' character.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Mar 10, 2007
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. arnuld
    Replies:
    7
    Views:
    366
    Salt_Peter
    Oct 30, 2006
  2. arnuld

    Stroustrup section 2.5.1

    arnuld, Oct 31, 2006, in forum: C++
    Replies:
    3
    Views:
    303
    Default User
    Oct 31, 2006
  3. arnuld
    Replies:
    13
    Views:
    510
    Default User
    Nov 8, 2006
  4. arnuld
    Replies:
    2
    Views:
    292
    arnuld
    Nov 8, 2006
  5. arnuld
    Replies:
    6
    Views:
    384
    arnuld
    Nov 8, 2006
Loading...

Share This Page