problems with CR (carriage return) and LF (line feed )

Discussion in 'C Programming' started by Andrew, Dec 8, 2003.

  1. Andrew

    Andrew Guest

    I have created a program that downloads a web page and then performs
    some text processing on it . The problem is in the text processing ,
    every line (in the downloaded txt file ) ends with a strange symbol
    which is the carriage return and the line feed . ( Hex values 0D and
    0A ). How are these values represented in C ??? . For istance for
    every character I read from the file i want the function to ignore it
    .. for example :



    ....................................................

    while((c=fgetc(fp) ) != EOF )
    {

    switch(c)
    {
    case '<' :
    {
    tagFlag=true;
    cont=true;
    i=0;
    if(getvalue==1)
    {
    getvalue=0;
    string_found=false ;
    }
    break;
    }
    case '>' :
    {
    tagFlag=false;
    break;
    }
    case <<<<<< What should i put here ??????
    {
    break;
    }
    default :
    {
    if( (string_found == true) )
    {
    if(tagFlag == false )
    {

    getvalue=1;
    printf("%c \n",c);
    }


    }
    else if( (string_found==false))
    {
    if( (tagFlag==false) &&
    (cont==true))
    {
    if(c==target)
    {

    if(i==
    (target.GetLen()-1) )
    {


    times_found++;

    string_found=true;
    }
    else
    {
    i++;

    cont=true;
    }
    }
    }
    }
    break;

    }
    }
    }


    ..................................................



    The file is stored like this :


    ......................................

    if(ret == SOCKET_ERROR)
    {

    exit(EXIT_FAILURE);
    }

    _setmode(_fileno(fp), _O_TEXT);
    /* fp is the file pointer */
    do
    {
    bytesRead = recv(itsSocket, Buffer,
    sizeof(Buffer), 0);

    fwrite(Buffer,sizeof(char),bytesRead,fp);
    } while(bytesRead!=0)



    (Ok I know socket programming is offtopic but my question isn't ....
    )
     
    Andrew, Dec 8, 2003
    #1
    1. Advertising

  2. Andrew wrote:

    > I have created a program that downloads a web page and then performs
    > some text processing on it . The problem is in the text processing ,
    > every line (in the downloaded txt file ) ends with a strange symbol
    > which is the carriage return and the line feed . ( Hex values 0D and
    > 0A ). How are these values represented in C ??? .


    0x0D and 0x0A. Alternatively, '\x0D' and '\x0A'.

    It may be that these values happen to correspond to characters in the
    execution character set, and can be represented some other way (such as
    '\r' or '\n', for example), but this is implementation-dependent.

    -Kevin
    --
    My email address is valid, but changes periodically.
    To contact me please use the address from a recent posting.
     
    Kevin Goodsell, Dec 8, 2003
    #2
    1. Advertising

  3. in comp.lang.c i read:

    >I have created a program that downloads a web page and then performs
    >some text processing on it . The problem is in the text processing ,
    >every line (in the downloaded txt file ) ends with a strange symbol
    >which is the carriage return and the line feed .


    naturally they do, that's what the http specification requires -- i.e.,
    http `headers' must all end with crlf. generally files are transported
    verbatim, so those bytes are likely present in the file, on the server.

    >( Hex values 0D and 0A ). How are these values represented in C ??? .


    umm, 0x0d and 0x0a.

    --
    a signature
     
    those who know me have no need of my name, Dec 8, 2003
    #3
  4. Andrew

    CBFalconer Guest

    Andrew wrote:
    >
    > I have created a program that downloads a web page and then performs
    > some text processing on it . The problem is in the text processing ,
    > every line (in the downloaded txt file ) ends with a strange symbol
    > which is the carriage return and the line feed . ( Hex values 0D and
    > 0A ). How are these values represented in C ??? . For istance for
    > every character I read from the file i want the function to ignore it
    > . for example :


    I have taken the liberty of reformating your code so I can clearly
    indicate suggested changes (which are no longer quoted lines).
    >
    > ...................................................
    >
    > while ((c = fgetc(fp) ) != EOF ) {
    > switch(c) {
    > case '<' : tagFlag = true;
    > cont = true;
    > i = 0;
    > if (getvalue == 1) {
    > getvalue = 0;
    > string_found = false ;
    > }
    > break;
    >
    > case '>' : tagFlag=false;
    > break;
    >
    > /* case <<<<<< What should i put here ?????? */

    case '\n':
    case '\r': break;
    >
    > default : if ( (string_found == true) ) {
    > if (tagFlag == false ) {
    > getvalue = 1;
    > printf("%c \n",c);
    > }
    > }
    > else if ( (string_found == false)) {
    > if ( (tagFlag == false) && (cont == true)) {
    > if (c == target) {
    > if (i == (target.GetLen()-1) ) {
    > times_found++;
    > string_found = true;
    > }
    > else {
    > i++;
    > cont = true;
    > }
    > }
    > }
    > }
    > break;


    > } /* switch */
    > } /* while */


    Excessive vertical spacing is just as harmful to comprehensibility
    as the lack of breaks. Note that braces around the individual
    cases are useless and confusing, as code normally simply executes
    in order in the absence of a break.

    I believe that the standards for HTML specify that those lines end
    in \r\n, so the solution should be portable. However I am not
    sure of this. You may want to inject a blank, which you can
    probably do by replacing the "break" with "c = ' '" and falling
    through. Other than this I am making no allegations about the
    accuracy of the code.

    --
    Chuck F () ()
    Available for consulting/temporary embedded and systems.
    <http://cbfalconer.home.att.net> USE worldnet address!
     
    CBFalconer, Dec 8, 2003
    #4
  5. in comp.lang.c i read:

    >I believe that the standards for HTML specify that those lines end
    >in \r\n, so the solution should be portable.


    they specify 0x0d 0x0a. whether those correspond to \r and \n depends on
    the implementation. most likely they will, but the key to writing portable
    code is in not making assumptions you can avoid.

    --
    a signature
     
    those who know me have no need of my name, Dec 8, 2003
    #5
  6. Andrew

    CBFalconer Guest

    those who know me have no need of my name wrote:
    >
    > > I believe that the standards for HTML specify that those lines
    > > end in \r\n, so the solution should be portable.

    >
    > they specify 0x0d 0x0a. whether those correspond to \r and \n
    > depends on the implementation. most likely they will, but the
    > key to writing portable code is in not making assumptions you
    > can avoid.


    Of course. But the i/o system would presumably make those
    translations if the internal system is not ascii based. At any
    rate, the point is that it is a vulnerability to be watched when
    porting.

    --
    Chuck F () ()
    Available for consulting/temporary embedded and systems.
    <http://cbfalconer.home.att.net> USE worldnet address!
     
    CBFalconer, Dec 8, 2003
    #6
  7. Andrew

    Andrew Guest

    Thank you very-very much people it worked fine !!!!
     
    Andrew, Dec 9, 2003
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Dave Tichenor
    Replies:
    3
    Views:
    32,145
    Steven Cheng[MSFT]
    Feb 17, 2004
  2. Replies:
    2
    Views:
    3,545
  3. OutlookNewbieDev

    carriage return and line feed characters are lost when passed in p

    OutlookNewbieDev, Oct 28, 2005, in forum: ASP .Net Web Services
    Replies:
    0
    Views:
    171
    OutlookNewbieDev
    Oct 28, 2005
  4. aa

    line feed and carriage return

    aa, Apr 21, 2004, in forum: ASP General
    Replies:
    2
    Views:
    224
  5. Steve Anderson
    Replies:
    3
    Views:
    256
    Steve Anderson
    Jun 21, 2004
Loading...

Share This Page