problems with CR (carriage return) and LF (line feed )

A

Andrew

I have created a program that downloads a web page and then performs
some text processing on it . The problem is in the text processing ,
every line (in the downloaded txt file ) ends with a strange symbol
which is the carriage return and the line feed . ( Hex values 0D and
0A ). How are these values represented in C ??? . For istance for
every character I read from the file i want the function to ignore it
.. for example :



....................................................

while((c=fgetc(fp) ) != EOF )
{

switch(c)
{
case '<' :
{
tagFlag=true;
cont=true;
i=0;
if(getvalue==1)
{
getvalue=0;
string_found=false ;
}
break;
}
case '>' :
{
tagFlag=false;
break;
}
case <<<<<< What should i put here ??????
{
break;
}
default :
{
if( (string_found == true) )
{
if(tagFlag == false )
{

getvalue=1;
printf("%c \n",c);
}


}
else if( (string_found==false))
{
if( (tagFlag==false) &&
(cont==true))
{
if(c==target)
{

if(i==
(target.GetLen()-1) )
{


times_found++;

string_found=true;
}
else
{
i++;

cont=true;
}
}
}
}
break;

}
}
}


..................................................



The file is stored like this :


......................................

if(ret == SOCKET_ERROR)
{

exit(EXIT_FAILURE);
}

_setmode(_fileno(fp), _O_TEXT);
/* fp is the file pointer */
do
{
bytesRead = recv(itsSocket, Buffer,
sizeof(Buffer), 0);

fwrite(Buffer,sizeof(char),bytesRead,fp);
} while(bytesRead!=0)



(Ok I know socket programming is offtopic but my question isn't ....
)
 
K

Kevin Goodsell

Andrew said:
I have created a program that downloads a web page and then performs
some text processing on it . The problem is in the text processing ,
every line (in the downloaded txt file ) ends with a strange symbol
which is the carriage return and the line feed . ( Hex values 0D and
0A ). How are these values represented in C ??? .

0x0D and 0x0A. Alternatively, '\x0D' and '\x0A'.

It may be that these values happen to correspond to characters in the
execution character set, and can be represented some other way (such as
'\r' or '\n', for example), but this is implementation-dependent.

-Kevin
 
T

those who know me have no need of my name

in comp.lang.c i read:
I have created a program that downloads a web page and then performs
some text processing on it . The problem is in the text processing ,
every line (in the downloaded txt file ) ends with a strange symbol
which is the carriage return and the line feed .

naturally they do, that's what the http specification requires -- i.e.,
http `headers' must all end with crlf. generally files are transported
verbatim, so those bytes are likely present in the file, on the server.
( Hex values 0D and 0A ). How are these values represented in C ??? .

umm, 0x0d and 0x0a.
 
C

CBFalconer

Andrew said:
I have created a program that downloads a web page and then performs
some text processing on it . The problem is in the text processing ,
every line (in the downloaded txt file ) ends with a strange symbol
which is the carriage return and the line feed . ( Hex values 0D and
0A ). How are these values represented in C ??? . For istance for
every character I read from the file i want the function to ignore it
. for example :

I have taken the liberty of reformating your code so I can clearly
indicate suggested changes (which are no longer quoted lines).
...................................................

while ((c = fgetc(fp) ) != EOF ) {
switch(c) {
case '<' : tagFlag = true;
cont = true;
i = 0;
if (getvalue == 1) {
getvalue = 0;
string_found = false ;
}
break;

case '>' : tagFlag=false;
break;

/* case <<<<<< What should i put here ?????? */
case '\n':
case '\r': break;
default : if ( (string_found == true) ) {
if (tagFlag == false ) {
getvalue = 1;
printf("%c \n",c);
}
}
else if ( (string_found == false)) {
if ( (tagFlag == false) && (cont == true)) {
if (c == target) {
if (i == (target.GetLen()-1) ) {
times_found++;
string_found = true;
}
else {
i++;
cont = true;
}
}
}
}
break;

} /* switch */
} /* while */

Excessive vertical spacing is just as harmful to comprehensibility
as the lack of breaks. Note that braces around the individual
cases are useless and confusing, as code normally simply executes
in order in the absence of a break.

I believe that the standards for HTML specify that those lines end
in \r\n, so the solution should be portable. However I am not
sure of this. You may want to inject a blank, which you can
probably do by replacing the "break" with "c = ' '" and falling
through. Other than this I am making no allegations about the
accuracy of the code.
 
T

those who know me have no need of my name

in comp.lang.c i read:
I believe that the standards for HTML specify that those lines end
in \r\n, so the solution should be portable.

they specify 0x0d 0x0a. whether those correspond to \r and \n depends on
the implementation. most likely they will, but the key to writing portable
code is in not making assumptions you can avoid.
 
C

CBFalconer

those said:
they specify 0x0d 0x0a. whether those correspond to \r and \n
depends on the implementation. most likely they will, but the
key to writing portable code is in not making assumptions you
can avoid.

Of course. But the i/o system would presumably make those
translations if the internal system is not ascii based. At any
rate, the point is that it is a vulnerability to be watched when
porting.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top