Spaces in C

J

Joriveek

Hi,

I have a little piece of program here

Basically what it does is, it copies the strings of variable widths. The
basis is until it finds a comma ",". The input is a CSV/Comma Separated
file.

Now the problem is that it is not counting Spaces. For example to read the
following line with the below Program is OK:

123, Hello, 3422C
3994,Hii,39948D

Result: Fine, works;

But if I have the strings like the below;
123, Hi How are you,99399 C

Result: Fails to read after "Hi" because of the space, can you suggest any
code changes to below stub?

Thanks
J
--------------------------
int j = 0;
char c;
c = ptr[0];
ibic[0] = c;
while(c != ',')
{
++j;
c = ptr[j];
ibic[j] = c;
}
ibic[j] = '\0';
return 0;
-------------------------
 
E

Eric Sosman

Joriveek wrote On 02/13/06 11:00,:
Hi,

I have a little piece of program here

Please post the entire thing -- reduce it to its
essentials if it's long, but post a complete compilable
program. When you are sick, do you take your entire
body to the doctor or just send a lock of your hair?
 
R

Rod Pemberton

Joriveek said:
Hi,

I have a little piece of program here

Basically what it does is, it copies the strings of variable widths. The
basis is until it finds a comma ",". The input is a CSV/Comma Separated
file.

Now the problem is that it is not counting Spaces. For example to read the
following line with the below Program is OK:

123, Hello, 3422C
3994,Hii,39948D

Result: Fine, works;

But if I have the strings like the below;
123, Hi How are you,99399 C

Result: Fails to read after "Hi" because of the space, can you suggest any
code changes to below stub?

Thanks
J
--------------------------
int j = 0;
char c;
c = ptr[0];
ibic[0] = c;
while(c != ',')
{
++j;
c = ptr[j];
ibic[j] = c;
}
ibic[j] = '\0';
return 0;

Are you sure the problem is in that stub and not in the routine that reads
and fills 'ptr'?

Rod Pemberton
 
J

Joriveek

sorry, it is for reading a CSV file;
if there are spaces, it is not working, just reading if it is a continuous
string.


Rod Pemberton said:
Joriveek said:
Hi,

I have a little piece of program here

Basically what it does is, it copies the strings of variable widths. The
basis is until it finds a comma ",". The input is a CSV/Comma Separated
file.

Now the problem is that it is not counting Spaces. For example to read
the
following line with the below Program is OK:

123, Hello, 3422C
3994,Hii,39948D

Result: Fine, works;

But if I have the strings like the below;
123, Hi How are you,99399 C

Result: Fails to read after "Hi" because of the space, can you suggest
any
code changes to below stub?

Thanks
J
--------------------------
int j = 0;
char c;
c = ptr[0];
ibic[0] = c;
while(c != ',')
{
++j;
c = ptr[j];
ibic[j] = c;
}
ibic[j] = '\0';
return 0;

Are you sure the problem is in that stub and not in the routine that reads
and fills 'ptr'?

Rod Pemberton
 
S

stathis gotsis

Joriveek said:
Hi,

I have a little piece of program here

Basically what it does is, it copies the strings of variable widths. The
basis is until it finds a comma ",". The input is a CSV/Comma Separated
file.

Now the problem is that it is not counting Spaces. For example to read the
following line with the below Program is OK:

123, Hello, 3422C
3994,Hii,39948D

Result: Fine, works;

But if I have the strings like the below;
123, Hi How are you,99399 C

Result: Fails to read after "Hi" because of the space, can you suggest any
code changes to below stub?

Thanks
J
--------------------------
int j = 0;
char c;
c = ptr[0];
ibic[0] = c;
while(c != ',')
{
++j;
c = ptr[j];
ibic[j] = c;
}
ibic[j] = '\0';
return 0;
-------------------------

Try displaying the contents of ptr[], maybe there are no spaces in there
either.
 
M

Mark McIntyre

Hi,

I have a little piece of program here

you didn't post enough of your code. The sample you show doesn't make
any sense.
Basically what it does is, it copies the strings of variable widths. The
basis is until it finds a comma ",". The input is a CSV/Comma Separated
file.

You could try using strchr or strtok

Mark McIntyre
 
C

CBFalconer

Mark said:
you didn't post enough of your code. The sample you show doesn't
make any sense.


You could try using strchr or strtok

Here is a routine I just wrote down, totally untested, and not even
compiled yet. After this bunch gets through criticizing it it
should be bullet proof. Until then beware slippery slopes.

#include <stddef.h>

/* copy over the next token from an input string, after
skipping leading blanks (or other whitespace???). The
token is terminated by the first appearance of tokchar,
or by the end of the source string.
The caller must supply sufficient space in token to
receive any token, Otherwise tokens will be truncated.

Returns: a pointer past the terminating tokchar.

This will happily return an infinity of empty tokens if
called with src pointing to the end of a string. Tokens
will never include a copy of tokchar.
*/


const char *toksplit(const char *src, /* Source of tokens */
char tokchar, /* token delimiting char */
char *token, /* receiver of parsed token */
size_t lgh) /* length token can receive */
/* not including final '\0' */
{
while (' ' == *src) *src++;

while (*src && (tokchar != *src)) {
if (lgh) {
*token++ = *src;
--lgh;
}
src++;
}
if (*src && (tokchar == *src)) src++;
*token = '\0';
return src;
} /* toksplit */

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>
 
M

Michael Mair

CBFalconer said:
Here is a routine I just wrote down, totally untested, and not even
compiled yet. After this bunch gets through criticizing it it
should be bullet proof. Until then beware slippery slopes.

I did not test it either...
#include <stddef.h>

/* copy over the next token from an input string, after
skipping leading blanks (or other whitespace???). The
token is terminated by the first appearance of tokchar,
or by the end of the source string.
The caller must supply sufficient space in token to
receive any token, Otherwise tokens will be truncated.

Returns: a pointer past the terminating tokchar.

This will happily return an infinity of empty tokens if
called with src pointing to the end of a string. Tokens
will never include a copy of tokchar.
*/


const char *toksplit(const char *src, /* Source of tokens */
char tokchar, /* token delimiting char */
char *token, /* receiver of parsed token */
size_t lgh) /* length token can receive */
/* not including final '\0' */
{
while (' ' == *src) *src++;

ITYM
while (*src && ' ' == *src) src++;
while (*src && (tokchar != *src)) {
if (lgh) {
*token++ = *src;
--lgh;
}

I'd break in an else. Why go through 100000 characters if
five suffice? This may imply a change of the loop structure.
src++;
}
if (*src && (tokchar == *src)) src++;
*token = '\0';
return src;
} /* toksplit */

Cheers
Michael
 
C

CBFalconer

Michael said:
I did not test it either...


ITYM
while (*src && ' ' == *src) src++;

if *src == ' ' then *src is true, unless ' ' == 0, which conflicts
with the idea that strings are terminated with '\0'.
I'd break in an else. Why go through 100000 characters if
five suffice? This may imply a change of the loop structure.

My attitude is that if a token is over-long and needs to be
truncated, do it and get the pointers set up for the next token.
That way a sequence of calls can always find, say, the third
token. I envision something like:

const char source = "Suitable stuff, , make, tokens";
char token[5];
const char *src = source;
int i;
....
for (i = 0; i < 3; i++) {
src = toksplit(src, ',', token, sizeof(token) - 1);
process(token);
}

finding the third token, "make". I don't think a source string of
length 10000 is especially likely to occur, so I am prepared for
inefficiencies in dealing with it. This would allow tokens to be
abbreviated to their first four chars.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>
 
M

Michael Mair

CBFalconer said:
if *src == ' ' then *src is true, unless ' ' == 0, which conflicts
with the idea that strings are terminated with '\0'.

Argh. Did not think enough about it.
I would have checked all the input parameters and
have prematurely terminated and was somehow still caught
on that track.
I'd break in an else. Why go through 100000 characters if
five suffice? This may imply a change of the loop structure.

My attitude is that if a token is over-long and needs to be
truncated, do it and get the pointers set up for the next token.
That way a sequence of calls can always find, say, the third
token. I envision something like:

const char source = "Suitable stuff, , make, tokens";
char token[5];
const char *src = source;
int i;
...
for (i = 0; i < 3; i++) {
src = toksplit(src, ',', token, sizeof(token) - 1);
process(token);
}

finding the third token, "make". I don't think a source string of
length 10000 is especially likely to occur, so I am prepared for
inefficiencies in dealing with it. This would allow tokens to be
abbreviated to their first four chars.

I see; I did not follow the discussion but I still would have
gone for a final call to strchr() after the loop rather than
test against lgh all the time.

Cheers
Michael
 
C

CBFalconer

Michael said:
CBFalconer schrieb:
.... snip ...

I did not test it either...

I got around to testing it. Use -DTESTING to compile a test
program with gcc. Without that define you get a linkable module.
The result follows:

/* ------- file toksplit.h ----------*/
#ifndef H_toksplit_h
# define H_toksplit_h

# ifdef __cplusplus
extern "C" {
# endif

#include <stddef.h>

/* copy over the next token from an input string, after
skipping leading blanks (or other whitespace?). The
token is terminated by the first appearance of tokchar,
or by the end of the source string.

The caller must supply sufficient space in token to
receive any token, Otherwise tokens will be truncated.

Returns: a pointer past the terminating tokchar.

This will happily return an infinity of empty tokens if
called with src pointing to the end of a string. Tokens
will never include a copy of tokchar.

released to Public Domain, by C.B. Falconer.
Published 2006-02-20. Attribution appreciated.
*/

const char *toksplit(const char *src, /* Source of tokens */
char tokchar, /* token delimiting char */
char *token, /* receiver of parsed token */
size_t lgh); /* length token can receive */
/* not including final '\0' */

# ifdef __cplusplus
}
# endif
#endif
/* ------- end file toksplit.h ----------*/


/* ------- file toksplit.c ----------*/
#include "toksplit.h"

/* copy over the next token from an input string, after
skipping leading blanks (or other whitespace?). The
token is terminated by the first appearance of tokchar,
or by the end of the source string.

The caller must supply sufficient space in token to
receive any token, Otherwise tokens will be truncated.

Returns: a pointer past the terminating tokchar.

This will happily return an infinity of empty tokens if
called with src pointing to the end of a string. Tokens
will never include a copy of tokchar.

A better name would be "strtkn", except that is reserved
for the system namespace. Change to that at your risk.

released to Public Domain, by C.B. Falconer.
Published 2006-02-20. Attribution appreciated.
*/

const char *toksplit(const char *src, /* Source of tokens */
char tokchar, /* token delimiting char */
char *token, /* receiver of parsed token */
size_t lgh) /* length token can receive */
/* not including final '\0' */
{
if (src) {
while (' ' == *src) *src++;

while (*src && (tokchar != *src)) {
if (lgh) {
*token++ = *src;
--lgh;
}
src++;
}
if (*src && (tokchar == *src)) src++;
}
*token = '\0';
return src;
} /* toksplit */

#ifdef TESTING
#include <stdio.h>

#define ABRsize 6 /* length of acceptable token abbreviations */

int main(void)
{
char teststring[] = "This is a test, ,, abbrev, more";

const char *t, *s = teststring;
int i;
char token[ABRsize + 1];

puts(teststring);
t = s;
for (i = 0; i < 4; i++) {
t = toksplit(t, ',', token, ABRsize);
putchar(i + '1'); putchar(':');
puts(token);
}

puts("\nHow to detect 'no more tokens'");
t = s; i = 0;
while (*t) {
t = toksplit(t, ',', token, 3);
putchar(i + '1'); putchar(':');
puts(token);
i++;
}

puts("\nUsing blanks as token delimiters");
t = s; i = 0;
while (*t) {
t = toksplit(t, ' ', token, ABRsize);
putchar(i + '1'); putchar(':');
puts(token);
i++;
}
return 0;
} /* main */

#endif
/* ------- end file toksplit.c ----------*/


--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>
 
W

websnarf

Joriveek said:
I have a little piece of program here

Basically what it does is, it copies the strings of variable widths. The
basis is until it finds a comma ",". The input is a CSV/Comma Separated
file.

CSV parsing is a little bit convoluted. CSV files are lines of fields,
which is the classic nested tokens problem that strtok is so useless as
dealing with. You can find a parser here:

http://www.pobox.com/~qed/bcsv.zip
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,191
Latest member
BuyKetoBeez

Latest Threads

Top