Problem using sscanf...

A

Alex Mathieu

Hi,

using sscanf, I'm trying to retrieve something, but nothing seems to
work.

Here's the pattern: SS%*sþ0þ%6s
Heres the data: SS000000395000000000DC-þ0þ799829þ1174503725þ

Actually, I would like to retrieve the "799829" from the data, but it
always failed. I thought that the "%*sþ0þ" would work as if I was
using "%*21cþ0þ", but it doesnt.

Can someone tell me why ?

Regards,

AM.
 
M

mlimber

using sscanf, I'm trying to retrieve something, but nothing seems to
work.

Here's the pattern: SS%*sþ0þ%6s
Heres the data: SS000000395000000000DC-þ0þ799829þ1174503725þ

Actually, I would like to retrieve the "799829" from the data, but it
always failed. I thought that the "%*sþ0þ" would work as if I was
using "%*21cþ0þ", but it doesnt.

Can someone tell me why ?

Nope, but comp.lang.c might help, or you might consider using strtok
or Boost.Tokenizer or even std::tr1::regex (aka Boost.Regex).

Cheers! --M
 
A

ablock84

Hi,

using sscanf, I'm trying to retrieve something, but nothing seems to
work.

Here's the pattern: SS%*sþ0þ%6s
Heres the data: SS000000395000000000DC-þ0þ799829þ1174503725þ

Actually, I would like to retrieve the "799829" from the data, but it
always failed. I thought that the "%*sþ0þ" would work as if I was
using "%*21cþ0þ", but it doesnt.

Can someone tell me why ?

Regards,

AM.

I think your problem is that "%*s" already reads the entire string,
because there is no seperator to stop parsing the string (a whitespace
for example). This means that after "%*s" there's nothing left to
parse.

You really should consider using a regular expressions library, as
mlimber already said.

Alex
 
A

Adrian Hawryluk

Alex said:
Hi,

using sscanf, I'm trying to retrieve something, but nothing seems to
work.

Here's the pattern: SS%*sþ0þ%6s
Heres the data: SS000000395000000000DC-þ0þ799829þ1174503725þ

Actually, I would like to retrieve the "799829" from the data, but it
always failed. I thought that the "%*sþ0þ" would work as if I was
using "%*21cþ0þ", but it doesnt.

Can someone tell me why ?

Regards,

AM.
Because scanf is greedy. It will match till it cannot match anymore.
What your scan pattern does is read in the entire 'word' which means
that it will read till it hits a whitespace. Try /something like/ this:

"SS%*[^-]-þ0þ%[^þ]%*s"

What this says is to read in a string that consists of every character
that is not a '-', read in the -þ0þ then read in a string that consists
of everything but a 'þ', then read in the rest of the 'word' (this last
step not necessary if you are using sscanf).

Note: using scanf can be difficult to get right. I also find the return
value not very useful as it doesn't tell me where the parse completed in
case I want to continue from there, but it could be useful for other
purposes.

When I use scanf, I usually use a %n in the format, and limit the string
read in to stop buffer overflows like this:

int byteOffset = 0; // must init as sscanf will not change if doesn't
// reach %n.
char stuff[7];
stuff[sizeof(stuff)-1] = '\0'; // ensuring null termination of the
// string without initialising the rest
// of it.

// Note the "%6[^þ]", this keeps the stuff buffer from overflowing.
sscanf(buffer, "SS%*[^-]-þ0þ%6[^þ]þ%n", stuff, &byteOffset);
if (byteOffset != 0) {
printf("You have read in the string %s.\n", stuff);
}

However, if you change the size of stuff to contain less elements then
you must change this number too, this is a potential maintenance
problem. To getting around that I would do like so:

// not sure if there is a header that contains these two macros:
#define STRINGIZE2(x) #x
#define STRINGIZE(x) STRINGIZE2(x)

#define DIM 7
int byteOffset = 0; // must init as sscanf will not change if doesn't
// reach %n.
char stuff[DIM];
stuff[DIM-1] = '\0'; // ensuring null termination of the string,
// without initialising the rest of it.

// Note the "%" STRINGIZE(DIM) "[^þ]", this keeps the stuff buffer
// from overflowing, while allowing you to modify the dimension at
// a single point some time later.
sscanf(buffer, "SS%*[^-]-þ0þ%" STRINGIZE(DIM)"[^þ]þ%n", stuff
, &byteOffset);
if (byteOffset != 0) {
printf("You have read in the string %s.\n", stuff);
}
#undef DIM // remove extraneous macros from the global macro namespace
#undef STRINGIZE
#undef STRINGIZE2

Because of the difficulty in its use, many people choose not use it.
However, if used correctly, it can be very fast at parsing.

FYI, I wrote this without testing it. There may be errors in the code
posted.

Hope this helps.


Adrian
--
_____________________________________________________________________
\/Adrian_Hawryluk BSc. - Specialties: UML, OOPD, Real-Time Systems\/
\ My newsgroup writings are licensed under a Creative Commons /
\ Attribution-Share Alike 3.0 License /
\_______[http://creativecommons.org/licenses/by-sa/3.0/]______/
\/_______[blog:_http://adrians-musings.blogspot.com/]______\/
 
A

Alex Mathieu

Yeah, seen this way this could be part of a solution and I REALLY
thank you..

Thing is that the sscanf is use into a log injector in our systems, so
I'm only specifying the pattern and the data to deal with... no very
much latitude. However, your solution with the "SS%*[^-]-þ0þ%[^þ]%*s"
pattern could help me for a while.

Actually, my problem is that I want to retrieve infos from a data
chunk where data are enclose between "þ" where "þ" is use as a
delimiter. Using regex it would be easy to retrieve the data with
something like þ*þ..., but with sscanf... this seems not too much
possible...

I'll try to think about a way to retrieve info easily from this
message...

However, thanks a lot for your very complete answer and the time you
took to wrote it down. It's not lost time, I'll try to implement that
solution on my own for test purpose :)

Regards,

Alexandre M.
Montréal, Québec



Alex Mathieuwrote:
using sscanf, I'm trying to retrieve something, but nothing seems to
work.
Here's the pattern: SS%*sþ0þ%6s
Heres the data: SS000000395000000000DC-þ0þ799829þ1174503725þ
Actually, I would like to retrieve the "799829" from the data, but it
always failed. I thought that the "%*sþ0þ" would work as if I was
using "%*21cþ0þ", but it doesnt.
Can someone tell me why ?

AM.

Because scanf is greedy. It will match till it cannot match anymore.
What your scan pattern does is read in the entire 'word' which means
that it will read till it hits a whitespace. Try /something like/ this:

"SS%*[^-]-þ0þ%[^þ]%*s"

What this says is to read in a string that consists of every character
that is not a '-', read in the -þ0þ then read in a string that consists
of everything but a 'þ', then read in the rest of the 'word' (this last
step not necessary if you are using sscanf).

Note: using scanf can be difficult to get right. I also find the return
value not very useful as it doesn't tell me where the parse completed in
case I want to continue from there, but it could be useful for other
purposes.

When I use scanf, I usually use a %n in the format, and limit the string
read in to stop buffer overflows like this:

int byteOffset = 0; // must init as sscanf will not change if doesn't
// reach %n.
char stuff[7];
stuff[sizeof(stuff)-1] = '\0'; // ensuring null termination of the
// string without initialising the rest
// of it.

// Note the "%6[^þ]", this keeps the stuff buffer from overflowing.
sscanf(buffer, "SS%*[^-]-þ0þ%6[^þ]þ%n", stuff, &byteOffset);
if (byteOffset != 0) {
printf("You have read in the string %s.\n", stuff);
}

However, if you change the size of stuff to contain less elements then
you must change this number too, this is a potential maintenance
problem. To getting around that I would do like so:

// not sure if there is a header that contains these two macros:
#define STRINGIZE2(x) #x
#define STRINGIZE(x) STRINGIZE2(x)

#define DIM 7
int byteOffset = 0; // must init as sscanf will not change if doesn't
// reach %n.
char stuff[DIM];
stuff[DIM-1] = '\0'; // ensuring null termination of the string,
// without initialising the rest of it.

// Note the "%" STRINGIZE(DIM) "[^þ]", this keeps the stuff buffer
// from overflowing, while allowing you to modify the dimension at
// a single point some time later.
sscanf(buffer, "SS%*[^-]-þ0þ%" STRINGIZE(DIM)"[^þ]þ%n", stuff
, &byteOffset);
if (byteOffset != 0) {
printf("You have read in the string %s.\n", stuff);
}
#undef DIM // remove extraneous macros from the global macro namespace
#undef STRINGIZE
#undef STRINGIZE2

Because of the difficulty in its use, many people choose not use it.
However, if used correctly, it can be very fast at parsing.

FYI, I wrote this without testing it. There may be errors in the code
posted.

Hope this helps.

Adrian
--
_____________________________________________________________________
\/Adrian_Hawryluk BSc. - Specialties: UML, OOPD, Real-Time Systems\/
\ My newsgroup writings are licensed under a Creative Commons /
\ Attribution-Share Alike 3.0 License /
\_______[http://creativecommons.org/licenses/by-sa/3.0/]______/
\/_______[blog:_http://adrians-musings.blogspot.com/]______\/
 
A

Adrian Hawryluk

Alex said:
Yeah, seen this way this could be part of a solution and I REALLY
thank you..

Thing is that the sscanf is use into a log injector in our systems, so
I'm only specifying the pattern and the data to deal with... no very
much latitude. However, your solution with the "SS%*[^-]-þ0þ%[^þ]%*s"
pattern could help me for a while.

Actually, my problem is that I want to retrieve infos from a data
chunk where data are enclose between "þ" where "þ" is use as a
delimiter. Using regex it would be easy to retrieve the data with
something like þ*þ..., but with sscanf... this seems not too much
possible...

I'll try to think about a way to retrieve info easily from this
message...

However, thanks a lot for your very complete answer and the time you
took to wrote it down. It's not lost time, I'll try to implement that
solution on my own for test purpose :)

No problem Alex, but please don't top post or quote unnecessarily, it is
considered rude on the newsgroups.

Re your more general problem of getting data between the delimiters.
Use something like this:

// x is the max number of chars to read in.
#define RECORD(x) "%" #x "[^þ]þ"
#define RECORD_IGNORE "%*[^þ]þ"

char str[8] = {}; // init entire array to '\0'. Slightly more
// overhead than just init the last one to '\0' but
// for small non-time-critical applications, it
// should be fine.
sscanf(buffer, RECORD_IGNORE RECORD(7) RECORD_IGNORE, str);

- or -

#define LEN 7
char str[LEN+1] = {};
sscanf(buffer, RECORD_IGNORE RECORD(LEN) RECORD_IGNORE, str);

- *don't do* -

#define DIM 8
char str[DIM] = {};
sscanf(buffer, RECORD_IGNORE RECORD(DIM-1) RECORD_IGNORE, str);

as your string will become "%*[^þ]þ%8-1[^þ]þ%*[^þ]þ" which doesn't make
sense.

Good luck.


Adrian
--
_____________________________________________________________________
\/Adrian_Hawryluk BSc. - Specialties: UML, OOPD, Real-Time Systems\/
\ _---_ Q. What are you doing here? _---_ /
\ / | A. Just surf'n the net, teaching and | \ /
\__/___\___ learning, learning and teaching. You?_____/___\__/
\/______[blog:__http://adrians-musings.blogspot.com/]______\/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top