the usage of sscanf

D

Da Wang

Hi, all

I am trying to use sscanf to parse the header for a web server,
according to the requirement, it need to neglect all the blanks in the
header
for example, all the following should be equvalient and the value should
be read correctly( get "Host" and "localhost" )
" Host: localhost "
" Host : localhost "
" Host :localhost "
"Host:localhost"
etc.

I have tried various ways and wrote the following code:
--------
st=sscanf(header, " %[a-zA-Z0-9_-] : %[^ ]" ,name, value);
---------
and so far it seems works..however, it only support a limit set of chars
and if I want more, I need to add all of them into the bracket, which
looks awkward. I am wondering if anyone has a better solution to my
problem and hope you could kindly help me out.

Many thanks.
--
Life is an opportunity to do something.
.-._
o_oo'_)
`._ `._
`, \
//_(_)_/
~~
 
D

dot

Hi, all

I am trying to use sscanf to parse the header for a web server,
according to the requirement, it need to neglect all the blanks in the
header
for example, all the following should be equvalient and the value should
be read correctly( get "Host" and "localhost" )
" Host: localhost "
" Host : localhost "
" Host :localhost "
"Host:localhost"
etc.

I have tried various ways and wrote the following code:
--------
st=sscanf(header, " %[a-zA-Z0-9_-] : %[^ ]" ,name, value);
---------
and so far it seems works..however, it only support a limit set of chars
and if I want more, I need to add all of them into the bracket, which
looks awkward. I am wondering if anyone has a better solution to my
problem and hope you could kindly help me out.

Use a #define with your character set in it...
Use the resulting constant in your code...

#define MY_CS a-zA-Z0-9_-

st = sscanf(header, " %[MY_CS] : %[^ ]" ,name, value)
 
K

Keith Thompson

Use a #define with your character set in it...
Use the resulting constant in your code...

#define MY_CS a-zA-Z0-9_-

st = sscanf(header, " %[MY_CS] : %[^ ]" ,name, value)

Macros aren't expanded in string literals.

I suppose you could do:

#define MY_CS "a-zA-Z0-9_-"
st = sscanf(header, " %[" MY_CS "] : %[^ ]" ,name, value);

but that's just equivalent to:

st = sscanf(header, " %[a-zA-Z0-9_-] : %[^ ]" ,name, value);
 
D

Da Wang

Keith said:
Use a #define with your character set in it...
Use the resulting constant in your code...

#define MY_CS a-zA-Z0-9_-

st = sscanf(header, " %[MY_CS] : %[^ ]" ,name, value)


Macros aren't expanded in string literals.

I suppose you could do:

#define MY_CS "a-zA-Z0-9_-"
st = sscanf(header, " %[" MY_CS "] : %[^ ]" ,name, value);

but that's just equivalent to:

st = sscanf(header, " %[a-zA-Z0-9_-] : %[^ ]" ,name, value);
Many thanks.

Another question, is there any way to use another form of regular
expression without using the charset?

Thanks in advance again.
--
Life is an opportunity to do something.
.-._
o_oo'_)
`._ `._
`, \
//_(_)_/
~~
 
C

Chris Torek

Keith Thompson wrote:
[slight editing]
#define MY_CS "a-zA-Z0-9_-"
st = sscanf(header, " %[" MY_CS "] : %[^ ]" ,name, value);
but that's just equivalent to:
st = sscanf(header, " %[a-zA-Z0-9_-] : %[^ ]" ,name, value);

Another question, is there any way to use another form of regular
expression without using the charset?

No. In fact, scanf does not really do regular expressions at
all -- the character-class %[ conversion is the equivalent of
[class]+ (i.e., one or more characters from the scanset), but no
other regular-expression features are available. (As a result,
the scanf engine does not need the amount of code found in most
RE matchers. The obvious trivial algorithm has linear behavior
and never needs to back up.)
 
D

Dave Thompson

Hi, all

I am trying to use sscanf to parse the header for a web server,
according to the requirement, it need to neglect all the blanks in the
header
for example, all the following should be equvalient and the value should
be read correctly( get "Host" and "localhost" )
" Host: localhost "
" Host : localhost "
" Host :localhost "
"Host:localhost"
etc.
Your requirement is wrong. Treating a header line beginning with
whitespace as a new item is in violation of 2068 syntax, inherited via
1945 from 822, which makes it a continuation of the preceding "folded"
header. Space after the header name before the colon is also
explicitly forbidden, and I've never seen it used, although it can be
parsed unambiguously under the "liberal receive" principle.
I have tried various ways and wrote the following code:

The range syntax a-z etc. is not standard C and thus not guaranteed
portable, but in practice it probably works on all but EBCDIC systems.

This isn't _ignoring_ spaces in the value part, it is terminating the
value at a space. For Host in particular this is OK because a
domainname (or IPaddress) can't contain whitespace, but this may be
wrong for other header fields.
and so far it seems works..however, it only support a limit set of chars
and if I want more, I need to add all of them into the bracket, which
looks awkward. I am wondering if anyone has a better solution to my
problem and hope you could kindly help me out.
If you want to accept anything in the header label, except colon and
maybe space (or HWS?) just use %[^:] or %[^ :] etc. If you want to
restrict it to given characters, you have to state those characters
somehow. You might find some systems that allow POSIX-style classes in
a *scanf scanset (as well as a regex) like %[[:alpha:][:digit:]-_] ,
but this isn't required and isn't that much better anyway.

- David.Thompson1 at worldnet.att.net
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top