Matching nonempty sequence of '^' scanf with the "%[" specifier

R

regis

Greetings,

about scanf matching nonempty sequences using the "%[" specifier...

"%[^-]" matches a nonempty sequence of anything except '-'
"%[^[]" matches a nonempty sequence of anything except '['
"%[^]]" matches a nonempty sequence of anything except ']'
"%[^^]" matches a nonempty sequence of anything except '^'

"%[-]" matches a nonempty sequence of '-'
"%[[]" matches a nonempty sequence of '['
"%[]]" matches a nonempty sequence of ']'

....but how to match a nonempty sequence of '^' ?

"%[^]" is not possible because here ']' is not the closing bracket
but a character in the inverted scanset.

Assuming that '^' is 0136 in octal, then "%[\136" still has the
meaning "%[^" with '^' interpreted as a special character,
so this is not possible either.

"%[^-^]" is not interpreted as matching a nonempty sequence in the
degenerated range {'^', ..., '^'} but as matching anything
except '^' and '-'.

"\^" is non a valid escape sequence...

is there a solution ?
 
P

P.J. Plauger

about scanf matching nonempty sequences using the "%[" specifier...

"%[^-]" matches a nonempty sequence of anything except '-'
"%[^[]" matches a nonempty sequence of anything except '['
"%[^]]" matches a nonempty sequence of anything except ']'
"%[^^]" matches a nonempty sequence of anything except '^'

"%[-]" matches a nonempty sequence of '-'
"%[[]" matches a nonempty sequence of '['
"%[]]" matches a nonempty sequence of ']'

...but how to match a nonempty sequence of '^' ?
^^*

"%[^]" is not possible because here ']' is not the closing bracket
but a character in the inverted scanset.

Assuming that '^' is 0136 in octal, then "%[\136" still has the
meaning "%[^" with '^' interpreted as a special character,
so this is not possible either.

"%[^-^]" is not interpreted as matching a nonempty sequence in the
degenerated range {'^', ..., '^'} but as matching anything
except '^' and '-'.

"\^" is non a valid escape sequence...

is there a solution ?

^^*

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
 
R

Robert Gamble

P.J. Plauger said:
about scanf matching nonempty sequences using the "%[" specifier...

"%[^-]" matches a nonempty sequence of anything except '-'
"%[^[]" matches a nonempty sequence of anything except '['
"%[^]]" matches a nonempty sequence of anything except ']'
"%[^^]" matches a nonempty sequence of anything except '^'

"%[-]" matches a nonempty sequence of '-'
"%[[]" matches a nonempty sequence of '['
"%[]]" matches a nonempty sequence of ']'

...but how to match a nonempty sequence of '^' ?

^^*

I am obviously missing something here, could you elaborate or provide a
complete example that demonstrates this?

Robert Gamble
 
P

P.J. Plauger

P.J. Plauger said:
about scanf matching nonempty sequences using the "%[" specifier...

"%[^-]" matches a nonempty sequence of anything except '-'
"%[^[]" matches a nonempty sequence of anything except '['
"%[^]]" matches a nonempty sequence of anything except ']'
"%[^^]" matches a nonempty sequence of anything except '^'

"%[-]" matches a nonempty sequence of '-'
"%[[]" matches a nonempty sequence of '['
"%[]]" matches a nonempty sequence of ']'

...but how to match a nonempty sequence of '^' ?

^^*

I am obviously missing something here, could you elaborate or provide a
complete example that demonstrates this?

I was being glib. You talked only about matching the sequence,
not storing it. In that case, "^^*" matches exactly the sequence
you want, and discards it. When I want to match just a sequence of
carets, and store it in a string, I do something dirty like "[\377^]"
or something besides \377 I don't expect to be in the input.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
 
B

Ben Pfaff

P.J. Plauger said:
P.J. Plauger said:
about scanf matching nonempty sequences using the "%[" specifier...
[...]
I am obviously missing something here, could you elaborate or provide a
complete example that demonstrates this?

I was being glib. You talked only about matching the sequence,
not storing it. In that case, "^^*" matches exactly the sequence
you want, and discards it. [...]

It does? As far as I can tell it only matches those three
characters literally, not a sequence of carets. I don't know
about any special handling of * outside a conversion
specification. Perhaps you can educate me.
 
R

Robert Gamble

P.J. Plauger said:
P.J. Plauger said:
about scanf matching nonempty sequences using the "%[" specifier...

"%[^-]" matches a nonempty sequence of anything except '-'
"%[^[]" matches a nonempty sequence of anything except '['
"%[^]]" matches a nonempty sequence of anything except ']'
"%[^^]" matches a nonempty sequence of anything except '^'

"%[-]" matches a nonempty sequence of '-'
"%[[]" matches a nonempty sequence of '['
"%[]]" matches a nonempty sequence of ']'

...but how to match a nonempty sequence of '^' ?

^^*

I am obviously missing something here, could you elaborate or provide a
complete example that demonstrates this?

I was being glib. You talked only about matching the sequence,
not storing it. In that case, "^^*" matches exactly the sequence
you want, and discards it.

I am not the OP but what I don't understand is the implied significance
of the asterisk character in your example, could you expand on this?
When I want to match just a sequence of
carets, and store it in a string, I do something dirty like "[\377^]"
or something besides \377 I don't expect to be in the input.

As far as I can tell, it is not possible to match/store a sequence of
only carets with the %[] conversion specifier, do you agree that this
is not possible? If it is possible to match (but not store) a sequence
of one of more carets without using %[] (as you indicate is possible
above) then it would be possible to cleanly obtain the number of
characters matched using a couple of well-placed %n specifiers but so
far I haven't seen any evidence that this is the case.

Robert Gamble
 
A

ais523

P.J. Plauger said:
about scanf matching nonempty sequences using the "%[" specifier...

"%[^-]" matches a nonempty sequence of anything except '-'
"%[^[]" matches a nonempty sequence of anything except '['
"%[^]]" matches a nonempty sequence of anything except ']'
"%[^^]" matches a nonempty sequence of anything except '^'

"%[-]" matches a nonempty sequence of '-'
"%[[]" matches a nonempty sequence of '['
"%[]]" matches a nonempty sequence of ']'

...but how to match a nonempty sequence of '^' ?
^^*

"%[^]" is not possible because here ']' is not the closing bracket
but a character in the inverted scanset.

Assuming that '^' is 0136 in octal, then "%[\136" still has the
meaning "%[^" with '^' interpreted as a special character,
so this is not possible either.

"%[^-^]" is not interpreted as matching a nonempty sequence in the
degenerated range {'^', ..., '^'} but as matching anything
except '^' and '-'.

"\^" is non a valid escape sequence...

is there a solution ?

^^*

I think ^^* is an attempt to create a regexp that matches any number of
carets (in which case \^\^* is what is needed), but the %[ specifier
doesn't match regexps (not a standard C concept), only scansets (which
appear similar to regexps). %[^^*] matches anything but carets and
asterisks when in a scanf format string.

To the OP: One slightly extreme solution is to write %[^] followed by
every character in the character set apart from '^' and ']', then a
']'. The main problem with this is the inefficiency, and the handling
of '\0' (which can't be written in the scanset, as it would terminate
the string). However, this is not recommended; I would use strspn to
input the carets followed by a sscanf on the rest of the string to
accomplish a similar effect.
Note also that %[ without a width specifier has the same problem as
gets if used with scanf; it can only be used safely on sscanf (where
you know the length of the input string) or possibly fscanf (if you're
sure you know the contents of the file and nothing but your program can
have modified it).
 
R

regis

ais523 said:
about scanf matching nonempty sequences using the "%[" specifier...

"%[^-]" matches a nonempty sequence of anything except '-'
"%[^[]" matches a nonempty sequence of anything except '['
"%[^]]" matches a nonempty sequence of anything except ']'
"%[^^]" matches a nonempty sequence of anything except '^'

"%[-]" matches a nonempty sequence of '-'
"%[[]" matches a nonempty sequence of '['
"%[]]" matches a nonempty sequence of ']'

...but how to match a nonempty sequence of '^' ?
To the OP: One slightly extreme solution is to write %[^] followed by
every character in the character set apart from '^' and ']', then a
']'. The main problem with this is the inefficiency, and the handling
of '\0' (which can't be written in the scanset, as it would terminate
the string). However, this is not recommended; I would use strspn to
input the carets followed by a sscanf on the rest of the string to
accomplish a similar effect.
Note also that %[ without a width specifier has the same problem as
gets if used with scanf; it can only be used safely on sscanf (where
you know the length of the input string) or possibly fscanf (if you're
sure you know the contents of the file and nothing but your program can
have modified it).

The point of my question is that, in general, when the designers of
some syntax introduce a special character, they always introduce a
simple lexical way to get back the literal meaning of this character
in the procese, e.g. by backslashing it, or by doubling it,
or as it is the case for the example above,
by analysing its position in the scanset.

The designers of scanf seemed to have cared that it be the case
for special characters '-','[',']' for both scansets and inverted
scansets but seemed to have done half the work for '^'.
 
P

P.J. Plauger

P.J. Plauger said:
P.J. Plauger wrote:

about scanf matching nonempty sequences using the "%[" specifier...
[...]
...but how to match a nonempty sequence of '^' ?

^^*

I am obviously missing something here, could you elaborate or provide a
complete example that demonstrates this?

I was being glib. You talked only about matching the sequence,
not storing it. In that case, "^^*" matches exactly the sequence
you want, and discards it. [...]

It does? As far as I can tell it only matches those three
characters literally, not a sequence of carets. I don't know
about any special handling of * outside a conversion
specification. Perhaps you can educate me.

And I promosed I wouldn't shoot from the hip for a whole month.
Never mind.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,051
Latest member
CarleyMcCr

Latest Threads

Top