Standard function to convert "\t" to '\t' (etc.)?

D

David Mathog

When a C program is run like this:

prog "1\t2"

The corresponding argv will be a null terminated string holding 4
characters before the terminator: '1' '\' 't' '2'.

Is there a standard function which will convert that to the null
terminated string holding instead: '1' '\t' '2' ?

It should also handle the other C defined characters, \n etc.

Does such a function exist? I searched but the characters and words
were all so common that I could not find the answer. Previously I have
used code which scanned along the string, looking for the backslash, and
then handled the subsequent escaped character. But that was my own
code. Is there a standard function to do this?

Thanks,

David Mathog
 
P

Phil Carmody

David Mathog said:
When a C program is run like this:

prog "1\t2"

The corresponding argv will be a null terminated string holding 4
characters before the terminator: '1' '\' 't' '2'.

That's a property of the environment in which you are running the
C program. I can think of at least one environment in which your
claim is false.
Is there a standard function which will convert that to the null
terminated string holding instead: '1' '\t' '2' ?

There is not. Why should a function only of use to the early
processing of one particular computer language be of particular
(or any) importance in the C standard library?

Phil
 
D

David Mathog

Phil said:
That's a property of the environment in which you are running the
C program. I can think of at least one environment in which your
claim is false.

I think you misread what I wrote. In the command line the operator
literally entered characters '1' '\' 't' '2', not '1' '<tab>' '2'. The
first form will work as I described on every OS I have ever used.

I tried coercing sprintf into doing this, but while this converts "\t"
to '\t':

sprintf(string,"1\t2");

this does not, where argv[2] contains the string passed in from the
command line.

sprintf(string, argv[2]);

Apparently the compiler is handling the 2nd argument differently than
the running program does.

Regards,

David Mathog
 
B

Beej Jorgensen

David Mathog said:
I tried coercing sprintf into doing this, but while this converts "\t"
to '\t':

sprintf(string,"1\t2");
^^
The coercion happens here in the string literal, not in sprintf(). The
"\t" is converted to a tab before sprintf() even sees it. Try:

sprintf(string, "1\\t2");

to get the same effect as this on your system:
sprintf(string, argv[2]);

(which I acknowledge is not what you want.)
Apparently the compiler is handling the 2nd argument differently than
the running program does.

In a way, yes, because the compiler processes string literals in a way
that it does not process arrays of chars.

-Beej
 
F

Flash Gordon

David said:
I think you misread what I wrote. In the command line the operator
literally entered characters '1' '\' 't' '2', not '1' '<tab>' '2'. The
first form will work as I described on every OS I have ever used.

I can certainly coerce my shell in to doing it.
I tried coercing sprintf into doing this, but while this converts "\t"
to '\t':

sprintf(string,"1\t2");

The translation of \t here is nothing to do with sprintf, it is the way
string literals are processed.
this does not, where argv[2] contains the string passed in from the
command line.

sprintf(string, argv[2]);

Here there is no string literal.
Apparently the compiler is handling the 2nd argument differently than
the running program does.

See above.

You would also get the expansion with
char *zzz="1\t2";


Normally the correct way to get a TAB as part of a parameter to a
program is whatever the shell you are using gives you as a method of
doing it.

There is no standard function to do what you want, but writing such a
function is not hard, especially as you know it will only shrink the
string or leave it the same length.
 
N

Nick Keighley

That's a property of the environment in which you are running the
C program. I can think of at least one environment in which your
claim is false.


There is not. Why should a function only of use to the early
processing of one particular computer language

and that "one particular computer language" would be C?

be of particular
(or any) importance in the C standard library?

why would C be of importance to the C library...
 
P

Phil Carmody

David Mathog said:
I think you misread what I wrote.

Your thinking is flawed.
In the command line the operator
literally entered characters '1' '\' 't' '2',

So he didn't start and finish the argument with a double-quote
character? Strange, as that appears to be what you wrote above.
not '1' '<tab>' '2'.
The first form will work as I described on every OS I have ever used.

That line of thinking is sometimes called "All the world's a VAX".
I tried coercing sprintf into doing this, but while this converts "\t"
to '\t':

We need to get one thing straight: In C, "\t" is a two-character
null-terminated string, and '\t' is a single character. You cannot
convert one to the other in the way that you think. (The conversion
``strip off the final '\0'.'' would fit the bill, but that's not
what you're after.) If you wish to specify a sequence which contains
a literal backslash character followed by a literal t character, then
please do not use either of the two quoting styles whose behaviour
is unambiguously defined by the C standard, use a style which contrasts
against that.
sprintf(string,"1\t2");

Here you use "1\t2" in a C-context. Therefore you mean a 4 character
null-terminated string with a tab as the 2nd character. Therefore
you have _not_ converted anything, in particular anything beginning
``1\t2'', to (contain a) tab using sprintf. The C compiler has converted
it during compilation.
this does not, where argv[2] contains the string passed in from the
command line.

sprintf(string, argv[2]);

Apparently the compiler is handling the 2nd argument differently than
the running program does.

There's no 'apparently' about it - their behaviours are *explicitly*
described. The C compiler interprets certain escape sequences embedded
in literal strings. The sprintf function, if processing a %s, will
simply copy characters from a string up to but not including its final
null character.

Phil
 
P

Phil Carmody

Nick Keighley said:
and that "one particular computer language" would be C?



why would C be of importance to the C library...

No reason at all. You're apparently confusing writing with running.

If knowledge about the syntax and parsing of C source was important,
there would be an eval() function.

Phil
 
R

Richard Bos

Phil Carmody said:
No reason at all. You're apparently confusing writing with running.

If knowledge about the syntax and parsing of C source was important,
there would be an eval() function.

Non sequitur.

A function to translate C string literals to internal strings is useful
not only for C interpreters or *shudder* running constructed lines of C
code on-the-fly, but also for compilers of C[1], source code analysers,
and many related programs.
What's more, it would even be useful for programs which have nothing to
do with the parsing of C code itself, but which would like to use C's
escape characters to allow users to input enriched strings. For example,
a pair of escape and unescape functions would be useful for writing
multi-line text fields (think database memo fields, e.g.) to text dumps.

If the _execution_ of C source code were important, there would be an
eval() function. But that is a rather more specific, and very much more
hairy, problem.

The main reasons, I suspect, why these functions don't exist are:
- There is no commonly supported previous prior art.
- It is relatively simple to write these functions; at least the
unescaping function is simple enough for a beginner's exercise -
the escaping one is harder, because the result may be longer than
the original.

Richard

[1] And these days, many other languages as well, which, granted, was
not much of an argument when the first Standard was written.
 
P

Phil Carmody

A function to translate C string literals to internal strings is useful
not only for C interpreters or *shudder* running constructed lines of C
code on-the-fly, but also for compilers of C[1], source code analysers,
and many related programs.

Very few programs, then. About 2 of the binaries on my machine here,
out of the few thousand I have installed. Thank you for emphasising
my point.
What's more, it would even be useful for programs which have nothing to
do with the parsing of C code itself, but which would like to use C's
escape characters to allow users to input enriched strings. For example,
a pair of escape and unescape functions would be useful for writing
multi-line text fields (think database memo fields, e.g.) to text dumps.

Why C's mechanism? Why not XML's mechanism? There's almost certainly
more XML-alike data out there being read and parsed, and thus unescaped,
than there is C-string-like data. There's nothing special about C,
stop being so parochial.
The main reasons, I suspect, why these functions don't exist are:
- There is no commonly supported previous prior art.

This ain't a patent - precisely what do you mean by that?
- It is relatively simple to write these functions;

One could say the same about most mem* and str* functions, yet they
are included in the standard library.

Don't get me wrong, I think the C standard library is far from perfect,
but I think something with as narrow a use as unescapeing C strings
would never have stayed for very long in anyone's mind as something which
should be included in it.

Phil
 
N

Nobody

No reason at all. You're apparently confusing writing with running.

If knowledge about the syntax and parsing of C source was important,
there would be an eval() function.

Non sequitur.

A function to translate C string literals to internal strings is useful
not only for C interpreters or *shudder* running constructed lines of C
code on-the-fly, but also for compilers of C[1], source code analysers,
and many related programs.

Functions to translate Python/Lisp/Bourne-shell/HTML/... literals would
be equally (or even more) useful. The standard library doesn't include
those functions either.

There is no reason why any part of the C compiler belongs in the standard
library.
 
J

jameskuyper

Phil said:
A function to translate C string literals to internal strings is useful
not only for C interpreters or *shudder* running constructed lines of C
code on-the-fly, but also for compilers of C[1], source code analysers,
and many related programs.

Very few programs, then. About 2 of the binaries on my machine here,
out of the few thousand I have installed. Thank you for emphasising
my point.

There's a lot of other comparably obscure features that did make it
into the standard. Large portions of the math library, for instance,
and _Complex.

....
This ain't a patent - precisely what do you mean by that?

Innovation is supposed to be done by implementors; innovations that
become sufficiently popular should then be incorporated into the C
standard - but the standard itself is supposed to standardize, not
innovate. This concept has been violated, sometimes with good reason,
but in general the committee prefers to avoid innovation.
One could say the same about most mem* and str* functions, yet they
are included in the standard library.

Those are in the standard library because on many platforms there are
optimizations that can be made to those functions at the assembly
language level that were unavailable in ordinary C code, at least with
the relatively primitive optimization available in the C compilers in
common use at the time those functions were standardized.
This isn't the case with the string_unescape(), or whatever you'd want
to call it.
 
D

David Mathog

Gordon said:
In BSD there are non-standard functions strunvis(), strunvisx(),
strvis(), and strvisx() which translate strings from and to a visual
representation of the string. There are several visual representations,
one of them being C-style, and another being octal escapes. strunvis()
decodes all these escapes back into a string. The strvis*() forms
have many options as to what to encode.

It isn't that I didn't already have my own code for this (see below my
signature, includes not shown, it needs limits.h). I just wanted to
know if there might not be a function always present in the C library
that would do this.

Thanks,

David Mathog

/* Convert text form for special characters in a string to the
corresponding (unsigned) character. Handles:
Some C escape sequences: \\, \a,\b,\f,\t,\r, and \n.
ASCII control characters like ^J (masks the 2nd character
retaining only the lowest 6 bits)
For a lone "^" use "/^".
Numerically specified character values as \###,\o###,\x## (3,3,
and 2 digits, as shown, ONLY)
Range is 0-255.
Returns 1 on success, 0 on error
*/
int convert_escape(char *string){
unsigned char *parsed;
unsigned char *scan;
#define NORMAL 0
#define ESCAPE 1
#define CONTROL 2
#define DNUMERIC 3
#define ONUMERIC 4
#define XNUMERIC 5
int state = NORMAL;
int sum = 0;
int count = 0;
int status = 1;
int ok = 1;
for(scan = parsed = (unsigned char *) string; ok; scan++){
switch(state){
case NORMAL:
switch(*scan){
case '\\':
state=ESCAPE;
break;
case '^':
state=CONTROL;
break;
case '\0':
*parsed = *scan; ok = 0;
break;
default:
*parsed=*scan; parsed++;
}
break;
case ESCAPE:
switch(*scan){
case '\\':
state=NORMAL; *parsed=*scan; parsed++;
break;
case 'a':
state=NORMAL; *parsed='\a'; parsed++;
break;
case 'b':
state=NORMAL; *parsed='\b'; parsed++;
break;
case 'f':
state=NORMAL; *parsed='\f'; parsed++;
break;
case 't':
state=NORMAL; *parsed='\t'; parsed++;
break;
case 'r':
state=NORMAL; *parsed='\r'; parsed++;
break;
case 'n':
state=NORMAL; *parsed='\n'; parsed++;
break;
case 'd':
state=DNUMERIC; sum=0; count=0;
break;
case 'o':
state=ONUMERIC; sum=0; count=0;
break;
case 'x':
state=XNUMERIC; sum=0; count=0;
break;
case '0':
case '1':
case '2':
case '3':
case '4':
case '5':
case '6':
case '7':
case '8':
case '9':
state=DNUMERIC; sum = *scan - '0'; count=1;
break;
case '\0':
ok = status = 0;
break;
default:
state=NORMAL; *parsed=*scan; parsed++;
}
break;
case CONTROL:
if(*scan=='\0'){
ok = status = 0;
}
else {
state=NORMAL; *parsed = *scan & 31; parsed++;
}
break;
case DNUMERIC:
switch(*scan){
case '0':
case '1':
case '2':
case '3':
case '4':
case '5':
case '6':
case '7':
case '8':
case '9':
sum=(10*sum) + (*scan - '0');
if(++count == 3){ /* There must be exactly 3 digits */
if(sum > UCHAR_MAX){
ok = status = 0;
}
else {
state = NORMAL; *parsed=sum; parsed++;
}
}
break;
default:
ok = status = 0;
}
break;
case ONUMERIC:
switch(*scan){
case '0':
case '1':
case '2':
case '3':
case '4':
case '5':
case '6':
case '7':
sum= (8*sum) + (*scan - '0');
if(++count == 3){ /* There must be exactly 3 digits */
if(sum > UCHAR_MAX){
ok = status = 0;
}
else {
state = NORMAL; *parsed=sum; parsed++;
}
}
break;
default:
ok = status = 0;
}
break;
case XNUMERIC:
switch(*scan){
case '0':
case '1':
case '2':
case '3':
case '4':
case '5':
case '6':
case '7':
case '8':
case '9':
sum=(16*sum) + (*scan - '0');
if(++count == 2){ /* There must be exactly 2 digits */
state = NORMAL; *parsed=sum; parsed++;
}
break;
case 'A':
case 'B':
case 'C':
case 'D':
case 'E':
case 'F':
sum=(16*sum) + (10 + *scan - 'A');
if(++count == 2){ /* There must be exactly 2 digits */
state = NORMAL; *parsed=sum; parsed++;
}
break;
case 'a':
case 'b':
case 'c':
case 'd':
case 'e':
case 'f':
sum=(16*sum) + (10 + *scan - 'a');
if(++count == 2){ /* There must be exactly 2 digits */
state = NORMAL; *parsed=sum; parsed++;
}
break;
default:
ok = status = 0;
}
break;
default: /* catch what should be an impossible state */
ok = status = 0;
}
}
return(status);
}
 
R

Richard Bos

Phil Carmody said:
A function to translate C string literals to internal strings is useful
not only for C interpreters or *shudder* running constructed lines of C
code on-the-fly, but also for compilers of C[1], source code analysers,
and many related programs.

Very few programs, then. About 2 of the binaries on my machine here,
out of the few thousand I have installed. Thank you for emphasising
my point.

Very many, compared to the ones in which, e.g., gets() or strncpy() are
the right function to use. If you take into account the fact that C was
originally intended as a systems programming language, not as an
applications programming language, all the more reason why escaping
functions could have been useful in the original C library.
Why C's mechanism? Why not XML's mechanism?
*Brrrrrrr*

There's almost certainly more XML-alike data out there being read and
parsed, and thus unescaped, than there is C-string-like data.

This may be true, although I'm not all that sure; but was it true when
K&R 1 was written, or even when the C89 Standard was?
There's nothing special about C, stop being so parochial.

Nothing special, except that we're talking about the library for C
itself. That sounds like a pretty special exception to me.
This ain't a patent - precisely what do you mean by that?
RTFRationale.


One could say the same about most mem* and str* functions, yet they
are included in the standard library.

Yes, but again, RTFRationale: there _was_ prior art for them, and they
can potentially be written much more efficiently in implementation-
specific code, possibly even built-in machine code. The former is,
AFAIAA, not true for escape/unescape, and the latter is not likely to be
true to any reasonable extent.
Don't get me wrong, I think the C standard library is far from perfect,
but I think something with as narrow a use as unescapeing C strings
would never have stayed for very long in anyone's mind as something which
should be included in it.

Oh, I'm not saying that they _should_ have been included. Just that they
_might_, had they been common at the time.

Richard
 
P

Phil Carmody

Phil Carmody said:
A function to translate C string literals to internal strings is useful
not only for C interpreters or *shudder* running constructed lines of C
code on-the-fly, but also for compilers of C[1], source code analysers,
and many related programs.

Very few programs, then. About 2 of the binaries on my machine here,
out of the few thousand I have installed. Thank you for emphasising
my point.

Very many, compared to the ones in which, e.g., gets() or strncpy() are
the right function to use. If you take into account the fact that C was
originally intended as a systems programming language, not as an
applications programming language, all the more reason why escaping
functions could have been useful in the original C library.

But, as history has _proved_, weren't.
*Brrrrrrr*

See .sig
This may be true, although I'm not all that sure; but was it true when
K&R 1 was written, or even when the C89 Standard was?


Nothing special, except that we're talking about the library for C
itself. That sounds like a pretty special exception to me.

No we are not. The environment in which a program runs does not
necesarily have _anything_ at all to do with the environment in
which a program is coded.

Maybe I'm biased as I have worked more in embedded systems than
any other environment, but I have _never_ confused the two. If
you just click the "build a bloaty app" button in your compiler,
and then click on the newly created icon that appears on your
desktop, then you might have a different view of the world, I
don't know.
RTFRationale.

No - express yourself clearly and unambiguously.

We're getting somewhere, then.
, but again, RTFRationale: there _was_ prior art for them, and they
can potentially be written much more efficiently in implementation-
specific code, possibly even built-in machine code.

A predicate true of almost any standalone function. Care to present
such an argument in an assembly-language related forum?
The former is,
AFAIAA, not true for escape/unescape, and the latter is not likely to be
true to any reasonable extent.

I detect someone who's never read any Abrash.
Oh, I'm not saying that they _should_ have been included. Just that they
_might_, had they been common at the time.

Holy moley! Stop the presses! There are some functions that aren't
in the standard library that might have been if they had been more
in demand! This is so groundbreaking I can hardly think of what to
type now
 
R

Richard Bos

Phil Carmody said:
No - express yourself clearly and unambiguously.

I did. "Prior art" is a specific reason that is given in the Rationale,
in those very words, for why something was or was not included in the
C89 Standard. If you want to discuss this subject, you need to know
those words.
I detect someone who's never read any Abrash.

I don't see how you can draw that conclusion. Literally speaking it's
true, but I've read plenty of discussions about Abrash' measurements,
and AFAICT they more or less agree with the above: memcpy() is simple
enough for machine code optimisations to work (think, e.g., a CPIR
instruction which a CPU might provide), while escape() would be complex
enough that algorithm improvements would overwhelm machine code hackery.
Holy moley! Stop the presses!

Your sarcasm fails to have a point that I can see. The above has been my
point all along; if you only understand that now, that is not because I
have changed my position.

Richard
 
P

Phil Carmody

Nick Keighley said:

"""
Google ["prior art" site:http://www.cs.man.ac.uk/~pjj/cs211/c_rationale/ ]
[Search]


Search: (*) the web ( ) pages from the UK


Your search - "prior art" site:http://www.cs.man.ac.uk/~pjj/cs211/c_rationale/
- did not match any documents.
"""

But they did let the spider crawl them:
"""
Results 1 - 2 of 2 from www.cs.man.ac.uk/~pjj/cs211/c_rationale for rationale.
(0.24 seconds)

Search Results

1. ANSI C Rationale

Rationale for. American National Standard for Information Systems --
Programming Language -- C. UNIX is a registered trademark of AT&T. ...
www.cs.man.ac.uk/~pjj/cs211/c_rationale/rat.html - Cached - Similar
"""
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top