literal escape sequence conversion to raw

  • Thread starter Walter L. Preuninger II
  • Start date
W

Walter L. Preuninger II

I need to convert escape sequences entered into my program to the actual
code.

For example, \r becomes 0x0d
I have looked over the FAQ, and searched the web, with no results.
Is there a function that can do this, or do I need to use predefined
constants or a table of the values?

Below is a sample program.
When run with the input w\rx, I want to see the output::

input 77 5c 72 78 a

output 77 d 78

Thanks,



Walter



#include <stdio.h>

#include <stdarg.h>

int main (void)

{

char input[80];

char output[80];

int i;

fgets (input, 79, stdin);

printf ("\ninput ");

for (i = 0; i < strlen (input); i++) {

printf ("%x ", input); }

strcpy (output, "");

sprintf (output, "%s", input);

printf ("\noutput ");

for (i = 0; i < strlen (output); i++) {

printf ("%x ", output); }

exit (0);

}



When run, I want to get this output:

w\rx

input 77 5c 72 78 a

output 77 d 78
 
B

Ben Pfaff

Walter L. Preuninger II said:
I need to convert escape sequences entered into my program to the actual
code.

For example, \r becomes 0x0d
I have looked over the FAQ, and searched the web, with no results.
Is there a function that can do this, or do I need to use predefined
constants or a table of the values?

Just write a function to do it. You don't have to know the
numeric values that escape sequences map to, because you can use
the escape sequences themselves to do the mapping. e.g.
if (in[0] == '\\') {
switch (in[1]) {
case 'n': *out++ = '\n'; break;
case 'r': *out++ = '\r'; break;
case 't': *out++ = '\t'; break;
...
}
...
}
 
M

Malcolm

Walter L. Preuninger II said:
I need to convert escape sequences entered into my program to the
actual code.

For example, \r becomes 0x0d
So you want to allow the user to enter a string containing C escape
sequences?
I have looked over the FAQ, and searched the web, with no results.
Is there a function that can do this, or do I need to use predefined
constants or a table of the values?
There is no ANSI function that converts a string containing '\' characters
to the C escaped equivalent. Obviously every compiler contains such a
function, but it isn't publically available.
Below is a sample program.
When run with the input w\rx, I want to see the output::

input 77 5c 72 78 a

output 77 d 78
Not sure what you are looking for here. Are you saying you want the space
character to escape to a hexadecimal ASCII code ? This is possible, but not
very sensible.
#include <stdio.h>
#include <stdarg.h>

int main (void)

{

char input[80];

char output[80];

int i;

fgets (input, 79, stdin);
This is actually not much improvement on gets() ? What do you propose to do
on over-long input ?
printf ("\ninput ");

for (i = 0; i < strlen (input); i++) {
This is an O(n*n) algorithm. Ok it is only a demonstration program on short
input, but strlen() will be called on every iteration, and will step through
the string.
printf ("%x ", input); }

strcpy (output, "");

sprintf (output, "%s", input);

printf ("\noutput ");

for (i = 0; i < strlen (output); i++) {

printf ("%x ", output); }

exit (0);

}

Try writing a function

void escapestring(char *out, const char *in)
{
}

Which detects C-style escapes and writes the corrected string to out.
 
E

Eric Sosman

Walter L. Preuninger II said:
I need to convert escape sequences entered into my program to the actual
code.

For example, \r becomes 0x0d
I have looked over the FAQ, and searched the web, with no results.
Is there a function that can do this, or do I need to use predefined
constants or a table of the values?

There is no Standard library function to accomplish
this. That's not too surprising, really: the backslash
method of denoting special characters is a convention of
the way C source code is written, not anything intrinsic
in the nature of the special characters themselves. The
translation is provided by the compiler, and is just one
of many compile-time activities that lack run-time analogs.

(Of course, the fact that some operation occurs at
compile time is not a compelling reason not to support it
at run time. For example, the character sequence 314e-2
in C source is compiled into a poor approximation to pi,
and this same transformation is also accomplished by the
run-time strtod() function, among others. 314e-2 is
understandable outside a C context, which may be why its
run-time translation is provided for while the conversion
of \r\n to CR-LF is not -- but even that rationale breaks
down a bit in light of the library's support for blatant
C-isms like 0xA and 012. "No prior art" may be the only
definitive reason -- and "no prior art" may also indicate
that the transformation isn't of wide interest.)

That said, it's pretty easy to perform the translation
yourself if you really need it. Pseudocode:

char *p;
char ch;

for (p = input_string; *p != '\0'; ++p) {
if (*p != '\\') {
/* ordinary character represents itself */
emit_as_output (*p);
}
else {
/* backslash modifies next character */
ch = translate[ (unsigned char)(*++p) ];
if (ch != '\0') {
/* recognized an escape sequence */
emit_as_output (ch);
}
else {
/* garbage after the backslash */
complain_bitterly();
--p; /* restart the scan */
}
}
}

The magic is in the translate[] array, which could be
initialized once at the start of the program:

#include <limits.h>
char translate[1+UCHAR_MAX];
...
translate['r'] = '\r';
translate['n'] = '\n';
...
translate['\\'] = '\\';

.... thus avoiding any hard-wired assumptions about the numeric
values of these special characters (ASCII is not the world's
only character encoding, you know).
 
W

Walter L. Preuninger II

Thanks to Michael B Allen, Ben Pfaff and yourself for such a quick response.

My intended program is to scan the OFAC SDN (Specifically Designated
Nationals, the "terrorist list") file. But I want the program to work on
differently formatted files. So this question was for the soup bowl option
of allowing the user to specify what terminates a line/record. Some files
are CR, some LF, some are CRLF, and I even know of one text file that is
terminated with 0xFF

So my thought is to provide a command line switch that accepts \r, \a, \r\n,
and hex or octal codes (--delim "\r0xff" etc)

The sample code I posted produces the input line, I wanted to see the output
line like I showed, not what the program would have given me, where the
output line would have mirrored the input line.

In theory, the input never goes over 64 characters per line, and I will
detect longer length lines in my code. The program listed was just a test
program, and I never optimize test or proof of concept code.

Thanks for the valuable input!


Walter
 
K

Kevin Goodsell

Walter said:
I need to convert escape sequences entered into my program to the actual
code.

For example, \r becomes 0x0d

I don't understand. I can think of at least 2 sensible meanings for '\r'
in this context - do you mean that your program actually reads the two
character sequence '\' 'r'? Or that your program reads the character
that C represents by '\r'? Also, when you say it "becomes 0x0d" do you
mean that you want to convert it to the sequence of characters '0', 'x',
'0', 'd' or that you want to map it to the integer value 0x0d (13 in
decimal)? Also, where does the resulting sequence or value come from? As
far as we can tell, 0x0d is completely arbitrary. Is it supposed to be
the value that represents '\r' in some character set? If so, is it a
specific character set (and which one?) or will you use whatever the
execution character set of your implementation is? Note that different
implementations may use different execution character sets (all the
world is not ASCII).

When asking questions here, try to be as precise as possible.
I have looked over the FAQ, and searched the web, with no results.
Is there a function that can do this, or do I need to use predefined
constants or a table of the values?

There is no standard function (or there doesn't seem to be, based on the
possibilities I can think of for what you may be trying to do). If you
are mapping characters to the values that represent them in some
specific character set, you can do so portably only by using a table or
mapping of some sort. If you are using the execution character set then
you can simply use the value of the character that was read (interpreted
as an integer instead of a character).
Below is a sample program.
When run with the input w\rx, I want to see the output::

input 77 5c 72 78 a

output 77 d 78

Thanks,



Walter



#include <stdio.h>

#include <stdarg.h>

All these extra blank lines are annoying. Please don't do that in the
future. I've removed some of them.

You don't seem to be using anything from said:
int main (void)
{

char input[80];

Please use sane indenting. Bad indenting (or lack of indenting) makes
the code much more difficult to read.
char output[80];
int i;

Judging from how you use 'i', it should probably be a size_t rather than
an int.
fgets (input, 79, stdin);

fgets() expects the entire buffer size as the 2nd argument, and reads no
more than one fewer than the number specified. In other words, unless
you are saving the last character for some reason, you should use 80 as
the second argument. Better yet, use sizeof(input) for easier maintenance.
printf ("\ninput ");

for (i = 0; i < strlen (input); i++) {

You need to #include said:
printf ("%x ", input); }


The %x format specifier tells printf to expect an argument of type
unsigned int. Are you absolutely positive that char promotes to unsigned
int on your implementation (and, if so, are you positive that you don't
care about portability)? Note that this would imply that CHAR_MAX >
INT_MAX, making it impossible to implement several standard functions
correctly, and thus is not possible on a hosted implementation.

So in short, you are almost certainly passing the wrong type here. You
can fix it by casting to unsigned int:

printf("%x ", (unsigned int)input);
strcpy (output, "");

You could say

output[0] = '\0';

instead. You need said:
sprintf (output, "%s", input);
printf ("\noutput ");

for (i = 0; i < strlen (output); i++) {

printf ("%x ", output); }


Same error as with the other printf().
exit (0);

You need to #include <stdlib.h> for exit().

Also, a portable program must terminate (non-empty) text streams with a
newline character. In other words, you should do

printf("\n");

or something equivalent before terminating your program if you aren't
sure that the most recent output to stdout ended with a newline. (An
exception being if you've used freopen() to change stdout to binary
mode. I believe this is only possible in C99, and even then it is
implementation-defined what changes in mode are permitted, and under
what circumstances.)

-Kevin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top