oddness with strcpy

J

jackassplus

I'm sure that strcpy has been beaten to death, but:

I have a program:
#include <stdio.h>
#include <string.h>

int main(int argc, char *argv[]){

if (argc > 1){
if (strlen(argv[1])!=3){
printf("Argument must be 3 characters long");
return 1; //arbitrary value
}
char temp[40] = {}; //this should be large enough
strcat(temp,strcat("[A-Z]{2}[0-9]{3}",argv[1]));
printf("%s\n",temp);
strcat(temp,"[0][0-9]{4}");
printf("%s\n",temp);
}
return 0;
}

output :

Q:\DJGPP\test>test WEM
[A-Z]{2}[0-9]{3}WEM
[A-Z]{2}[0-9]{3}WEMEM

The first line is how I expect it, the second is not. could anyone
shed light on why this is not working as I expext it to?
 
D

Dominik Friedrichs

I'm sure that strcpy has been beaten to death, but:
if (argc > 1){
if (strlen(argv[1])!=3){
printf("Argument must be 3 characters long");
return 1; //arbitrary value
}
char temp[40] = {}; //this should be large enough
strcat(temp,strcat("[A-Z]{2}[0-9]{3}",argv[1]));
printf("%s\n",temp);
strcat(temp,"[0][0-9]{4}");
printf("%s\n",temp);
}
return 0;
}

I dont see any strcpy in your code...
Change
char temp[40] = {};
to
char temp[40] = {0};
or
char temp[40] = "";

or even better, replace your first strcat with strcpy.
 
N

Nelu

I'm sure that strcpy has been beaten to death, but:

I have a program:
#include <stdio.h>
#include <string.h>

int main(int argc, char *argv[]){

if (argc > 1){
if (strlen(argv[1])!=3){
printf("Argument must be 3 characters long"); return 1; //arbitrary
value
}
char temp[40] = {}; //this should be large enough

I assume C99 as C90 won't support empty initializer brackets, // comments
and declarations in the middle of the code.
strcat(temp,strcat("[A-Z]{2}[0-9]{3}",argv[1])); printf("%
s\n",temp);

You have a problem where you are trying to concatenate a string to a
string literal. You can't do that. strcat will not allocate memory to fit
the two strings. It will try to append the second string to the first
which means that the first parameter needs to have enough memory to hold
both strings and it cannot be a either a constant or a string literal.
strcat(temp,"[0][0-9]{4}");
printf("%s\n",temp);
}
return 0;
}

output :

Q:\DJGPP\test>test WEM
[A-Z]{2}[0-9]{3}WEM
[A-Z]{2}[0-9]{3}WEMEM

Crashes on mine.
The first line is how I expect it, the second is not. could anyone shed
light on why this is not working as I expext it to?

You shouldn't expect the first one either.
 
J

jackassplus

I dont see any strcpy in your code...
Sorry, it's still early...I meant strcat().
Change
char temp[40] = ""; did this...
or even better, replace your first strcat with strcpy.
and this...
but get the same result.

  char temp[40] = "[A-Z]{2}[0-9]{3}";
  strcat(temp, argv[1]);
  printf("%s\n",temp);

gives [A-Z]{2}[0-9]{3}WEM

If thats what you want...


That's fine, but why does

strcat(temp,"[0][0-9]{4}");
give me [A-Z]{2}[0-9]{3}WEMEM
instead of [A-Z]{2}[0-9]{3}WEM[0][0-9]{4}?
 
D

Dominik Friedrichs

I dont see any strcpy in your code...
Sorry, it's still early...I meant strcat().
Change
char temp[40] = ""; did this...
or even better, replace your first strcat with strcpy.
and this...
but get the same result.

char temp[40] = "[A-Z]{2}[0-9]{3}";
strcat(temp, argv[1]);
printf("%s\n",temp);

gives [A-Z]{2}[0-9]{3}WEM

This works fine for me:

char arg[] = "WEM";

char temp[40] = "[A-Z]{2}[0-9]{3}";
strcat(temp, arg);
strcat(temp,"[0][0-9]{4}");
printf("%s\n",temp);

// [A-Z]{2}[0-9]{3}WEM[0][0-9]{4}
 
J

jameskuyper

I'm sure that strcpy has been beaten to death, but:

I have a program:
#include <stdio.h>
#include <string.h>

int main(int argc, char *argv[]){ ....
char temp[40] = {}; //this should be large enough
strcat(temp,strcat("[A-Z]{2}[0-9]{3}",argv[1]));

The call to strcat("[A-Z]{2}[0-9]{3}",argv[1]) has undefined behavior.
The standard explicitly tells you that you can't count on being able
to write to a string literal. However, you didn't just write to the
string literal; you're writing past the end of the string literal,
which is worse. This might or might not be the cause of your problem,
but I'd strongly recommend fixing this before worrying about any other
issue.

Here's a much simpler approach, which avoids the undefined behavior:

char temp[40] = "[A-Z]{2}[0-9]{3}";
strcat(temp, argv[1]);
 
J

jackassplus

You have a problem where you are trying to concatenate a string to a
string literal. You can't do that. strcat will not allocate memory to fit
the two strings. It will try to append the second string to the first
which means that the first parameter needs to have enough memory to hold
both strings and it cannot be a either a constant or a string literal.

So would this be why
char temp[40] = "[A-Z]{2}[0-9]{3}";
strcat(temp,argv[1]);
strcat(temp,"[0][0-9]{4}");
works and
char temp[40] = "";
strcpy(temp,strcat("[A-Z]{2}[0-9]{3}",argv[1]));
strcat(temp,"[0][0-9]{4}");
does not?

to be honest, I don't see the difference.
 
J

jackassplus

I'm sure that strcpy has been beaten to death, but:
I have a program:
#include <stdio.h>
#include <string.h>
int main(int argc, char *argv[]){ ...
           char temp[40] = {}; //this should be large enough
           strcat(temp,strcat("[A-Z]{2}[0-9]{3}",argv[1]));

The call to strcat("[A-Z]{2}[0-9]{3}",argv[1]) has undefined behavior.
The standard explicitly tells you that you can't count on being able
to write to a string literal. However, you didn't just write to the
string literal; you're writing past the end of the string literal,
which is worse. This might or might not be the cause of your problem,
but I'd strongly recommend fixing this before worrying about any other
issue.

Here's a much simpler approach, which avoids the undefined behavior:

    char temp[40] = "[A-Z]{2}[0-9]{3}";
    strcat(temp, argv[1]);

I think I see now,
a string "cat" has length 4
If I tried to append an 's', it would try to shove 5 chars into a
memory allocation of 4 bytes
whereas
char cat[5] = "cat"
is big enough to hold "cats"

Am I correct?
 
N

Nelu

You have a problem where you are trying to concatenate a string to a
string literal. You can't do that. strcat will not allocate memory to
fit the two strings. It will try to append the second string to the
first which means that the first parameter needs to have enough memory
to hold both strings and it cannot be a either a constant or a string
literal.
So would this be why
char temp[40] = "[A-Z]{2}[0-9]{3}";

temp can hold 40 characters per your declaration, and the content only
occupies 16 + 1 ('\0'). It has space for 23 more characters.
strcat(temp,argv[1]);
strcat(temp,"[0][0-9]{4}");
works and
char temp[40] = "";
strcpy(temp,strcat("[A-Z]{2}[0-9]{3}",argv[1]));

"[A-Z]{2}[0-9]{3}" can hold 16 + 1. That's it. There's no more space for
anything else. Also, you are not allowed to modify string literals. The
standard says that it results in undefined behavior. That's why on your
machine it gave you bad output and it crashed on mine. It is conceivable
that it may output a correct result on a different machine, but nobody
can tell.
strcat(temp,"[0][0-9]{4}");
does not?

This works. The problem is in the previous line, inside the first strcpy.
 
J

jameskuyper

You have a problem where you are trying to concatenate a string to a
string literal. You can't do that. strcat will not allocate memory to fit
the two strings. It will try to append the second string to the first
which means that the first parameter needs to have enough memory to hold
both strings and it cannot be a either a constant or a string literal.

So would this be why
char temp[40] = "[A-Z]{2}[0-9]{3}";
strcat(temp,argv[1]);
strcat(temp,"[0][0-9]{4}");
works and
char temp[40] = "";
strcpy(temp,strcat("[A-Z]{2}[0-9]{3}",argv[1]));
strcat(temp,"[0][0-9]{4}");
does not?

to be honest, I don't see the difference.

The string literal "[A-Z]{2}[0-9]{3}" causes an unnamed array of 17
characters to be created. When used as an initializer for temp, that
array is copied over to the 'temp' array (arguably, the unnamed array
doesn't even have to exist in this case, since there's no way for a
strictly conforming program to determine whether or not it exists).

However, when that string literal is used in the context of your strcat
() call, it has a value which is a pointer to the first byte of that
unnamed array. According to 6.4.5p6 "If the program attempts to modify
such an array, the behavior is undefined." This allows, for instance,
for the unnamed array to be stored in read-only memory.

However, your program does not merely attempt to overwrite the
terminating '\0' character of the unnamed array with 'W', it then goes
on to try to write the characters 'E', 'M', and '\0' to the next three
bytes in memory. It is undefined behavior to attempt to write to those
memory locations. In principle, this means that just about anything
could happen.

In practice, one of the more likely possibilities is that those three
bytes of memory have already been reserved for some other purpose, and
when they get overwritten they cause other parts of the program to
malfunction. Those bytes are not part of the unnamed array, but that's
just about the only thing you know for sure about them. The single
most likely possibility is that they overlap the memory used for the
unnamed array corresponding to the "[0][0-9]{4}" string literal. As a
result, the pointer whose value is passed to your final strcat() call
would point at memory which now contains the string "EM". That would
explain your results perfectly.
 
S

Spiros Bousbouras

I'm sure that strcpy has been beaten to death, but:
I have a program:
#include <stdio.h>
#include <string.h>
int main(int argc, char *argv[]){ ...
char temp[40] = {}; //this should be large enough
strcat(temp,strcat("[A-Z]{2}[0-9]{3}",argv[1]));
The call to strcat("[A-Z]{2}[0-9]{3}",argv[1]) has undefined behavior.
The standard explicitly tells you that you can't count on being able
to write to a string literal. However, you didn't just write to the
string literal; you're writing past the end of the string literal,
which is worse. This might or might not be the cause of your problem,
but I'd strongly recommend fixing this before worrying about any other
issue.
Here's a much simpler approach, which avoids the undefined behavior:
char temp[40] = "[A-Z]{2}[0-9]{3}";
strcat(temp, argv[1]);

I think I see now,
a string "cat" has length 4
If I tried to append an 's', it would try to shove 5 chars into a
memory allocation of 4 bytes
whereas
char cat[5] = "cat"
is big enough to hold "cats"

Am I correct?

Yes but also a compiler is allowed to place a string
literal on read only memory so writing to it may cause
your programme to crash. When you do
char cat[5] = "cat" it's ok because an array is supposed
to be writable but when you do
strcat("[A-Z]{2}[0-9]{3}",argv[1]) the compiler might
put "[A-Z]{2}[0-9]{3}" on read only memory.
In other words

char *p = "abc" ;
p[0] = 0 ;

is undefined behavior but

char p[] = "abc" ;
p[0] = 0 ;

is ok.
 
W

Willem

(e-mail address removed) wrote:
) I'm sure that strcpy has been beaten to death, but:
)
) I have a program:
) #include <stdio.h>
) #include <string.h>
)
) int main(int argc, char *argv[]){
)
) if (argc > 1){
) if (strlen(argv[1])!=3){
) printf("Argument must be 3 characters long");
) return 1; //arbitrary value
) }
) char temp[40] = {}; //this should be large enough
) strcat(temp,strcat("[A-Z]{2}[0-9]{3}",argv[1]));

You want to be using: strcat(strcat(temp, "[A-Z]{3}[0-9]{3}"),argv[1]);

The way it is now, it's UB.

(I've got a pretty good idea why exactly you're getting the output
you are getting, but that is very very compiler-specific.)

) printf("%s\n",temp);
) strcat(temp,"[0][0-9]{4}");
) printf("%s\n",temp);
) }
) return 0;
) }
)
) output :
)
) Q:\DJGPP\test>test WEM
) [A-Z]{2}[0-9]{3}WEM
) [A-Z]{2}[0-9]{3}WEMEM
)
) The first line is how I expect it, the second is not. could anyone
) shed light on why this is not working as I expext it to?


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
P

Paul Hsieh

I'm sure that strcpy has been beaten to death, but:

I have a program:
#include <stdio.h>
#include <string.h>

int main(int argc, char *argv[]){

        if (argc > 1){
                if (strlen(argv[1])!=3){
                        printf("Argument must be 3 characters long");
                        return 1; //arbitrary value
                }
                char temp[40] = {}; //this should be large enough

This is a C++/C99-ism. Many compilers will barf on this.
                strcat(temp,strcat("[A-Z]{2}[0-9]{3}",argv[1]));

strcat(<string constant>,string) is not well defined. The reason you
are having a problem is because you put a string between two quotes
and trying to concatenate to it. C just doesn't have a simple way of
doing this. One way to deal with this is to combine this statement
with the one above it:

char temp[40] = "[A-Z]{2}[0-9]{3}";
strcat (temp, argv[1]);

But then you have a implicit safety constraint of requiring some
length limits on argv[1] which is not obvious from just looking at
that code. There are these new _s functions like strcat_s() but those
exist in even fewer compilers and are time consuming to use for
insufficient benefit.
                printf("%s\n",temp);
                strcat(temp,"[0][0-9]{4}");

Its even harder to see if this guy is well defined.
                printf("%s\n",temp);
        }
        return 0;

}

output :

Q:\DJGPP\test>test WEM
[A-Z]{2}[0-9]{3}WEM
[A-Z]{2}[0-9]{3}WEMEM

The first line is how I expect it, the second is not. could anyone
shed light on why this is not working as I expext it to?

The behavior is ill-defined and that's all.

If you use the Better String Library ( http://bstring.sf.net ) you
would write this program as:

#include <stdio.h>
#include <stdlib.h>
#include "bstrlib.h"

int main (int argc, char * argv[]) {
if (1 < argc) {
if (3 != strlen (argv[1])){
printf ("Argument must be 3 characters long\n");
return EXIT_FAILURE;
}
bstring b = bfromcstr ("[A-Z]{2}[0-9]{3}");
if (BSTR_OK == bcatcstr (b, argv[1]))
printf ("%s\n", bdata (b));
if (BSTR_OK == bcatcstr (b, "[0][0-9]{4}"))
printf ("%s\n", bdata (b));
bdestroy (b);
}
return EXIT_SUCCESS;
}

Notice how there is no issue with overwriting a string literal, no
hint of magic length constants (like [40]). The syntactic and
semantic limitations on bstrings are in synch, so you cannot run into
the same sorts of problems that you run into when trying to use raw C
strings.
 
O

Old Wolf

The call to strcat("[A-Z]{2}[0-9]{3}",argv[1]) has undefined behavior.
The standard explicitly tells you that you can't count on being able
to write to a string literal. However, you didn't just write to the
string literal; you're writing past the end of the string literal,
which is worse.

Behaviour worse than undefined behaviour?
That's pretty bad..:)
 
J

James Kuyper

Old said:
The call to strcat("[A-Z]{2}[0-9]{3}",argv[1]) has undefined behavior.
The standard explicitly tells you that you can't count on being able
to write to a string literal. However, you didn't just write to the
string literal; you're writing past the end of the string literal,
which is worse.

Behaviour worse than undefined behaviour?
That's pretty bad..:)

Theoretically, all undefined behavior is equivalent. In practical terms,
some ways of making the behavior undefined are more likely than others
to cause actual problems.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top