Please explain the output

J

Jaspreet

I was recently asked this question in an interview. Unfortunately I was
not able to answer it and the interviewer made a decision on my C
strengths (or weekness) based on this single question and that was a
sad end to my interview. Here is the program:

#include <stdio.h>

int main()
{
char *c ="abc";
int *i=(int*)c;
printf("%x", *i);
return 0;
}

The output is:
636261.

I know that hex 61 is decimal 97 which is the ASCII code for a. hex 62
is code for b and so on. My query is why is it printing the ascii codes
in the reverse order.

I apologise if this questios is a very basic one.
 
W

Walter Roberson

#include <stdio.h>
int main()
{
char *c ="abc";
int *i=(int*)c;
printf("%x", *i);
return 0;
}
The output is:
636261.

Not in general, it isn't. For example on the system I'm using,
the output is 61626300
I know that hex 61 is decimal 97 which is the ASCII code for a. hex 62
is code for b and so on. My query is why is it printing the ascii codes
in the reverse order.

On systems in which the representation of integers is little-endian,
the byte order is sometimes 4321 (and sometimes 2143). The order
that an integer (or float or double) is stored into RAM is often
not the same as the internal processor in-register order.
 
A

Antonio Contreras

Jaspreet said:
I was recently asked this question in an interview. Unfortunately I was
not able to answer it and the interviewer made a decision on my C
strengths (or weekness) based on this single question and that was a
sad end to my interview. Here is the program:

#include <stdio.h>

int main()
{
char *c ="abc";
int *i=(int*)c;
printf("%x", *i);
return 0;
}

The output is:
636261.

I know that hex 61 is decimal 97 which is the ASCII code for a. hex 62
is code for b and so on. My query is why is it printing the ascii codes
in the reverse order.

I apologise if this questios is a very basic one.

In first place, IIRC, this program invokes undefined behaviour because
it dereferences a pointer that has been converted to another type.
Since behaviour is undefined, the output could be anything. For all you
know there could be no output, or your computer may burn.

Besides that, you're probably in a platform with 32 bit integers that
are stored in memory in big endian format. The string "abc" is stored
in memory:

|'a'|'b'|'c'| 0 |
^
|
|
char *c

Now when you cast c to an int* and dereference it, these four bytes are
interpreted as an integer in big endian format, wich means that 0 is
the MSB and 'a' the LSB.
 
E

Eric Laberge

Jaspreet said:
I was recently asked this question in an interview. Unfortunately I was
not able to answer it and the interviewer made a decision on my C
strengths (or weekness) based on this single question and that was a
sad end to my interview. Here is the program:

#include <stdio.h>

int main()
{
char *c ="abc";
int *i=(int*)c;
printf("%x", *i);
return 0;
}

The output is:
636261.

I know that hex 61 is decimal 97 which is the ASCII code for a. hex 62
is code for b and so on. My query is why is it printing the ascii codes
in the reverse order.

I apologise if this questios is a very basic one.

Here's a hint: Big vs little endian.
 
S

Simon Biber

Jaspreet said:
I was recently asked this question in an interview. Unfortunately I was
not able to answer it and the interviewer made a decision on my C
strengths (or weekness) based on this single question and that was a
sad end to my interview. Here is the program:

#include <stdio.h>

int main()
{
char *c ="abc";

It's best to store pointers to string literals in a const pointer to char.
int *i=(int*)c;

This pointer conversion has undefined behaviour; for one thing, the
pointer to char may not be aligned correctly for an int!
printf("%x", *i);

1. the pointer may be mis-aligned
2. if sizeof(int)>4 then part of the value is uninitialised
3. the value may be a trap representation for int
4. the %x specifies an unsigned int, not a signed int
5. the stdout stream should end with a newline character
return 0;
}

The output is:
636261.

That's just one possible result of the undefined behaviour.
 
W

Waxhead

Jaspreet said:
I was recently asked this question in an interview. Unfortunately I was
not able to answer it and the interviewer made a decision on my C
strengths (or weekness) based on this single question and that was a
sad end to my interview. Here is the program:

#include <stdio.h>

int main()
{
char *c ="abc";
int *i=(int*)c;
printf("%x", *i);
return 0;
}

The output is:
636261.

The system who produces this output is a little endian system (x86), if
you run the same code on a big endian system (for example a MC68000
compatible) you will get the output you might expect.
 
K

Kenneth Brody

Jaspreet said:
I was recently asked this question in an interview. Unfortunately I was
not able to answer it and the interviewer made a decision on my C
strengths (or weekness) based on this single question and that was a
sad end to my interview. Here is the program:
[... program to output *(int *)"abc" in hex ...]
The output is:
636261.

I know that hex 61 is decimal 97 which is the ASCII code for a. hex 62
is code for b and so on. My query is why is it printing the ascii codes
in the reverse order.

I apologise if this questios is a very basic one.

Google for "little endian" and "big endian".

(We're ignoring all of the non-portability issues in the program, of
course.)

--
+-------------------------+--------------------+-----------------------------+
| Kenneth J. Brody | www.hvcomputer.com | |
| kenbrody/at\spamcop.net | www.fptech.com | #include <std_disclaimer.h> |
+-------------------------+--------------------+-----------------------------+
Don't e-mail me at: <mailto:[email protected]>
 
E

Emmanuel Delahaye

Jaspreet wrote on 16/09/05 :
I was recently asked this question in an interview. Unfortunately I was
not able to answer it and the interviewer made a decision on my C
strengths (or weekness) based on this single question and that was a
sad end to my interview. Here is the program:

#include <stdio.h>

int main()
{
char *c ="abc";
int *i=(int*)c;

Undefined behaviour (UB). There is no guarantee that the address of a
string literal is correctly aligned for a int.
printf("%x", *i);

Undefined behaviour. "%x" expects an unsigned int.

A '\n' is missing (unterminated line). The result may appear or not.
return 0;
}

The output is:
636261.

Once the UB's fixed, the behaviour is implementation-dependent (size of
an int, endianness, charset). The result is target-dependent. It can't
be explainded without knowing the details of the implementation.
I know that hex 61 is decimal 97 which is the ASCII code for a. hex 62
is code for b and so on. My query is why is it printing the ascii codes
in the reverse order.

Try on a Mac...
I apologise if this questios is a very basic one.

This question is idiotic. The interviewer is a idiot. The company that
hires such an idiotic is simply going to die. Try another place.

--
Emmanuel
The C-FAQ: http://www.eskimo.com/~scs/C-faq/faq.html
The C-library: http://www.dinkumware.com/refxc.html

"Clearly your code does not meet the original spec."
"You are sentenced to 30 lashes with a wet noodle."
-- Jerry Coffin in a.l.c.c++
 
W

Walter Roberson

In first place, IIRC, this program invokes undefined behaviour because
it dereferences a pointer that has been converted to another type.
Since behaviour is undefined, the output could be anything.

-As written- your statement ignores (char *) access, which -is-
legal (but makes no promises about what you get when you read
out internal padding bytes.)

Besides that, you're probably in a platform with 32 bit integers that
are stored in memory in big endian format. The string "abc" is stored
in memory:
|'a'|'b'|'c'| 0 |
^
|
|
char *c
Now when you cast c to an int* and dereference it, these four bytes are
interpreted as an integer in big endian format, wich means that 0 is
the MSB and 'a' the LSB.

Just the opposite: "big endian" means that the address is of the
"big end", which is to say the MSB.

The claimed output will occur only for a subset of "little endian" systems.
 
A

aegis

Simon said:
It's best to store pointers to string literals in a const pointer to char.


This pointer conversion has undefined behaviour; for one thing, the
pointer to char may not be aligned correctly for an int!

In what regard is it undefined behavior? The conversion will produce an
implementation defined value. Subsequent use by dereferencing it may
invoke undefined behavior though.
 
P

pete

aegis said:
In what regard is it undefined behavior?
The conversion will produce an
implementation defined value.

No.
If the pointer to char, is not aligned correctly for an int,
then you have no assignment.
On systems that do have a requirement for alignment,
you can't just write an int value anywhere that it will fit.
It has to be written on an int alignment boundry.
 
A

aegis

pete said:
No.
If the pointer to char, is not aligned correctly for an int,
then you have no assignment.
On systems that do have a requirement for alignment,
you can't just write an int value anywhere that it will fit.
It has to be written on an int alignment boundry.

I think you are confusing initialization with 'assignment'.
Also, I don't see how what you are saying follows.
Do you have a chapter and verse?
 
K

Keith Thompson

aegis said:
In what regard is it undefined behavior? The conversion will produce an
implementation defined value. Subsequent use by dereferencing it may
invoke undefined behavior though.

C99 6.3.2.3p7:

A pointer to an object or incomplete type may be converted to a
pointer to a different object or incomplete type. If the
resulting pointer is not correctly aligned for the pointed-to
type, the behavior is undefined.

It's likely, but not required, that the string literal "abc" will
happen to be aligned sufficiently for an int. Even if it isn't, it's
likely that no visible error will occur until and unless you attempt
to dereference the resulting pointer-to-int. But it's still UB.
 
J

Jack Klein

-As written- your statement ignores (char *) access, which -is-
legal (but makes no promises about what you get when you read
out internal padding bytes.)

You are quite wrong, and you snipped too much context to show it. Here
is the OP's code, replaced:
#include <stdio.h>

int main()
{
char *c ="abc";
int *i=(int*)c;
printf("%x", *i);
return 0;
}

The code is defining a string literal, then assigning the address of
that string literal via a cast to a pointer to int, then dereferencing
the int pointer.

That is undefined behavior pure and simple. Memory that is not an int
is being accessed as an int.
 
K

Kenneth Brody

Keith Thompson wrote:
[...]
char *c ="abc"; [...]
int *i=(int*)c;

This pointer conversion has undefined behaviour; for one thing, the
pointer to char may not be aligned correctly for an int!

In what regard is it undefined behavior? The conversion will produce an
implementation defined value. Subsequent use by dereferencing it may
invoke undefined behavior though.

C99 6.3.2.3p7:

A pointer to an object or incomplete type may be converted to a
pointer to a different object or incomplete type. If the
resulting pointer is not correctly aligned for the pointed-to
type, the behavior is undefined.

It's likely, but not required, that the string literal "abc" will
happen to be aligned sufficiently for an int. Even if it isn't, it's
likely that no visible error will occur until and unless you attempt
to dereference the resulting pointer-to-int. But it's still UB.

It only says that "if the resulting pointer is not correctly aligned"
is the behavior undefined. However, if the pointer is aligned, is
the behavior still "undefined", or is it "implementation defined"?

Suppose you had this:

struct foo
{
char str[4];
int i;
}
c = { "abc" };

...

int *i = (int *)c.str;

Doesn't the standard guarantee that the struct be aligned in such a
way that c.str (the first struct element) must be properly aligned for
an int?

Does this still invoke "undefined behavior"?

--
+-------------------------+--------------------+-----------------------------+
| Kenneth J. Brody | www.hvcomputer.com | |
| kenbrody/at\spamcop.net | www.fptech.com | #include <std_disclaimer.h> |
+-------------------------+--------------------+-----------------------------+
Don't e-mail me at: <mailto:[email protected]>
 
K

Kenneth Brody

Emmanuel said:
Jaspreet wrote on 16/09/05 :
I was recently asked this question in an interview. Unfortunately I was [...]
char *c ="abc";
int *i=(int*)c; [...]
The output is:
636261.

Once the UB's fixed, the behaviour is implementation-dependent (size of
an int, endianness, charset). The result is target-dependent. It can't
be explainded without knowing the details of the implementation.
I know that hex 61 is decimal 97 which is the ASCII code for a. hex 62
is code for b and so on. My query is why is it printing the ascii codes
in the reverse order.

Try on a Mac...
I apologise if this questios is a very basic one.

This question is idiotic. The interviewer is a idiot. The company that
hires such an idiotic is simply going to die. Try another place.

Perhaps the inverviewer was hoping for an answer along the lines of
"because the printer doesn't have a nasal-daemon font"? (Okay, that's
probably not the answer he was looking for.)

I don't know if I'd call the interviewer an "idiot". It may be that
this company writes device drivers for Windows boxes, and isn't really
interested in portability.

I'd have to see the rest of the questions before I'd pass judgement.

--
+-------------------------+--------------------+-----------------------------+
| Kenneth J. Brody | www.hvcomputer.com | |
| kenbrody/at\spamcop.net | www.fptech.com | #include <std_disclaimer.h> |
+-------------------------+--------------------+-----------------------------+
Don't e-mail me at: <mailto:[email protected]>
 
K

Keith Thompson

Emmanuel Delahaye said:
Jaspreet wrote on 16/09/05 :
I was recently asked this question in an interview. Unfortunately I was
not able to answer it and the interviewer made a decision on my C
strengths (or weekness) based on this single question and that was a
sad end to my interview. Here is the program:

#include <stdio.h>

int main()
{
char *c ="abc";
int *i=(int*)c;
printf("%x", *i);
return 0;
}

The output is:
636261.
[...]

This question is idiotic. The interviewer is a idiot. The company that
hires such an idiotic is simply going to die. Try another place.

Not necessarily. Even though the behavior is undefined (nasal demons
and all that), it's not unreasonable to ask for an explanation of why
the program might happen to behave in some particular manner. This
kind of analysis can be important in debugging existing code.

I might even ask that kind of question in an interview myself. The
interviewee most likely to get the job would be the one who gave an
answer something like this:

This is horrible code that should never have been written this
way in the first place. The conversion of c to int* potentially
invokes undefined behavior if the string literal isn't properly
aligned for an int. As far as the C language standard is
concerned, there's nothing more to be said. But ...

If the conversion doesn't blow up, the dereference on the following
line may still do so. Assuming that "works", it assumes that
int is 4 bytes (the size of the string literal including the
trailing '\0'); if int is smaller than 4 bytes it won't grab
the whole string literal, and if int is bigger than 4 bytes it
will access memory beyond the string -- undefined behavior again.
The value of *i depends on the character encoding used and on the
byte ordering used for type int. It's possible, but unlikely,
that the value of *i will be a trap representation; if so, this
is undefined behavior again.

"int main()" is acceptable, but "int main(void)" is better.

"i" is a lousy name for a pointer variable. "c" isn't great
either.

The program should write a newline at the end of its output;
if it doesn't, the output may not appear on some systems.

If the output of the program is "636261", you can *probably*
reach the following conclusions:

The system has 8-bit chars and 32-bit ints.

The string literal happens to be aligned properly to be accessed
as an int. This could mean either than it's 4-byte aligned,
or that the system doesn't require 4-byte alignment for ints.
We could find out which by changing the initialization of i to
"(int*)(c+1) and seeing whether it then blows up.

The system uses an ASCII encoding for characters ('a'==61, 'b'==62,
'c'=63).

The system uses a little-endian representation for integers
(it's likely to be an x86).

If you really want a program that does this (for some reason),
you should shift each byte value into the result. Using memcpy()
would avoid the alignment issue, but not some of the other
portability issues.

If this code occurs in production software, it should be fixed as
soon as possible. Since any bug fix risks introducing new bugs,
the fix will have to go through the full testing cycle before
being released. (You do have a formal testing cycle, right?)


I wouldn't expect an interviewee, especially for an entry-level job,
to pick up on *all* these points, but anyone who can explain
reasonably well both why the code is bad and why it might produce the
observed output is a promising candidate.

Looking at it from the other side, if I were given this question in an
interview, I'd try to cover most of these points (at least until the
interviewer tells me to stop). If this is actually what the
interviewer is looking for, that's a good sign. If he expected me
just to explain why the code produces "636261", assuming that's the
only correct answer, I'm more likely to look elsewhere. On the other
other hand, if he didn't know about the problems with the code but
seems willing to learn, that may be an even better sign.
 
K

Keith Thompson

Kenneth Brody said:
Keith Thompson wrote:
[...]
char *c ="abc"; [...]
int *i=(int*)c;

This pointer conversion has undefined behaviour; for one thing, the
pointer to char may not be aligned correctly for an int!

In what regard is it undefined behavior? The conversion will produce an
implementation defined value. Subsequent use by dereferencing it may
invoke undefined behavior though.

C99 6.3.2.3p7:

A pointer to an object or incomplete type may be converted to a
pointer to a different object or incomplete type. If the
resulting pointer is not correctly aligned for the pointed-to
type, the behavior is undefined.

It's likely, but not required, that the string literal "abc" will
happen to be aligned sufficiently for an int. Even if it isn't, it's
likely that no visible error will occur until and unless you attempt
to dereference the resulting pointer-to-int. But it's still UB.

It only says that "if the resulting pointer is not correctly aligned"
is the behavior undefined. However, if the pointer is aligned, is
the behavior still "undefined", or is it "implementation defined"?

Suppose you had this:

struct foo
{
char str[4];
int i;
}
c = { "abc" };

...

int *i = (int *)c.str;

Doesn't the standard guarantee that the struct be aligned in such a
way that c.str (the first struct element) must be properly aligned for
an int?

Does this still invoke "undefined behavior"?

The standard doesn't *directly* guarantee that c.str is correctly
aligned for an int, but it might be indirectly guaranteed. I can't
think of a way for a conforming implementation to avoid aligning c.str
properly for an int. This is only because str happens to be the first
member of the struct, and it needs to allow for arrays of struct foo.
It's cetainly not something I'd want to depend on. If I really wanted
to force alignment, I'd use a union -- or I'd use memcpy() so I didn't
have to worry about alignment at all.

So yes, if c.str is properly aligned for an int, the conversion
(int*)c.str does *not* invoke undefined behavior. But any attempt to
dereference the resulting int* value could invoke UB (if the
reinterpreted value happens to be a trap representation).
 
W

Walter Roberson

The standard doesn't *directly* guarantee that c.str is correctly
aligned for an int, but it might be indirectly guaranteed. I can't
think of a way for a conforming implementation to avoid aligning c.str
properly for an int. This is only because str happens to be the first
member of the struct, and it needs to allow for arrays of struct foo.
It's cetainly not something I'd want to depend on. If I really wanted
to force alignment, I'd use a union -- or I'd use memcpy() so I didn't
have to worry about alignment at all.

Or, to force aligment, malloc() the space, since malloc() is certain
to use the most restrictive alignment.
 
T

Tim Rentsch

[snip]
Suppose you had this:

struct foo
{
char str[4];
int i;
}
c = { "abc" };

...

int *i = (int *)c.str;

Doesn't the standard guarantee that the struct be aligned in such a
way that c.str (the first struct element) must be properly aligned for
an int?

Yes, the alignment requirements for a struct are at least as
restrictive as the alignment requirements for each of the
members of the struct; this implication must hold because
of the guarantees made by malloc(). The alignment of the
first struct element matches the alignment of the struct,
as you point out.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,022
Latest member
MaybelleMa

Latest Threads

Top