7

I was experimenting with pointer manipulation and decided to try converting an array of numbers into an integer by directly copying from memory using memcpy.

char aux[4] = {1,2,3,4}; 
int aux2 = 0;
memcpy((char*) &aux2, &aux[0], 4);
printf("%X", aux2);

I expected the result to be 0x1020304 since I'm copying the exact bytes from one to another, but printf gives me the result 0x4030201, which is almost my desired output, only backwards. Why does this happen and is there a way to get the result in the "correct" order?

5
  • 9
    Endianness Commented Feb 9, 2021 at 18:55
  • 2
    You expected wrong. Your CPU (ISA) uses a different order. Commented Feb 9, 2021 at 18:57
  • You're on a little endian architecture, where the least significant bytes come first in memory (at lower addresses). Commented Feb 9, 2021 at 18:57
  • %X is only for printing unsigned int -- you should make aux2 unsigned Commented Feb 9, 2021 at 20:52
  • I wrote FAQ answer about this the other week here: What is CPU endianness? Commented Feb 10, 2021 at 9:29

1 Answer 1

9

Your code has at best implementation defined behavior and in some cases undefined behavior.

Type int may have a size different from 4: on 16-bit systems, int typically has a size of only 2 bytes. You would have undefined behavior on such systems.

On regular 32-bit systems, int has 4 bytes, but the order in which the 4 bytes are stored in memory is implementation defined, a problem referred to as endianness:

  • some systems use big-endian representation, where the first byte is the most significant part of the integer. Bytes 01 02 03 04 represent the value 0x01020304 on big-endian systems, such as older Macs, some mobile phones and embedded systems.

  • conversely, most personal computers today use little-endian representation, where the first byte contains the least significant part of the integer. Bytes 01 02 03 04 represent the value 0x04030201 on little-endian systems, such as yours.

  • The C Standard does not exclude other representations, where bytes would be in some other order. This was the case on some ancient DEC systems: the PDP-11, where the C language was originally developped (middle-endian or mixed-endian).

Albeit surprising, the little-endian order is very logical as the byte at offset n contains the bits representing values between 2n*8 and 2n*8+7. Endianness is a cultural issue, both choices seem natural to long time users.

The same variations are found in other contexts, such as the ordering of date components:

  • Japan uses big-endian representation: February 17 2021 is written 2021.02.17,

  • Europe uses little-endian representation: February 17 2021 is written 17/02/2021,

  • The USA use a middle-endian representation: February 17 2021 is written 02/17/2021.

  • 21 is pronounced twenty-one in English (big-endian) whereas Germans say einundzwanzig (one and twenty, little endian and actually middle-endian for 3-digit numbers). But then 17 is seventeen (little-endian) and in French dix-sept (big-endian).

  • Western languages write numbers in big-endian format (I am 42 years old) but semitic scripts use little-endian order: Hebrew (אני בת 42) and Arabic (أنا ٤٢ سنة) both use little-endian as they are read from right to left.

Here is a more portable version to test memory representation:

#include <stdio.h>
#include <string.h>

int main() {
    unsigned int aux2 = 0x01020304;
    unsigned char aux[sizeof(unsigned int)]; 
    memcpy(&aux, aux2, sizeof(aux));
    printf("%X is represented in memory as", aux2);
    for (size_t i = 0; i < sizeof(aux); i++)
        printf(" %02X", aux[i]);
    printf("\n");
    return 0;
}
Sign up to request clarification or add additional context in comments.

7 Comments

Nice answer. Detail: "English (big-endian)" --> English numbers have inconsistent endiand as in 17 "seven-ten".
OK, so what endianness is the French pronunciation of "80"? ;-)
@AndrewHenle Or: 97: "quatre-vingt-dix-sept" --> 4*20 10 7.
"both use little-endian as they are read from right to left." --> Hmmm, I do not see endian as a right-left vs. right-left issue, but a "what is read/spoken/encoded first issue.
@chux-ReinstateMonica: 97 is a good one :) still big-endian, but using base-20, a system with many examples in ancient and current history known as Vigesimal
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.