Saturday, August 2, 2008

sizeof - the handy operator!

Yes, you read it right! sizeof is an operator and not a function as most newcomers would imagine. sizeof works at the compile time and not at the runtime.

Consider the following:
size_t i = sizeof (int);

By the time your executable is prepared (object file to be specific) the size of the operand, int, is already assigned to the variable i. Note that the operand can be an object, struct, union, array, type (int, char, long, etc), variable, pointer, etc.,

Let's look at some code

void main()
{
       char *a = "ABCDE";
       char b[] = "00000";
       struct {
              char c;
              short int d;
              char e;
              int f;
       }g;

       printf("%d %d %d %d\n", sizeof(*a), sizeof(a), sizeof(b), sizeof(g));
}

The above code on my machine outputs:
1 4 6 12

Let's analyze the output

sizeof(*a)
a is a pointer to a char. It is pointing to the first char of the string "ABCDE" stored somewhere in the memory (we actually know from older post where the string literals go!). *a would give me the char it is pointing to, A in our case. So, sizeof(*a) would actually mean size of A which is 1.

sizeof(a)
a is a pointer. It is a pointer to a char in our case, but it doesn't really matter what it points to as far as its size is concerned. Yes, did you know all pointers (pointer is just an address of something) in a process are of same size no matter what they are pointing to?

A pointer is of 4 bytes (32 bits) in a 32-bit process and 8 bytes (64 bits) in a 64-bit process. My machine happens to be a Windows XP 32bit and hence you see 4.

What is a 32-bit machine by the way? Simply put, it is a machine whose CPU registers are of 32 bits size. So the addresses (that can be stored in the registers) can also be of 32 bits. Hence, if you compile & link (build) a program for an x86 (another way of referring to 32-bit machines) machine, the addresses in the process will be of 32 bits. Similarly, if you build it for an x64 (another way of referring to 64-bit machines) machine, the addresses in the process will be of 64 bits.

sizeof(b)
b is an array. The compiler knows the size of the array at the compile time. sizeof(b) is 6. Wondering why not 5? Because the compiler reserves an extra char for the NULL terminator!

sizeof(g)
This one depends on the compiler padding, something that I don't intend to cover in detail in this post.

sizeof vs strlen()

So, when sizeof(a) only returns the size of a pointer and not the size of the string (sequence of chars) it is pointing to, how is one supposed to get the size of the string it is pointing to? strlen(const char *) function from the C runtime library is meant for this purpose. All it does is to start at the address specified and parse until it encounters a NULL character. What if you do not NULL terminate a string? Have you ever seen garbage being printed or your program crashing??

Interesting use of sizeof

Now, let's see where can one smartly use the sizeof operator. Consider the following code:

int myArray [] = {1000, 1001, 1002};
void main()
{
       for( int i=0; i<sizeof(myArray)/sizeof(myArray[0]); i++)
       {
              printf("%d\n", myArray[i]);
       }
}

sizeof(myArray) gives you the total bytes required for all the elements in myArray = 12.
sizeof(myArray[0]) gives you the size of the first (or each) element in the myArray = 4.
The division gives you 3 which is the number of elements in myArray!

If you have to add/delete elements into/from myArray, you can do so without having to change anything inside main()! Instead of sizeof, if you had used a macro or magic number to indicate the size, you would have required to change the macro as well everytime you change myArray! The above use of sizeof is a wonderful trick to avoid several programming errors!

Further, myArray could be an array of anything (not just int). You still can get the number of elements in it using the above logic.

Summary

Use of sizeof operator will prevent you from assuming and hardcoding the size of elements/datatypes. You never know what all platforms your code may get compiled for and there is no guarantee that the sizeof(g) is always going to be 12!