Saturday, August 2, 2008

sizeof - the handy operator!

Yes, you read it right! sizeof is an operator and not a function as most newcomers would imagine. sizeof works at the compile time and not at the runtime.

Consider the following:
size_t i = sizeof (int);

By the time your executable is prepared (object file to be specific) the size of the operand, int, is already assigned to the variable i. Note that the operand can be an object, struct, union, array, type (int, char, long, etc), variable, pointer, etc.,

Let's look at some code

void main()
{
       char *a = "ABCDE";
       char b[] = "00000";
       struct {
              char c;
              short int d;
              char e;
              int f;
       }g;

       printf("%d %d %d %d\n", sizeof(*a), sizeof(a), sizeof(b), sizeof(g));
}

The above code on my machine outputs:
1 4 6 12

Let's analyze the output

sizeof(*a)
a is a pointer to a char. It is pointing to the first char of the string "ABCDE" stored somewhere in the memory (we actually know from older post where the string literals go!). *a would give me the char it is pointing to, A in our case. So, sizeof(*a) would actually mean size of A which is 1.

sizeof(a)
a is a pointer. It is a pointer to a char in our case, but it doesn't really matter what it points to as far as its size is concerned. Yes, did you know all pointers (pointer is just an address of something) in a process are of same size no matter what they are pointing to?

A pointer is of 4 bytes (32 bits) in a 32-bit process and 8 bytes (64 bits) in a 64-bit process. My machine happens to be a Windows XP 32bit and hence you see 4.

What is a 32-bit machine by the way? Simply put, it is a machine whose CPU registers are of 32 bits size. So the addresses (that can be stored in the registers) can also be of 32 bits. Hence, if you compile & link (build) a program for an x86 (another way of referring to 32-bit machines) machine, the addresses in the process will be of 32 bits. Similarly, if you build it for an x64 (another way of referring to 64-bit machines) machine, the addresses in the process will be of 64 bits.

sizeof(b)
b is an array. The compiler knows the size of the array at the compile time. sizeof(b) is 6. Wondering why not 5? Because the compiler reserves an extra char for the NULL terminator!

sizeof(g)
This one depends on the compiler padding, something that I don't intend to cover in detail in this post.

sizeof vs strlen()

So, when sizeof(a) only returns the size of a pointer and not the size of the string (sequence of chars) it is pointing to, how is one supposed to get the size of the string it is pointing to? strlen(const char *) function from the C runtime library is meant for this purpose. All it does is to start at the address specified and parse until it encounters a NULL character. What if you do not NULL terminate a string? Have you ever seen garbage being printed or your program crashing??

Interesting use of sizeof

Now, let's see where can one smartly use the sizeof operator. Consider the following code:

int myArray [] = {1000, 1001, 1002};
void main()
{
       for( int i=0; i<sizeof(myArray)/sizeof(myArray[0]); i++)
       {
              printf("%d\n", myArray[i]);
       }
}

sizeof(myArray) gives you the total bytes required for all the elements in myArray = 12.
sizeof(myArray[0]) gives you the size of the first (or each) element in the myArray = 4.
The division gives you 3 which is the number of elements in myArray!

If you have to add/delete elements into/from myArray, you can do so without having to change anything inside main()! Instead of sizeof, if you had used a macro or magic number to indicate the size, you would have required to change the macro as well everytime you change myArray! The above use of sizeof is a wonderful trick to avoid several programming errors!

Further, myArray could be an array of anything (not just int). You still can get the number of elements in it using the above logic.

Summary

Use of sizeof operator will prevent you from assuming and hardcoding the size of elements/datatypes. You never know what all platforms your code may get compiled for and there is no guarantee that the sizeof(g) is always going to be 12!

Sunday, July 27, 2008

volatile - tell the compiler not to optimize!

Let's re-visit the code used in my previous post:

void foo()
{
    const int i = 0;
    int *j = (int *) &i;
    *j = 1;
    printf("%d\n", i); // prints 0
    printf("%d\n", *j); // prints 1
}

Remember, the compiler doesn't expect the code to modify a const? However, we can give it a "hint" that the const can still get modified (by your own code if the const is local or by an external source) and ask the compiler not to optimize the usage of the const (by replacing it with its value wherever it is used).

volatile is the keyword that forces the compiler to not optimize the usage of the const. Now, note the output of the following code:

void foo()
{
    volatile const int i = 0;
    int *j = (int *) &i;
    *j = 1;
    printf("%d\n", i); // prints 1
    printf("%d\n", *j); // prints 1
}

Also, note that usage of volatile alongside const doesn't mean that the compiler will allow an int pointer to point to a const int without a typecast. In other words, the typecast in the following statement is still required and not using it will cause a compiler error:
int *j = (int *) &i;

To summarize, the volatile keyword will cause the compiler to shed any optimizations with the usage of the variable and generates code to fetch the value from the associated memory location everytime the code refers to the variable.

Tuesday, July 15, 2008

Heap, Stack, Static, Global Variables & Constants

A typical C or C++ program would be made up of global variables, local variables, static variables, constants, string literals, arrays, dynamic allocations, etc. It is extremely important to understand where all that data is actually stored in the memory. It will help a great deal in debugging any application!

Here is a sample program that declares a few local, global, static variables, arrays, consts and also allocates memory dynamically. It prints out the addresses of the variables, the string literal and dynamically allocated memory. Include your favorite headers to compile :-)

const int a = 0;
char *b = "String Literal";

int c;
char d[10];
static int e;

void main()
{
    static int f;

    int g;
    int h[2];
    const int i= 0;

    int *j = (int*) malloc(sizeof(int));
    int *k = new int;

    printf("// global constant & string literal\n");
    printf(" const int a \t%d\n char *b \t%d\n\n", &a, b);

    printf("// global variable, global array, static variables\n");
    printf(" int c \t\t%d\n char d[] \t%d\n static int e \t%d\n static int f \t%d\n\n", &c, d, &e, &f);

    printf("// local variable, local array, local const\n");
    printf(" int g \t\t%d\n int h[] \t%d\n const int i \t%d\n\n", &g, h, &i);

    printf("// malloc, new\n");
    printf(" int *j \t%d\n int *k \t%d\n\n", j, k);

    free(j);
    delete k;
}

The output of the above program on a Windows machine would look like this (the addresses would be different on a different machine):



Looking at the addresses printed, it can be said that:
* Global constants and String literals are stored together. Further, the memory they are stored in is read-only and any write operation will cause a runtime error!
* Global variables and static variables (even if they are declared as static inside a function) are stored together. They are placed in a memory region that is initialized at the beginning of the process and is maintained till the end of the process.
* malloc function and new operator dynamically allocate memory from another section called Heap.
* Local variables are stored in yet another location called Stack.

Interestingly, even the constants defined within a function go onto the Stack. However, there is no read-only section on the Stack. Which means the local variable defined as "const" inside a function is not really protected againt write operations the way the global consts are protected. So, you could still modify the "const int" defined inside a function by assigning its address to an int * and typecasting it as such like below:

void foo()
{
    const int i = 0;

    // Typecast to avoid the compiler error.
    int *j = (int *) &i;

    // You can actually modify the const! Now try making "i" global.
    *j = 1;
    printf("%d\n", *j);
}

Wondering why I chose to use *j instead of i in the printf()? That is because printf("%d\n", i) would still print 0! Can you guess why? That is because i is declared as const and the compiler replaces i with 0 (the initial value) when it generates the assembly. After all, it doesn't expect the code to modify a const!

Saturday, July 12, 2008

Who can benefit from WinCoding?

Are you a Computer Science student?
Do you program in C, C++?
Preparing for that FIRST job in Software Industry?
Want to explore coding on Microsoft Windows platform?
Wish to know which tools can help debug the problems in your application?
...
To those and many others, this blog will help! Quite often, I notice students are in a hurry to finish the course content and move on to the next semester. In the process, they fail to grasp the fundamentals that turn out to be the building blocks of the Industry, Software in our case. Little do they realize the same fundamentals will haunt them few months down the line when they face a competitive examination for higher education or appear for an interview!

Further, software companies spend a great deal on training the half-baked freshers before deploying them on live projects. This blog will help plug the gap in the learning thereby benefitting both the readers and the employers by keeping the readers "Industry Ready".

I would appreciate your comments and feedback very much. Please be patient while I pick relevant topics. Keep visiting.

Best of luck!
WinCoding