Pointers and Strings – Array of Characters

In our previous tutorials about pointers and arrays, we have been primarily looking at how to handle integer arrays with pointers. however, in this guide, we will move ahead and learn about character arrays with pointers that will basically revolve around the concept of strings.

Strings Introduction

Strings are a group or sequence of characters which is stored at a contiguous memory location. In other words, string is an array of characters. It includes characters including alphabets, numbers, and all other types of characters. For example:

  • “Hello”
  • “This is a string”
  • “1234”

Character arrays are very important because we use them as a string in C. Because in C language string data type is not available by default. We can use strings to perform operations such as modifying, copying, concatenating, etc. To be able to efficiently work with strings in C, there are a few things that you need to understand.

Storing strings in character arrays

To be able to store a string in a character array, the first requirement is that the array should be large enough to accommodate the whole string. A large enough character array is such that it has a size greater than or equal to the number of characters in the string plus 1.

Size of array ≥ Number of characters in the string + 1

For example, if our string is “HELLO” consisting of five alphabetical characters then the size of the array will be ≥ (5+1)=6.

Although the string consists of 5 characters but we need space at least of 6. This is because we need to store the information specifying the last character of a string. To understand this in a better way let us take an example of storing the string “HELLO” in a character array:

If we will declare a character array of size 5, it will be able to store all the characters in the string “HELLO”. ‘H’ will go to the zeroth index, ‘E’ will go to the first index, ‘L’ will go to the second index, the next ‘L’ will go the third index and ‘O’ will go to the fourth index respectively.

char X[5];
X[0]='H';
X[1]='E';
X[2]='L';
X[3]='L';
X[4]='O';

Let us now assume that we had the same character array but of size 8. The figure below shows the logical view of our array X.

Pointers and Strings pic1

We will store the string “HELLO” in this particular array:

Pointers and Strings pic2

As you may notice we have stored all the characters of the string “HELLO” in this array. The three indices 5th, 6th and 7th will be filled with garbage values. However, one vital information is missing. We did not mention that the ‘O’ at the 4th index is the last character of our string. Hence to denote the last character in the string we use a NULL character.

We store a NULL character as the last character in the string. A NULL character has an ASCII value of 0. It is denoted by a forward slash with 0 like ‘\0’. Hence X[5] = ‘\0’;

Pointers and Strings pic3

All the functions for string manipulation in C except that the strings will terminate with a NULL character.

Strings as a Array of Characters Example in C

Let us look at an example code to demonstrate this concept.

#include <stdio.h>
#include <stdlib.h>

int main(){
char X[5];
X[0]='H';
X[1]='E';
X[2]='L';
X[3]='L';
X[4]='O';
printf("%s",X);
}

Here we have taken a character array of size 5 and filled in all the characters. No space is used to null terminate it. Then we are printing this array as an output.

Now let’s see the code output. After the compilation of the above code, you will get the following output.

Pointers and Strings pic4

As you may notice that the “HELLO” string is being printed but some garbage values are also found alongside it. This is happening because we did not null terminate our string.

If we change the size of the character array to 6 and add a null character at the 5th index then we will output the correct output.

#include <stdio.h>
#include <stdlib.h>

int main(){
char X[6];
X[0]='H';
X[1]='E';
X[2]='L';
X[3]='L';
X[4]='O';
X[5]='\0';
printf("%s",X);
}

Now let’s see the code output. After the compilation of the above code, you will get the following output.

Pointers and Strings pic5

If we change the size of the array to a number greater than 6 then still we get the same correct output. This is because of the presence of the NULL character.

Further Examples

The string.h library has a handful of functions for string manipulation. Now lets find out the length of the string using the strlen() function from the string.h library. Use the following statement to find the length of the string stored in the character array ‘X’.

int length = strlen(X);
printf("Length of the string is: %d\n",length);

The complete code is given below:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(){
char X[10];
X[0]='H';
X[1]='E';
X[2]='L';
X[3]='L';
X[4]='O';
X[5]='\0';
int length = strlen(X);
printf("Length of the string is: %d\n",length);
}

Now let’s see the code output. After the compilation of the above code, you will get the following output.

Pointers and Strings pic6

Even though the size of the array is 10 but the length of the string is 5. Thus, the string length function also counts till it spots a NULL character.

In our program, instead of writing the characters individually at their appropriate positions, we can also initialize the array by using string literals as shown below:

char X[10] = "HELLO";

String literals are a group of characters within double quotation marks. The null termination for the string literal is implicit. So it will always be stored with a NULL termination in the memory.

Additionally we can also initialze the character array as follow:

char X[] = "HELLO";

In this case, the size of the character array ‘X’ will be set to 6 bytes where 1 bytes stores one character. This is the minimum size required for our character array with 5 characters.

Now let us try to print the size in bytes of this character array using the sizeof() function.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(){
char X[] = "HELLO";
printf("Size in bytes: %d\n",sizeof(X));
int length = strlen(X);
printf("Length of the string is: %d\n",length);
}

After the compilation of the above code, you will get the following output.

Pointers and Strings pic7

As you will see the size of the array in bytes is 6 as space has been allocated for 6 characters but the length is 5. This is because the NULL character is not included in the length.

However, if we initialize the character array with a smaller size e.g. in our case less than 6 then we will get a compilation error. This is because the compiler will force this particular array to be of minimum size 6.

Additionally, we can also initialize the character array by using curly brackets and putting all the characters inside them, separated by commas. However, in this case the NULL termination will not be implicit. You will have to induce the NULL character inside the braces as well. This can be seen below:

char X[6] = {'H','E','L','L','O','\0'};

Strings and Pointers in C

Let us declare a character array Y of size 4. We will initialize it with the string literal “BYE”.

char Y[4] = "BYE";

The figure below shows how it is stored in the memory:

Pointers and Strings pic8

As arrays are stored in one contiguous block of memory so we can say for example the first character gets stored at the address 100. One character is one byte in size, so the next character will be at address 101, 102 and so on. Y is the variable name for this whole array.

Let’s declare a variable ‘ptr’ which is a pointer to a character.

char* ptr;

A pointer variable in atypical architecture is stored in 4 bytes. For example, this variable has the address 200.

Pointers and Strings pic9

Writing the following statement we can equate the pointer to a character with a character array. This statement is valid.

ptr = Y;

Just using the name of the array returns the address of the first element in the array. Thus, this statement will allot the address 100 to ptr. Therefore ptr will now point to the first element in the array.

Pointers and Strings pic10

We can use this variable ‘ptr’ which is a character pointer just like Y to read and write into the array.

If we print ptr[1], then the output will be the character at the first index i.e. ‘Y.’

We can even modify the elements of the array using this ptr variable. The following line: ptr[0] = ‘D’; will modify the character at the 0th index. This way the whole string will be changed to “DYE”.

When we write ptr[i] for any position i, it is the same as *(ptr+i). As ptr is the base address, (ptr+i) will take us to the address of the i-th element. So in this case lets say (ptr+2) will be 102 and if we put * operator in front of it, we are basically dereferencing and finding out the value.

Hence: ptr[i] = *(ptr+i)

Even if it is the array name we can still write these two statements as equivalents:

Y[i] = *(Y+i)

This is how we use arrays and pointers to read and write.

As we have seen above ptr = Y is a valid statement however we can not say Y = ptr. This statement is not valid. It does not make sense and will give us a compilation error. Additionally, we can not increment/decrement this variable Y as well.

Traversing a string using pointers

We can increment/decrement ‘ptr’ which is a pointer variable. ptr = ptr+1 is a valid statement in this case. This will cause ptr to point to the next element in the array. Now ptr will now become 101 instead and point to the second element. To traverse an array, we will run a loop and use a local variable for example ‘i’ to increment it in the loop. If we have a pointer variable, we can keep on incrementing the pointer and hence we will be able to traverse the list.

Pointers and Strings pic11

Arrays are always passed to a Function by Reference

When we pass an array to a function to only pass the base address of the array in a pointer variable. We do not pass the whole copy of the array. Let us go through some sample example code to understand it in a better way.

#include <stdio.h>

void print(char* array)
{
    int i=0;
    while(array[i]!='\0')
    {
        printf("%c",array[i]);
        i++;
    }
    printf("\n");
}
int main(){
char array[10] = "Welcome";
print(array);
}

We have declared a character array of size 10. It has the string literal “Welcome” of length 7 stored in it.

char array[10] = "Welcome";

As we are using the string literal here thus, the null termination is implicit.

We will print this array in the main() without using the printf() function. We will create our own print function and pass the array as a parameter inside it.

print(array);

This function will print the string part in the character array. The argument to the function according to the compiler will be the address of the character array. As arrays are larger in size thus it is inefficient to create a copy of the same array for each function. This print function that we are creating does not know the size of the array. It only knows the base address of the array. So we will declare a variable ‘i’, initialize it to zero and use it in a while statement. While array[i] is not equal to NULL character, we will print the character array[i] and also increment i. Once we reach the NULL character, we will come out of this loop.

void print(char* array)
{
    int i=0;
    while(array[i]!='\0')
    {
        printf("%c",array[i]);
        i++;
    }
    printf("\n");
}

After the compilation of the above code, you will get the following output.

Pointers and Strings pic12

Additionally, we can also replace the variable ‘i’ we created in the print() function and use only the name of the character array with dereferencing (*array) to access the elements. This can be seen below:

void print(char* array)
{
    while(*array!='\0')
    {
        printf("%c",*array);
        array++;
    }
    printf("\n");
}

This function gives the same result as the one that we previously defined with the integer variable ‘i’.

What happens in the system memory?

Now, let’s look into what happens in the system’s memory when this code runs. The memory that is allocated for the execution of a program is typically divided into these four sections shown below.

Pointers to function arguments Application memory

One part of the memory stores the instructions in the program known as the Code segment. The next segment stores the global variables. The stack segment is where all the information regarding the function call execution and all the local variables are found when the code runs.

For example purposes, we will use the same code used in the previous example where we created our own print function to print the character array as an output.

#include <stdio.h>

void print(char* array)
{
    while(*array!='\0')
    {
        printf("%c",*array);
        array++;
    }
    printf("\n");
}
int main(){
char array[10] = "WELCOME";
print(array);
}

When this program will start executing, first the main() will be invoked. Whenever a function is called some amount of memory from the stack is allocated for the execution of that function. This is known as the stack frame of that function. For example the stack frame from starting address 300 to 350 is allocated for the main() function in one contiguous block of memory. In this stack, the memory increases from bottom to the top. All the local variables of the function will be found in the stack frame of the function. So, when we declare the character array of size 10, 10 bytes from the stack frame will be allocated for this particular character array. Lets suppose they are allocated from the address 300 to 310. Each character is stored in 1 byte so we need 10 bytes for this character array of size 10.

Pointers and Strings picture 15

Apart from the local variables there may be more information in the stack frame that is why some space is still left in the main() function’s stack frame. After this, the control goes to the following print statement.

print(array);

As soon as we make a call to another function from a function, the execution of that particular function is paused at that particular line. The system goes on to execute the called function. This called function gets allocated a stack frame on top of the calling function. Whatever function is at the top of the stack at any point is executing. We will wait for this function to finish then main() will be resumed. As print() is executing, it will have a local variable ‘array’ in its stack frame. However this will be a pointer variable. A pointer variable takes 4 bytes of memory in a typical architecture so this will be found at lets say at starting address 354. It has 4 bytes allocated in the stack frame.

Pointers and Strings picture 14

Notice that this ‘array’ in the print function is not the same ‘array’ in the main() function. Both of these have different scopes. When we make a call to print and pass array as the argument inside it in the main() function, it is the address 300 which is the base address of the array. This is passed to the print() function and the print function stores it in the pointer variable ‘array.’

Pointers and Strings picture 13

Sometimes it may confuse us if we are using the same local variable name in the calling function and the same argument name in the called function. You must understand that both of them are different.

The figure below shows the character array, ‘array’ of size 10 with its elements stored at the respective indices. The addresses are increasing towards the right. The eighth character is NULL whereas the first seven characters are the characters of the word ‘Welcome.’ The rest of the blocks are filled with garbage values.

Pointers and Strings picture 16

Now we have the ‘array’ variable from the print() function. It is a character pointer at address 354 that stores the address 300. Thus, it points to the first element of the array.

Pointers and Strings picture 17

The array in green is local to the main and the array in blue is a character pointer local to the print() function.

Now let us look at what happens in the system’s memory when the while loop starts in the print() function.

void print(char* array)
{
    while(*array!='\0')
    {
        printf("%c",*array);
        array++;
    }
    printf("\n");
}

Here, we are saying that while *array is not equal to NULL character continue the while loop. When we put the * operator in front of a pointer variable we are actually looking at the value at that particular address. So at this stage when ‘array’ is pointing to the base address of *array which is ‘W’ so the NULL condition in the while loop is not true. Thus, we will move ahead to the next line where we are printing this element (*array) using printf(). The output will be ‘W.’

Then we are incrementing array by one. This is pointer arithmetic. As we are incrementing the pointer by one unit hence the address increments by the size of the data type that the pointer points to. array here is a pointer to a character data type. Character data type is 1 byte so array+1 is like saying array=array+1. So, array now becomes 301 and it is now pointing to the second element in the array which is ‘E’.

Pointers and Strings picture 18

Once again we come to the verifying condition in the while loop. Now *array is equal to ‘E’ which is not the NULL character so we will go inside the loop and print ‘E.’

We will keep on going like this until the address in the pointer variable reaches 307.

Pointers and Strings picture 19

Here the value at this particular address is a NULL character so the loop will not execute. We will go out of the loop and print the following statement which denotes the end of the line:

printf("\n");

Thus, the execution of the print() function will finish. So the particular stack frame for print() will be cleared from the stack.

Pointers and Strings picture 15

Now the main() function will resume and finish its execution.

Further Modification in Code

Let us now modify this particular code and learn a few more concepts from it.

#include <stdio.h>

void print(char* array)
{
    while(*array!='\0')
    {
        printf("%c",*array);
        array++;
    }
    printf("\n");
}
int main(){
char *array="Welcome";
print(array);
}

Instead of creating a character array of size 10, we will create a character pointer named ‘array’. We will equate it to a string literal in a statement like this:

char *array="Welcome";

After the compilation of the above code, you will get the same output as before.

Pointers and Strings pic12

In previous examples, when we used a string literal in the initialization statement of an array then the string got stored in the space allocated to that array. It went into the stack in the character array of size 10. But if we use the string literal elsewhere in a statement like this char *array=”Welcome” then in this case the string gets stored as a constant during the compile time. In most cases, it will get stored in the code segment of the system’s memory. However, you will not be able to modify the string as we could previously.

  • If we want to modify the elements in the array we can do so by first initializing the array as a string literal. We have a character array and we are passing the address of the array to a function, then that function receives it in a character pointer. Using this pointer we can modify the data in this particular array. Suppose we want to change the first character to ‘A’ we will do so in the following way:
#include<stdio.h>
#include<string.h>
void print(char *array)
{   array[0] ='A';
    while(*array!='\0')
    {
        printf("%c",*array);
        array++;
    }
    printf("\n");
}
int main(){
char array[10]="Welcome";
print(array);
}    

After the compilation of the above code, you will get the following output.

Pointers and Strings picture 20

Notice that the first element has been changed from ‘W’ to ‘A’.

  • If we want a function just to read a string and not write anything we will have to change the argument to const character pointer of the print() function.
void print(const char *array)
{
    while(*array!='\0')
    {
        printf("%c",*array);
        array++;
    }
    printf("\n");
}
int main(){
char array[10]="Welcome";
print(array);
}

We will be able to read the elements in the array however we will not be able to modify it anymore.

Leave a Comment