Of Pointers and Men (5)

Posted Aug 7, 2022

By Rancune

9 min read

This post is an automatic translation from French. You can read the original version here.

I’m sure you’re starting to understand arrays better by now. Today, I’d like to introduce you to dynamic arrays, and through them, two functions that will accompany you throughout your entire C learning journey: malloc and free.

Are you ready? Grab a coffee, settle in comfortably, and let’s dive into a new chapter!

Static arrays and dynamic arrays

Up until now, we’ve used the following syntax to declare an array:

<type> tableau[taille] ;

For example:

int tab[42] ;

This array is what we call a static array, because its size is supposed to be known at compile time. In the earliest versions of C, it was thus impossible to use a variable to specify the size of an array.

This changed with the C99 standard, which introduced this possibility with “variable length arrays”, or VLAs. This standard aims to make array usage more flexible by allowing code like this:

#include <stdio.h>

int
main() {
	int A ;
	scanf("%d", &A) ;
	int tab[A] ;
	return 0 ;
}

As you can see, we’re using a variable here to specify the array size!

However, while this is sometimes convenient, there are some constraints when using a VLA. For example, it’s not possible to declare such an array as a global variable, nor with the “static” keyword, because the standard is clear on this point:

6.7.6.2 Array declarators

	[...]

    2 If an identifier is declared as having a variably modified type,
	it shall be an ordinary identifier (as defined in 6.2.3), have no linkage,
	and have either block scope or function prototype scope. If an identifier
	is declared to be an object with static or thread storage duration, it
	shall not have a variable length array type.

This isn’t very surprising: it simply comes from the fact that the sizes of the data and bss sections, where global variables are stored, must be known as soon as the executable is loaded.

Furthermore, using a VLA causes a significant increase in the complexity of the assembly generated by our compiler:

$ gcc -g -o test main.c

$ objdump -S test
[...]
int main() {
    1145:       55                      push   %rbp
    1146:       48 89 e5                mov    %rsp,%rbp
    1149:       41 57                   push   %r15
    114b:       41 56                   push   %r14
    114d:       41 55                   push   %r13
    114f:       41 54                   push   %r12
    1151:       53                      push   %rbx
    1152:       48 83 ec 28             sub    $0x28,%rsp
    1156:       64 48 8b 04 25 28 00    mov    %fs:0x28,%rax
    115d:       00 00
    115f:       48 89 45 c8             mov    %rax,-0x38(%rbp)
    1163:       31 c0                   xor    %eax,%eax
    1165:       48 89 e0                mov    %rsp,%rax
    1168:       48 89 c3                mov    %rax,%rbx
    int A ;
        scanf("%d", &A) ;
    116b:       48 8d 45 b4             lea    -0x4c(%rbp),%rax
    116f:       48 89 c6                mov    %rax,%rsi
    1172:       48 8d 3d 8b 0e 00 00    lea    0xe8b(%rip),%rdi        # 2004 <_IO_stdin_used+0x4>
    1179:       b8 00 00 00 00          mov    $0x0,%eax
    117e:       e8 bd fe ff ff          call   1040 <__isoc99_scanf@plt>
        int tab[A] ;
    1183:       8b 45 b4                mov    -0x4c(%rbp),%eax
    1186:       48 63 d0                movslq %eax,%rdx
    1189:       48 83 ea 01             sub    $0x1,%rdx
    118d:       48 89 55 b8             mov    %rdx,-0x48(%rbp)
    1191:       48 63 d0                movslq %eax,%rdx
    1194:       49 89 d6                mov    %rdx,%r14
    1197:       41 bf 00 00 00 00       mov    $0x0,%r15d
    119d:       48 63 d0                movslq %eax,%rdx
    11a0:       49 89 d4                mov    %rdx,%r12
    11a3:       41 bd 00 00 00 00       mov    $0x0,%r13d
    11a9:       48 98                   cltq
    11ab:       48 8d 14 85 00 00 00    lea    0x0(,%rax,4),%rdx
    11b2:       00
    11b3:       b8 10 00 00 00          mov    $0x10,%eax
    11b8:       48 83 e8 01             sub    $0x1,%rax
    11bc:       48 01 d0                add    %rdx,%rax
    11bf:       b9 10 00 00 00          mov    $0x10,%ecx
    11c4:       ba 00 00 00 00          mov    $0x0,%edx
    11c9:       48 f7 f1                div    %rcx
    11cc:       48 6b c0 10             imul   $0x10,%rax,%rax
    11d0:       48 29 c4                sub    %rax,%rsp
    11d3:       48 89 e0                mov    %rsp,%rax
    11d6:       48 83 c0 03             add    $0x3,%rax
    11da:       48 c1 e8 02             shr    $0x2,%rax
    11de:       48 c1 e0 02             shl    $0x2,%rax
    11e2:       48 89 45 c0             mov    %rax,-0x40(%rbp)

    return 0 ;
    11e6:       b8 00 00 00 00          mov    $0x0,%eax
    11eb:       48 89 dc                mov    %rbx,%rsp

}
[...]

But then? Should we avoid them? Are we condemned to use fixed-size arrays?

Some people (and my professors were among them) will tell you yes. But if you’re used to reading me, you probably know that I don’t like overly categorical opinions on things.

VLAs have their uses, but you simply need to be aware of this complexity and ask yourself before using them whether it’s a problem or not in your code. And the question often boils down to “Do I really need this here?”. It’s up to you to answer that question on a case-by-case basis … :)

Another way for the programmer to create variable-size arrays is to use the “malloc” function.

Dynamic arrays: malloc and free

Using malloc requires including stdlib.h in your code: it’s part of the C standard library. You’ll find it on every OS, even the most obscure ones. Malloc allows you to make a request to this OS and ask it to reserve a certain amount of memory for you.

This is perfect timing because we saw, in the previous chapter of this series, that an array is precisely that: a memory space in which values are stored one after another. Here is a simple example of creating and using such an array:

#include <stdio.h>
#include <stdlib.h>

int
main()
{
	int* tab ;
	int i ;

	/* Allocation du tableau */

	tab = malloc( 10*sizeof(int) ) ;

	/* Utilisation */

	for (i=0; i<10; i++) {
		tab[i] = 0 ;
	}

	/* Liberation memoire */

	free(tab) ;

	return 0;
}

The malloc function reserves memory during program execution to store our values. This is what explains the term “dynamic arrays” that is generally used to describe this type of array.

First thing to observe here: what we request, we must return. That’s why any memory space reserved using “malloc” must be freed when we no longer need it using the “free” function.

Let’s now look at the use of malloc in detail:

tab = malloc( 10*sizeof(int) ) ;

Malloc takes only one argument: the amount of memory we want to obtain. Since we want to create an array of 10 integers, we used the expression “10*sizeof(int)” or, if you prefer, “10 times the size in bytes of an integer”.

What? Why? Yes, since an integer is 32 bits on our PC, we could have requested 40 bytes directly. It would have worked perfectly and the compiler wouldn’t even have complained. But we’re among civilized folk here, and we respect portability! Nothing in the C standard tells you that an integer is 4 bytes … And who knows? Maybe I’m recompiling your code to run it on my waffle iron, right? :)

Let’s now look at the value returned by malloc. Here’s what the man page says:

void *malloc(size_t size);

As you can see, malloc returns an untyped pointer, or void*. This is normal: we asked for a certain amount of memory, the OS assigned it to us. We didn’t tell it what we intended to do with it. We want to store this address in a variable, tab, of type int* so we can access the integers one after another.

To do things properly, we could do a cast: we tell the compiler that yes, we do want to convert the result of malloc to int* before putting it in our variable. In practice, this isn’t necessary because the void* will automatically be promoted to the correct type, and this could mask errors (for example if you forget to include stdlib.h). So we generally don’t cast malloc.

Like all requests, our memory request can be denied. It’s rare, but sometimes malloc can simply fail. Again, the man page shows us the way:

VALEUR RENVOYEE
       Les fonctions malloc() et calloc() renvoient un pointeur vers la memoire  allouee,  qui
       est  correctement  alignee  pour  n'importe quel type interne. Si elles echouent, elles
       renvoient NULL.

So we usually add a small test to check malloc’s return value, and act accordingly:

#include <stdio.h>
#include <stdlib.h>

int
main()
{
	int* tab ;
	int i ;

	tab = malloc( 10*sizeof(int) ) ;
	if ( tab == NULL ) {
		fprintf(stderr, "Malloc failed\n") ;
		return 1 ;
	}

	for (i=0; i<10; i++) {
		tab[i] = 0 ;
	}

	free(tab) ;

	return 0;
}

Leave free ( or die hard … )

To free the memory reserved by your program using malloc, you use the “free” function. Its usage is relatively simple: just pass it the address you got from malloc, and it will take care of freeing the memory.

It’s worth noting that the free function doesn’t return a value or an error: the C standard (C99 version here) is very clear on this:

The free function causes the space pointed to by ptr to be deallocated, that is,
made available for further allocation. If ptr is a null pointer, no action
occurs. Otherwise, if the argument does not match a pointer earlier returned by
the calloc, malloc, or realloc function, or if the space has been deallocated by
a call to free or realloc, the behavior is undefined.

Where you need to be careful is that the value contained in the variable tab has not been altered by the free function. Our tab still contains the address, but we’re no longer supposed to use it since we’ve freed that memory space. To avoid this very common mistake, it’s recommended to set this pointer to NULL to prevent any subsequent use.

What if I don’t free the memory? Is it serious?

Well … Not that much in practice! Because fortunately, the OS generally watches over you! It will take care, upon the death of your process, of ensuring that the memory allocated to you is properly freed. This is at least the case for all modern OSes I know of!

That said, just because forgetting a free won’t trigger a bloody massacre of kittens doesn’t mean you shouldn’t pay attention to it in your code. First of all, it generally remains an indicator of good code quality. Then, and this is probably more important, you might use a malloc inside a loop, which will reserve more and more memory, until it often exceeds the available resources on your machine.

In short, not freeing memory is bad practice!

Conclusion

The heat being what it is – such are the joys of blogging in August – I’ll stop here for today. We’ll see in the next article how dynamic arrays differ from static arrays.

See you soon!

Rancune.

code, c

c pointers

This post is licensed under CC BY 4.0 by the author.