Thursday, July 27, 2006
By: Jason Doucette
Probably every C / C++ programmer has used a variable argument list function, also known as variadic functions. The printf() series of functions are an example of these. They are functions which can accept a variable (non-constant) number of arguments, where even the data type of these arguments is not known. These arguments always follow one or more known parameters. The information passed to the function can differ on each call for the variable arguments.
Technical Details
How do these types of functions work? How do they accept an unknown number of parameters whose data types are also not known?
The function knows where the last regular parameter is on the stack. It also knows its size. With this information, we can determine the beginning of the unknown parameters that follow it. We can store a pointer that points to this location. All we know at this time is that the pointer is pointing to the first unknown parameter. We have no idea what data it is pointing to, or how large it is.
This implies that all of these functions must have at least one regular parameter before the variable arguments. You need this parameter, even if it is not used for anything else (a dummy parameter), to allow the function to know where the variable arguments begin.
So, we have a pointer to the beginning of the variable arguments data. How do we know what exists there? What did the caller pass in? We cannot possible know this information unless something tells us. In the case of printf(), a format string is passed as one of the regular parameters. By passing this string, we can determine what you have passed to printf() for the variable arguments. During the parsing, if we run into "%I" or "%d", we knows that a 32-bit signed integer awaits us at the variable argument pointer. We can grab this data, and move the pointer ahead 32-bits (4 bytes). The pointer now points to the next variable argument. If we run into "%s" next, we know the next variable argument is a string. After dealing with the string data, we can move the pointer to the first byte after the string. It now, again, points to the next unknown variable argument.
Implications
It is up to the caller to ensure the format string matches the data passed in. If you tell the function that the first variable argument is an integer, and you give it a string, the function will deal the data bits passed in as an integer. Thus, the output will be incorrect. There is no way printf() can know you passed in the wrong information.
Why is there no type checking? Because there is nothing to check the types against. You can pass anything you desire to a variadic function (for the variable arguments portion of it) and it can be different for each call. The compiler has no idea how to parse your format string to see if your parameters match it. Even if a compiler was specifically coded to type check printf() functions (by parsing their format strings), this functionality would break if it encountered a different function also named printf(), and the compiler could not possibly know in advance about proprietary functions you have coded yourself.
How Does printf() Deal With %f ?
If printf() has to know the size of the variable arguments to be able to 'find' the remainder, how does it know the size of the floating point type passed in when you use the "%f" format specifier? It works for both float (32-bit) and double (64-bit) floating point data types. One is 4 bytes large, and one is 8 bytes large. How much do we increment the pointer to find the next variable argument? How do we deal with the 4 or 8 bytes worth of data if we don't know which it is?
Maybe it can tell by just looking at the first 4 bytes, and by knowing intimate details of floating point storage, it can discern between float and double. No, because floating point values are efficiently stored, unlike BCD (binary coded decimal) values, in which they use just about all possible combinations of bits. It is impossible to distinguish between float and double by looking at the first 4 bytes of each.
So, how does it know?
Magic?
No. The answer comes from ‘C++ Language Reference - Variable Argument Lists’:
“When arguments of type char are passed as variable arguments, they are converted to type int. Similarly, when arguments of type float are passed as variable arguments, they are converted to type double. Arguments of other types are subject to the usual integral and floating-point promotions. See Integral Promotions for more information.”
If you take a look at the printf() Type Field Characters, you will see that it states that all of the floating point types are doubles. "%f" tells printf() that you are passing in a double. It is the compiler that ensures that you actually do so, even if you try to pass in a float, instead.
Additional Information
- The va_arg(), va_end(), and va_start() functions allow you access variable argument lists.
- There exists compilers that check the format string for printf(), scanf() and variants.
About the Author: I am Jason Doucette of Xona Games, an award-winning indie game studio that I founded with my twin brother. We make intensified arcade-style retro games. Our business, our games, our technology, and we as competitive gamers have won prestigious awards and received worldwide press. Our business has won $190,000 in contests. Our games have ranked from #1 in Canada to #1 in Japan, have become #1 best sellers in multiple countries, have won game contests, and have held 3 of the top 5 rated spots in Japan of all Xbox LIVE indie games. Our game engines have been awarded for technical excellence. And we, the developers, have placed #1 in competitive gaming competitions -- relating to the games we make. Read about our story, our awards, our games, and view our blog.