Down the Rabbit Hole: Dangerous logging

Today I will try to scare you:)
The most common realization of logging is something like that (simplified):

#include <stdio.h>

#ifdef _MSC_VER // workaround for MS VC
#define snprintf(b, bsz, f, ...)                \
        _snprintf_s(b, bsz, _TRUNCATE, f, __VA_ARGS__)
#endif
#define LOG_LEVEL 2
#define LOG(lvl, ...) do {                       \
    if (LOG_LEVEL < lvl) break;                  \
    char b[1024];                                \
    snprintf(b, sizeof(b) - 1, __VA_ARGS__);     \
    b[sizeof(b) - 1] = 0;                        \
    printf("%i! %s\n", (int)(lvl), b);           \
} while(0 == __LINE__)

It can be used in following way:

unsigned fp;
char *fname;
// ...
if (seekFailed)
{
    LOG(1, "Failed to seek to offset %u in file \"%s\"", fp, fname);
}

Looks safe at first glance, probably you can find something similar in your project. But lets look deeper. One of the drawbacks of logging is that it does not perform any useful work and that it is always on the most rare execution path (because usually you want to log only errors or other uncommon events). Therefore, log messages are often not tested and forgotten (in some sense it is like code comments - nobody likes irrelevant comments, but it is not a problem to find one). As a result, you can find:

LOG(1, "Failed to open file \"%s\", error %i", fname);

In that example, there is no error code passed (for %i). Bad, but relatively harmless. Just some junk value from stack or register will be logged as error code. Or something worse:

LOG(2, "Failed to rename \"%s\" to \"%s\"", oldName);

This example is very similar to previous one, except that junk from stack (or register, depending on calling convention used) will be used as pointer value (for second %s). In most cases that will lead application to crash with access violation at address. We can get more interesting example by adding large files support in the first example (with seekFailed). Variable fp probably will be declared as:

long long fp;

Oops, I will forget to replace %u in:

LOG(1, "Failed to seek to offset %u in file \"%s\"", fp, fname);

with something more appropriate (%llu). To understand why it is better not to do so, lets look on how parameters are passed to function with variable arguments list. Depending on calling convention used, parameters are placed one after another into registers and stack. Not to count particular features of each calling convention, we'll consider that all parameters are placed in abstract cells of the same size arranged side by side. So, when compiler generates function call code for function with variable arguments list it doesn't know how passed parameters will be used. All it knows is type (and therefore size) of each parameter. It has nothing to do but just place all passed parameters one by another in those cells, side by side. Every parameter will get its own cell (or cells if parameter is too large to fit in one cell). Each cell will get only one parameter (or its part), it is not possible for cell to contain two or more parameters or several parts of different parameters. Called function has access to those cells. But curiosity of situation is that called function doesn't know anything about passed parameters (even total number of parameters is mystery for it). It just gets array of raw cells (without knowing total count of cells). There for, function with variable arguments list requires some extra information to be able to handle passed parameters. Most popular example of such information is format string in functions of printf family, which contains rich information about passed parameters (type, total count and order is most important now). So, if that information about passed parameters is wrong, function will not be able to interpret content of cells properly (and therefor, entire program will not work properly). String "Failed to seek to offset %u in file \"%s\"" tells, that function will get two parameters - first of type int and second of type char *. In IA-32 (i386) both of parameters require one 32-bit cell. What will be wrong if instead of first int we'll pass long long? Size of long long in IA-32 is 64 bits. One cell will not enough for such large parameter, it will be placed in two adjacent cells, and second parameter (char *) will be placed in third cell. When function will interpret cells content it will use first cell (with first 32-bit part of fp variable) as first argument and second cell (with second 32-bit part of fp variable) as second argument (that is of type char *). The consequences of such mistake depends on value of fp:

if UINT_MAX >= fp, the second part of fp variable will be zero (it is enough 32 bits to represent numbers from 0 to UINT_MAX) and string "(null)" will be logged instead of file name
if UINT_MAX < fp, then second part of fp variable will be non-zero (probably it will not be a correct pointer on the allocated memory) and application will crash with access violation at address

Under AMD64 (x86-64) size of cell is 64 bits. Therefor every parameter will get exactly one cell (first cell for first parameter and second cell for second parameter), as it was supposed. But only 32-bit part of fp will be logged, since function still thinks that first cell contains int.

Yeh, such a horror story.

Down the Rabbit Hole

Oct 21, 2009

Dangerous logging

No comments:

Post a Comment

About Me

Blog Archive

Labels