Nov 8, 2009

Structure and class methods have external linkage

Lets look on small example:
File f01.cpp:
#include <stdio.h>
struct TEST
{
    TEST() { printf("TEST from f01: created\n"); }
    unsigned m;
};
void f01()
{
    TEST t;
    printf("TEST from f01: sz=%u\n", (unsigned)sizeof(t));
}
File f02.cpp:
#include <stdio.h>
struct TEST
{
    TEST() { printf("TEST from f02: created\n"); }
    unsigned m1;
    unsigned m2;
};
void f02()
{
    TEST t;
    printf("TEST from f02: sz=%u\n", (unsigned)sizeof(t));
}
File main.cpp:
#include <stdio.h>
void f01(); // prototype
void f02(); // prototype
int main(int argc, char *argv[])
{
    (void)argc; (void)argv;
    f01();
    f02();
    return 0;
}
Program output:
TEST from f01: created
TEST from f01: sz=4
TEST from f01: created
TEST from f02: sz=8

As you can see structure TEST from f01.cpp will be constructed in both f01() and f02() functions. In function f02() it will be allocated 8 bytes for TEST structure (since it has two unsigned members in f02.cpp), nevertheless constructor of TEST from f01.cpp will be called. I think it is clear enough that calling incorrect constructor is a bad thing, and it is better to avoid it.

The reason of such behavior, is that structure and class methods have external linkage. It is done to make it possible to use structure or class from another translation unit (*.cpp file). Therefor compiler will generate two TEST::TEST() external symbols, one for each TEST structure. Linker, when linking all three files together, will not be able to understand what TEST::TEST() to use in every particular case, so it will pick just the first one. I don't know why there is no linkage warning (or even error) for such situation (in both gcc and MSVC).

The solution is to put declaration of TEST structure in anonymous namespace:
File f01.cpp:
#include <stdio.h>
namespace {
    struct TEST
    {
        TEST() { printf("TEST from f01: created\n"); }
        unsigned m;
    };
}
void f01()
{
    TEST t;
    printf("TEST from f01: sz=%u\n", (unsigned)sizeof(t));
}
File f02.cpp:
#include <stdio.h>
namespace {
    struct TEST
    {
        TEST() { printf("TEST from f02: created\n"); }
        unsigned m1;
        unsigned m2;
    };
}
void f02()
{
    TEST t;
    printf("TEST from f02: sz=%u\n", (unsigned)sizeof(t));
}
Program output:
TEST from f01: created
TEST from f01: sz=4
TEST from f02: created
TEST from f02: sz=8
By putting declarations in anonymous namespace we force them to have internal linkage, and therefor not to confuse linker with their symbols.

Bonus example:
File f01.cpp:
#include <stdio.h>
struct TEST { unsigned m; };
void f01() { TEST t; }
File f02.cpp:
#include <string>
#include <stdio.h>
struct TEST { std::string s; };
void f02()
{
    TEST t;
    printf("t.s=\"%s\"\n", t.s.c_str());
}
Example will crash since constructor of TEST structure from f01.cpp doesn't call constructor of TEST::s member from f02.cpp.

No comments:

Post a Comment