“The (B)Leading Edge”

"The (B)Leading Edge"

Moving to Standard C++

Jack Reeves

©The C++ Report

By the time you read this, the first public comment period for the latest committee draft of the C++ Standard will have passed. I have no doubt that there are still some problems in the (draft) Standard that need to be resolved, but I firmly believe that the time is rapidly approaching when I can write "Standard C++" instead of "(draft) Standard C++". More to the point, I expect that by the time you read this, C++ compiler vendors will be shipping the first releases of what they will refer to as "ANSI/ISO C++" compilers. These compilers will probably not be "full ANSI/ISO C++" right out of the box (there are some dark corners of the (draft) Standard that are going to take some time to implement and test thoroughly), but they will be a big step in that direction. Therefore, I thought it appropriate to devote a column to what you can expect when you get one of these new ANSI/ISO C++ compilers and start to migrate your existing C++ code base.

I have said before that (d)Standard[1] C++^* is really a different programming language from ARM[2] C++. While (d)Standard C++ is based upon ARM C++, just as C++ is based upon C, the committee was not compelled to keep ARM C++ as a strict subset of (d)Standard C++. This is not to say that the committee had a cavalier attitude about breaking existing code. On the contrary, they often went out of their way to maintain backward compatibility. Nevertheless, you might find it much easier to compile ANSI/ISO C code using your new ANSI/ISO C++ compiler than to compile some of your existing C++ code -- at least at first. This column will examine some of the problems you can expect to encounter as you try to move your existing code to (d)Standard C++.

Expect to recompile everything

The very first thing you will likely discover about your new compiler is that you can not mix new object code with the old -- i.e. the compilers are not "binary compatible". There are several reasons for this. Some of the more persuasive are

a. Different object layout formats required to accommodate RTTI (run time type identification).

b. Different name mangling techniques required to support namespaces.

c. Different code formats required to support exception handling.

Whatever the actual reason, you will have to recompile all code in an application before you can re-link it.

Compiler options may change

Before you recompile everything, you may want to make changes to your "make" system. The options for the compiler will probably have changed. Many compilers have been supporting aspects of (d)Standard C++ for some time, but only on an optional basis. Some examples of the features which might fall into this category include:

exception handling

RTTI

new cast notation

bool as a built-in type

operator new[] and operator delete[]

namespaces

If you are using any of these features with your old compiler, you probably had to explicitly enable them. Under (d)Standard C++ all of these features are now enabled by default. Your new compiler may still accept the old switches for compatibility, but it may also have several new options that you might want to consider enabling. Some of these may be switches to disable certain (d)Standard features (like exceptions) to make it easier to migrate ARM code. If you are not quite ready for everything in (d)Standard C++, you may want to look into this possibility.

Expect some warnings, maybe a lot of warnings

When you recompile existing ARM C++ code with a (d)Standard C++ compiler, you will probably get some warnings. There are at least two sources of such warnings: deprecated language features, and possibly dangerous interactions with new language features. I'll explain both.

Deprecated features --

Officially, the term "deprecated" means that the language feature is legal in the current version of the language, but it may not be legal in a future version. In more pragmatic terms, the use of the feature is discouraged. Some people may well wonder if "deprecated" really means anything at all given that the current version of Standard C++ is not yet approved and nobody knows if there will even be another version, let alone when it will appear. Nevertheless, the purpose of marking some feature as deprecated is to facilitate migration of code -- a "deprecated feature" should generate a warning in "this" version, because it will become an error in the "next" version. Good compilers will generate warnings for deprecated language features.

An example of something that is deprecated is the following:

char* p = "a string";

In ARM C++ this is correct code. Currently, a string literal has type of "array of char". This is because that is the type of a string literal in C. The committee decided to fix this loop-hole in the type system. In (d)Standard C++ a string literal has type "array of const char". The implicit array-to-pointer conversion of a string literal in (d)Standard C++ yields a "const char*". Technically, this makes the assignment above illegal (it removes the const qualifier). Nevertheless, there is a lot of existing C++ code that contains statements like the above. This is especially true in C++ code that has to interface with legacy C code where the concept of "const correctness" is virtually unknown.

Rather than break all that existing code, the committee added a standard conversion of "string literal to char*". They marked the conversion as deprecated, however. Thus, the line of code above is still legal in (d)Standard C++, but you may get a warning about it.

Some other items that are now considered deprecated:

use of postfix ++ operator on a bool operand

static keyword used to declare objects at namespace scope

access declarations

In addition, the entire strstream library is considered deprecated (it is replaced by the stringstream class). Likewise, all of the Standard C Library headers (use the new Standard C++ Library versions listed in table 4). Finally, certain functions of the iostreams library have newer, preferred forms. I doubt if the use of any of these library features will generate a warning, but you should make a note of them anyway. Unlike features of the language itself, deprecated library features are much more likely to disappear in a feature release.

New language features --

A number of new language features have caused changes in the way certain existing features work. Existing code that uses these changed features may be perfectly valid, but it may work somewhat different than expected. I hope and expect that the better compilers will have a number of warnings available to help you check your code to make sure that it still works the way it was intended to work. As one simple example, consider the following:

int done = (i < j); // 1

...

if (done) { // 2

This use to be perfectly valid code. It still is, but in (d)Standard C++ it works conceptually different. Now the logical comparison i < j on line 1 yields a bool instead of an int. There exists a standard conversion from bool to int however, which is used to do the initialization. The if statement on line 2 expects to test a bool. Here a standard promotion of an int to a bool is used to create the correct value for the condition. In reality, there may not be a single instruction different between the code generated by an ARM compiler versus that generated by an (d)Standard compiler. Still, it would be better to write

bool done = (i < j);

...

if (done) {

I hope that compilers will offer at least the option of generating a warning message for one or both of these new conversions.

Obviously, I can not identify all the possible warnings that a new (d)Standard C++ compiler might generate. As always, you should carefully examine the cause of any warnings and determine whether or not the code is correct before you choose to ignore them.

Expect some errors

In some places, the committee changed the way certain existing C++ language features work. Sometimes this was necessary in order to cleanly integrate some new language feature. Sometimes it was just because the committee felt that the old feature needed "fixing". We have already seen an example of this in the change of the type of a string literal. Unless you are using a very limited subset of C++, you will probably find that there are some things in your code that just will not compile with your new (d)Standard C++ compiler. What follows is a partial list of some of the things you may run into.

Formerly deprecated features --

If your code still uses such things as the overload keyword then you are going to have a problem. There are a number of older language features that are still supported by certain versions of cfront based C++ compilers. These features were already considered deprecated. They will be flagged as errors by a (d)Standard C++ compiler. Other examples:

Omission of the definition of a static class data member which is implicitly zero initialized (this will cause a link error)

Assignment of an int to an enumeration type (a cast is now required)

Explicitly specifying the number of elements in the delete[] expression

Implicit int declarations

Assignment to this in constructors or destructors

Errors from new language features --

It would be nice if ANSI/ISO C++ compiler vendors provide an ARM compatible switch, but sooner are later you are probably going to have to deal with the fact that certain parts of your code has to be changed to be compatible with (d)Standard C++. Some of these changes may be limited to certain specific areas, like templates, but others may be much more widely pervasive. I will elaborate upon each of the following:

• New keywords

• New headers

• Namespace std

• New names for old types

• For statement semantics

• Overloading ambiguities

• Template changes

• Internal library changes

New keywords -- table 1 lists the keywords that are now part of (d)Standard C++ (74 in all). Table 2 lists the new keywords defined in (d)Standard C++ that were not defined in ARM C++ (26 total). As a raw number, this is an increase of over 50%. Obviously, (d)Standard C++ is not that much larger than ARM C++ -- at least 15 of the new keywords have existed for some time as macros in various Standard C or common C++ headers. Nevertheless, this drives home the fact that (d)Standard C++ is not just ARM C++ with a bigger library. Obviously, if you are currently using any keywords as identifiers in you code, you will get compile-time errors.

New headers -- table 3 lists the (d)Standard C++ Library headers. Table 4 lists the (d)Standard C++ headers for the Standard C library. For compatibility, vendors are expected to provide "dot-h" versions of the Standard C library headers (string.h, stdio.h, etc.). I would also expect vendors to provide versions of such common C++ headers as new.h and iostream.h. Nevertheless, it is quite possible that you may find that certain headers used in your code are missing. You may also find that certain declarations have moved to a different header.

If your code uses the STL this may be a major impact for you. Existing STL implementations may not compile under an (d)Standard C++ compiler for reasons I discuss below under templates. This may force you to switch to the compiler vendor's Standard Library. This will mean changing to the Standard headers as well as using the namespace std qualification discussed below.

Namespace std -- almost all of the (d)Standard C++ Library is encapsulated within the std namespace. This means functions, classes, and objects in the (d)Standard C++ Library, including functions from the Standard C library, must now be either explicitly qualified i.e. std::strlen(s), or be inserted into the current scope by the appropriate using declaration/directive and then used without qualification. This may give some projects fits. For example, I once worked on a project that had a coding guideline which required all global functions to be explicitly qualified (i.e. ::strlen(s)) whether it was needed or not. Under (d)Standard C++, such a call will fail to compile (strlen is no longer declared in the un-named global namespace). As noted above, vendors are expected to supply headers that provide using declarations for the Standard C library. In addition, many will likely provide dot-h versions for many common headers that have using declarations for common functions and objects (e.g. cout, cerr, and cin), but I have no idea how extensive this will be.

New names for old types -- in the (d)Standard C++ Library there is no class string or class ifstream or class iostream. The names "string", "ifstream", and "iostream" continue to exist, but rather than being type names, they are synonyms for much more complex types:

string ::= basic_string<char,

char_traits<char>,

allocator<char> >

ifstream ::= basic_ifstream<char, char_traits<char> >

iostream ::= basic_iostream<char, char_traits<char> >

You can continue to use the names string or iostream as if they were type names, with one exception. Many of us are in the habit of sticking in a forward declaration for common types such as string or ostream when all we want is to declare a reference. For example:

class ostream;

ostream& operator<<(ostream&, const X&);

class string;

class X {

X::X(const string&);

...

};

This reduces the number of physical dependencies in the header, and is usually a good idea. Unfortunately, it isn't going to work anymore. Because string and ostream are only synonyms for template instantiations, such forward declarations are guaranteed to clash with the real declarations at some point.

This is also true of stand-alone declarations of standard functions such as

size_t strlen(const char*);

This is not the same function as the one declared in <cstring> (or string.h). The strlen function in those headers is in namespace std. You might think to try

namespace std {

size_t strlen(const char*);

};

This might work -- but it might not. The (d)Standard does not require that standard headers be actual files, or that including one of them results in textual inclusion (i.e. pre-compiled headers are allowed). The (d)Standard also makes it clear that attempting to add to namespace std is undefined. In other words, one (d)Standard compiler might accept the above, another might reject it with an error message, and both would be correct.

In (d)Standard C++, the only correct way to introduce a name into a translation unit is to #include its header. This was true in ARM C++ also, but (d)Standard C++ makes it much harder to sidestep this requirement.

Besides all the new templates, there are a few types in the (d)Standard Library that went through a couple of iterations before finally arriving at their final names. These include the types of the standard exception hierarchy. In the early days, there was xalloc (replaced by bad_alloc), and its siblings. Some early libraries provided xalloc, et. all. If you are using one of these compilers/libraries you will have to make changes when you switch to a (d)Standard compiler and library.

For statement semantics -- (d)Standard C++ does not introduce any new statements into the language, but the semantics of some of the existing statements have changed. For the most part, these are extensions, and so will not matter when compiling ARM compliant code. There is one exception, however. Under ARM C++, the scope of an object defined in the initialization portion of a for statement was to the end of the enclosing block. Thus

for (int i = 0; i < size(); ++i) { ... }

was equivalent to

int i = 0;

for (; i < size(); ++i) { ... }

Under (d)Standard C++, 'i' will go out of scope at the end of the for statement. Code such as

for (int i = 0; i < size(); ++i) {

bool found = // test something

if (found) break;

}

if (i >= size()) {

// handle not found case

}

will no longer compile (At least you hope it will generate a compiler error -- if there is some other 'i' in scope after the for loop, then you will not get a compile-time error, instead you will have a runtime error to track down).

Overloading ambiguities -- you may discover that certain function calls which use to compile cleanly are now flagged as ambiguous by a (d)Standard C++ compiler. There are at least two possible reasons why this might occur. The first is that the committee cleaned up the overload resolution rules and certain situations that various compilers use to consider unambiguous are no longer so. If you are just recompiling working code, hopefully you will not see any of these. If you do, it should cause you to look carefully at what your old compiler was actually choosing. In my experience, the compiler chose wrong more often than not. You may find a latent bug or two this way.

The second possible reason for ambiguous errors is templates. The (d)Standard Library contains a lot of templates. You may discover that what use to be a simple (not overloaded) function call (or operator expression -- which translates into a function call), now has some templates in the mix and the result is ambiguous. It is even more likely that you may discover that a template version of the function was actually chosen and then it wouldn't compile when instantiated. Note that the overload resolution algorithm does not take into consideration whether a template function will actually compile if bound to a certain set of arguments. I have run into this problem with the comparison operator templates that were part of the STL (and are now in the header <utility>).

Another possible source of overloading templates is class member templates. A lot of the classes in the (d)Standard Library now define member templates. Unfortunately, I get the distinct impression that this may not be of any real concern for awhile. I have yet to encounter a compiler that supports member templates.

Template changes -- (d)Standard C++ makes a number of changes to the template mechanism versus ARM C++. As near as I can tell, the users of templates will not have to change code that instantiates a template (there is a caveat to this, as usual). The templates themselves will have to be upgraded in order to compile under (d)Standard C++. This has to do with the name binding rules the committee established. The rules themselves and their ramifications are beyond the scope of this article. Suffice it to say that certain declarations within templates now have to be qualified with the typename keyword. A few other changes may also be required.

If you are using the public domain STL (or a third party version), this may force you to switch to the version supplied with the compiler (or to upgrade your third party version). That, in turn, will force you to use the new standard headers, and introduce you to the problems associated with namespace std (which I discussed above).

In the previous column[3], I remarked that since all the (d)Standard versions of the STL were in namespace std, and since all the headers for the (d)Standard library have different names than the typical headers, you might be able to continue to use an existing STL library instead of switching to the compiler vendors (d)Standard version. That assumes that the old version continues to compile when run through the ANSI/ISO compiler. For myself, all my compilers that support namespaces do not support the typename keyword, and the one that requires typename does not support namespaces, so I haven't had a chance to actually try out this idea.

If you have written templates yourself, you may have to upgrade them. The caveat I mentioned above is that if your templates depend upon any objects other than those defined within the template itself, and this includes members of base classes, then you should pay careful attention to the name binding rules. You may discover occasions where the old templates were being bound to objects at the point of instantiation but the new templates bind to objects at the point of definition. If this happens, you may be forced to change the template, which in turn will force changes in the client code.

Internal library changes -- finally, there have been quite a few changes to various parts of the (d)Standard Library in various versions of the Draft Working Paper. If you have been using certain early versions of such things as the STL and/or string class, you may find that the (d)Standard versions are somewhat different. Consider the string library. It was one of the first new libraries adopted for the (d)Standard Library. As a result, there are a number of implementations of string available, all claiming to be "ANSI" string classes. Most of these implementations do not conform to the latest committee draft, however. When we upgraded our string library awhile back we discovered a number of problems in our code. For example, we had a number of calls such as

s.insert(p, ' ');

s.append('.');

These functions had a single character as an argument. In the original string class definition, the above corresponded to the following signatures:

insert(size_t pos, char c, size_t n = 1);

append(char c, size_t n = 1);

About the time of the first committee draft, these signatures got changed around (as well as becoming templatized):

insert(size_type pos, size_type n, charT c);

append(size_type n, charT c);

Everywhere a string function once took a character and a repeat count (which defaulted to 1), the signature now requires an explicit repeat count and then the character argument. As a result, the old calls no longer compile.

A more recent change renamed the string "remove" functions to "erase".

The changes to the STL containers are less significant, though there are a few. The algorithm count, for example, now returns its result as the function value instead of via an argument.

Debugging the result

After you get everything to compile, there is still the possibility that it will not execute correctly. So far, all the problems I have encountered have been related to the evolution of the library, and cannot be directly blamed on language changes. On the other hand, I do not yet have a compiler that fully supports the entire (d)Standard. Some runtime problems should show up rather quickly; others are likely to take awhile to manifest themselves. Since the former are much easier to find, I will start with an example of the latter.

operator new and bad_alloc -- the (d)Standard now requires that the default memory allocation function (operator new) throw an exception (bad_alloc) if an attempt to allocate memory from the free store fails. If your project is like mine, you probably have a lot of code that looks (conceptually) like:

X* p = new X;

if (p == NULL) {

// memory allocation failure

}

Under (d)Standard C++ this code is still perfectly valid, it is just that the if statement is now superfluous. An exception from new will bypass it completely. In order to get the old behavior (and keep the if-statement in the loop), you must use the placement form of new:

#include <new>

...

X* p = new(nothrow) X;

Object nothrow is of type nothrow_t which is defined in header <new> along with a placement form of operator new that takes such an object as its argument. This form will return a null pointer if the allocation fails. Alternatively, if you would rather deal with exceptions, you can change the code to the following:

#include <new> // for bad_alloc

...

try {

X* p = new X;

} catch (bad_alloc&) {

// former if-statement body goes here

}

Naturally, none of these forms will cope with an different exception thrown by the X constructor.

I fear this may turn out to be one of the larger problems in migrating production code to (d)Standard C++. Memory allocation errors are typically very rare, yet good code must be written with their possibility in mind. It does no good to start writing code that copes with bad_alloc exceptions until you have a library version of operator new that throws such an exception. On the other hand, it does little good to switch your own code to the nothrow form when libraries such as string and the STL containers are going to be propagating bad_alloc upon an allocation failure.

Library signature changes -- you might discover that some of the changes to library function signatures allow existing code to compile without error, but cause runtime errors instead. For example:

s.append(' ', 13);

Under our first version of the string library, this added 13 blanks to the end of string 's'. With the new signature (see above), the compiler silently promotes the character argument to a size_type, while the repeat count converts silently to a character. The result is to append 32 carriage returns to the string -- not exactly the same thing.

Another example:

char* p = 0;

...

string(p);

The specifications for the string class state that the const char* parameter of the string constructor can not be null. In this case, the code was incorrect, but the earlier version of the string class accepted this as a request to build an empty string. The newer version of string generated a segmentation fault.

Obviously, this type of problem is not limited to (d)Standard C++, but can occur anytime code which depends upon undocumented or undefined behavior is migrated. Nevertheless, as this example shows, there is quite a lot of behavior that is undefined in the (d)Standard Library that you might ordinarily assume should have been defined. Some vendors may simply ignore the areas of undefined behavior; others may try to protect clients from such mistakes; still others may extend the specification. You should beware of such undefined behavior. Just because your code use to work and it no longer does is not automatically an indication that you have found a bug in your new library; you may have found a bug in your code.

Lifetime of temporaries -- under the ARM, the lifetime of temporaries was undefined. In (d)Standard C++ this has been addressed. Temporaries under ARM C++ often hung around until the end of the scope in which they were created. Under (d)Standard C++, they are required to go away at the end of the complete statement containing the expression that created them (with a couple of exceptions which are not pertinent to this discussion). Most C++ programmers do not pay any particular attention to what temporaries they are creating, and even less attention to the lifetimes of such temporaries. In the vast majority of cases, this is fine. It is possible, however that somewhere there is some code that depends upon a temporary being valid for a while longer than it should.

Consider a reference counting scheme for memory management that uses a smart pointer class to hide the details. The lifetime of a temporary of such a pointer class could determine whether some other section of code was dealing with a valid object or just deallocated memory.

These are just a few of the problems that I am already seeing or am worrying about. In order to build up a truly comprehensive list I need experience with a much larger code base. This means I need input from you, the reader. If you encounter gotcha's in migrating code from ARM C++ to a version of (d)Standard C++, please send them to me, either to jack@fx.com or the editor. I will post them on my web site and write up the more interesting ones in future columns.

References

1. "Working Paper for Draft Proposed International Standard for Information Systems -- Programming Language C++", December 1996.

2. Ellis, M., and B. Stroustrup, The Annotated C++ Reference Manual, Addison-Wesley, 1990.

3. Reeves, J. The (B)Leading Edge: More STL Gotcha's, The C++ Report, Vol. 9, No. 5, May 1997.

Table 1.

Keywords in (d)Standard C++

asm do if return try

auto double inline short typedef

bool dynamic_cast int signed typeid

break else long sizeof typename

case enum mutable static union

catch explicit namespace static_cast unsigned

char export new struct using

class extern operator switch virtual

const false private template void

const_cast float protected this volatile

continue for public throw wchar_t

default friend register true while

delete goto reinterpret_cast

Alternate representations

and or not xor

and_eq or_eq not_eq xor_eq

bitand bitor compl

Table 2.

New Keywords in (d)Standard C++

bool false true

const_cast mutable typeid

dynamic_cast namespace typename

explicit reinterpret_cast using

export static_cast wchar_t

and or not xor

and_eq ` or_eq not_eq xor_eq

bitand bitor compl

Table 3.

C++ Library Headers

Table 4.

C++ Headers for C Library Facilities