Easily Avoid Large C++ Class Copies

Wednesday, June 14, 2006
By: Jason Doucette

Large Classes means No Copies Wanted

I am involved in a rather large project written in mostly C.  I use C++ where it helps, so I have a few classes.  They exist mainly for the purposes of defining a proper hierarchy and for data encapsulation.  These classes have a large amount of data which is dynamically allocated, since statically allocated data eats up stack space.  The fact that they use a large amount means I do not want copies of them being made.  Making copies would be a performance issue, a memory issue, and there just is no reason to have duplicate copies to begin with.  If such a thing occurred, it would definitely be a bug in my code, and I would like to know about it, so I could fix it.  It would be great if the compiler simply disallowed this, and refused to compile, since that’s the best time to find bugs.  But such is not the case…

 

C++'s Forced Favours are Unhelpful

The C++ standard states that a default copy constructor and a default assignment operator will be made if you do not declare one.  This happens whether you want them or not.  It seems awfully nice thing to do.  Except, you can not reject this help.  Usually favours that you did not ask for are not all that appeasing.  This case is no different.  The C++ standard makes the most ridiculous versions of these member functions you can possibly imagine.

The default versions of the copy constructor and the assignment operator merely copy all of your data members over as-is, with no further thought.  Yes, it does invoke the copy constructor and the assignment operator for each data member, so they are copied properly.  It does not just perform a simple data copy via memcpy(), which would not invoke any copy constructors or assignment operators, so it is not that bad.  But realize that the copy constructors and assignment operators it automatically invokes on your data members, if created automatically by the compiler, will merely do the same thing.  What’s bad about this?

If you have pointers, you will wind up with two pointers pointing to the same piece of data.  This is not only a concern for dynamically allocated memory.  It is equally bad for pointers that point to anything other than NULL, such as other data members.  Most C++ mentors fail to point this out, likely excusing that it was “left as an exercise for the student”.  Let’s hope the exercise is not production code.

So, for the purposes of anything larger than a trivially contrived C++ tutorial example, these function members may as well just call exit().

 

Stop Complaining.  It is Easily Resolved.

Hey, after all, you just need to program your own copy constructor and assignment operator, right?  C’mon!  It’s easy.

For the copy constructor, if you have pointers, just allocate some more memory, pass the return value of the allocation to the new pointer, and copy the data over.  Voilà!

The assignment operator is not much harder.  You can do the same thing as above, but first, you should do a preliminary check that you are not assigning something to itself.  If so, abort immediately.  If not, deallocate the original copy (if you did this during self assignment, you would lose your data!).  In fact, for the assignment operator, if the dynamic memory is known to be the same size, you do not even have to deallocate it, since you can just reuse it!

 

So, How is this Bad?

I will tell you:

  1. While you know you need to manually handle pointers, you may not also realize that the rest of the data members do not automatically handle themselves.  If you declare your own versions of these member functions, the default member functions become non-existent, and with them go all of their functionality.  While their functionality was ridiculously minimal, you must now recreate it.  Thus, you are thrown into a predicament where you now need to handle every single data member.  If you miss one, tough luck!

    This is what we call a dependency.  It requires manual labour to ensure you have covered everything.  Manual labour is error prone.  This is why dependencies are bad.

    You may try a neat trick using memcpy() and sizeof(*this).  But this works only for POD (Plain Old Data) classes, the type that does not contain any data members except those that represent raw data.  However, POD classes work properly with the default copy constructor and assignment operator.  So, it is unneeded for them.

    But, how about using memcpy() to handle only the Plain Old Data portion of your classes.  Not so fast.  This will overwrite anything initialized in the initialization list, which forces you to not use one (and you should use initialization lists, as I will explain later).  If you use this trick anyway, you still need to worry about implement proper copying of all the non-POD data members, so this does not remove dependencies completely.

  2. The moment you add a new data member to your class, you need to remember to modify your copy constructor and your assignment operator, even if it is not a pointer.  This is the same dependency as mentioned above.  Bugs lurk in dependencies; avoid them if you want solid code.

  3. Most of the code in the copy constructor exists identically in the assignment operator, as they both make copies of objects.  So, you have duplicate code.  You are supposed to avoid duplicate code like the plague, since it is a dependency.  If you change one, you have to remember to change the other in the same manner.  More manual labour and more room for mistakes.  If you duplicate code, you are putting out a welcome mat for bugs.

    You can try to avoid duplicate code by calling the assignment operator from within the copy constructor.  How neat!  But, the assignment operator is an assignment; it replaces data members that already exist.  So, what is it replacing when the copy constructor, whose purpose is to construct an object from scratch, calls it?  Your copy constructor will have had to have constructed the object first, with the default constructor.  This 'defaultly' constructed object is what the assignment operator will replace.  This means the object is initialized twice!  While this does get the job done, it is inefficient.

    For those of you initializing your data members individually with assignments in the code, note that you are doing the same thing as above… just think about what you are assigning the data to.  Something has to exist to accept the replacement.  (Be aware of ambiguity of the C++ syntax when you declare and assign a variable in one statement.  CObject b = a; is not an assignment.  It is identical to CObject b(a); which invokes the copy constructor.)  You should use initialization lists, instead, as they will call the copy constructor for your data members.  They will not call the default constructor and then the assignment operator.  Thus, they will be initialized only once.

    Another possibility is to make them both call another member function that does all the copying work.  Aha, modularization.  Now, how can this be wrong?  Unfortunately, this results in the same problem of double initialization.  For the copy constructor to be able to call this member function, it must already have constructed the object you are passing to it.

    So, you are ultimately stuck with writing duplicate code, or submitting to the inefficiency of double initialization.  Duplicate code is horrendous, so it looks like the inefficient double initialization method is better.  But, of course, this depends on how this inefficiency affects your project.

  4. What happens if either member function fails?  (Ugh, more drudgery!  Programming is supposed to be fun.)  You need extra code to ensure your dynamic memory allocations are handled properly.  And, you have to report errors when they occur.  This is done by throwing an exception, or if that isn't your style, by allowing construction to proceed, and reporting the error through a data member.  Either implies the code invoking these member functions needs to handle the error.  What a mess!

So, you need to worry about ALL this even when you do not ever want a copy of your classes being made!  This is the result due to an early decision that C++ should make its own versions of these member functions...  versions that do not work except for the most elementary classes.  I hope someone can explain what purpose they hold, because they have little use in my experience.  Their detriments far outweigh any use I can imagine at this time.

It is unfortunate that these mistakes could not simply be fixed, by turning off this ‘feature’, without worrying about breaking millions of lines of code that relies on the original implementation.  Perhaps compilers could have a switch that toggles the existence of the feature, thus allowing for both worlds to co-exist.

 

Is There Any Hope?

Must we submit to this atrocity?  If you do not want these member functions, and C++ gives them to you, perhaps you can prevent them from being invoked.

A first stab may be making our own copies that do nothing, but assert() that they will never be run.  This gives you a warning to fix your code that is invoking a copy being made.  But, this is a run-time catch, not compile-time catch.  It is fine in the lab, but it is not OK when one of these is invoked in a distributed build, in some code path that was never tested during beta testing, on a customer’s machine.  Ouch.

 

The Real Solution

The only decent way to prevent them from being invoked is to declare them as private.  This way, no users of your classes can invoke them, and the compiler will complain if you do.  Code cannot leave the lab without being compiled, so no such functions will ever be invoked on a customer’s machine.  Catching bugs at compile-time is orders of magnitude better than at run-time – they are much less costly to fix, and you will not have angry customers.

Not so fast.  The class itself can access private class members.  Is there a way to avoid this?  Yes!  Declare the member functions, but do not define them.  That is, provide the function prototypes, so the compiler does not automatically define its own versions, and also so that it knows you want them private.  But, do not define the actual code.  If you attempt to use them from within the class, the compile process succeeds, but the linker will fail to find the code it needs, and will issue an error: “LNK2019: unresolved external symbol”.

This is the solution in code:

class CExample
{
public:

 // constructor
 explicit CExample();

 // destructor
 ~CExample();

private:

 // copy constructor
 explicit CExample(const CExample& original);
  // Declared private so interface cannot use it.
  // Not defined so implementation cannot use it.

 // assignment operator
 CExample& operator=(const CExample& original);
  // Declared private so interface cannot use it.
  // Not defined so implementation cannot use it.
};

 

Final Thoughts

Your C++ mentor may have led you to believe these member functions are special (apparently the C++ Standard calls them "special"), but they are just member functions.  The copy constructor is just a constructor.  It is a constructor that takes one parameter, which happens to be the class itself.  This parameter does not even need to be a constant, or a reference, for it to compile.  (However, if it is not a reference, it is ‘passed by value’, so, when the function is invoked, the compiler makes a copy of it.  How does it make that copy?  By calling the copy constructor, of course.  Say “Hello, infinite recursion”.)  The default assignment operator is just an overloaded operator.  If the compiler did not automatically make its own useless versions, you would not even think about them in a special way.  You would just make them when they needed to be made.  How wonderful would that be?

 

Links

 

 

About the Author: I am Jason Doucette of Xona Games, an award-winning indie game studio that I founded with my twin brother. We make intensified arcade-style retro games. Our business, our games, our technology, and we as competitive gamers have won prestigious awards and received worldwide press. Our business has won $190,000 in contests. Our games have ranked from #1 in Canada to #1 in Japan, have become #1 best sellers in multiple countries, have won game contests, and have held 3 of the top 5 rated spots in Japan of all Xbox LIVE indie games. Our game engines have been awarded for technical excellence. And we, the developers, have placed #1 in competitive gaming competitions -- relating to the games we make. Read about our story, our awards, our games, and view our blog.