smart pointer

Download smart pointer

Post on 28-Mar-2015

99 views

Category:

Documents

2 download

Embed Size (px)

TRANSCRIPT

Advanced Concepts in C++

Garbage Collection Techniques in C++Smart Pointers for automated memory management

Simon FuhrmannDarmstadt University of Technology, Germany

September 13, 2005

Abstract: This paper deals with library implemented automated dynamic memory management, also known as automated garbage collection (or just AGC or GC for short) for the C++ programming language.

Contents1 Introduction 2 Motivation 2.1 Objects as return value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Objects in GUI environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Understanding smart pointers 3.1 Wrapping pointers . . . . . . 3.2 Implemented idioms . . . . . 3.3 Resource management . . . . 3.4 Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 2 4 5 5 6 7 8 10 10 15 15 18 19 22 22 22 22 24 28

4 Reference counting smart pointers 4.1 Reference counting implementation 4.2 Reference counting issues . . . . . 4.2.1 Cyclic references . . . . . . 4.2.2 Multi-Threading . . . . . . 4.2.3 Performance analysis . . . .

5 Smart pointer libraries 5.1 The C++ Standard Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 The Boost Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 The Loki Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A ref ptr source code List of Figures

I

1

INTRODUCTION

1

1

Introduction

Managing dynamic memory is a complex and highly error-prone task and is one of the most common causes of runtime errors. Such errors can cause intermittent program failures that are dicult to nd and to x. There are dierent types of errors with managing dynamic memory. One of the most common mistakes are memory leaks. Memory leaks occur if a program fails to free objects that are no longer in use. Typically, a program continues to leak memory, and over the time its performance degrades and it can eventually run out of memory. Another problem are premature frees. This is caused by freeing objects that are still in use, which can result in corrupted data and may lead to sudden failures. Memory fragmentation is caused by frequent allocation and deallocation of memory such that larger chunks of memory are divided into much smaller, scattered pieces, and the memory between these pieces is too small to be reallocated again.

Memory management and C++The most critical issues are memory leaks and premature frees. Every instanciated primitive type (int, char, etc.) and complex type (objects) needs memory, but not all of the memory is managed manually. The place where the data resides is decisive: Nearly every primitve instance is created on the stack, even most of the complex instances are created on the stack. Since the stack has automatic extent, the requested data will be automatically reclaimed if it goes out of scope. But data that is created on the heap will stay until it is freed, so the heap has dynamic extent. In the Java programming language every object needs to be created on the heap, and typical Java code looks like this: public void myMethod() { MyClass obj = new MyClass(); // Do something with obj } Sure, obj will go out of scope at the end of myMethod but the data will not be freed since it was created on the heap. The stack can only be used for primitive types. And this is one of the main reasons for Javas completely dierent garbage collection model. Java uses a mark-sweep garbage collector to detect and reclaim unused data. In the C++ programming language there is no such garbage collector. Objects can be created on the heap, like in Java, and also on the stack, like any other primitive type: void my_function() { MyClass obj; // Do something with obj } Cleaning up obj is not neccessary because the stack reclaims all data when the scope is exited. This simplies memory management a lot and no garbage collector is needed in most cases. It is advisable to use the stack as often as possible, not only because there is no need to care about memory management but also because the stack is faster than the heap. On most operating systems the stack has xed size and no memory allocations need to be done. Its just a matter of decrementing and incrementing the stack pointer. Its even possible to use STL containers with non-pointer types which eliminates some more needs of manual memory management. If the STL containers scope is exited all objects within that container are reclaimed automatically. But only non-pointer types are reclaimed, the data of stored pointers is untouched!

2

MOTIVATION

2

2

Motivation

In real life applications manual memory management is an issue and cannot be bypassed completely because some situations require manual memory managment. There are several reasons for that but the most important ones are: The stack is limited, on most operating systems to somewhat in between 8 and 64 MB and some big objects will not t. But in most cases big data has dynamic size and needs to be created on the heap, anyway, so the corresponing class needs to care about memory management, and not the creator of the instance. An object obj requires sizeof(obj) bytes of the stack (if it was created on the stack). More important are objects that need to outlast the current function scope. There are a lot scenarious for this issue. Most common are objects in GUI environments and objects that sould be used as return value.

2.1

Objects as return value

Using objects as return value is relatively common. In most cases the object can be simply returned if there are no performance considerations. Approach 1: Copy return value from local variable:

std::string repeat_string(const std::string& str, int times) { std::string result; for (int i = 0; i < times; ++i) result += str; return result; } The function repeat string() takes a string and an amount of repetitions and returns times concatenated copies of str. On the one hand, the code is simple and easy to understand, and if str and times are small, copying the result should not be matter. But on the other hand if str is long, times has a big value, and performance is critical, this code is very expensive. Note: This very simple example may get optimized by the compiler with a technique called return value optimization. This optimization requests space for the result in the right place and skips copying the result (skips calling the copy constructor). But more complex function calls may not get optimized. Approach 2: Supply result space for the new data:

void repeat_string(const std::string& str, int times, std::string& result) { for (int i = 0; i < times; ++i) result += str; } This approach uses a trick to prevent copying the result from a local variable. There are no more performance considerations and memory management is not an issue here. But other reasons rates this solution far from beeing good. Using parameters for values that should be return values is unintuitively. But unfortunately, this is common practice in functions from libc, the C library.

2

MOTIVATION

3

Even worse, there is no control over the creation process of the result object and there is no guarantee that result was created with appropriate constructor parameters. Quickly forgetting this approach, its time to forge ahead with some better ideas. Approach 3: Return a pointer to the data:

std::string* repeat_string(const std::string& s, int times) { std::string* result = new std::string; for (int i = 0; i < times; ++i) *result += s; return result; } Returning a pointer to the data is the most convenient way to perform a cheap copy operation. Only few bytes (the actual size of a pointer depends on the machine) need to be copied. But there is no way to reclaim the requested memory within the repeat string() function. The deletion of the data need to be done outside this function, after the result has been used. Another idea would be using a static buer that gets overwritten on subsequent use to avoid deletion. But neither remembering to delete the data nor supplying a static buer is a good solution for that simple problem. Approach 4: Return memory-managed data:

This approach is a little lookahead to smart pointers and possibly the only ecient way with automated memory management: auto_ptr repeat_string(const std::string& s, int times) { auto_ptr result(new std::string); for (int i = 0; i < times; ++i) *result += s; return result; } The result ist created in the heap, of course, because the data need to exceed the current scope. But instead of leaving the cleanup to the caller, a smart pointer called auto ptr takes care about the deletion of the data. The specic semantics from auto ptr will be explained later on. The slightly complicated code (that uses C++ templates) can be cleaned with typedefs and results in a code like this: typedef auto_ptr StrPtr; StrPtr repeat_string (const std::string& s, int times) { StrPtr result(new std::string); for (int i = 0; i < times; ++i) *result += s; return result; } As a result, the code looks identical to the rst approach but some things are dierent: The data needs to be created with new and the result needs to be dereferenced to access the data, thats because a smart pointer behaves