Tracking Memory Leaks in C++

This may be common knowledge for a lot of you, but Valgrind is a dynamic code analysis tool that discovers memory leaks for you. It can tell you exactly where memory that leaked was allocated, from there you can use your intuition to decide where the memory should be freed.

The greatest thing is how easy to use it is! I just removed half a dozen memory leaks from my undergraduate thesis project in about 35 minutes with no prior experience. The only “trick” to using Valgrind is ensuring that you compile your project with debugging flags turned on ( “-g” for gcc and g++). That will embed line number information in your executable so that Valgrind can generate useful output.

The Valgrind Quick Start page is by far the best introduction to Valgrind.

j j j

Getting Started with tinySTM (Ubuntu 9.04)

This post is a quick guide to go from nothing to writing small tinySTM based applications. For those that don’t know, tinySTM is a library for writing applications that use transactional memory for synchronization in lieu of traditional locks an semaphores. So this begs two questions now. What is synchronization and what is transactional memory?

Loosely speaking, synchronization is a term used to refer to any method to prevent processes or threads from trampling on one another. What do I mean trampling? There’s things like memory consistency errors which is a term for when threads have an inconsistent view of the same data. For example, if two threads check the value of an integer and see different values. This is typically caused when the integer is cached on the CPU. One core will load a cached version of the variable and the other thread (running on a different core) will go to RAM to read the value. And so different values are seen! Synchronization prevents problems like these.

Transactional Memory (TM) is a style of synchronization that was inspired heavily by databases. In a database requests are encapsulated as transactions. Databases ensure integrity through transactions. This is accomplished by rolling-back any changes that were made in a partially completed transaction. This means that failed transactions won’t break your database. The same is true of memory.

With a little back story, we’re ready to start

tar -xvf tinySTM-0.9.9.tgz
cd tinySTM-0.9.9
sudo apt-get install libatomic-ops-dev
export LIBAO_HOME=/usr/include/atomic_ops

Runing make compiles tinySTM and puts a static library file at ~/tinySTM-0.9.9/lib/libstm.a. Anything we write to use tinySTM will need to link to this lib file.

Let’s make sure that everything is working by compiling and running the example code that came with tinySTM.

cd test
cd bank
# To run these demos with multiple threads we use the "-n" option
./bank -n 3

If everything is working correctly you should get some pretty lengthy output that looks similar to this:

kris@cosmos:~/tinySTM-0.9.9/test/bank$ ./bank -n 3
Nb accounts : 1024
Duration : 10000
Nb threads : 3
Read-all rate : 20
Read threads : 0
Seed : 0
Write-all rate : 0
Write threads : 0
Type sizes : int=4/long=8/ptr=8/word=8
Initializing STM
Creating thread 0
Creating thread 1
Creating thread 2
Thread 0
#transfer : 1969727
#read-all : 492137
#write-all : 0
#aborts : 522377
#lock-r : 167012
#lock-w : 387
#val-r : 354978
#val-w : 0
#val-c : 0
#inv-mem : 0
#realloc : 0
#r-over : 0
#lr-ok : 0
#lr-failed : 0
Max retries : 35784
Thread 1
#transfer : 3517300
#read-all : 879229
#write-all : 0
#aborts : 986623
#lock-r : 288231
#lock-w : 691
#val-r : 697695
#val-w : 6
#val-c : 0
#inv-mem : 0
#realloc : 0
#r-over : 0
#lr-ok : 0
#lr-failed : 0
Max retries : 45082
Thread 2
#transfer : 1947009
#read-all : 486864
#write-all : 0
#aborts : 580381
#lock-r : 228081
#lock-w : 328
#val-r : 351970
#val-w : 2
#val-c : 0
#inv-mem : 0
#realloc : 0
#r-over : 0
#lr-ok : 0
#lr-failed : 0
Max retries : 57503
Bank total : 0 (expected: 0)
Duration : 10000 (ms)
#txs : 9292266 (929226.600000 / s)
#read txs : 1858230 (185823.000000 / s)
#write txs : 0 (0.000000 / s)
#update txs : 7434036 (743403.600000 / s)
#aborts : 2089381 (208938.100000 / s)
#lock-r : 683324 (68332.400000 / s)
#lock-w : 1406 (140.600000 / s)
#val-r : 1404643 (140464.300000 / s)
#val-w : 8 (0.800000 / s)
#val-c : 0 (0.000000 / s)
#inv-mem : 0 (0.000000 / s)
#realloc : 0 (0.000000 / s)
#r-over : 0 (0.000000 / s)
#lr-ok : 0 (0.000000 / s)
#lr-failed : 0 (0.000000 / s)
Max retries : 57503

It’s really no fun to run someone else’s code, so lets build something simple from the ground up. I’ll be using the Boost Thread library for threading instead of pthreads (which is what the tinySTM examples use).

I’m going to write a very contrived example, where I’ll have a Counter class and a MyRunnable class. The Counter class will be extremely simple. In fact, it will basically just be a wrapper around an integer. The only method of interest it will provide will be increment(), which will increment the integer some amount each time it is called. The other class, MyRunnable is basically just an encapsulation of a Boost thread, you can think of it as class the implements Runnable in Java.

The program will start a bunch of threads via Boost, which results in the the run() method of each MyRunnable object getting executed from a different thread of execution. The MyRunnables will try to call increment() on the same Counter object. If everything is done right, each call should be accounted for in the end.

I will synchronize the increment() method by enclosing its body in a transaction. That means that if another thread modifies any of the memory touched in the body of increment, the transaction will be canceled and rolled back to the original state.

Don’t forget to copy all of the tinySTM .h files (stm.h, mod_mem.h, etc) and the library file (libstm.a) into your current working directory. With all of that in mind, here’s the example:

//File: samplestm.cpp
//Author: Kristopher Kalish
#include <iostream>
#include <boost/thread.hpp>
#include <atomic_ops.h>
#include "stm.h"

// These following macros are from the tinySTM examples, and they truly 
// are useful.
 * Useful macros to work with transactions. Note that, to use nested
 * transactions, one should check the environment returned by
 * stm_get_env() and only call sigsetjmp() if it is not null.
#define RO                              1
#define RW                              0
#define START(id, ro)                   { sigjmp_buf *_e = stm_get_env(); stm_tx_attr_t _a = {id, ro}; sigsetjmp(*_e, 0); stm_start(_e, &_a)
#define LOAD(addr)                      stm_load((stm_word_t *)addr)
#define STORE(addr, value)              stm_store((stm_word_t *)addr, (stm_word_t)value)
#define COMMIT                          stm_commit(); }

using namespace std;

static const int INCREMENT = 5;
static const int NUM_RUNS  = 100000;

class Counter
		value = 0;

	 * Increment the counter by five by looping. A loop was picked to
	 * make calls to increment() take more cpu time.
	void increment()
		START(0, RW);

		for(int i = 0; i < INCREMENT; i++)
			int tmp = (int) LOAD(&this->value);
			tmp = tmp + 1;

			STORE(&this->value, tmp);


	int getValue()
		return value;

	int value;


class MyRunnable

	MyRunnable(int id, boost::barrier* bar, Counter* count)
		this->id    = id;
		this->bar   = bar;
		this->count = count;

	void run()
		for(int i = 0; i < NUM_RUNS; i++)

		// all done, wait at the barrier

	// The entry point for a thread
	void operator()()
		// We must call stm_init_thread() at the beginning of each
		// thread's line of execution before using the tinySTM library


		// Call this at the end of each thread's execution to have
		// tinySTM clean up.

	int             id;
	boost::barrier* bar;
	Counter*        count;

int main()
	int            numThreads = 4;
	boost::barrier my_barrier(numThreads);
	Counter        count;

	cout << "Intializing tinySTM." << endl;

	cout << "Counter is starting with value: " << count.getValue() << endl;
	cout << "Starting " << numThreads << " counting threads..." << endl;

	// Need to make at least one thread
	assert(numThreads >= 1);

	// Make the first thread
	boost::thread thread1(MyRunnable(0, &my_barrier, &count));

	// Then make the remaining threads
	for(int i = 1; i < numThreads; i++)
		boost::thread thread(MyRunnable(i, &my_barrier, &count));

	// thread1 will terminate when all threads have reached the barrier
	thread1.join(); // Wait for thread1 to terminate 

	cout << "Counter is ended with value: " << count.getValue() << endl;
	cout << "Counter should be: " << NUM_RUNS * numThreads * INCREMENT << endl;

	// Let tinySTM clean up after itself

	return 0;

Then to compile and run, we will need to link against the tinySTM library and Boost library:

g++ samplestm.cpp -lboost_thread-mt libstm.a -o sample

Example output:

Intializing tinySTM.
Counter is starting with value: 0
Starting 4 counting threads...
Counter is ended with value: 2000000
Counter should be: 2000000

j j j


I fixed some problems with the post about Boost threads. The example program worked correctly for the most part, but only by accident. I had incorrectly assumed that the join() method started the thread. This is not the case. It’s a method that blocks until the thread is done.

The correct solution would be to make a “barrier” which has one method, wait(). When a barrier is constructed, it is initalized with a counter. Each thread calls wait() when it’s done, decrementing the counter. When the counter reaches zero, all calls to wait() return. The example in the previous post about Boost threads now uses a barrier.

j j j

Getting Started with Boost Threads

Boost is collection of open source C++ libraries. They are released under the “Boost License” so they can be incorporated into open-source and closed-source projects. Anyway, one of the libraries in the collection that is of particular interest to me is the threading library. It’s cross-platform, so I should be able to run my code on any platform. It also uses proper C++ templating, so it’s clean as well.

This post is targeted to readers who already have some experience writing multi-threaded applications (in Java for example). This post tell you only what you need to go from nothing to compiling a simple Boost-based program that uses locks.

The first thing we have to do is get Boost. The threading library was last changed in version 1.36, so anything 1.36 and later will do. You can do a manual install by following the instructions in the Getting Started Guide. However, I use Ubuntu 9.04 which packages the Boost library, and I’m a huge advocate of using your distro’s package management system so I’ll be using that.

To get Boost in Ubuntu, run the following:

sudo apt-get install libboost1.37-dev
echo "That was easy!"

So now all that’s left is make a simple, multi-threaded application.

// File: sample.cpp
#include <iostream>
#include <boost/thread.hpp>

using namespace std;

class MyRunnable

	MyRunnable(int id, boost::mutex* mutex, boost::barrier* bar)
		this->id    = id;
		this->mutex = mutex;
		this->bar   = bar;

	// The entry point for a thread
	void operator()()
		for(int i = 0; i < 10; ++i)
			boost::mutex::scoped_lock  lock(*mutex);					
                       cout << "id: " << this->id << ", " << i << endl;	

		// all done, wait at the barrier. 
                // wait() returns when everyone has met at the barrier

	int id;
	boost::mutex* mutex;
	boost::barrier* bar;

int main()
	boost::mutex io_mutex;
        // this barrier will wait for two invocations of wait()
	boost::barrier my_barrier(2); 

	cout << "Starting two counting threads..." << endl;
	// the boost::mutex cannot be copied (for obvious reasons)
	// so we must pass the pointer to the mutex.
	boost::thread thread1(MyRunnable(1, &io_mutex, &my_barrier));
	boost::thread thread2(MyRunnable(2, &io_mutex, &my_barrier));

	thread1.join(); // wait for thread1 to finish

	// Note how the program doesn't return until all threads are dead
	return 0;

Then compile and run!:

g++ sample.cpp -lboost_thread-mt

The output will of course vary a lot each time you run it, but it should look something like this:

id: Starting two counting threads...
1, 0
id: 1, 1
id: 1, 2
id: 1, 3
id: 1, 4
id: 1, 5
id: 1, 6
id: 1, 7
id: 1, 8
id: 1, 9
id: 2, 0
id: 2, 1
id: 2, 2
id: 2, 3
id: 2, 4
id: 2, 5
id: 2, 6
id: 2, 7
id: 2, 8
id: 2, 9

Notice how the output of the two threads we created is interleaved with the output of the main thread of execution. This is one of the dangers of threading!

j j j

Limiting Bandwidth in Linux

Ever wanted to limit the bandwidth of a single command in Linux? It’s easy with trickle. You use it to launch the program that you want to restrict and it will provide a modified (restricted) version of sockets. No configuration… nothing.

Here’s how to install and use it to limit the bandwidth given to Firefox to 300 KB/sec on Ubuntu:

sudo apt-get install trickle
trickle -d 300 firefox

Ridiculously easy right?

j j j