Lessons I Learned About C

I recently worked on my first non-trivial C project during my student job in bioinformatics research. I already knew enough C to hack things together, like my LoL Shield daemon, which still suffers from some bugs but works fine, but this time I tried to do everything right™.

Now, in other languages you can get along just fine using just a REPL and some printf-debugging, but as I learned, when programming C, life is just a lot easier if you know how to use your tools. As it turns out, there is not much you need to know to get great benefits. So what follows are some of the lessons I’ve learned and my overall impressions of working with C.

Use valgrind

Valgrind is wonderful – just put valgrind in front of your program and arguments and wait for it to finish. Valgrind slows down execution considerably, so it might be a good idea to use not the biggest input available for such a test run. In the end valgrind spits out statistics about memory allocation and in case of (possible) memory leaks gives a backtrace to the line where the corresponding memory block was allocated. It helped me to find not only leaks, but also a lot of stupid double frees or index over-/underruns that I would probably not find ever otherwise, because valgrind tells you how many bytes away from an allowed memory block your invalid read or write went. And it is a great feeling to see the “no leaks are possible” message! A fact that you often take for granted in all those fancy garbage-collected languages we have at our disposal today.

Use gdb

Well, this point is probably obvious to most. The thing is, I do not like debuggers. At least I thought so. This may be because of rather negative usability experiences I had with them when using IDEs. Well, now I see that probably the problem are bloated IDEs and not debuggers. I learned my basic gdb-ABC from this tutorial and some googling, which is enough to start being productive. Just don’t forget to disable optimization and add debugging information (-O0 -g). Overall, the experience was quite pleasent and the gdb CLI is very polished. At the very least, you can locate the place where your program crashes and often inspecting variables prior to the crash is enough to see the mistake. But in some cases I still find printf-debugging to be the less tedious solution. Especially if I want a trace of multiple variables in e.g. each run of some loop. I don’t know of a fast way to do the same with a debugger, but maybe I just have to use them more.

Profiling

Add -pg to both compiler and linker options, and your program will be compiled in a way that it creates a gmon.out file which contains profiling information. Just run gprof your_program gmon.out > result.txt to extract a human-readable summary about the run. I used this tutorial about basic gprof usage, but I must admit, I didn’t use it much. Still, it is useful to know how to find bottlenecks easily.

Unit-testing in C

Setting up unit-testing in C is surprisingly simple. I settled on a variant of minunit – all it takes is one additional header file with some macros and adapting your Makefile. There are several variants of this micro-test-framework (if you can even call it this) floating around the web, e.g. this one. If you set it up in a similar way, each test suite will be compiled into a separate executable and a shell script will run them all and log the results. All the tests can be run through valgrind, so that correct memory usage is tested for free as well. I modified the macros so that success and failure of tests is shown in green and red, respectively, it is much more fun and cheerful with colors! ;)

I have not enough discipline (yet?) to do test-driven development, but what I found to be very useful was what I’d call test-driven debugging. I had a few situations where I simply could not find the bug, I could hit my head against the wall. Then I had an enlightenment – when reasoning does not lead me anywhere, probably my assumptions are wrong! So I’ve added tests for all functions involved. And of course (as usual) the problem was in a completely different place. For me, finding a nasty bug is the biggest motivation to write tests, and sometimes it is the only solution. Hence, test-driven debugging.

My testing recipe

The problem with testing is that it only protects from mistakes you have thought of or already made (and added a test against it, afterwards). Coming from Haskell, I am a fan of randomized tests in the style of QuickCheck. So for the more complex algorithms I added generators for big random inputs and sampled them for test cases, where the complex and efficient implementation is compared against a simple naive solution. The randomized tests are complementing some regular tests (especially edge cases) with known expected values and some exhaustive tests for rather small inputs. The combination of these kinds of tests:

  • regular tests (hard-coded results)
  • small exhaustive tests (against naive implementation)
  • bigger randomized tests (against naive implementation)
  • all above through valgrind

gives me a warm fuzzy feeling about the correctness of my code, when all the test suites pass and everything is green.

Clang

Clang is awesome and probably all the good things you heard about it are true. The error and warning messages are among the best I’ve ever seen from a compiler (and even colorful!). So there is no excuse for not using clang at least for development. I would even use it if it was just for the helpful warning including a hint for the correct format string when using printf! As a bonus, clang includes a linting tool, clang-format, which is also easy to use. Unfortunately it lacks some customization options I would like, but that are just minor things.

Minimize global state

Well, to be honest, this is a lesson I already knew before, but I applied it in C as well – I rather have some more parameters in a function, than introduce a global variable. In the end, the only globally accessible data I introduced were the program args, because they are coupled with a lot of stuff and I really don’t want to pass them to every function explicitly. This makes testing easier and the code is less spaghettified.

Use C99 to the full potential

Previously, I basically just used C99 as standard to be able to declare variables everywhere (again, a feature taken for granted normally!). Now I know that the C standard library (this small and laughable collection of functions) does include some useful things – you get a bool type, a set of sane fixed-size integer types and some useful constants and macros associated with them. In fact, I made a master-header that includes about two-thirds of the C standard library, as you need it all the time anyway.

Impressions

Well, in the end I may not have some unique perspective to offer, but I can confirm many of the good and bad aspects people often mention.

What I like about C

Speed

Damn, it is fast. It’s a nice feeling to run a blazing-fast binary that does in seconds what would take minutes in your fancy modern language of choice.

Control

No garbage collector kicking in, no magic performed by the runtime. Even though it may be not really true, it really does feel like you have full control over the machine.

Portability

Well, not exactly a feature of C, but just a matter of fact. But probably this point is moot, if your target are just regular recent x86-family processors. But it is good to know that you can compile for, say, your MIPS wireless router.

What I hate about C

Makefiles

Writing the Makefile for the project was like always a major PITA and more or less consisting of copy-pasting snippets from various tutorials and Stackoverflow questions, then poking around until it does what I want. I probably really need to RTFM some day, because writing Makefiles seems not to be a skill you just pick up on the fly. For me, it is just a necessary evil. Maybe I am just spoiled by other languages. Of course, this point does not apply to C itself, but Makefiles are the usually used with C or C++ projects and are the de-facto standard build system, so I think this is fair.

Historical madness

Separation of .c and .h files, always needing to change function signatures in two different places, needing macro guards to prevent multiple includes of the same file, needing a prototype signature, if you use a function before the definition. The resulting amount of boilerplate code. Working with strings. These are just examples from a huge list of small annoyances. Not even asking for the accomplishments of PL theory and engineering of the last decades. I know that all of this is historical baggage we now can’t change. I just hope that someday Rust or something similar will succeed and replace C. Probably won’t ever happen, but one can always dream!

My Conclusion

While you can get along without fancy tools in e.g. Haskell or Ruby most of the time, in C appropriate tooling is indispensable. It also makes programming in C is a lot more bearable and fun. You still probably need almost twice as much time to accomplish the same thing and have a lot more ways to make mistakes, but in the end it pays off. I am not afraid of SEGFAULTs and memory corruptions anymore, but I still rather wouldn’t use C without a very compelling reason to do so.