Lessons from C

One of the first things all new CGAT fellows have to do is learn how to program.  We do this in Python, firstly because it’s lovely and intuitive, it has an amazing community using, supporting and developing it and teaches some of the basics of programming like data structures, flow-control, etc without requiring a background in computing or computer science.  Oh, and our code base is mostly in Python and it helps if everyone is singing from the same hymn sheet.  In essence it’s a great beginner language (certainly more so than the cryptic Perl).  This blog isn’t going to be all about technical aspects of what we do, but a few of us have been delving into C recently.  I thought I’d just cover a few of the cool things I’ve learnt from C, which I hope, will improve my Python programming and my overall understanding of computing.

Lesson 1: The difference between static and dynamic typing
Python variable assignment is really easy:

var = something

It doesn’t matter per se what the “something” is, be it a string, integer, float or a Python list or dictionary.  It’s all the same.  Nice.  Simple.  In C you can assign a variable a value in much the same way, but you have to declare the type of the variable at the start of the program and/or function where it is being used, and you can’t change it.  In Python var can start as an integer and later be re-assigned as a string or tuple.  Sometimes in Python you can accidentally switch variable type without realising, which can cause unexpected code behaviour. As Python does not know the type of a variable before it is actually used, any errors resulting from wrong variable types manifest themselves only when your program is executing. And it might have been running for a while. Not so in C. As the variable type is declared, the compiler can check at the compilation stage for any instances where you, for example, attempt to add a number to string. I now realize that with freedom comes responsibility and why best practise in coding is so important – even if you can reuse variable names in Python, it is usually not a good idea.

Lesson 2: computers work in bits

Perhaps a little obvious that computers only understand 0’s and 1’s at the end of the day, but understanding data representation in bits is also really important.  It’s rare that we would have to think in this way with our day-to-day Python programming.  Enter C and the bitwise operators (NB: if, like us, you are torturing yourself by learning C from “The C programming language” by Kernighan and Ritchie you may have also experienced this lesson).  Bitwise operators are pretty neat, you can do some pretty funky manipulations by just changing individual or groups of bits.  Also this was a great lesson in the Two’s complement number system and bit arithmetic.  Even if you aren’t learning C and you don’t know about this it’s worth checking out.  This helped me work out how the SAM bitwise flags work!

Lesson 3: characters are just numbers

Each ASCII character is represented by a number, which means you can do arithmetic on characters. The following is valid syntax in both Python and C:

print(1 + 'a')

In Python this will throw a TypeError because the operation of adding an integer value to a string is not defined. In C this is perfectly valid because it recognises ‘a’ as a constant of type char for character, which is simply shorthand to write the byte value of 97, so it’ll print 98!  This example helped me to understand how different languages map supposedly similar symbolic representations to different CPU instructions. In Python, it is necessary to convert the character to its ASCII code explicitely. The equivalent operation is

print (1 + ord('a'))

There is a lot of legacy from C, which is evident in some of the syntax for bash commands and other languages, such as R, which we use for statistical analysis.  I’m a big fan of working from first principles, and learning C is about as close as I can get without having to dive into assembly language.