Learning the C Programming Language as a Classical Musician

Episode 5

Welcome back!

In the last episode we looked at some nice ways to comment out your code, and ended up playing with ASCII and the fact that, for a computer, everything is just numbers.

Today we will look at what is known as the phases of translation, that is, the processes that transform our code written in plain English into something the machine can read. It is a quite complex and deep subject, so I will do my best to make it easier to understand.

Let’s get started.

Phases of translation

Once you are happy with your main.c file, it is time to ask the computer to compile your code, that is, translate it into machine-readable code. In Xcode, we do this by pressing Cmd-B to build and Cmd-R to run, but, normally, pressing Cmd-R will build and run together. You can also press the Play button in the Toolbar.

At this point, the compiler processes the C source file according to a list of operations that need to be done, more or less in the order below. Due to the power of modern CPUs, some of these phases may happen at the same time, as long as the final result is the same. In music, this is the same magic that occurs when a player translates the notational signs on the score into played notes.

Let’s look at these phases.

Phase 1

Since we can compile C code on different operative systems (such as Windows, macOS, Linux, etc…), such code must be adapted to a model which is common across all of them. This model is called the source character set, which, among other things, replaces the end-of-line indicators with newline characters. Newline characters are those \n we saw in some code samples, and that languages such as Swift will make automatic at the end of a string. Other operative systems may have different line terminators, so one of the operations to be performed is to replace them with \n. Then the code, usually a text file encoded, for example in UTF-8, is mapped byte-by-byte, to the source character set.

This set includes many characters, but the most important part of it is the basic source character set, consisting of 96 characters:

5 whitespace characters (space, horizontal tab, vertical tab, form feed (akin to a page break), new-line)
10 digit characters from '0' to '9'
52 letters from 'a' to 'z' and from 'A' to 'Z'
29 punctuation characters: _ { } [ ] # ( ) < > % : ; . ? * + - / ^ & | ~ ! = , \ " '

To summarise: every line of code is mapped to a universal character, exactly as each note corresponds to an exact frequency, regardless of the notational system used to write it down on paper.

Phase 2

This phase is crucial but simple: when a line ends with \n, in theory two separate characters, they are combined into a single logical operator that causes the line feed. Imagine the repeat symbol in music:

This is made of: two vertically aligned dots, a thin vertical line and a thick vertical line. That’s four elements, yet they are always combined into a single symbol that means “repeat from previous mirrored repeat symbol or, missing that, from the beginning”.

At this point comes a safety check: should a non-empty file not end with a newline character, its behaviour will stay undefined, to avoid issues with the program. This is as if the score didn’t end with a final barline, but with a simple one: we could not know whether the piece were truly over or if a page were missing.

Phase 3

Now the code must be decomposed into different categories, mainly comments, sequences of whitespace characters and preprocessing tokens. These are, for example, header names, such as the <stdio.h> we find at the top of each C file, identifiers, which we looked at in Episode 3, preprocessing numbers such as integer and floating constants¹, character constants and string literals², and operators and punctuators. Other non-whitespace characters are inserted into their own category.

In a musical score, you have several categories of notational objects that are all mixed and then translated to create music: notes, dynamics, articulations, slurs, lyrics, chord symbols, rehearsal marks, staves, barlines, etc. …

Once this classification is over, the following happens:

Comments are replaced each by one single space character
Newlines are kept as they are
The rest is ready to be processed

Phase 4

This phase is divided into three sub-phases. The first one consists in the execution of the preprocessor. To put it simply, imagine you are reading a text in which not just some words, but entire sentences, or even pages, are summarised with a single word or with a collection of symbols. Imagine now that your brain has the power of transforming those symbols (called preprocessor directives in C) into the full text with the blink of an eye. This is what is happening during the preprocessing. Do you remember the <stdio.h> header? That file is made of thousands of lines of code, functionalities, and more. When you press Run the compiler transforms that line #include <stdio.h> into those thousands of lines of code!

Now, in the second sub-phase, each file that has a #include directive at the beginning pass through phases 1 to 4, recursively (that is, again and again, until no more file with such a beginning is left to be solved).

The last sub-phase consists in removing those preprocessor directives from the source code.

This phase is crucial because it allows us to write code that can run on several machines, and then conditionally exclude one or more of them according to our needs.

In music this could bring a funny comparison: when you see a sequence of notes, all your years of training are summarised in a nanosecond, and you are magically able to play those notes, you do not need to study them again every time.

Phase 5

We are almost there. Now we need to take care of escape sequences (those character pairs beginning with \) when they are found in character constants and in string literals. Those need to be converted from the source character set into the execution character set, which is usually provided by the operative system or the IDE. Whatever the result, it must be guaranteed that each character is a non-null spaced character. In short, when you go to your string quartet rehearsal, do not forget to bring your instrument with you!

Phase 6

If two or more string literals are adjacent, they are chained together. In music, this is known as “Attacca” and is written at the end of a movement to signify that you need to immediately start the next one.

Phase 7

This is where the actual compilation takes place, the performance, the concert! The syntax and the semantics of each token is analysed and translated into a single unit. In short: play all notes in tune, and follow the phrasing instructions of both the composer and the conductor.

Phase 8

This is the final phase, where every external component necessary for the program to run is collected and made into a program so that our final “app” or “program” can be run by the operative system (technically called the execution environment).

What’s next?

And that’s it! I hope you are enjoying this series.

I could have started another subject, but I feel this has been a tough one and that you may want to read it a few times before moving on. Furthermore, I am aware that my approach is more theoretical than practical, but I have taken several courses in the last three years that just make you feel good because you are building something that runs, that is shiny and flashy. When you then go out alone building things by yourself, you get stuck at every other step and need to follow my approach: study, research, understand, then try and fail again. That is why I prefer to give you a solid basis to start with, assuring you that practical examples will indeed come.

I also do not expect—nor should you—that you know by heart all this terminology. Just be aware of it, and try to remember it the next time you see some code, and if you don’t, come back here and review.

In the next episode we will quickly review identifiers and scope, which we introduced in Episode 3, and then look at the concept of lifetime.

Bottom Line

Thank you for reading today’s article.

If you have any question or suggestion, please leave a comment below or contact me using the dedicated contact form. Assuming you do not already do so, please subscribe to my newsletter on Gumroad, to receive exclusive discounts and free products.

I hope you found this article helpful, if you did, please like it and share it with your friends and peers. Don’t forget to follow me on this blog and to let me know what you think.

If you are interested in my music engraving services and publications don’t forget to visit my Facebook page and the pages where I publish my scores (Gumroad, SheetMusicPlus, ScoreExchange and on Apple Books).

You can also support me by buying Paul Hudson’s Swift programming books from this Affiliate Link or BigMountainStudio’s books from this Affiliate Link.

Thank you so much for reading!

Until the next one, this is Michele, the Music Designer.

A floating number is essentially a decimal number such as 9.764453 ↩
With literals is intended a single block of text inside of a pair of double quotes, for example:
"This is a string literal". ↩

Learning the C Programming Language as a Classical Musician