Compiler vs Interpreter | Why C Is More Efficient Than Python
You’ll often hear that C/C++ is more efficient than interpreted languages such as Python and Node.js. In the proceeding post I’ll attempt to explain one of the many reasons why compiled languages like C/C++ are more performant. Before we can talk about the the benefits of compilers, it’s important that you have basic understanding of modern computer architectures.
The performance of a processor can be measured in instructions per clock cycle or IPC. There are two main methods of exploiting instruction-level parallelism to achieve a higher IPC.
1. Pipeline Depth
Pipelines are analogous to doing your laundry.
- Place a load of dirty clothes in the washer
- When the washer is done, place the wet clothes in the dryer
- When the dryer is finished, fold the clothes
- When the folding is finished, put the clothes away
If you have a big family, you’ll agree that like showers, washing machines can be a contested resource. It wouldn’t make sense to wait until your sibling has finished the entire sequence of washing, drying, folding and putting their clothes away before attempting to do a wash yourself.
Similarly, rather than waiting until an instruction has completely finished executing before attempting to run the next one, we can place the next instruction on to the pipeline.
For those of you who want to know what exactly is happening at the hardware level, we’re actually placing register between stages to store intermediate values and in doing so, lowering the critical path. The lower your critical path, the higher you can raise the clock frequency.
2. Pipeline Width
Another approach to increasing instruction-level parallelism is to replicate the internal components (i.e. ALU) so that the processor can concurrently execute multiple instructions in every pipeline stage. In keeping with our example about doing the laundry, this is analogous to adding an additional washer and dryer.
Suppose we had the following assembly program.
Due to the dependencies between instructions, the best we can hope for is to schedule the instructions in the following manner.
It takes four clock cycles per loop iteration. If you do the math, you get an IPC of 1.25 versus the 2.0 we could achieve if we fully took advantage of the additional hardware.
This is where compilers come in. The compiler will unroll the loop or in other words, it will take the code that would be executed within multiple iterations and spread it throughout memory. During the unrolling process, the compiler introduces additional registers to store the intermediate results and renames registers where there wasn’t any real data dependency. In doing so it can better schedule the code.
Notice that now 12 of the 14 instructions in the loop execute as pairs. It takes 8 clocks for 4 loop iterations, or 2 clocks per iteration, which yields a IPC of 14/8 = 1.75. That’s a 1.75 — 1.25 / 1.75 = 29% speed increase over the previous example.
To conclude, compilers do much more than check your syntax. Unlike programs that are interpreted (read and executed line by line), programs that are compiled ahead of time can take advantage of the many compiler optimizations such as loop unrolling to increase performance.
I want to give a big shoutout to Mr Patterson and Mr Hennessy for their wonderful book on computer architecture.