C To Assembly Tests & Scripts

This repository stores both C and Assembly code. There are Bash scripts in the scripts/ directory that automatically converts all C source files in the src/ directory to Assembly code and stores the results in the asm/ directory. The Bash scripts build the C source files using both GCC and Clang along with optimization levels 0 - 3.

My Motives

I enjoy low-level optimization in C and often refer back to this repository when wanting to analyze performance with my C code. High performance is required for many applications I make in C, especially firewalls and packet processing software. While I wouldn’t consider myself an expert at low-level optimization, but I want to continue learning in hopes I’ll eventually become very efficient with it.

C Source Files

The following C source files are included in the src/ directory.

File Name Description
8to64.c Casts from a u8 array to u64.
16to64.c Casts from a u16 array to u64.
32to64.c Casts from a u32 array to u64.
64.c Prints a 64-bit integer.
64_2.c Creates a 64-bit integer along with a pointer referencing it and prints the pointer real value.
64to8.c Creates a 64-bit and stores it in a u8 array.
64to16.c Creates a 64-bit and stores it in a u16 array.
64to32.c Creates a 64-bit and stores it in a u32 array.
empty.c A completely empty C program.
forloop.c Creates an 8-byte array and prints each byte through a for loop.
fprintf.c Prints to stdout via fprintf().
if_simple.c Create an integer and performs a simple if check that prints to stdout.
if.c Create an integer and checks against 6 values.
matchrs.c Creates a 32-bit integer and performs bit-wise operations.
matchstruct.c Sets flags within a structure and checks. If successful, prints to stdout.
memcpy.c Copies an 8-byte array using memcpy().
nullptr.c Initializes a null pointer and prints to stdout based off of the value of the null pointer.
perftest1_one.c Initializes an one-byte integer and performs addition/division. Afterwards. prints to stdout.
perftest1_two.c Initializes a standard integer (likely four bytes) and performs addition/division. Afterwards. prints to stdout.
perftest2_one.c Initializes a large data structure and passes to a no-inlined function by value along with prints the fields to stdout.
perftest2_two.c Initializes a large data structure and passes to a no-inlined function by reference along with prints the fields to stdout.
perftest3_one.c Initializes a data structure with all 0’s, sets a few fields, and then passes it to a non-inlined function along with prints the values.
perftest3_two.c Initializes a data structure with fields representing typical padding added by the compiler, sets the fields (including padding fields) all at once, and then passes it to a non-inlined function along with prints the values.
pointer_reassign.c Initializes an integer and pointer that points to it then reassigns again and prints value.
pointer.c Initializes an integer and pointer that points to it then prints value.
print.c Prints string constant, creates a new character array, copies constant to it, and prints new array.
printf.c Prints to stdout via printf().
switch_simple.c Creates an integer and performs a single switch case. If matched, prints to stdout.
switch.c Creates an integer and performs 6 switch cases. If matched, prints to stdout.
switch.c Creates an integer and performs 6 switch cases. If matched, prints to stdout.
unroll_not_test.c Creates an integer and a loop that executes 256 times. Each loop iteration adds onto integer by j * 2. Afterwards, prints to stdout.
unroll_simple_not.c Creates an integer and a loop that executes 100 times. Each loop iteration adds onto integer by i * 5. Afterwards, prints to stdout.
unroll_simple_not.c Creates an integer and a loop that executes 100 times. Each loop iteration adds onto integer by i * 5. Afterwards, prints to stdout.
unroll_simple.c Creates an integer and a loop (unrolled) that executes 100 times. Each loop iteration adds onto integer by i * 5. Afterwards, prints to stdout.
unroll_test.c Creates an integer and a loop (unrolled by 10) that executes 256 times. Each loop iteration adds onto integer by j * 2. Afterwards, prints to stdout.
xdp_adjust_head.c Performs bpf_xdp_adjust_head() function inside of a XDP program.
xdp_adjust_tail.c Performs bpf_xdp_adjust_tail() function inside of a XDP program.
xdp_adjust_head.c Performs bpf_xdp_adjust_head() function inside of a XDP program.
xdp_block_port8080.c A XDP program that drops and blocks source IPs when packets arrive on TCP destination port 8080.
xdp_drop_port8080.c A XDP program that drops packets when packets arrive on TCP destination port 8080.
xdp_redefine.c A XDP program that initializes ethernet, IP, and TCP headers and then reinitializes and checks again.
xdp_simple_check_unlikely.c A XDP program that initializes ethernet, IP, and TCP headers and uses unlikely() to check if the header is valid.
xdp_simple_check.c A XDP program that initializes ethernet, IP, and TCP headers and checks if the header is valid.
xdp_simple_drop.c A XDP program that returns XDP_DROP immediately
xdp_simple_pass.c A XDP program that returns XDP_PASS immediately

More C source files will be added as time goes on and I need to test different things.

NOTE - I want to revamp file names and source files for organization in the future since it’s a bit messy right now. However, I don’t have the time to revamp the entire repository since a lot of these programs date back to years ago when I was new-ish to C.

Generating Assembly Code

I’d recommend using the scripts/gensrcdir.sh Bash script I made to generate Assembly code under different compilers (GCC and Clang) and optimization levels (0 - 4) under all C source files in the src/ directory. There are also both non-Intel and Intel architecture dumps included.

You may also use the scripts/genassembly.sh Bash script to convert a single source file which only requires one argument which is the name of the source file in src/ directory without the file extension (.c). Also make sure to modify the ROOTDIR variable if you place the script outside of this repository’s scripts/ directory. An example may be found below.

./genassembly.sh pointer

Optimization Levels

Here is general information on the different optimization levels for Clang. Please keep in mind optimization levels may be different for GCC.

Code Generation Options
    -O0, -O1, -O2, -O3, -Ofast, -Os, -Oz, -Og, -O, -O4
    Specify which optimization level to use:
        -O0  Means “no optimization”: this level compiles the fastest
        and generates the most debuggable code.

        -O1 Somewhere between -O0 and -O2.

        -O2 Moderate level of optimization which enables  most  opti‐
        mizations.

        -O3  Like -O2, except that it enables optimizations that take
        longer to perform or that may generate  larger  code  (in  an
        attempt to make the program run faster).

        -Ofast  Enables  all  the  optimizations  from -O3 along with
        other aggressive optimizations that may violate  strict  com‐
        pliance with language standards.

        -Os Like -O2 with extra optimizations to reduce code size.

        -Oz Like -Os (and thus -O2), but reduces code size further.

        -Og  Like  -O1. In future versions, this option might disable
        different optimizations in order to improve debuggability.

        -O Equivalent to -O2.

        -O4 and higher
            Currently equivalent to -O3

You’ll notice a lot of optimizations within the Assembly code from -O1 to -O3.

System

This was all tested on my Linux VM running virtio_net drivers and Debian 12. The Linux kernel the tests in asm/ were built with was 6.1.0-13.

GitHub Repository