Performance of HTTP compression in embedded systems

Apr 11, 2020

Embedded systems these days frequently need to serve up HTML pages, but in most cases, both the processor and the network interfaces are very slow. It’s important to find the right balance of gzip compression ratio to reduce the page load time of these devices, with the goal being to minimize the total amount of time from when the request is made to when the response is fully recieved.

gzip Performance

The first step is to figure out exactly how slow gzip is on the desired processor.

The following script is used to fetch performance data at varying levels of performance. It uses ./test.html as the document to test with, and you’ll likely want to change exec_count depending on your processor.

from __future__ import print_function
from timeit import timeit
import io, gzip


def compress(data, level):
    output = io.BytesIO()
    gzip.GzipFile(mode="wb", fileobj=output, compresslevel=level).write(data)
    return output.tell()


def load_example():
    with open("./test.html", "r") as f:
        content = f.read()
        try:
            return content.encode("utf8")
        except UnicodeDecodeError:
            return content


data = load_example()
exec_count = 20
print("compression level, time (ms), compressed size (kiB)")
for level in range(0, 10):
    test_fun = lambda: compress(data, level=level)
    time = timeit(test_fun, number=exec_count)
    size = test_fun()
    print(str((level, time / float(exec_count) * 1000, size / 1024.0)) + ",")
import pandas as pd
%config InlineBackend.figure_formats = ['svg']
import matplotlib.pyplot as plt
%matplotlib inline
orange_pi_zero = pd.DataFrame([
    (0, 11.398696899414062, 403.4951171875),
    (1, 30.363094806671143, 87.796875),
    (2, 31.992197036743164, 83.490234375),
    (3, 35.304808616638184, 79.7978515625),
    (4, 45.820748805999756, 74.3955078125),
    (5, 56.08339309692383, 70.9033203125),
    (6, 70.80044746398926, 68.8701171875),
    (7, 80.49700260162354, 68.52734375),
    (8, 111.0435962677002, 68.2294921875),
    (9, 160.21054983139038, 68.1083984375),
], columns=['level', 'time_ms', 'size_kiby']
).set_index('level')

ryzen_2700x = pd.DataFrame([
    (0, 0.6599545478820801, 403.4951171875),
    (1, 3.2829999923706055, 87.796875),
    (2, 2.991056442260742, 83.490234375),
    (3, 3.4702062606811523, 79.7978515625),
    (4, 4.487097263336182, 74.3955078125),
    (5, 5.750846862792969, 70.9033203125),
    (6, 7.645905017852783, 68.8701171875),
    (7, 8.826696872711182, 68.52734375),
    (8, 12.513351440429688, 68.2294921875),
    (9, 18.452298641204834, 68.1083984375),
], columns=['level', 'time_ms', 'size_kiby']
).set_index('level')
def add_stat_cols(df):
    return df.assign(compression_ratio=lambda df: df.size_kiby / df.size_kiby[0],
                     slowdown=lambda df: df.time_ms / df.time_ms[0])

orange_pi_zero = add_stat_cols(orange_pi_zero)
ryzen_2700x = add_stat_cols(ryzen_2700x)

orange_pi_zero
time_ms size_kiby compression_ratio slowdown
level
0 11.398697 403.495117 1.000000 1.000000
1 30.363095 87.796875 0.217591 2.663734
2 31.992197 83.490234 0.206918 2.806654
3 35.304809 79.797852 0.197767 3.097267
4 45.820749 74.395508 0.184378 4.019823
5 56.083393 70.903320 0.175723 4.920158
6 70.800447 68.870117 0.170684 6.211276
7 80.497003 68.527344 0.169834 7.061948
8 111.043596 68.229492 0.169096 9.741780
9 160.210550 68.108398 0.168796 14.055164
fig, ax = plt.subplots()

ax.set_ylabel('Compression Ratio')
ax.set_xlabel('Compression Duration (ms)')
ax.set_title('Compression Ratio vs Duration')

for label, data in [('Orange Pi Zero', orange_pi_zero), ('Ryzen 2700X', ryzen_2700x)]:
    ax.scatter(data.time_ms[1:], data.compression_ratio[1:], label=label)
ax.grid()
ax.legend()

plt.show()

Network Performance

Now that we’ve figured out how long gzip takes us, we need to understand the performance of the board. One nice way to do that is by running iperf3 -c <hardwired pc ip> on the client device and iperf3 -s on a wired PC. It’s probably a good idea to try this out at various times of the day and with various physical configurations of the board and the access point.

Fortunately, if you don’t want to do this, tkaiser has done lots of this work for us at https://forum.armbian.com/topic/3739-wi-fi-performance-and-known-issues-on-sbc/. I’ve used this forum topic as my reference for typical_networks_kbps.

def calculate_total_time_ms(level, network_speed_kbps):
    compressed_kbits = data.size_kiby[level] * 8
    network_speed_kbpms = network_speed_kbps / 1000.0
    compression_time_ms = data.time_ms[level]
    return compressed_kbits / network_speed_kbpms + compression_time_ms

typical_networks_kbps = {
    'crappy wifi': 6_000,
    'ok wifi': 24_000,
    'good wifi': 50_000,
    'gigabit ethernet': 1_000_000
}

total_transfer_times = pd.DataFrame({
    name: [calculate_total_time_ms(level, speed_kbps)
           for level in range(0, 10)]
    for name, speed_kbps in typical_networks_kbps.items()
})
total_transfer_times
crappy wifi ok wifi good wifi gigabit ethernet
0 538.653444 135.158327 65.219173 3.887915
1 120.345500 32.548625 17.330500 3.985375
2 114.311369 30.821135 16.349494 3.658978
3 109.867342 30.069490 16.237863 4.108589
4 103.681108 29.285600 16.390379 5.082261
5 100.288607 29.385287 17.095378 6.318073
6 99.472728 30.602611 18.665124 8.196866
7 100.196489 31.669145 19.791072 9.374916
8 103.486008 35.256516 23.430070 13.059187
9 109.263497 41.155098 29.349642 18.997166
total_transfer_time_improvement = pd.DataFrame({
    network_type:
        total_transfer_times[network_type] / total_transfer_times[network_type][0]
    for network_type in total_transfer_times
})

total_transfer_time_improvement
crappy wifi ok wifi good wifi gigabit ethernet
0 1.000000 1.000000 1.000000 1.000000
1 0.223419 0.240818 0.265727 1.025067
2 0.212217 0.228037 0.250685 0.941116
3 0.203967 0.222476 0.248974 1.056759
4 0.192482 0.216676 0.251312 1.307194
5 0.186184 0.217414 0.262122 1.625054
6 0.184669 0.226420 0.286191 2.108293
7 0.186013 0.234311 0.303455 2.411296
8 0.192120 0.260853 0.359251 3.358918
9 0.202846 0.304495 0.450016 4.886209
fig, ax = plt.subplots()

ax.set_ylabel('Compression + Transfer Time (ms)')
ax.set_xlabel('gzip Level')
ax.set_title('Effect of compression level on request time')

for network_type in total_transfer_times:
    ax.scatter(total_transfer_times.index, total_transfer_times[network_type], label=network_type)

ax.set_yscale('log')
ax.grid()
ax.legend()

plt.show()
fig, (ax1, ax2) = plt.subplots(2, 1, sharex=True)

ax.set_ylabel('Compression + Transfer Time (ms)')
ax2.set_xlabel('gzip Level')
ax1.set_title('Effect of compression level on request time')

for network_type in total_transfer_time_improvement:
    for ax in [ax1, ax2]:
        ax.scatter(total_transfer_time_improvement.index,
                   total_transfer_time_improvement[network_type], label=network_type)

ax1.set_ylim(0.9, 0.9 + .25)
ax2.set_ylim(0.15, 0.15 + .25)
ax1.grid()
ax1.legend()
ax2.grid()
ax1.spines['bottom'].set_visible(False)
ax2.spines['top'].set_visible(False)
ax1.xaxis.tick_top()
ax1.tick_params(labeltop=False)  # don't put tick labels at the top
ax2.xaxis.tick_bottom()

plt.show()

Conclusion

It’s fairly clear that no matter what you do, the network won’t be the bottleneck when using gigabit ethernet.

However, it seems like the performance sweet-spot for the type of slow wifi links frequently found in these SBCs is around gzip level 4. There’s a fairly strong knee in the speed of gzip at that point, even on a desktop CPU.

Other Algorithms

Brotli is the only other compression algorithm widely supported by browsers, but its compression ratio is much worse, which makes it unsuitiable for on-the-fly compression on low-end hardware.

It looks like the people behind zstd are looking to add it to browsers. When it’s widely supported, this will likely beat gzip in this application, or at the very least allow for more fine-grained tuning of the compression ratio.

Notebook Download

This document was built with a Jupyter Notebook. Get it here. The code licensed under Apache-2.0.