Embedded systems these days frequently need to serve up HTML pages, but in most cases, both the processor and the network interfaces are very slow. It’s important to find the right balance of gzip compression ratio to reduce the page load time of these devices, with the goal being to minimize the total amount of time from when the request is made to when the response is fully recieved.
gzip Performance
The first step is to figure out exactly how slow gzip is on the desired processor.
The following script is used to fetch performance data at varying levels of performance. It uses ./test.html
as the document to test with, and you’ll likely want to change exec_count
depending on your processor.
from __future__ import print_function
from timeit import timeit
import io, gzip
def compress(data, level):
output = io.BytesIO()
gzip.GzipFile(mode="wb", fileobj=output, compresslevel=level).write(data)
return output.tell()
def load_example():
with open("./test.html", "r") as f:
content = f.read()
try:
return content.encode("utf8")
except UnicodeDecodeError:
return content
data = load_example()
exec_count = 20
print("compression level, time (ms), compressed size (kiB)")
for level in range(0, 10):
test_fun = lambda: compress(data, level=level)
time = timeit(test_fun, number=exec_count)
size = test_fun()
print(str((level, time / float(exec_count) * 1000, size / 1024.0)) + ",")
import pandas as pd
%config InlineBackend.figure_formats = ['svg']
import matplotlib.pyplot as plt
%matplotlib inline
orange_pi_zero = pd.DataFrame([
(0, 11.398696899414062, 403.4951171875),
(1, 30.363094806671143, 87.796875),
(2, 31.992197036743164, 83.490234375),
(3, 35.304808616638184, 79.7978515625),
(4, 45.820748805999756, 74.3955078125),
(5, 56.08339309692383, 70.9033203125),
(6, 70.80044746398926, 68.8701171875),
(7, 80.49700260162354, 68.52734375),
(8, 111.0435962677002, 68.2294921875),
(9, 160.21054983139038, 68.1083984375),
], columns=['level', 'time_ms', 'size_kiby']
).set_index('level')
ryzen_2700x = pd.DataFrame([
(0, 0.6599545478820801, 403.4951171875),
(1, 3.2829999923706055, 87.796875),
(2, 2.991056442260742, 83.490234375),
(3, 3.4702062606811523, 79.7978515625),
(4, 4.487097263336182, 74.3955078125),
(5, 5.750846862792969, 70.9033203125),
(6, 7.645905017852783, 68.8701171875),
(7, 8.826696872711182, 68.52734375),
(8, 12.513351440429688, 68.2294921875),
(9, 18.452298641204834, 68.1083984375),
], columns=['level', 'time_ms', 'size_kiby']
).set_index('level')
def add_stat_cols(df):
return df.assign(compression_ratio=lambda df: df.size_kiby / df.size_kiby[0],
slowdown=lambda df: df.time_ms / df.time_ms[0])
orange_pi_zero = add_stat_cols(orange_pi_zero)
ryzen_2700x = add_stat_cols(ryzen_2700x)
orange_pi_zero
time_ms | size_kiby | compression_ratio | slowdown | |
---|---|---|---|---|
level | ||||
0 | 11.398697 | 403.495117 | 1.000000 | 1.000000 |
1 | 30.363095 | 87.796875 | 0.217591 | 2.663734 |
2 | 31.992197 | 83.490234 | 0.206918 | 2.806654 |
3 | 35.304809 | 79.797852 | 0.197767 | 3.097267 |
4 | 45.820749 | 74.395508 | 0.184378 | 4.019823 |
5 | 56.083393 | 70.903320 | 0.175723 | 4.920158 |
6 | 70.800447 | 68.870117 | 0.170684 | 6.211276 |
7 | 80.497003 | 68.527344 | 0.169834 | 7.061948 |
8 | 111.043596 | 68.229492 | 0.169096 | 9.741780 |
9 | 160.210550 | 68.108398 | 0.168796 | 14.055164 |
fig, ax = plt.subplots()
ax.set_ylabel('Compression Ratio')
ax.set_xlabel('Compression Duration (ms)')
ax.set_title('Compression Ratio vs Duration')
for label, data in [('Orange Pi Zero', orange_pi_zero), ('Ryzen 2700X', ryzen_2700x)]:
ax.scatter(data.time_ms[1:], data.compression_ratio[1:], label=label)
ax.grid()
ax.legend()
plt.show()
Network Performance
Now that we’ve figured out how long gzip takes us, we need to understand the performance of the board. One nice way to do that is by running iperf3 -c <hardwired pc ip>
on the client device and iperf3 -s
on a wired PC. It’s probably a good idea to try this out at various times of the day and with various physical configurations of the board and the access point.
Fortunately, if you don’t want to do this, tkaiser has done lots of this work for us at https://forum.armbian.com/topic/3739-wi-fi-performance-and-known-issues-on-sbc/. I’ve used this forum topic as my reference for typical_networks_kbps
.
def calculate_total_time_ms(level, network_speed_kbps):
compressed_kbits = data.size_kiby[level] * 8
network_speed_kbpms = network_speed_kbps / 1000.0
compression_time_ms = data.time_ms[level]
return compressed_kbits / network_speed_kbpms + compression_time_ms
typical_networks_kbps = {
'crappy wifi': 6_000,
'ok wifi': 24_000,
'good wifi': 50_000,
'gigabit ethernet': 1_000_000
}
total_transfer_times = pd.DataFrame({
name: [calculate_total_time_ms(level, speed_kbps)
for level in range(0, 10)]
for name, speed_kbps in typical_networks_kbps.items()
})
total_transfer_times
crappy wifi | ok wifi | good wifi | gigabit ethernet | |
---|---|---|---|---|
0 | 538.653444 | 135.158327 | 65.219173 | 3.887915 |
1 | 120.345500 | 32.548625 | 17.330500 | 3.985375 |
2 | 114.311369 | 30.821135 | 16.349494 | 3.658978 |
3 | 109.867342 | 30.069490 | 16.237863 | 4.108589 |
4 | 103.681108 | 29.285600 | 16.390379 | 5.082261 |
5 | 100.288607 | 29.385287 | 17.095378 | 6.318073 |
6 | 99.472728 | 30.602611 | 18.665124 | 8.196866 |
7 | 100.196489 | 31.669145 | 19.791072 | 9.374916 |
8 | 103.486008 | 35.256516 | 23.430070 | 13.059187 |
9 | 109.263497 | 41.155098 | 29.349642 | 18.997166 |
total_transfer_time_improvement = pd.DataFrame({
network_type:
total_transfer_times[network_type] / total_transfer_times[network_type][0]
for network_type in total_transfer_times
})
total_transfer_time_improvement
crappy wifi | ok wifi | good wifi | gigabit ethernet | |
---|---|---|---|---|
0 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
1 | 0.223419 | 0.240818 | 0.265727 | 1.025067 |
2 | 0.212217 | 0.228037 | 0.250685 | 0.941116 |
3 | 0.203967 | 0.222476 | 0.248974 | 1.056759 |
4 | 0.192482 | 0.216676 | 0.251312 | 1.307194 |
5 | 0.186184 | 0.217414 | 0.262122 | 1.625054 |
6 | 0.184669 | 0.226420 | 0.286191 | 2.108293 |
7 | 0.186013 | 0.234311 | 0.303455 | 2.411296 |
8 | 0.192120 | 0.260853 | 0.359251 | 3.358918 |
9 | 0.202846 | 0.304495 | 0.450016 | 4.886209 |
fig, ax = plt.subplots()
ax.set_ylabel('Compression + Transfer Time (ms)')
ax.set_xlabel('gzip Level')
ax.set_title('Effect of compression level on request time')
for network_type in total_transfer_times:
ax.scatter(total_transfer_times.index, total_transfer_times[network_type], label=network_type)
ax.set_yscale('log')
ax.grid()
ax.legend()
plt.show()
fig, (ax1, ax2) = plt.subplots(2, 1, sharex=True)
ax.set_ylabel('Compression + Transfer Time (ms)')
ax2.set_xlabel('gzip Level')
ax1.set_title('Effect of compression level on request time')
for network_type in total_transfer_time_improvement:
for ax in [ax1, ax2]:
ax.scatter(total_transfer_time_improvement.index,
total_transfer_time_improvement[network_type], label=network_type)
ax1.set_ylim(0.9, 0.9 + .25)
ax2.set_ylim(0.15, 0.15 + .25)
ax1.grid()
ax1.legend()
ax2.grid()
ax1.spines['bottom'].set_visible(False)
ax2.spines['top'].set_visible(False)
ax1.xaxis.tick_top()
ax1.tick_params(labeltop=False) # don't put tick labels at the top
ax2.xaxis.tick_bottom()
plt.show()
Conclusion
It’s fairly clear that no matter what you do, the network won’t be the bottleneck when using gigabit ethernet.
However, it seems like the performance sweet-spot for the type of slow wifi links frequently found in these SBCs is around gzip level 4. There’s a fairly strong knee in the speed of gzip at that point, even on a desktop CPU.
Other Algorithms
Brotli is the only other compression algorithm widely supported by browsers, but its compression ratio is much worse, which makes it unsuitiable for on-the-fly compression on low-end hardware.
It looks like the people behind zstd are looking to add it to browsers. When it’s widely supported, this will likely beat gzip in this application, or at the very least allow for more fine-grained tuning of the compression ratio.
Notebook Download
This document was built with a Jupyter Notebook. Get it here. The code licensed under Apache-2.0.