Our program processes network packets, in particular, TCP / IP / etc headers. In them, numerical values - offsets, counters, addresses - are presented in network byte order (big-endian); we are working on x86 (little-endian). In standard structures describing headers, these fields are represented by simple integer types (
uint32_t
,
uint16_t
). After several bugs due to the fact that we forgot to convert the byte order, we decided to replace the field types with classes that prohibit implicit conversions and atypical operations. Under the cut is a utilitarian code and specific examples of errors that strict typing revealed.
Byte order
Likbez for those who are not aware of the byte order (endianness, byte order). In more detail
already was on "Habré" .
In the usual notation of numbers, they go from the oldest (left) to the youngest (right) on the left: 432
10 = 4 × 10
2 + 3 × 10
1 + 2 × 10
0 . Integer data types have a fixed size, for example, 16 bits (numbers from 0 to 65535). They are stored in memory as two bytes, for example, 432
10 = 01b0
16 , that is, bytes 01 and b0.
Print the bytes of this number:
#include <cstdio> // printf() #include <cstdint> // uint8_t, uint16_t int main() { uint16_t value = 0x01b0; printf("%04x\n", value); const auto bytes = reinterpret_cast<const uint8_t*>(&value); for (auto i = 0; i < sizeof(value); i++) { printf("%02x ", bytes[i]); } }
On conventional Intel or AMD (x86) processors, we get the following:
01b0 b0 01
Bytes in memory are located from the youngest to the oldest, and not as when writing numbers. This order is called
little-endian (LE). The same is true for 4-byte numbers. The byte order is determined by the processor architecture. The “native” order for the processor is also called the
order of the CPU or host (CPU / host byte order). In our case, host byte order is little-endian.
However, the Internet was not born on x86, and there the order of bytes was different -
from oldest to youngest (big-endian, BE). They began to use it in the headers of network protocols (IP, TCP, UDP), which is why big-endian is also called
network byte order.
Example: port 443 (1bb
16 ), which uses HTTPS, is written in the TCP headers bytes bb 01, which when read will give bb01
16 = 47873.
The byte order of a number can be converted. For example, for
uint16_t
there is a standard function
htons()
(
h ost
to n etwork for
s hort integer - from host order to network order for short integers) and the reverse
ntohs()
. Similarly, for
uint32_t
there is
htonl()
and
ntohl()
(long is a long integer).
Unfortunately, the compiler does not know where the particular value of a variable of type
uint32_t
came from, and does not warn if you mix values with different byte order and get an incorrect result.
Strong typing
The risk of confusing byte order is obvious, how to deal with it?
- Code review. This is a mandatory procedure in our project. Unfortunately, the testers least of all want to delve into the code that manipulates bytes: “I see
htons()
- probably, the author thought about everything”. - Discipline, rules like: BE only in packages, all variables in LE. It is not always reasonable, for example, if you need to check ports against a hash table, it is more efficient to store them in the network byte order and search “as is”.
- Tests. As you know, they do not guarantee the absence of errors. Data can be unsuccessfully matched (1.1.1.1 does not change when converting byte order) or adjusted to the result.
When working with a network, you cannot ignore byte order, so I would like to make it impossible to ignore it when writing code. Moreover, we do not just have a number in BE - it is a port number, IP address, TCP sequence number, checksum. One cannot be assigned to another, even if the number of bits matches.
The solution is known - strict typing, that is, separate types for ports, addresses, numbers. In addition, these types must support BE / LE conversion.
Boost.Endian does not suit us, because there is no Boost in the project.
The project size is about 40 thousand lines in C ++ 17. If you create safe wrapper types and rewrite the header structures on them, all places where there is work with BE will automatically stop compiling. We'll have to go through them all once, but the new code will only be safe.
Big-endian class number #include <cstdint> #include <iosfwd> #define PACKED __attribute__((packed)) constexpr auto bswap(uint16_t value) noexcept { return __builtin_bswap16(value); } constexpr auto bswap(uint32_t value) noexcept { return __builtin_bswap32(value); } template<typename T> struct Raw { T value; }; template<typename T> Raw(T) -> Raw<T>; template<typename T> struct BigEndian { using Underlying = T; using Native = T; constexpr BigEndian() noexcept = default; constexpr explicit BigEndian(Native value) noexcept : _value{bswap(value)} {} constexpr BigEndian(Raw<Underlying> raw) noexcept : _value{raw.value} {} constexpr Underlying raw() const { return _value; } constexpr Native native() const { return bswap(_value); } explicit operator bool() const { return static_cast<bool>(_value); } bool operator==(const BigEndian& other) const { return raw() == other.raw(); } bool operator!=(const BigEndian& other) const { return raw() != other.raw(); } friend std::ostream& operator<<(std::ostream& out, const BigEndian& value) { return out << value.native(); } private: Underlying _value{}; } PACKED;
- A header file with this type will be included everywhere, so instead of a heavy
<iostream>
a lightweight <iosfwd>
. - Instead of
htons()
, etc. - fast compiler intrinsics. In particular, they are constexpr
constant propagation, so constexpr
constructors. - Sometimes there is already a
uint16_t
/ uint32_t
value located in BE. The Raw<T>
structure with deduction guide allows you to conveniently create a BigEndian<T>
from it.
The controversial point here is
PACKED
: it is believed that packaged structures are less optimizable. The only answer is to measure. Our code benchmarks did not reveal any slowdowns. In addition, in the case of network packets, the position of the fields in the header is still fixed.
In most cases, BE does not need any operations other than comparison. Sequence numbers need to be folded correctly with LE:
using BE16 = BigEndian<uint16_t>; using BE32 = BigEndian<uint32_t>; struct Seqnum : BE32 { using BE32::BE32; template<typename Integral> Seqnum operator+(Integral increment) const { static_assert(std::is_integral_v<Integral>); return Seqnum{static_cast<uint32_t>(native() + increment)}; } } PACKED; struct IP : BE32 { using BE32::BE32; } PACKED; struct L4Port : BE16 { using BE16::BE16; } PACKED;
Secure TCP Header Structure enum TCPFlag : uint8_t { TH_FIN = 0x01, TH_SYN = 0x02, TH_RST = 0x04, TH_PUSH = 0x08, TH_ACK = 0x10, TH_URG = 0x20, TH_ECE = 0x40, TH_CWR = 0x80, }; using TCPFlags = std::underlying_type_t<TCPFlag>; struct TCPHeader { L4Port th_sport; L4Port th_dport; Seqnum th_seq; Seqnum th_ack; uint32_t th_flags2 : 4; uint32_t th_off : 4; TCPFlags th_flags; BE16 th_win; uint16_t th_sum; BE16 th_urp; uint16_t header_length() const { return th_off << 2; } void set_header_length(uint16_t len) { th_off = len >> 2; } uint8_t* payload() { return reinterpret_cast<uint8_t*>(this) + header_length(); } const uint8_t* payload() const { return reinterpret_cast<const uint8_t*>(this) + header_length(); } }; static_assert(sizeof(TCPHeader) == 20);
-
TCPFlag
could be made an enum class
, but in practice only two operations are performed on flags: checking the entry ( &
) or replacing the flags with a combination ( |
) - there is no confusion. - Bit fields are left primitive, but safe access methods are made.
- Field names are left classic.
results
Most edits were trivial. The code is cleaner:
auto tcp = packet->tcp_header(); - return make_response(packet, - cookie_make(packet, rte_be_to_cpu_32(tcp->th_seq)), - rte_cpu_to_be_32(rte_be_to_cpu_32(tcp->th_seq) + 1), - TH_SYN | TH_ACK); + return make_response(packet, cookie_make(packet, tcp->th_seq.native()), + tcp->th_seq + 1, TH_SYN | TH_ACK); }
In part, the types documented the code:
- void check_packet(int64_t, int64_t, uint8_t, bool); + void check_packet(std::optional<Seqnum>, std::optional<Seqnum>, TCPFlags, bool);
Suddenly, it turned out that you can incorrectly read the size of the TCP window, while unit tests will pass and even traffic will be chased:
// window size auto wscale_ratio = options().wscale_dst - options().wscale_src; if (wscale_ratio < 0) { - auto window_size = header.window_size() / (1 << (-wscale_ratio)); + auto window_size = header.window_size().native() / (1 << (-wscale_ratio)); if (header.window_size() && window_size < 1) { window_size = WINDOW_SIZE_MIN; } header_out.window_size(window_size); } else { - auto window_size = header.window_size() * (1 << (wscale_ratio)); + auto window_size = header.window_size().native() * (1 << (wscale_ratio)); if (window_size > WINDOW_SIZE_MAX) { window_size = WINDOW_SIZE_MAX; }
Example of a logical error: the developer of the original code thought that the function accepts BE, although in fact it is not. When trying to use
Raw{}
instead of
0
program simply did not compile (fortunately, this is just a unit test). Immediately we see an unsuccessful choice of data: the error would be found sooner if it were not used 0, which is the same in any byte order.
- auto cookie = cookie_make_inner(tuple, rte_be_to_cpu_32(0)); + auto cookie = cookie_make_inner(tuple, 0);
A similar example: first, the compiler pointed out the mismatch between the
def_seq
and
cookie
types, then it became clear why the test passed earlier - such constants.
- const uint32_t def_seq = 0xA7A7A7A7; - const uint32_t def_ack = 0xA8A8A8A8; + const Seqnum def_seq{0x12345678}; + const Seqnum def_ack{0x90abcdef}; ... - auto cookie = rte_be_to_cpu_32(_tcph->th_ack); + auto cookie = _tcph->th_ack; ASSERT_NE(def_seq, cookie);
Summary
The bottom line is:
- Found one bug and several logical errors in unit tests.
- Refactoring made me sort out dubious places, readability increased.
- Performance has been preserved, but could have decreased - benchmarks are needed.
All three points are important to us, so we think refactoring was worth it.
Do you insure yourself against mistakes with strict types?