1 | initial version |

But if it is odd length, it is coming 1 less.

That is because of this line:

```
uint16_t const * const dest_ip_16 = (uint16_t*)dest_ip_8;
```

You are converting a point to char to a pointer to short (16 bit word). Since words must be word-aligned, it cannot point to an odd memory address (`char*`

can), so it is rounded down (lowest bit cleared).

But why bother converting it in the first place?

Note that an overflow of an addition can only be 1 digit (the value 1, that is why CPU's have a 'carry bit flag'). Instead of the check for overflow after every addition, it is faster to do that after all additions. Also calling a function (check_sum_step) for just one calculation makes your code slow. That is fine if you just want to understand how it works, but don't use it in a real application.

The TCP/UDP/IP checksums are done in one's complement arithmetic. This has a unique property: it doesn't matter if the calculation is done in Big Endian or Little Endian, so no need for htons() calls.

It also means that you can do the additions in bigger chucks (32 or 64 bits).

But all modern processors use two's complement arithmetic; to avoid having both +0 and -0. (To do `-a`

in 1's is in c-code `neg_a = ~a`

, but in 2's `neg_a = -a`

or `neg_a = ~a + 1`

.) So this needs to be corrected 2's to 1's complement by adding the overflow.

So you can to the calculation using a 64-bit variable `unsigned long long sum`

and add in chucks of 32-bit `sum += *buf++;`

(hey, an IPv4 address is also 32-bit!), add remaining if less than 32 bit.

And finally 'fold' 64 bit to 32 `sum = (sum >> 32) + (sum & 0xFFFFFFFF); sum += (sum >> 32);`

and 32 to 16 `sum = (sum >> 16) + (sum & 0xFFFF); sum += (sum >> 16);`

. Finally return the one's complement result `return ~sum;`

If you want to dig deeper into how-to and performance, than read this: https://locklessinc.com/articles/tcp_checksum/

2 | No.2 Revision |

But if it is odd length, it is coming 1 less.

That is because of this line:

```
uint16_t const * const dest_ip_16 = (uint16_t*)dest_ip_8;
```

You are converting a point to char to a pointer to short (16 bit word). Since words must be word-aligned, it cannot point to an odd memory address (`char*`

can), so it is rounded down (lowest bit cleared).

But why bother converting it in the first place?

Note that an overflow of an addition can only be 1 digit (the value 1, that is why CPU's have a 'carry bit flag'). Instead of the check for overflow after every addition, it is faster to do that after all additions. Also calling a function (check_sum_step) for just one calculation makes your code slow. That is fine if you just want to understand how it works, but don't use it in a real application.

The TCP/UDP/IP checksums are done in one's complement arithmetic. This has a unique property: it doesn't matter if the calculation is done in Big Endian or Little Endian, so no need for htons() calls.

It also means that you can do the additions in bigger chucks (32 or 64 bits).

But all modern processors use two's complement arithmetic; to avoid having both +0 and -0. (To do `-a`

in 1's is in c-code `neg_a = ~a`

, but in 2's `neg_a = -a`

or `neg_a = ~a + 1`

.) So this needs to be corrected 2's to 1's complement by adding the overflow.

So you can ~~to ~~do the calculation using a 64-bit variable `unsigned long long sum`

and add in chucks of 32-bit `sum += *buf++;`

(hey, an IPv4 address is also 32-bit!), add remaining if less than 32 bit.

And finally 'fold' 64 bit to 32 `sum = (sum >> 32) + (sum & 0xFFFFFFFF); sum += (sum >> 32);`

and 32 to 16 `sum = (sum >> 16) + (sum & 0xFFFF); sum += (sum >> 16);`

. Finally return the one's complement result `return ~sum;`

If you want to dig deeper into how-to and performance, than read this: https://locklessinc.com/articles/tcp_checksum/

3 | No.3 Revision |

But if it is odd length, it is coming 1 less.

That is because of this line:

```
uint16_t const * const dest_ip_16 = (uint16_t*)dest_ip_8;
```

You are converting a ~~point ~~pointer to char to a pointer to short (16 bit word). Since words must be word-aligned, it cannot point to an odd memory address (`char*`

can), so it is rounded down (lowest bit cleared).

But why bother converting it in the first place?

Note that an overflow of an addition can only be 1 digit (the value 1, that is why CPU's have a 'carry bit flag'). Instead of the check for overflow after every addition, it is faster to do that after all additions. Also calling a function (check_sum_step) for just one calculation makes your code slow. That is fine if you just want to understand how it works, but don't use it in a real application.

The TCP/UDP/IP checksums are done in one's complement arithmetic. This has a unique property: it doesn't matter if the calculation is done in Big Endian or Little Endian, so no need for htons() calls.

It also means that you can do the additions in bigger chucks (32 or 64 bits).

But all modern processors use two's complement arithmetic; to avoid having both +0 and -0. (To do `-a`

in 1's is in c-code `neg_a = ~a`

, but in 2's `neg_a = -a`

or `neg_a = ~a + 1`

.) So this needs to be corrected 2's to 1's complement by adding the overflow.

So you can do the calculation using a 64-bit variable `unsigned long long sum`

and add in chucks of 32-bit `sum += *buf++;`

(hey, an IPv4 address is also 32-bit!), add remaining if less than 32 bit.

And finally 'fold' 64 bit to 32 `sum = (sum >> 32) + (sum & 0xFFFFFFFF); sum += (sum >> 32);`

and 32 to 16 `sum = (sum >> 16) + (sum & 0xFFFF); sum += (sum >> 16);`

. Finally return the one's complement result `return ~sum;`

If you want to dig deeper into how-to and performance, than read this: https://locklessinc.com/articles/tcp_checksum/

4 | No.4 Revision |

But if it is odd length, it is coming 1 less.

That is because of this line:

```
uint16_t const * const dest_ip_16 = (uint16_t*)dest_ip_8;
```

You are converting a pointer to char to a pointer to short (16 bit word). Since words must be word-aligned, it cannot point to an odd memory address (`char*`

can), so it is rounded down (lowest bit cleared).

But why bother converting it in the first place?

The TCP/UDP/IP checksums are done in one's complement arithmetic. This has a unique property: it doesn't matter if the calculation is done in Big Endian or Little Endian, so no need for htons() calls.

It also means that you can do the additions in bigger ~~chucks ~~chunks (32 or 64 bits).

But all modern processors use two's complement arithmetic; to avoid having both +0 and -0. (To do `-a`

in 1's is in c-code `neg_a = ~a`

, but in 2's `neg_a = -a`

or `neg_a = ~a + 1`

.) So this needs to be corrected 2's to 1's complement by adding the overflow.

So you can do the calculation using a 64-bit variable `unsigned long long sum`

and add in chucks of 32-bit `sum += *buf++;`

(hey, an IPv4 address is also 32-bit!), add remaining if less than 32 bit.

And finally 'fold' 64 bit to 32 `sum = (sum >> 32) + (sum & 0xFFFFFFFF); sum += (sum >> 32);`

and 32 to 16 `sum = (sum >> 16) + (sum & 0xFFFF); sum += (sum >> 16);`

. Finally return the one's complement result `return ~sum;`

5 | No.5 Revision |

But if it is odd length, it is coming 1 less.

That is because of this line:

```
uint16_t const * const dest_ip_16 = (uint16_t*)dest_ip_8;
```

You are converting a pointer to char to a pointer to short (16 bit word). Since words must be word-aligned, it cannot point to an odd memory address (`char*`

can), so it is rounded down (lowest bit cleared).

But why bother converting it in the first place?

The TCP/UDP/IP checksums are done in one's complement arithmetic. This has a unique property: it doesn't matter if the calculation is done in Big Endian or Little Endian, so no need for htons() calls.

It also means that you can do the additions in bigger chunks (32 or 64 bits).

But all modern processors use two's complement arithmetic; to avoid having both +0 and -0. (To do `-a`

in 1's is in c-code `neg_a = ~a`

, but in 2's `neg_a = -a`

or `neg_a = ~a + 1`

.) So this needs to be corrected 2's to 1's complement by adding the overflow.

So you can do the calculation using a 64-bit variable `unsigned long long sum`

and add in ~~chucks ~~chunks of 32-bit `sum += *buf++;`

(hey, an IPv4 address is also 32-bit!), add remaining if less than 32 bit.

~~And finally ~~After that 'fold' 64 bit to 32 `sum = (sum >> 32) + (sum & 0xFFFFFFFF); sum += (sum >> 32);`

and 32 to 16 `sum = (sum >> 16) + (sum & 0xFFFF); sum += (sum >> 16);`

. Finally return the one's complement result `return ~sum;`

6 | No.6 Revision |

But if it is odd length, it is coming 1 less.

That is because of this line:

```
uint16_t const * const dest_ip_16 = (uint16_t*)dest_ip_8;
```

You are converting a pointer to char to a pointer to short (16 bit word). Since words must be word-aligned, it cannot point to an odd memory address (`char*`

can), so it is rounded down (lowest bit cleared).

But why bother converting it in the first place?

The TCP/UDP/IP checksums are done in one's complement arithmetic. This has ~~a ~~an unique property: it doesn't matter if the calculation is done in Big Endian or Little Endian, so no need for htons() calls.

It also means that you can do the additions in bigger chunks (32 or 64 bits).

But all modern processors use two's complement arithmetic; to avoid having both +0 and -0. (To do `-a`

in 1's is in c-code `neg_a = ~a`

, but in 2's `neg_a = -a`

or `neg_a = ~a + 1`

.) So this needs to be corrected 2's to 1's complement by adding the overflow.

So you can do the calculation using a 64-bit variable `unsigned long long sum`

and add in chunks of 32-bit `sum += *buf++;`

(hey, an IPv4 address is also 32-bit!), add remaining if less than 32 bit.

After that 'fold' 64 bit to 32 `sum = (sum >> 32) + (sum & 0xFFFFFFFF); sum += (sum >> 32);`

and 32 to 16 `sum = (sum >> 16) + (sum & 0xFFFF); sum += (sum >> 16);`

. Finally return the one's complement result `return ~sum;`