wuffs/doc/background.md

# Background

Decoding untrusted data, such as images downloaded from across the web, has a
long history of security vulnerabilities. As of 2019, libpng is over 20 years
old, and the [PNG specification is dated 2003](https://www.w3.org/TR/PNG/), but
that well examined C library is still getting [CVE's published in
2019](https://www.cvedetails.com/vulnerability-list/vendor_id-7294/year-2019/Libpng.html).

Sandboxing and fuzzing can mitigate the danger, but they are reactions to C's
fundamental unsafety. Newer programming languages remove entire classes of
potential security bugs. Buffer overflows and null pointer dereferences are
amongst the most well known.

Less well known are integer overflow bugs. Offset-length pairs, defining a
sub-section of a file, are seen in many file formats, such as OpenType fonts
and PDF documents. A conscientious C programmer might think to check that a
section of a file or a buffer is within bounds by writing `if (offset + length
< end)` before processing that section, but that addition can silently
overflow, and a maliciously crafted file might bypass the check.

A variation on this theme is where `offset` is a pointer, exemplified by
[capnproto's
CVE-2017-7892](https://github.com/sandstorm-io/capnproto/blob/master/security-advisories/2017-04-17-0-apple-clang-elides-bounds-check.md)
and [another
example](https://www.blackhat.com/docs/us-14/materials/us-14-Rosenberg-Reflections-on-Trusting-TrustZone.pdf).
For a pointer-typed offset, witnessing such a vulnerability can depend on both
the malicious input itself and the addresses of the memory the software used to
process that input. Those addresses can vary from run to run and from system to
system, e.g. 32-bit versus 64-bit systems and whether dynamically allocated
memory can have sufficiently high address values, and that variability makes it
harder to reproduce and to catch such subtle bugs from fuzzing.

In C, some integer overflow is *undefined behavior*, as per [the C99 spec
section 3.4.3](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf). In
Go, integer overflow is [silently
ignored](https://golang.org/ref/spec#Integer_overflow). In Rust, integer
overflow is [checked at run time in debug mode and silently ignored in release
mode](http://huonw.github.io/blog/2016/04/myths-and-legends-about-integer-overflow-in-rust/)
by default, as the run time performance penalty was deemed too great. In Swift,
it's a [run time
error](https://developer.apple.com/library/content/documentation/Swift/Conceptual/Swift_Programming_Language/AdvancedOperators.html#//apple_ref/doc/uid/TP40014097-CH27-ID37).
In D, it's [configurable](http://dconf.org/2017/talks/alexandrescu.pdf). Other
languages like Python and Haskell can automatically spill into 'big integers'
larger than 64 bits, but this can have a performance impact when such integers
are used in inner loops.

Even if overflow is checked, it is usually checked at run time. Similarly,
modern languages do their bounds checking at run time. An expression like
`a[i]` is really `if ((0 <= i) && (i < a.length)) { use a[i] } else { throw }`,
in mangled pseudo-code. Compilers for these languages can often eliminate many
of these bounds checks, e.g. if `i` is an iterator index, but not always all of
them.

The run time cost is small, measured in nanoseconds. But if an image decoding
library has to eat this cost per pixel, and you have a megapixel image, then
nanoseconds become milliseconds, and milliseconds can matter.

In comparison, in Wuffs, all bounds checks and arithmetic overflow checks
happen at compile time, with zero run time overhead.