mirror of
https://github.com/google/wuffs.git
synced 2026-01-18 17:11:32 +01:00
And some minor clean-ups. There should be no semantic changes to the documentation in this patch.
60 lines
3.5 KiB
Markdown
60 lines
3.5 KiB
Markdown
# Background
|
|
|
|
Decoding untrusted data, such as images downloaded from across the web, has a
|
|
long history of security vulnerabilities. As of 2019, libpng is over 20 years
|
|
old, and the [PNG specification is dated 2003](https://www.w3.org/TR/PNG/), but
|
|
that well examined C library is still getting [CVE's published in
|
|
2019](https://www.cvedetails.com/vulnerability-list/vendor_id-7294/year-2019/Libpng.html).
|
|
|
|
Sandboxing and fuzzing can mitigate the danger, but they are reactions to C's
|
|
fundamental unsafety. Newer programming languages remove entire classes of
|
|
potential security bugs. Buffer overflows and null pointer dereferences are
|
|
amongst the most well known.
|
|
|
|
Less well known are integer overflow bugs. Offset-length pairs, defining a
|
|
sub-section of a file, are seen in many file formats, such as OpenType fonts
|
|
and PDF documents. A conscientious C programmer might think to check that a
|
|
section of a file or a buffer is within bounds by writing `if (offset + length
|
|
< end)` before processing that section, but that addition can silently
|
|
overflow, and a maliciously crafted file might bypass the check.
|
|
|
|
A variation on this theme is where `offset` is a pointer, exemplified by
|
|
[capnproto's
|
|
CVE-2017-7892](https://github.com/sandstorm-io/capnproto/blob/master/security-advisories/2017-04-17-0-apple-clang-elides-bounds-check.md)
|
|
and [another
|
|
example](https://www.blackhat.com/docs/us-14/materials/us-14-Rosenberg-Reflections-on-Trusting-TrustZone.pdf).
|
|
For a pointer-typed offset, witnessing such a vulnerability can depend on both
|
|
the malicious input itself and the addresses of the memory the software used to
|
|
process that input. Those addresses can vary from run to run and from system to
|
|
system, e.g. 32-bit versus 64-bit systems and whether dynamically allocated
|
|
memory can have sufficiently high address values, and that variability makes it
|
|
harder to reproduce and to catch such subtle bugs from fuzzing.
|
|
|
|
In C, some integer overflow is *undefined behavior*, as per [the C99 spec
|
|
section 3.4.3](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf). In
|
|
Go, integer overflow is [silently
|
|
ignored](https://golang.org/ref/spec#Integer_overflow). In Rust, integer
|
|
overflow is [checked at run time in debug mode and silently ignored in release
|
|
mode](http://huonw.github.io/blog/2016/04/myths-and-legends-about-integer-overflow-in-rust/)
|
|
by default, as the run time performance penalty was deemed too great. In Swift,
|
|
it's a [run time
|
|
error](https://developer.apple.com/library/content/documentation/Swift/Conceptual/Swift_Programming_Language/AdvancedOperators.html#//apple_ref/doc/uid/TP40014097-CH27-ID37).
|
|
In D, it's [configurable](http://dconf.org/2017/talks/alexandrescu.pdf). Other
|
|
languages like Python and Haskell can automatically spill into 'big integers'
|
|
larger than 64 bits, but this can have a performance impact when such integers
|
|
are used in inner loops.
|
|
|
|
Even if overflow is checked, it is usually checked at run time. Similarly,
|
|
modern languages do their bounds checking at run time. An expression like
|
|
`a[i]` is really `if ((0 <= i) && (i < a.length)) { use a[i] } else { throw }`,
|
|
in mangled pseudo-code. Compilers for these languages can often eliminate many
|
|
of these bounds checks, e.g. if `i` is an iterator index, but not always all of
|
|
them.
|
|
|
|
The run time cost is small, measured in nanoseconds. But if an image decoding
|
|
library has to eat this cost per pixel, and you have a megapixel image, then
|
|
nanoseconds become milliseconds, and milliseconds can matter.
|
|
|
|
In comparison, in Wuffs, all bounds checks and arithmetic overflow checks
|
|
happen at compile time, with zero run time overhead.
|