mirror of
https://github.com/google/wuffs.git
synced 2026-01-18 17:11:32 +01:00
And some minor clean-ups. There should be no semantic changes to the documentation in this patch.
233 lines
13 KiB
Markdown
233 lines
13 KiB
Markdown
# Related Work
|
|
|
|
Wuffs is an industrial project, not an academic project, and this Related Work
|
|
section is not as extensive or as rigorous as would be found in an academic
|
|
thesis or journal article. The underlying goal is usefulness, not novelty. For
|
|
Wuffs' users, it'd be perfectly fine to have multiple implementations of a good
|
|
idea, even if the later ones wouldn't earn a Ph. D.
|
|
|
|
A number of other projects are listed below. Wuffs is not exactly like any of
|
|
them, for one reason or another. We're not claiming that "use a state of the
|
|
art theorem prover" or "write it in Rust" are unworkable approaches. They're
|
|
just different approaches, with different trade-offs.
|
|
|
|
For example, Wuffs is primarily concerned about *safety* and *performance*.
|
|
Formal *correctness*, such as being able to mathematically express that "when
|
|
this function returns, that array's elements will be sorted in increasing
|
|
order", is less of a concern for Wuffs the Language than for other projects.
|
|
This is especially as higher level concerns like "correctly implements the JPEG
|
|
specification" are less amenable to formalization than foundational concepts
|
|
like searching and sorting. Even with something as mathematical and
|
|
conceptually simple as Quicksort, compare a two-line [Haskell
|
|
implementation](https://rosettacode.org/wiki/Sorting_algorithms/Quicksort#Haskell)
|
|
with the essay that is an [Idris
|
|
implementation](https://github.com/bmsherman/blog/wiki/Quicksort-in-Idris).
|
|
Proving *correctness* is an admirable goal, but it is not Wuffs' goal.
|
|
|
|
For Wuffs the Library, confidence about correctness is instead based on
|
|
testing, both unit testing and for e.g. media codecs, corpora of test files. A
|
|
small corpus is included in the Wuffs source code repository. Other, much
|
|
larger corpora (e.g. based on a sample of a web crawl), are not included for
|
|
download size and copyright reasons.
|
|
|
|
|
|
## Static Checkers
|
|
|
|
Wuffs is a language by itself, integrated with a compiler, not something
|
|
embedded in the comments of another language's program, supported by a separate
|
|
program checker, as the bounds check elimination affects the generated code.
|
|
|
|
[Checked C](https://www.microsoft.com/en-us/research/project/checked-c/) has a
|
|
similar goal of safety, but aims to be backwards compatible with C, including
|
|
the idiomatic ways in which C code plays fast and loose with pointers and
|
|
arrays, and implicitly inserting run time checks while maintaining existing
|
|
data layout (e.g. the sizeof a struct type) for interoperability with other
|
|
systems. This is a much harder problem than Wuffs' approach of a new, simple,
|
|
domain-specific programming language.
|
|
|
|
Extended Static Checker for Java (ESC/Java) and its successor
|
|
[OpenJML](http://www.openjml.org/), which obviously target the Java programming
|
|
language, similarly has to analyze a language that is more complicated than
|
|
Wuffs. Java is also not commonly used for writing e.g. low level image codecs,
|
|
as the language lacks unsigned integer types, it is garbage collected and
|
|
idiomatic code often allocates.
|
|
|
|
Similarly, [Whiley](http://whiley.org/) is a programming language with extended
|
|
static checking. In contrast to Wuffs, its `int` type uses unbounded
|
|
arithmetic, instead of e.g. 32-bit twos-complement representation. That can be
|
|
easier for theorem provers to reason about, but this can have a performance
|
|
impact when such integers are used in inner loops. In theory, a compiler could
|
|
recognize "an `int` restricted to the range [0, 99]" as a `uint8` instead of a
|
|
potentially 'big integer', but even so, it may improve performance to
|
|
explicitly use a `uint32` here even when a `uint8` would do.
|
|
|
|
[Spark ADA](http://libre.adacore.com/tools/spark-gpl-edition/) targets a subset
|
|
of the Ada programming language, which again is more complicated than Wuffs. It
|
|
also allows for pluggable theorem provers such as Z3, similar to
|
|
[Dafny](http://research.microsoft.com/dafny). This can increase programmer
|
|
productivity in that it can take less effort to prove more programs safe, but
|
|
it also affects reproducibility (when some programs that are provable in one
|
|
programmer's configuration are unprovable in another) and the size of the
|
|
trusted computing base. It's also not obvious up front what effect different
|
|
theorem provers, and their different heuristics, will have on compile times or
|
|
proof times.
|
|
|
|
In constrast, Wuffs' proof system is fast (and dumb) instead of smart (and
|
|
slow). For example, [Vale (Verified Assembly Language for
|
|
Everest)](https://github.com/project-everest/vale) is a promising approach that
|
|
uses Dafny to verify implementations of cryptographic algorithms. However,
|
|
["Vale: Verifying High-Performance Cryptographic Assembly
|
|
Code"](http://www.andrew.cmu.edu/user/bparno/papers/vale.pdf) reports in Table
|
|
1 that verification times are measured in *minutes*. Wuffs aims for
|
|
*sub-second* compilation times, including proofs of (memory) safety. Figure 15
|
|
in that paper suggests that Vale's verification times can grow exponentially in
|
|
some cases, and also goes on to say, "Dafny/Z3 fails to verify the AES key
|
|
inversion for 6 unrolled iterations and 9 unrolled iterations, indicating that
|
|
SMT solvers like Z3 are still occasionally unpredictable". Of course, Vale is
|
|
also tackling a much harder problem than Wuffs: proving correctness, not just
|
|
safety.
|
|
|
|
Once again, we're not claiming that these other approaches are unworkable, or
|
|
not useful, just different, with different trade-offs.
|
|
|
|
|
|
## Why a New Language?
|
|
|
|
Simpler languages are easier to prove things about. Macros, inheritance,
|
|
closures, generics, operator overloading, goto's and built-in container types
|
|
are all useful features, but as mentioned in the top-level
|
|
[README](/README.md), Wuffs is not a general purpose programming language.
|
|
Instead, its focus is on compile-time provable memory safety, with C-like
|
|
performance, for CPU-intensive file format decoders.
|
|
|
|
Nonetheless, Wuffs is an imperative language, not a functional language,
|
|
despite the long history of functional languages and provability. Inner loop
|
|
performance usage matters, and it is easier to match the optimization
|
|
techniques of incumbent C libraries with a C-like language than with a
|
|
functional language. Compared to something like
|
|
[F\*](https://www.fstar-lang.org/), [Idris](https://www.idris-lang.org/) or
|
|
[Liquid Haskell](https://ucsd-progsys.github.io/liquidhaskell-blog/), Wuffs has
|
|
no garbage collector overhead, as it has no garbage collector.
|
|
|
|
Memory usage also matters, again considering per-pixel costs can be multiplied
|
|
by millions of pixels. It is important to represent e.g. an RGBA pixel as four
|
|
u8 values (or one u32), not as four naturally-sized-for-the-CPU integers or
|
|
four 'big integers'.
|
|
|
|
[F\* / KreMLin](https://github.com/FStarLang/kremlin) is a subset of F\* that
|
|
is similar in some sense to Wuffs, in that it transpiles to C and cares about
|
|
both safety and performance. Unlike Wuffs, it is a functional language (with
|
|
dependent types), not an imperative one (with a simpler type system), and uses
|
|
a sophisticated theorem prover like Z3, with the same trade-offs as discussed
|
|
above for Spark ADA and Dafny. It also tackles a more complicated problem, in
|
|
attempting to prove correctness properties, not just safety properties. Further
|
|
reading is in ["Everest: Towards a Verified, Drop-in Replacement of
|
|
HTTPS"](https://project-everest.github.io/assets/snapl2017.pdf) and ["Verified
|
|
Low-Level Programming Embedded in F\*"](https://arxiv.org/pdf/1703.00053.pdf).
|
|
|
|
[Cryptol](https://github.com/GaloisInc/cryptol) is another project that, like
|
|
Everest (including F\* / KreMLin and its HACL\* sub-project, and Dafny / Vale),
|
|
focuses on cryptographic algorithms, rather than Wuffs' focus on file formats,
|
|
and also relies on a sophisticated theorem prover like Z3.
|
|
|
|
Once again, we're not claiming that these other approaches are unworkable, or
|
|
not useful, just different, with different trade-offs.
|
|
|
|
|
|
## Why Not ATS?
|
|
|
|
ATS (Applied Type System) is a statically typed programming language that
|
|
unifies implementation with formal specification. Its goals sound similar to
|
|
Wuffs in the abstract, but they differ in their approach. That's not to say
|
|
that one is better or worse than the other, they're just different.
|
|
|
|
Graydon Hoare, on the topic of ["Extended static checking (ESC), refinement
|
|
types, general dependent-typed
|
|
languages"](https://graydon2.dreamwidth.org/253769.html) says of existing
|
|
approaches, including ATS, "So far, most exercises in this space have ended in
|
|
frustration... the complexity wall here can make other areas of computering
|
|
[sic] look... a bit like child's play. It gets very dense, very fast."
|
|
|
|
In contrast, Wuffs aims to be significantly simpler. The trade-off is, of
|
|
course, that Wuffs is not and does not aim to be a general purpose language.
|
|
ATS might be more powerful and therefore placed higher in the [Blub
|
|
Paradox](http://www.paulgraham.com/avg.html) analogy, but the flip side of that
|
|
is that, unlike Lisp, ATS is much more complicated to learn and to understand,
|
|
perhaps dauntingly so. For example, [its
|
|
manual](http://ats-lang.sourceforge.net/DOCUMENT/INT2PROGINATS/HTML/book1.html)
|
|
mentions that types, sorts, views, viewtypes, dataviews, datatypes and
|
|
dataviewtypes are all similar-sounding but distinct ATS concepts. As of
|
|
February 2018, the [ATS home page](http://www.ats-lang.org/) says that "the
|
|
current implementation... [consists] of more than 180K lines of code" and that
|
|
"in general, one should expect to encounter many unfamiliar programming
|
|
concepts and features in ATS and be prepared to spend a substantial amount of
|
|
time on learning them." In the [words of one
|
|
commentator](https://www.reddit.com/r/rust/comments/7d85sj/puffs_a_new_language_for_parsing_untrusted_files/dpx0hp7/),
|
|
"the cognitive overhead of managing your proof-threading in ATS is much higher
|
|
than managing your lifetimes/borrows in Rust, which is by far the biggest cliff
|
|
of Rust's learning curve. Not to mention ATS is syntactically... challenging".
|
|
|
|
On verification, [ATS' Examples](http://www.ats-lang.org/Examples.html) says
|
|
that "A fully verified implementaion of the [Fibonacci] function in ATS can now
|
|
be given... Please click
|
|
[here](http://www.ats-lang.org/SERVER/MYCODE/Patsoptaas_serve.php?mycode_fil=fibats)
|
|
if you are interested in compiling and running this example on-line." Modifying
|
|
that example's "N" value from 10 to 10000 yields "fibats(10000) = Infinity",
|
|
which is clearly incorrect. Somewhere along the way, there's been a breakdown
|
|
between numbers in the ideal mathematical sense (often used by proof systems)
|
|
and numbers in practice (whether fixed width integers or floating point) as
|
|
actually computed on by CPUs. Regardless of where that breakdown is in this
|
|
case, it doesn't build confidence that, in general, ATS programs are guaranteed
|
|
free of arithmetic overflow despite compiler-verified proofs.
|
|
|
|
Once again, we're not claiming that these other approaches are unworkable, or
|
|
not useful, just different, with different trade-offs.
|
|
|
|
|
|
## Why Not Rust?
|
|
|
|
Rust is a general purpose programming language, which aims for C-like
|
|
performance in general. Wuffs aims for C-like performance specifically for the
|
|
limited problem space of bits-and-bytes level CPU-intensive computation (while
|
|
doing that safely). As one data point, a [GIF decoding
|
|
benchmark](https://github.com/google/wuffs/commit/9a463d46) measures Wuffs as
|
|
4x faster than two separate Rust implementations, including the one that
|
|
[crates.io calls "gif"](https://crates.io/crates/gif), and one based on Nom
|
|
(discussed below).
|
|
|
|
Wuffs also transpiles to standard C99 code with no dependencies, and should run
|
|
anywhere that has a C99 compliant C compiler. An existing C/C++ project, such
|
|
as the Chromium web browser, can choose to replace e.g. libpng with Wuffs PNG,
|
|
without needing any additional toolchains. Sure, languages like D and Rust have
|
|
great binary-level interop with C code, and Mozilla are reporting progress with
|
|
parsing media formats in Rust, but it's still an operational hurdle to grow a
|
|
project's build process and to assess build times and binary sizes.
|
|
|
|
[Nom](https://github.com/Geal/nom) is a parser combinator library in Rust,
|
|
described in ["Writing parsers like it is
|
|
2017"](http://spw17.langsec.org/papers/chifflier-parsing-in-2017.pdf). Wuffs
|
|
differs from nom by itself, in that Wuffs is an end to end implementation, not
|
|
limited to that part of a file format that is easily expressible as a formal
|
|
grammar. In particular, it also handles entropy encodings such as LZW (for
|
|
GIF), ZLIB (for PNG) and Huffman (for JPEG, TODO). Wuffs differs from nom
|
|
combined with other Rust code (e.g. a Rust LZW implementation) in that bounds
|
|
and overflow checks are not just ubiquitous but also completely eliminated at
|
|
compile time.
|
|
|
|
[Kaitai Struct](http://kaitai.io/) is in a similar space, generating safe
|
|
parsers for multiple target programming languages from one declarative
|
|
specification. Again, Wuffs differs in that it is a complete (and performant)
|
|
end to end implementation, not just for the structured parts of a file format.
|
|
Repeating a point in the previous paragraph, the difficulty in decoding the GIF
|
|
format isn't in the regularly-expressible part of the format, it's in the LZW
|
|
compression. [Kaitai's GIF parser](http://formats.kaitai.io/gif/index.html)
|
|
returns the compressed LZW data as an opaque blob.
|
|
|
|
Once again, we're not claiming that these other approaches are unworkable, or
|
|
not useful, just different, with different trade-offs.
|
|
|
|
|
|
---
|
|
|
|
Updated on February 2018.
|