The SeL4 Foundation: What and Why

turbinerneiter · on April 7, 2020

I'd be really happy to see seL4 thrive and I like that this blog-post shows that the people behind it understand what could potential hold it back - the infrastructure around it. It needs what GNU and systemd are for Linux. And it needs an Arduino/Raspberry Pi - a halo platform that I can buy and play around with.

What I wonder is how this formal verification works. seL4 is written in C and apparently the formal verification is done in a different language. Is it possible to have a programming language where the compiler does the formal verification?

MaxBarraclough · on April 7, 2020

> What I wonder is how this formal verification works

Curiously it involves Haskell for its specification. [0]

They've also done some very impressive work verifying that the ARM machine code corresponds to the verified C code. They did this not by using a formally verified compiler, but by using GCC and then verifying the correspondence afterwards.

> Is it possible to have a programming language where the compiler does the formal verification?

Yes, but these languages are not all that approachable.

There's SPARK, which is Ada-based and has seen real-world use in various life-or-death applications like military aviation.

There's also ZZ, which is in early days, and which aims to be closer to C. [2] [3]

There's also Perfect Developer, a proprietary payware product, which does impressive things adding formal verification machinery to C and C++. [4]

[0] https://en.wikipedia.org/wiki/L4_microkernel_family#High_ass...

[1] https://en.wikipedia.org/wiki/SPARK_(programming_language)

[2] https://github.com/aep/zz

[3] https://news.ycombinator.com/item?id=22245409

[4] https://www.eschertech.com/articles/index.php

wahern · on April 7, 2020

How does Perfect Developer compare to Frama-C? Is it more capable or maybe simply more convenient for MISRA verification?

For the unfamiliar, see http://frama-c.com/ and https://en.wikipedia.org/wiki/Frama-C

saithound · on April 7, 2020

> Is it possible to have a programming language where the compiler does the formal verification?

It is possible, but using separate languages (as seL4 does) has certain benefits. For example:

- Kernel developers tend to have C experience. If the OS is writen in C, they have a much larger pool to hire from.

- Any such language would have to be acceptable both for the people who develop the OS and for the people who do the verification: I suspect it's not easy to find something that pleases both crowds.

- C compilers get better all the time. Every time someone improves the gcc code generator, seL4 automatically gets a bit faster. Less mainstream programming languages get better only if you maintain an expensive in-house team working on the compiler.

cwzwarich · on April 7, 2020

> C compilers get better all the time. Every time someone improves the gcc code generator, seL4 automatically gets a bit faster. Less mainstream programming languages get better only if you maintain an expensive in-house team working on the compiler.

Doesn't SeL4 need -O1 codegen to make their extraction from the generated code (and the thus the rest of the formal proof) work out?

saithound · on April 7, 2020

The 2017 paper [1] says that they "can handle most of seL4 even at optimisation level 2", presumably it got better in the meantime.

Also, it's not like code generation is the only advantage of using a well-supported, externally developed compiler. It's just the one I picked as an illustrative example. GCC releases improve code generation, general usability, diagnostic format support, profilers, etc., at a rate that would be hard to match for a niche language, especially without hiring a dedicated compiler development team.

[1] https://ts.data61.csiro.au/publications/nicta_full_text/6449...

[2] https://gcc.gnu.org/gcc-9/changes.html

hinkley · on April 7, 2020

Probably an Unsolved Problem in computer programming:

If you are trying to solve an old problem a new way, is it necessarily true that having a large pool of applicants is an advantage? It could be a liability.

On the one hand, there could be more variance in a large pool, so you might find a lot of unconventional thinkers who think like you. On the other hand, large groups have social mores, and so the dogma might be difficult to overcome, like the old "people can write fortran in any language" problem.

MaxBarraclough · on April 7, 2020

Formal methods aren't just an academic curiosity, they have been used for real-world OS work. https://en.wikipedia.org/wiki/SLAM_project

> Less mainstream programming languages get better only if you have an expensive in-house team working on the compiler.

Not if you defer optimisation to an external (and actively maintained) backend like LLVM. Rust and Zig do this.

antoinealb · on April 7, 2020

"Better" is not only a compiler backend thing. Things like better static analysis could also matter. I believe some optimisations are also done by the front end but I don't know enough about compiler design to expand on that.

vbezhenar · on April 7, 2020

How will it prove that this new optimization does not introduce bugs? It's not like GCC is verified compiler.

saithound · on April 7, 2020

Great question. They don't have to trust the compiler: after compilation, they use an SMT-based proof process to establish that the resulting binary matches the semantics of the C source directly. See https://ts.data61.csiro.au/publications/nicta_full_text/6449....

zingermc · on April 7, 2020

Ah, it appears that is only done for ARM, but not x86. It would be great if x86 were more amenable to FM, or if we had faster formally-specified processors. Can't wait for a RISC-V workstation!

lawfulcactus · on April 7, 2020

So, I think what you're missing here is that the 'verification' is a formal specification about the behavior of the kernel, along with a set of proofs about that specification. This is vastly simplified (there are multiple levels of specification, last I checked), but that's the general idea. It's possible to have your compiler automatically derive the specification from your code (and, indeed, the seL4 people have written a Rust-like language called Cogent that does this), but formal verification inherently involves an understanding of what your code is /intended/ to do, and proving that it /does/. The compiler only has information about what it /does/, so it can only generate a specification that describes this. Information about your high-level intentions for the code has to be concretized somewhere, while ensuring that the spec conforms to it-- that's the job of the proof engineers.

turbinerneiter · on April 7, 2020

How does this formal specification look like?

From my naive standpoint: input, output, side-effect, time

  [no_sideffect, 1ms]
  fn increment(int i) -> int:
      return i + 1

What this is missing is a formal statement that says "increases the value of the input by one" -> except you accept the function body as formal statement.

Thanks for all the answers here, I guess I'm going to look for some more reading material.

saithound · on April 7, 2020

To add to logicchains' answer:

Normally, you wouldn't take the function body as a formal statement, and would have a separate formal specification instead. In the example of increment, the formal specification is no simpler and no more useful than the implementation itself, but this is usually not the case.

Imagine e.g. that the following specification describes the desired behavior of a function f:

    f(x) <= √x  and  (f(x) + 1) > √x

The OS developer would write something the code below to implement this specification.

  int f(int x) {
    int r1 = 0;
    int r2 = 1;
    while (x >= r2) {
      r1++;
      r2 = r2 + r1 + r1 + 1;
    }
    return r1;
  }

Does this C implementation conform to the specification? It's not all that easy to tell. But if the program is actually correct, then Floyd-Hoare logic [1] allows you to go through the program line-by-line and produce a proof that for any integer i and variable x', the following holds:

  lemma f_implements_spec: "
    { valueOf(x') = i \<and> i > 0 }
    x' = f(x')
    { valueOf(x') <= \<sqrt>(i) \<and> valueOf(x') - 1 > \<sqrt>(i) }"

meaning that if the statement x'=f(x') is ever executed in an environment where x' has an integer value i > 0, then after this line is executed, the result stored in x' will satisfy the specification above.

[1] https://en.wikipedia.org/wiki/Hoare_logic

logicchains · on April 7, 2020

Here's a specification of an increment function in Isabelle/HOL, the theorem prover used for SeL4, plus some other random lemmas about it:

    theory Scratch
      imports Main
    begin

    fun increment :: "int ⇒ int" where 
    "increment x = x + 1"

    lemma increment_increments:
      shows "increment x = x + 1"
      by simp

    lemma double_increment_adds_two:
      shows "increment (increment x) = x + 2"
      by simp

    lemma increment_sub_1_is_x:
      shows "increment x - 1 = x"
      by simp

shawnz · on April 7, 2020

If you take the original code as the formal statement, how could it ever be wrong?

qznc · on April 7, 2020

Practically every compiler does formal verification. The type checking of C for example. The question is what properties you want to verify. It isn't a binary question but rather a continuum from C up to Isabelle/Coq.

cwzwarich · on April 7, 2020

Is it really formal verification if the type system is unsound? Due to undefined behavior, a "proven pointer-to-int" may not end up being a valid pointer-to-int at runtime. Even in Haskell, the type system is actually logically inconsistent (see http://okmij.org/ftp/Haskell/impredicativity-bites.html).

mycall · on April 8, 2020

I see Genode Operating System Framework as the infrastructure it needs.

h91wka · on April 7, 2020

> Is it possible to have a programming language where the compiler does the formal verification?

Such compiler would require solving halting problem. Most problems related to search of formal proofs are undecidable.

MaxBarraclough · on April 7, 2020

This is true, but SPARK and ZZ do it anyway. Automatic deductive program-verification can work, but the halting problem means there will always be truths than cannot be automatically proved. This can still be good enough to get the job done.

In the case of SPARK, it's necessary to use timeouts on the prover. Ugly, sure enough, but real work can still be done with it.

https://docs.adacore.com/spark2014-docs/html/ug/en/tutorial....

simiones · on April 7, 2020

If we want to go to extremes, it's even worse than the halting problem. Godel's incompleteness theorem shows that, assuming your system is consistent, there are truths that don't have a proof at all, which is even worse than not being able to compute that proof.

Still, the only impact this has in practice is that it tells you languages or programming styles amenable to automatic verification must be very limited in their expresiveness/flexibility.

MaxBarraclough · on April 7, 2020

> the only impact this has in practice is that it tells you languages or programming styles amenable to automatic verification must be very limited in their expresiveness/flexibility.

This is a consequence of the state of current tooling, rather than an inescapable consequence of the relevant math.

A concrete example: the SPARK language is a subset of Ada, but with each release it gets less claustrophobic.

You can even go to the other end of the programming language spectrum, and use formal methods with functional programming languages. I don't think anyone develops real critical systems this way, but I believe it's an active area of research.

pron · on April 7, 2020

> This is a consequence of the state of current tooling, rather than an inescapable consequence of the relevant math.

It actually is an inescapable consequence except that it's far worse than limited languages: limiting languages doesn't help. In fact, langugage is kind of a red herring here. If a language has nothing more than boolean variables and non-recursive non-first-class subroutines, or, alternatively loops up to depth 2 -- it is already beyond our general verification power, as the cost of proving properties grows exponentially in the size of the program, and that kind of language is well below anything usable; usable languages require exponential space and super-exponential time even if they're completely decidable. It's important to remember that the problem isn't a binary distinction between decidable and undecidable. Almost all decidable problems are also impossible. They can take any length of time except infinite. In fact, undecidable problems are a very extreme and exceptional kind of impossible; most impossible computational problems are decidable.

See my post here for some results: https://pron.github.io/posts/correctness-and-complexity

Rather, what formal method do is try to get a certain mix of confidence, difficulty and scale so that they might be able to check some properties with some confidence in some cases. That is the best we can hope for, and what formal methods in industry use actually do. This is still a lot, but people should understand what the rules of the game are.

wahern · on April 7, 2020

> as the cost of proving properties grows exponentially in the size of the program

Exactly. And the solution is architectural modularization--implementing a solution as multiple components that can each be independently verified and then integrated in a verifiable manner. Microkernels are uniquely amenable to such an approach, regardless of language.

Which I think is the major reason why memory allocation in seL4, including memory for in-kernel runtime objects, is handled in user space, otherwise its tentacles would make kernel verification intractable. And it's not just that allocation is handled by a single user space daemon; it bleeds into the entire architecture. AFAIU, any request you make of the kernel must be accompanied by the necessary memory needed to satisfy the request. This also fits nicely into the capability model.

pron · on April 7, 2020

> Exactly. And the solution is architectural modularization--implementing a solution as multiple components that can each be independently verified and then integrated in a verifiable manner.

That doesn't matter. I mean, you might be able to prove some properties in this way just as you could in others, but it is not a "solution" by any means -- it might work in some cases for some properties with some barely acceptable scalability, and that's about it. The reason is that the cost of reasoning about n components is not even a computable function of n (the cost grows in the same way if you were trying to prove the property on a monolith). See the blog post I linked to for a result that proved this, and an example of a case where two components are easily analyzed yet their composition isn't.

The main direction in formal methods research to increase scalability in a viable way is to reduce soundness.

wahern · on April 7, 2020

You can't refactor any implementation (i.e. shuffle code) into components and magically make verification tractable. But there's more than one possible implementation to any particular functional problem, and some of those implementations might be tractable.

AFAIU Component-Based Software Engineering has been a bedrock principle of applied formal verification for decades. It's a process approach. If a functional problem is sufficiently complex it's necessary, if not always sufficient, as a practical matter to use a component architecture. And even if it's not sufficient, a solution where some components are formally verified is better than where none are. Circumscribing soundness isn't an alternative, it's complementary.

You're right, it's not by itself a "solution" in the sense of completely resolving the problem; it's a "solution" in the other meaning of that word--a means of dealing with a problem.

pron · on April 8, 2020

> AFAIU Component-Based Software Engineering has been a bedrock principle of applied formal verification for decades.

Yes, and still the only software verified using such techniques has been tiny (< 10KLOC) and required extraordinary simplification and tremendous effort by specialists. It's like saying that particle accelerators have been a bedrock principle of turning lead into gold. It "works" but not in any sense that makes it a repeatable, affordable practice. Moreover, the gap between the size of programs verifiable in this way and the size of a "normal" piece of business software has stayed pretty much constant over the past four decades -- 3 orders of magnitude.

> And even if it's not sufficient, a solution where some components are formally verified is better than where none are.

Better in what sense? Is it better to soundly verify one component or to use unsound methods on an entire system (even assuming similar costs, but often the latter is cheaper). Soundness is costly and is never a real requirement as a software system is a physical system, anyway; machines only perform machine instructions according to spec with some probability, and probability is all engineers can ever hope for. But sure, sometimes there are small components that are easy enough to verify with deductive proofs, making it the most affordable path to high confidence, although I'd say that's the rare exception rather than the rule.

> Circumscribing soundness isn't an alternative, it's complementary.

It allows you to use automated methods that don't require careful modularity.

simiones · on April 7, 2020

It depends on what kind of guarantees you want, and how much you rely on the language versus programming style. But I think there are fundamental reasons that would prevent you from doing development in full Common Lisp and expect a formal verifier to statically check your code for complex properties (such as halting, bounded memory), and expect it to never reject a correct program.

Even with functional languages, the stronger properties you want to check, the more restricted the program must be. For example, Idris can ensure your program halts, but it requires you to write your program in a specific way that can be checked for halting, you can't just take any halting program and expect it to pass the halting checks.

Similarly, Rust's borrow checker can verify if your program is free of data races and memory leaks, but you must write your program following specific patterns, even if you can produce data-race and memory leak-free programs in other ways as well (of course, there is the `unsafe` block in Rust exactly for this reason).

Of course, there will likely be more improvements as time goes on, but the halting problem puts a fundamental limit on how expressive a language can be while also being amenable to (computable) proof of certain complex properties.

steveklabnik · on April 7, 2020

(Rust does not prove the absence of memory leaks)

_0w8t · on April 7, 2020

Strictly speaking real computers are finite state machines and are trivially decidable from the halting problem point of view. The problem is that it just takes long long time.

h91wka · on April 7, 2020

This argument has been risen and dismissed in this thread already, no need to repeat.

seb314 · on April 7, 2020

> Such compiler would require solving halting problem.

I don't think so. As long as false-negative results are acceptable. (i.e. it is ok if for some programs no proof if found even though they fulfill the specification)

Occasional false-negatives seem perfectly fine for real-world use.

opnitro · on April 7, 2020

In fact all major type systems work this way.

curryhoward · on April 7, 2020

Not if the human provides the proofs and the compiler merely checks them (like in Coq, for example).

h91wka · on April 7, 2020

I specifically mentioned in my post that problems related to _proof search_ are undecidable. Coq's tactics for proof search are unreliable. They rely on timeouts and tend to fail ("omega can't solve this system").

So you're not arguing with my post.

curryhoward · on April 10, 2020

The question you were answering was not talking about proof search. It was merely asking whether it was possible to do verification in the same language that the program is written in. And, contrary to your response, it is possible to do verification in the same language as the program is written in. Coq is an example of that.

Yes, proof search is undecidable. But that's not what we're talking about in this thread. seL4 isn't verified by proof search. The proofs were provided by humans (with limited automation).

wolfgke · on April 7, 2020

> Such compiler would require solving halting problem.

Wrong: Because of the finite memory instead of an unbounded tape (among other reasons), computers are strictly less expressive than Turing machines.

h91wka · on April 7, 2020

This technically correct comment would be actually useful, if it linked a fully automated prover, that is capable of proving any property of a program that fits in N bytes.

If you remove infinite tape, you go from undecidable to "just" exponentially complex.

wolfgke · on April 7, 2020

> If you remove infinite tape, you go from undecidable to "just" exponentially complex.

Which is a huge leap - we got from "impossible" to "possible in principle" and we now can do research how to make it even "sufficiently fast" for problems of practical relevance.

h91wka · on April 7, 2020

No. In CIC I can prove that, say, some property is valid for _any_ natural number. Your "memory is bounded" argument is invalid, as formal proofs can go beyond that.

h91wka · on April 7, 2020

...Also I think you don't realize that formal inductive proofs can easily reason about programs operating with unbounded memory.

wolfgke · on April 7, 2020

With formal inductive proofs, you can reason, but not decide.

tzs · on April 7, 2020

Does the proof that the halting problem is not solvable actually make use of the unboundedness of the tape?

naasking · on April 7, 2020

Standard proof is a diagonalization, which depends on infinite sequences.

That said, the Halting problem on physical computers is still of extremely high computational complexity, which can be just as high a barrier as incomputability.

wolfgke · on April 7, 2020

Of course it does (implicitly!).

Otherwise the number of possible states of the Turing machine T that we want to check whether it halts or not is finite and we just have to "simulate" the program and see whether it terminates or it reaches again a state that it was already in.

marcosdumay · on April 7, 2020

How would the "if this program halts then run forever, else halt" algorithm work on a real computer? I imagine it would fail with stack overflow.

skyde · on April 7, 2020

spark version of Ada is such a language but it use design by contract assertion only.

jiveturkey · on April 7, 2020

> systemd

nobody needs systemd. nobody but RedHat. it is a step backwards, net net.

microcolonel · on April 7, 2020

That "Verification of the multicore kernel" thing seems like it could use some bold, underline, and italics. I doubt it is as simple as adding an AArch64 model.

snvzz · on April 7, 2020

Definitely not simple, but simpler than you might imagine.

Specifically, I'm referring to the fact that the microkernel can't be run from more than one CPU at once and can't be preempted, which is an important design decision.

Its execution time is, however, constrained into the WCET, complete with proof. And that is a very small amount of time, allowing for hard real-time applications to be built on top of seL4.

microcolonel · on April 8, 2020

Every time I learn more about seL4, I fall a bit more in love with it. Can't wait to see what people do in terms of integration over time.

I think it would be super cool to see L4Linux updated again, and on seL4 this time; maybe start shifting device drivers into tasks behind a low-overhead abstraction. We're going to have Linux and POSIX applications for a loooong time, but maybe we can make incremental progress toward something a bit less braindead.

segfaultbuserr · on April 7, 2020

Simple question: What are some seL4 "distros" one can install today? Which types of hardware is supported? And how usable the userland is?

tptacek · on April 7, 2020

If you're looking for a desktop/end-user operating system (even for a server of some sort), you're not really looking for L4. You might as well just use Qubes; the security model will be similar (in that you'll be "using" hosted Linux, not the underlying OS directly).

L4 gets interesting if you're building another platform.

segfaultbuserr · on April 7, 2020

Well, I think it's an interesting idea in itself to build a full operating system based on a secure kernel, even if it's server/CLI only, runs in a virtual machine as a prototype (think GNU Hurd). Isn't it?

tptacek · on April 7, 2020

If you're going to build a new OS, sure. That's how you should think of L4: as a toolkit for building new operating systems.

closeneough · on April 7, 2020

https://genode.org/

exikyut · on April 7, 2020

Looking at all the comments here, one question seems missing yet relevant: why is CompCert not used?

akavel · on April 7, 2020

Hm, as to userland, I wonder why they don't seem to mention GenodeOS in any way?

snvzz · on April 7, 2020

Genode is Affero GPL licensed. Very troublesome.

That's probably why.

akavel · on April 7, 2020

Ohhh, now that is something I totally missed, thanks!

dathinab · on April 7, 2020

It's not directly related to them. It's just one of many projects using the kennel.

Also this is not a official post but a semi official one by one of it's members on his personal blog!

jiveturkey · on April 7, 2020

> seL4 is a game-changer

probably overstating the case.