Astrée Static Analyzer for C and C++

johndoe0815 · on Feb 3, 2023

The company behind Astrée, AbsInt, was founded by members of Professor Reinhard Wilhelm’s compiler research group at Saarbrücken university in Germany. They are well known for their WCET (worst case execution time) analyser aiT and one of the very successful examples in Germany which manages to transfer first class academic research results into practical real world applications. Happy to see their work mentioned here on hn!

edwcross · on Feb 3, 2023

Note that Astrée was developed in France, by researchers from Inria. AbsInt sells it packaged as a product (with support, etc), but the bulk of the research was done in the 2000's by researchers such as Cousot.

This is also the case with CompCert, which is still developed mostly by French researchers (with an open-source, non-commercial license), and packaged in commercial form by AbsInt.

johndoe0815 · on Feb 3, 2023

Ah, good to know (and I guess this explains the accent aigu :)). Well, there's a lot of excellent CS research in France, too - and nice to see there's a collaboration (the French border is just around the corner from Saarbrücken).

thechao · on Feb 3, 2023

If AbsInt is packing Cousot-adjacent work, does it stand for "ABStract INTerpretation"? Because that'd be cool.

muldvarp · on Feb 3, 2023

From their website:

> The company name is an acronym for “abstract interpretation”, a sophisticated approach to static program analysis formalized by Patrick and Radhia Cousot at the Laboratoire d’Informatique, Grenoble in 1977. It is by implementing this approach that we were able to develop our unique, highly successful products.

transpute · on Feb 3, 2023

https://web.archive.org/web/20230203022055/https://www.absin...

jamesfinlayson · on Feb 3, 2023

Nice feature set. I was optimistic it was a free tool that I could use on some personal projects but after reading some of the features I was confident that it was definitely paid.

KsassPeuk · on Feb 3, 2023

For C, Frama-C + Eva plug-in does essentially the same job as Astrée (with small differences of support for specific features) and is open-source and freely available.

(There is an experimental C++ front-end, but it is not ready for industrial size C++ that makes extensive use of the standard library)

zokier · on Feb 3, 2023

Frama-C is big reason why I think learning and writing C makes still sense in 2023.

rerdavies · on Feb 3, 2023

Not only paid, but presumably expensive enough that if you have to ask how much it costs, you probably can't afford it. I can't find a price for a license on their site. :-/

nextos · on Feb 3, 2023

Abstract interpreters and static analyzers are really powerful and not that hard to implement, compared to the little competition we have in this space.

I am surprised with the lack of startups in the area. It's just a bit of abstract algebra. With Datalog, the barrier of entry is pretty low: https://arxiv.org/abs/2012.10086

I have personally implemented a few of the features provided by Astrée, e.g. array bounds check, using the ideas described in: https://link.springer.com/book/10.1007/978-3-662-03811-6

ryao · on Feb 3, 2023

I thought abstraction interpretation on the level that astree claims to implement is supposed to be extremely hard, which is why almost nobody else tries to do it. I only know of two that try:

http://kestreltechnology.com/technology.html

https://github.com/NASA-SW-VnV/ikos

edwcross · on Feb 3, 2023

Also Frama-C/Eva: https://www.frama-c.com/fc-plugins/eva.html

mdaniel · on Feb 3, 2023

It took some eagle eye to spot the link, but they have the source in a self-hosted copy of gitlab: https://git.frama-c.com/pub/frama-c and a sibling repo is also interesting where they seem to apply analysis to some open source code as an example: https://git.frama-c.com/pub/open-source-case-studies

quentinkent1 · on Feb 3, 2023

Also Mathworks Polyspace Code Prover: https://mathworks.com/products/polyspace-code-prover.html

sam_bristow · on Feb 3, 2023

I _think_ Perforce's QA-C uses abstract interpretation, but I'm not 100% sure.

mschnell · on Feb 3, 2023

You can add grammatech‘s codesonar

marmetio · on Feb 3, 2023

I believe the "level" referred to is soundness. Codesonar is self-admittedly not sound [1], and instead focuses on other valuable aspects of static analysis. It's a different type of tool.

[1] https://blogs.grammatech.com/how-sound-static-analysis-compl...

aseipp · on Feb 3, 2023

It's difficult to fund enough skilled programmers to work on stuff like this full time, and productionize the results in a coherent product. A good example is a sibling comment that says "it would mostly be easy except for the GUI" and that's assuming "no allocations." The things customers pay you for are often the hard parts like that stuff, not the theoretical parts you can enjoy from a PDF. It sucks, but it's true: you're selling a production quality tool to picky-ass programmers. I say this as someone who has a copy of that book you mentioned and has implemented type systems, domain specific analyzers, used datalog (Souffle) a lot, etc.

You also need to know your competition, and the story is even worse there. As I've watched security engineering advance over the years -- and I know you aren't specifically talking about security, but go with me here -- one of the only methods that has actually gone from "nascent infosec curiosity" into full-blown large scale usage by major and minor players has been fuzzing, which most people know as just "throw random shit at it until it breaks". That's because it's often easy to integrate, has few visible thorns in practice, has a surprisingly high signal-to-noise ratio, and can be performed even with duct tape and glue on an existing codebase. It can be done in any language. It often delivers positive results immediately in the short term and continuously over time. That's actually what your userbase considers "easy to implement", not "learn datalog and abstract algebra to understand and integrate domain specific static analyzers into your build system." Is it powerful? Yes, and I like solutions like CodeQL or MIRAI (for Rust). Do they have long term benefits? I think so. Is it easy or at least "not that hard"? I'm not immediately convinced.

edwcross · on Feb 3, 2023

There are probably others, but this is one such startup: https://trust-in-soft.com/

HybridCurve · on Feb 3, 2023

It is not that difficult to implement -some- basic features of static analysis or abstract interpretation. The problem complexity in this area is enormous and it quickly becomes difficult to manage aspects of path exploration and satisfiability effectively. This is why fuzzing has been more popular method since it is not as difficult to implement and produces reasonable results.

intelVISA · on Feb 3, 2023

Outside of regulated fields like healthcare and travel is the market even significant for such tooling? Can't say I know any shop that prioritizes code correctness outside of press releases.

defrost · on Feb 3, 2023

Geophysical data aquisition, processing, interpretation.

Fire simulation modelling.

Computational GIS tooling (Hexagon et al)

Computational Algebra (eg Cayley|Magma)

Fireball monitoring, Square Kilometre Array processing, . . .

I'm struggling to recall anything I've worked on that didn't emphasis code correctness.

ryao · on Feb 3, 2023

I am interested in using it on OpenZFS. I will probably try the trial in a couple months. Assuming that it does well, it is unfortunate that I will not be able to use it beyond the trial period, since I expect the price to be too expensive.

nextos · on Feb 3, 2023

That's my point, IMHO this is much more useful than LLM-autocompletion right now.

But there are so few options available, at such a high cost, that nobody uses them.

Abstract interpretation / static analysis can detect lots of defects automatically.

transpute · on Feb 3, 2023

Attackers can use trial versions to narrow their search for 0-day defects.

edwcross · on Feb 3, 2023

In practice, it's not that easy; fuzzers tend to be more effective for that.

Usually, the time required to properly setup and understand and refine the analysis will take most of the trial period.

These tools are better thought of in the long term, they may require some changes to coding style, and also some knowledge of the code itself.

While they are good for building defenses, they are not so effective for helping attackers. But in the end, yes, they _could_ be used for that.

GuB-42 · on Feb 3, 2023

You can add energy and defense to the list.

And regulated fields are a big market, not just the huge companies that work in this field (you can see a short list on the linked page), but also suppliers of these companies.

jamesfinlayson · on Feb 3, 2023

If there's a high enough risk, maybe. I've worked in gambling (regulated, but not to the same degree as something that's life and death) and the security team has pushed for the use of tools like this.

eddsh1994 · on Feb 3, 2023

Networking? Defense? Operating systems? Etc

ryao · on Feb 3, 2023

Most operating system developers are unlikely to be able to afford it given how it’s pricing is “if you need to ask, it costs too much”. :/

DoingIsLearning · on Feb 3, 2023

When you have to ask, it doesn't necessarily mean you can't afford it but often means they have quite a lot of wiggle room in pricing and will slide up or down the scale depending on how 'big' of a client you are.

intelVISA · on Feb 3, 2023

Assuming no allocations it wouldn't be terribly hard to implement most of the features listed on the landing page. Still, nice work!

Roughest bit is probably having to write the portable, cross-platform GUI imo.

planede · on Feb 3, 2023

From their claims they either solved the halting problem, or the tool severely limits what correct C or C++ programs it allows, essentially making the language not Turing-complete. I suppose it's the latter.

I wonder if it's not too limiting.

quentinkent1 · on Feb 3, 2023

The goal of the user is to prove some properties over an input program. Usual properties are: all array accesses are within boundaries, no division by zero, all used values are initialized, all de-referenced pointers are non-null, etc... The kind of properties that makes a program safe to run and crash-free.

The tool will either prove that all the properties hold (user is happy), or prove that some property does not hold at some point in the program (user knows what to fix usually), or bail-out with no proof at all for a property at some point (user has to modify the program to make it more understandable by the tool).

The accepted language is still Turing-complete (C or C++) and the tool is not solving the halting problem in the general case since it can bail-out with no proof.

planede · on Feb 3, 2023

Detecting division by zero with 100% accuracy in a general C or C++ program without assistance is equivalent to the halting problem.

Now I'm not saying that they solved the halting problem, as that is not possible. Rather that there is missing information on the linked page. Either:

1. The tool requires the an external machine-checkable proof that no division by zero happens.

2. Certain language constructs are disallowed. They probably just ban goto (this is fine), possibly also recursion, multiple return statements, loops without a hard bound of iteration, etc... .

Or both. Where can I find information about this?

aeneasmackenzie · on Feb 3, 2023

Usually these tools just bail out. This is fine for the domain. Any division where the divisor is not known nonzero is a bug even if it will never happen because you probably aren't sure it will never happen either.

muldvarp · on Feb 3, 2023

> Or both.

Neither, the tool has false positives, i.e. it sometimes reports divions as a potential divion by zero even though it isn't one.

Animats · on Feb 3, 2023

Is it really that good? 100% verification, 0 false positives?

aeneasmackenzie · on Feb 3, 2023

> In contrast, thanks to its modularity and domain-awareness, Astrée can be made exceptionally precise, often to the point of producing exactly zero false alarms.

I will guess that when analyzing avionics code with no memory allocation it is perfect, and when analyzing other code it is merely great. I wonder what the recruiting pipeline from INRIA into AbsInt is like.

KsassPeuk · on Feb 3, 2023

On this kind of analysis, it strongly depends on the features that are used in the analyzed code and its degree of complexity. An analysis that is entirely automatic, detect every mistake and only mistakes is not possible in the general case.