Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Astrée Static Analyzer for C and C++ (absint.com)
91 points by transpute on Feb 3, 2023 | hide | past | favorite | 41 comments


The company behind Astrée, AbsInt, was founded by members of Professor Reinhard Wilhelm’s compiler research group at Saarbrücken university in Germany. They are well known for their WCET (worst case execution time) analyser aiT and one of the very successful examples in Germany which manages to transfer first class academic research results into practical real world applications. Happy to see their work mentioned here on hn!


Note that Astrée was developed in France, by researchers from Inria. AbsInt sells it packaged as a product (with support, etc), but the bulk of the research was done in the 2000's by researchers such as Cousot.

This is also the case with CompCert, which is still developed mostly by French researchers (with an open-source, non-commercial license), and packaged in commercial form by AbsInt.


Ah, good to know (and I guess this explains the accent aigu :)). Well, there's a lot of excellent CS research in France, too - and nice to see there's a collaboration (the French border is just around the corner from Saarbrücken).


If AbsInt is packing Cousot-adjacent work, does it stand for "ABStract INTerpretation"? Because that'd be cool.


From their website:

> The company name is an acronym for “abstract interpre­tation”, a sophisticated approach to static program analysis formalized by Patrick and Radhia Cousot at the Laboratoire d’Informatique, Grenoble in 1977. It is by implementing this approach that we were able to develop our unique, highly successful products.



Nice feature set. I was optimistic it was a free tool that I could use on some personal projects but after reading some of the features I was confident that it was definitely paid.


For C, Frama-C + Eva plug-in does essentially the same job as Astrée (with small differences of support for specific features) and is open-source and freely available.

(There is an experimental C++ front-end, but it is not ready for industrial size C++ that makes extensive use of the standard library)


Frama-C is big reason why I think learning and writing C makes still sense in 2023.


Not only paid, but presumably expensive enough that if you have to ask how much it costs, you probably can't afford it. I can't find a price for a license on their site. :-/


Abstract interpreters and static analyzers are really powerful and not that hard to implement, compared to the little competition we have in this space.

I am surprised with the lack of startups in the area. It's just a bit of abstract algebra. With Datalog, the barrier of entry is pretty low: https://arxiv.org/abs/2012.10086

I have personally implemented a few of the features provided by Astrée, e.g. array bounds check, using the ideas described in: https://link.springer.com/book/10.1007/978-3-662-03811-6


I thought abstraction interpretation on the level that astree claims to implement is supposed to be extremely hard, which is why almost nobody else tries to do it. I only know of two that try:

http://kestreltechnology.com/technology.html

https://github.com/NASA-SW-VnV/ikos



It took some eagle eye to spot the link, but they have the source in a self-hosted copy of gitlab: https://git.frama-c.com/pub/frama-c and a sibling repo is also interesting where they seem to apply analysis to some open source code as an example: https://git.frama-c.com/pub/open-source-case-studies


Also Mathworks Polyspace Code Prover: https://mathworks.com/products/polyspace-code-prover.html


I _think_ Perforce's QA-C uses abstract interpretation, but I'm not 100% sure.


You can add grammatech‘s codesonar


I believe the "level" referred to is soundness. Codesonar is self-admittedly not sound [1], and instead focuses on other valuable aspects of static analysis. It's a different type of tool.

[1] https://blogs.grammatech.com/how-sound-static-analysis-compl...


It's difficult to fund enough skilled programmers to work on stuff like this full time, and productionize the results in a coherent product. A good example is a sibling comment that says "it would mostly be easy except for the GUI" and that's assuming "no allocations." The things customers pay you for are often the hard parts like that stuff, not the theoretical parts you can enjoy from a PDF. It sucks, but it's true: you're selling a production quality tool to picky-ass programmers. I say this as someone who has a copy of that book you mentioned and has implemented type systems, domain specific analyzers, used datalog (Souffle) a lot, etc.

You also need to know your competition, and the story is even worse there. As I've watched security engineering advance over the years -- and I know you aren't specifically talking about security, but go with me here -- one of the only methods that has actually gone from "nascent infosec curiosity" into full-blown large scale usage by major and minor players has been fuzzing, which most people know as just "throw random shit at it until it breaks". That's because it's often easy to integrate, has few visible thorns in practice, has a surprisingly high signal-to-noise ratio, and can be performed even with duct tape and glue on an existing codebase. It can be done in any language. It often delivers positive results immediately in the short term and continuously over time. That's actually what your userbase considers "easy to implement", not "learn datalog and abstract algebra to understand and integrate domain specific static analyzers into your build system." Is it powerful? Yes, and I like solutions like CodeQL or MIRAI (for Rust). Do they have long term benefits? I think so. Is it easy or at least "not that hard"? I'm not immediately convinced.


There are probably others, but this is one such startup: https://trust-in-soft.com/


It is not that difficult to implement -some- basic features of static analysis or abstract interpretation. The problem complexity in this area is enormous and it quickly becomes difficult to manage aspects of path exploration and satisfiability effectively. This is why fuzzing has been more popular method since it is not as difficult to implement and produces reasonable results.


Outside of regulated fields like healthcare and travel is the market even significant for such tooling? Can't say I know any shop that prioritizes code correctness outside of press releases.


Geophysical data aquisition, processing, interpretation.

Fire simulation modelling.

Computational GIS tooling (Hexagon et al)

Computational Algebra (eg Cayley|Magma)

Fireball monitoring, Square Kilometre Array processing, . . .

I'm struggling to recall anything I've worked on that didn't emphasis code correctness.


I am interested in using it on OpenZFS. I will probably try the trial in a couple months. Assuming that it does well, it is unfortunate that I will not be able to use it beyond the trial period, since I expect the price to be too expensive.


That's my point, IMHO this is much more useful than LLM-autocompletion right now.

But there are so few options available, at such a high cost, that nobody uses them.

Abstract interpretation / static analysis can detect lots of defects automatically.


Attackers can use trial versions to narrow their search for 0-day defects.


In practice, it's not that easy; fuzzers tend to be more effective for that.

Usually, the time required to properly setup and understand and refine the analysis will take most of the trial period.

These tools are better thought of in the long term, they may require some changes to coding style, and also some knowledge of the code itself.

While they are good for building defenses, they are not so effective for helping attackers. But in the end, yes, they _could_ be used for that.


You can add energy and defense to the list.

And regulated fields are a big market, not just the huge companies that work in this field (you can see a short list on the linked page), but also suppliers of these companies.


If there's a high enough risk, maybe. I've worked in gambling (regulated, but not to the same degree as something that's life and death) and the security team has pushed for the use of tools like this.


Networking? Defense? Operating systems? Etc


Most operating system developers are unlikely to be able to afford it given how it’s pricing is “if you need to ask, it costs too much”. :/


When you have to ask, it doesn't necessarily mean you can't afford it but often means they have quite a lot of wiggle room in pricing and will slide up or down the scale depending on how 'big' of a client you are.


Assuming no allocations it wouldn't be terribly hard to implement most of the features listed on the landing page. Still, nice work!

Roughest bit is probably having to write the portable, cross-platform GUI imo.


From their claims they either solved the halting problem, or the tool severely limits what correct C or C++ programs it allows, essentially making the language not Turing-complete. I suppose it's the latter.

I wonder if it's not too limiting.


The goal of the user is to prove some properties over an input program. Usual properties are: all array accesses are within boundaries, no division by zero, all used values are initialized, all de-referenced pointers are non-null, etc... The kind of properties that makes a program safe to run and crash-free.

The tool will either prove that all the properties hold (user is happy), or prove that some property does not hold at some point in the program (user knows what to fix usually), or bail-out with no proof at all for a property at some point (user has to modify the program to make it more understandable by the tool).

The accepted language is still Turing-complete (C or C++) and the tool is not solving the halting problem in the general case since it can bail-out with no proof.


Detecting division by zero with 100% accuracy in a general C or C++ program without assistance is equivalent to the halting problem.

Now I'm not saying that they solved the halting problem, as that is not possible. Rather that there is missing information on the linked page. Either:

1. The tool requires the an external machine-checkable proof that no division by zero happens.

2. Certain language constructs are disallowed. They probably just ban goto (this is fine), possibly also recursion, multiple return statements, loops without a hard bound of iteration, etc... .

Or both. Where can I find information about this?


Usually these tools just bail out. This is fine for the domain. Any division where the divisor is not known nonzero is a bug even if it will never happen because you probably aren't sure it will never happen either.


> Or both.

Neither, the tool has false positives, i.e. it sometimes reports divions as a potential divion by zero even though it isn't one.


Is it really that good? 100% verification, 0 false positives?


> In contrast, thanks to its modularity and domain-awareness, Astrée can be made exceptionally precise, often to the point of pro­ducing exactly zero false alarms.

I will guess that when analyzing avionics code with no memory allocation it is perfect, and when analyzing other code it is merely great. I wonder what the recruiting pipeline from INRIA into AbsInt is like.


On this kind of analysis, it strongly depends on the features that are used in the analyzed code and its degree of complexity. An analysis that is entirely automatic, detect every mistake and only mistakes is not possible in the general case.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: