Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Rust RFC 2094: non-lexical lifetimes (github.com/rust-lang)
156 points by JoshTriplett on Aug 2, 2017 | hide | past | favorite | 66 comments



Is there an ELI5 for lifetimes in general? There are so many features of Rust that I have no context for and no intuition about. I also do not have a CS degree.


No CS degree here either.

Variables exist in memory, right? But we only need them for some amount of time - then we want our memory back. There are many strategies for this - Java uses a garbage collector to search through memory and reclaim parts no longer in use, C programmers manually call 'free'. Java suffers from its strategy because of the performance impact of that scan/ freeing strategy. C suffers from the complexity of manually managing memory leading to many security vulnerabilities.

Rust takes this concept of how long memory should live - when it should be freed - and encodes it into its type system. Similar to how you can say 'this thing is an int and it can be added to other ints' rust says 'this thing lives this long and it can reference other things that live this long, or shorter'.

Then the compiler can tell you "hey, you're trying to access something that might have been freed" the same way it would say "you're trying to add an integer to a string".

This means you don't have the performance impact of the GC but you also have a compiler watching your back, telling you when you may be doing something unsafe.

The algorithm used to determine how long variables live and when they can / can't be accessed is the subject of the RFC.


Great explanation. The system also takes care of ownership moving around. In C you have functions that take a pointer to something, but you don't know who owns it (and is responsible for cleaning it up) after the function call, unless that is made explicit in the description. In Rust if you've moved ownership you can't use it in the caller afterwards, and if you haven't moved ownership you can't use it in the callee after the function has finished (this system is called borrowing, and is why the system that deals with lifetimes is called the 'borrow checker').

Actually there is another way which you'll find commonly in well written robust libraries - make the argument a const pointer, so that the callee is obviously not responsible for clearing it up and then the callee takes a copy to keep. This also allows passing both const and non const data to the same function, and means that there is much less likely to be an ownership screq-up across a library boundary. As a pattern though it can result in a lot of unnecessary copies, which Rust's system allows you to avoid.


Interesting. I have only worked in C a little bit, but I have seen the `const int foo(const int * x)` idiom a few times now.

So there is a formal sense of "ownership" of memory in rust? And right now it's based on the lexical scope in which the memory was allocated?


The checker in the compiler was being too conservative, because it was easier to write a conservative one. That means that sometimes it would complain about cases that were actually just fine. The language doesn't specify that lifetimes are tied to lexical scope, that's just what the borrow checker was using as a shortcut. The new one will be more specific.


This is sort of true, but also not: it wasn't that it was easier to write a conservative one, it's that analysis is inherently conservative, and so we chose lexical scope because, well, that's what most languages do.

The language generally does specify that they are, and that's exactly how we teach borrowing today. This new one is not more specific, but more general: it allows all programs today to stay the same, but enables more programs that don't compile today to compile in the future.


Nit > 'this thing lives this long and it can reference other things that live this long, or shorter'.

should be '... or longer '.

The reference only remains valid if the referent hasn't been dropped in the meantime, so it can only reference things that outlive it.


Thanks, unfortunately I can't edit the post. :(


Excellent description.

I really like the section of "Thinking in Scopes" in the Book that describes this:

https://doc.rust-lang.org/book/first-edition/lifetimes.html#...

You can think of lifetimes as hidden variables defined by the compiler that are in a sense bound to some block of code. I find this to be the best way to think of them personally.


Possibly a stupid question, but is this a bit like reference counting, like you see in python, only done at compile time?


In python you can't really statically define a reference count

   a = {}
   b = {}
   l = []
   for _ in range(100):
       if input('thingy'):
          l.append(a)
       else:
          l.append(b)
In that snippet it's impossible to know how many references there are to a or b.

In Rust, lifetime management is set up so that your code can statically determine when something no longer can be referred to (hence "free GC").

For example:

   def f():
      a = 3 
      g(a)
      # a is no longer used from this point
      return 5
you can clearly define when things are no longer usable, hence free-able. Python in this context will also free at the end of the function, but it's because it will be decrementing the reference counter and it gets to 0. But Rust won't need to have the reference counter since it knows it can free the object at any point after g.

(This isn't a great example for a lot of reasons, Rust allows you to go much further than this)

This ends up being somewhat similar to writing code in more statically typed languages/languages with dependent types. You might have to rework your code so that the machine can properly identify when it can free memory.


How then does Rust avoid the if('thingy') issue? Or, for that matter, a language like C, where presumably a and b would be allocated on the stack. In C would they both hang around until the function returns? Or would I have to copy them into the array anyway, which would be heap-allocated?


They're stack allocated in Rust too, so they're both going to stick around. Arrays in Rust are stack allocated, but this code would probably use a vector, which is heap allocated, and so yeah, you'd end up copying them from the stack to the heap.


Not really. Rust has an idea of "borrowing" references to things, which could be seen as increasing the reference count, but it's not a particularly good analogy because borrows actually have no effect on the original lifetime.

So, if you borrow something and try to store it somewhere with a lifetime the compiler thinks is going to outlive the lifetime of the thing you borrowed, that's a compiler error, whereas in a reference counted language or a GCed language the original entity just keeps existing for as long as required.

This is usually because said entity is on the heap - Rust puts everything on the stack by default.

You can of course put things on the heap, and there is a selection of types available in the standard library which are I guess a bit like C++ smart pointers that let you have heap allocation, reference-counted semantics etc. if you want it.


Not a stupid question, as it's a very common way for some people to think about it. I don't think it's very accurate though, and leads to incorrect intuitions.

This is because reference counting is prescriptive, that is, the count determines how long a value stays alive. Lifetimes are descriptive, that is, you cannot magically make something live longer by taking an additional reference to it, instead, you'll get an error.


Not stupid at all, and something that a lot of people ask.

It's not really like that - with reference counting you sort of have multiple 'owners' of a variable, and they all get to 'free' that memory except that 'free' just decrements a counter.

With rust there is a very clear owner at all points - you always know that your code in one spot owns the memory, and no other code does. There may be references handed out but those are very clearly distinct - that is, by looking at the code you can see "yes, I own this" or "yes, I'm borrowing this".

Now, with borrowing it's a little bit more like reference counting, but it's really just scope based. We know that the references are valid for some scope, and after that scope they are not. There's no internal counting or anything like that in the compiler as far as I am aware.


It's more like compile-time weakrefs, where the compiler ensures that you can't access a weakref after the source strong reference is gone (not just that deref'ing the weakref returns None)


A bit. Not completely, but a bit. That analogy may well help you understand how lifetimes work, but it will break down well before you have a workable understanding of lifetimes.


Ha, don't worry about the CS degree - lifetimes are plenty confusing to people with a CS degree too! I think it'd take a combination of reading the theory and actually using Rust for most people to get a good grasp on them.



This is more or less the feasibility test for lifetimes/regions as a general purpose programming feature. It will probably be many years from now that another language tries to do anything like this with the same pervasiveness and breadth.


I kind of agree.

In a way it is great the work being done by the Rust team, as many companies wouldn't bother with it.

It is already influencing the design decisions of other languages.

However it really needs to be more ergonomic, I for example have been bitten trying to call a method into self from inside a closure.

From logical point of view it was clear, the closure wouldn't outlive the object that owned it, but the borrow checker thought otherwise.

This is one of the use cases being solved by non-lexical lifetimes.


Actually, as the text of RFC makes clear in "What this proposal will not fix" section, this does not fix "self in closure" issue. Actually, "self in closure" is explicitly listed as an example.


Oh, I misread it, as I quickly browsed through it. :|


If you reread the text of the RFC, you'll note that this specific issues intends to be solved by potentially tweaking the algorithm for closure desugaring. It's a solvable problem, though independent of NLL.


I don't think the current algorithm is actual a barrier to practical use. The worst I've had to suffer for it is a few extra lines of code here and there. This RFC might make things easier to use, but if it fails in that, it doesn't really prove the concept is infeasible.


Why would that be? If rust proves the concept works, I'd expect more languages to try it.


It's very hard to retrofit the rules that the borrow check enforces onto an existing language. That's because the borrow check enforces "aliasing xor mutability", and basically all languages in wide use don't enforce that. Adding that to an existing language generally requires breaking backwards compatibility in fundamental ways.

For example, all non-immutable objects in garbage-collected languages (or similar features such as shared_ptr in C++) are incompatible with these restrictions. As another example, mutable global variables are generally incompatible with the semantics that the borrow check enforces.

That said, I could see analogous dynamic systems (rather than static ones as in the case of Rust) becoming more widespread. In fact, Transferables [1] in JavaScript are an example of one such system.

[1]: https://developer.mozilla.org/en-US/docs/Web/API/Transferabl...


I read the parent comment's "I'd expect more languages to try it" as meaning more new languages.

Really interesting point about Transferables, I hadn't seen that.


Hmm, what about wrapping everything in an Arc<RwLock<...>> and gradually removing the wrappers (and the extra method calls) from each variable, backtracking on compiler errors?


Swift is likely to adopt it but it a much more measured way, since everything is `shared_ptr` by default in that language. Relatively few languages have the combination of requirements and goals that makes Rust's approach obligatory and necessary.

So if it's successful we should see more of it but...we won't necessarily see many languages where it is as all-pervading as Rust.


Swift's tentative plan seems to be to have a level between regular swift and super unsafe swift with UnsafePointer and stuff where you can write fast, low level, but safe code like Rust.

In swift's case I'd think they would avoid too much complexity since the lifetime stuff won't be as pervasive as it is in Rust (so it can have more rough corners)


Given the current state of the borrow checker and how lifetimes are handled inside closures, essential for callbacks, I think in its current state anything close to Cocoa or Cocoa Touch would be quite hard to implement and use in Rust, unless it is full of Rc and RefCell everywhere.


Lifetime handling inside closures is planned to be fixed. (It is a different fix from this one.)


The big question with Rust lifetimes was "can you tighten the screws down that tight and still do anything"? Rust managed to get it to work reasonably well. This lightens it up a bit.

It's not clear to me how far this goes. One annoying Rust idiom is having to allocate something, then pass it into a function so the function can return it. There's no way to give a return value the lifetime of the caller, not the callee. Will this change allow that? It doesn't look like it.

This is more in the other direction. "In the new proposal, the lifetime of a reference lasts only for those portions of the function in which the reference may later be used (where the reference is live, in compiler speak)." In modern compilers, when a local variable has been used for the last time in its scope, it's dead, and its storage can be reused. This is mostly done to free up registers. It seems to be exposing liveness analysis at the source level.

The use cases aren't that convincing. This may be more of a "because we can" feature.


> There's no way to give a return value the lifetime of the caller, not the callee.

That is, in the general case, not physically possible, and this sort of thing only works in GC languages because they return GC pointers to dynamically allocated data. It certainly doesn't exist in C or C++ (unless I completely misunderstood what semantics you wanted).

AFAIK there is no implementation of anything (not even JITs) which can allocate on the caller's stack. The closest anything gets is Forth where the call stack and data stack are separate and returning just leaves data on the stack for caller to read.

Without a split stack, returning something larger than a register (or two) is done by passing a pointer to space on the caller's stack, to the callee.

There have been very specific schemes proposed for `-> [T]` in Rust, e.g. the slice is left of the callee's stack, and a pointer to it returned, so the caller can allocate that much stack space and memmove'd it up.

But that wouldn't help with "allocating" on the caller's caller's stack because moving the data would invalidate references to it (and if you can have a reference to something, you can use it in ways the compiler can't trace it, unlike a GC).


You can surely allocate on callers stack on C++, that is how return value optimization works, and was made explicit on ANSI C++17 standard.


Well, in that sense, Rust does return value optimization too, and it has been doing so basically since forever. Because Rust has no copy constructor, RVO in Rust does not need any explicit support from the language specification.


That's not allocating on the callers's stack though, that's just initialising there. The caller does the allocation.

And Rust does RVO, in fact it's important for "placement new" structures until the dedicated syntax lands.


Right. Just think of the return value as an additional mutable formal parameter, and function return values as syntactic sugar for that additional parameter. The caller allocates space for the actual parameter.


To be more explicit

   y = f(g(h(x)));
is really

   var hreturn;
   h(x, hreturn);
   var greturn;
   g(hreturn, greturn);
   // hreturn is no longer live
   var y;
   h(greturn, y);
   // greturn is no longer live
Looked at this way, a function can return any fixed-size type without a copy. Some languages, mostly those in the Pascal/Modula family, did this explicitly. Instead of having a return statement, within a function, the name of the function was the return value, and code could assign to it, or pass it by reference to another function.


IIRC ada can return variable sized objects on the stack.


> This may be more of a "because we can" feature.

What I see is actually "it's wanted/needed enough that it's worth putting in the huge effort to make such a complex (and novel?) system reality".

EDIT: to be more clear: what's new here isn't "exposing liveness at the source level" but "model liveness in region typing". If we were changing when the destructor runs (which we can't because backwards compatibility - also, doubtful we'd want that at all), that would be more "source level".


Believe me, all use cases there are based on actual user complaints and this is in no way "because we can" feature.


It's also a solid stepping stone to the next level of unergonomic problems with lifetimes.


Can you provide specific examples?


Liveness analysis is honestly pretty simple so I'm glad it's being introduced as a rule for determining lifetimes. I would be afraid of introducing rules that make for complicated specs, and this is almost one of these areas. However, it's fairly easy to describe liveness analysis with set theory so I think it works.


Well, analysis is simple, but it does make inference algorithm considerably more complex. (Analysis generates constraints, inference solves them.)


This sounds suspiciously similar to what JavaScript does.


JavaScript doesn't even have a static type system, let alone one that tracks lifetimes, those are left to the garbage collector.

How is this similar to anything JavaScript does?


Good. Adding the `let` and `const` scoping rules to javascript was a damn good idea. Although non-lexical lifetimes has nothing to do with this, Rust has had block-based scoping since before 1.0. Non-lexical lifetimes are a refinement to allow certain cases where a value is borrowed outside a block but can safely be borrowed again inside that block. And other cases.


In which way(s)?


Scope? ES6+? It's pretty obvious both languages originated from Mozilla.


Basically every language since Ada has scope - this is in no way similar to Javascript.


Programming languages, higher level than assembler or machine language, have existed for about 60 years: FORTRAN (1957), LISP (1958), Algol 60 (1959-1960). Modern era programming languages like Ada (1983) and Rust naturally make use of the lessons learned from the early languages. The idea of scope of variables is found in LISP and Algol from well over half a century ago.


Really? Is that why Mozilla is trying to launch a browser written entirely in Rust? Or was.


Just because Mozilla worked on both languages doesn't mean they're related in any way. The goals, execution contexts, and safety guarantees of the two languages are completely different.


But unnecessary syntactic sugar seems to be a common trait.

I give up on this thread. :) I can code in both languages, by the way, and I could swear that Rust is just a systems-level version of JavaScript, or JS before the hipsters got their hands on it and made it insufferable.


Rust is unrelated to JS.

Source: I was involved in Rust's design from the very beginning.


@pcwalton

I'll take that as canonical, then. I'm probably showing my naivety, but I don't grasp the reaction to the comparison. Both are basically C-family languages, there is a distinct Mozilla connection, and recent Rust development does seem to be web-focused. Rust has a lot of features that one would like JS to have, or get right, if it does have something comparable.

I think I'm missing some details. Is there a back story for why Rust developers wouldn't care for that comparison?


Those are all surface-level comparisons, completely outside the actual languages themselves.

And in particular, this thread is about static analysis of pointers to improve ergonomics while retaining memory safety. Javascript's approach to memory safety is just to use a garbage collector, which is essentially the opposite of the approach taken here.


That's an interesting point of view, but with some knowledge of different languages, Rust doesn't seem all that similar to me. Can you reference what languages you are thinking of that aren't like JS or Rust that makes you think JS and Rust are similar by comparison?


Haskell and Erlang come to mind. COBOL, if you really want a distinct comparison.

It might just be me, I wouldn't take my off the cuff commentary too seriously. I was kind of surprised when I started to learn Rust that the Internet makes it sound like this incredibly difficult and arcane language, and I thought, after getting a grasp on the basics, that it was much like a fine-grained Node. It's a nice language, I like it, personally.


Well, those are significantly different languages, one in paradigm and the other in implementation. I think most people are just used to comparing it to more common contemporary languages. If you have experience with other scripting/dynamic languages (Python, Ruby, Perl), JavaScript often feels much closer to them. If you have some experience with C or C++, portions of Rust seem really similar there as well. JS and Rust don't really seem any more similar to me than Python and C++. There are similarities in both, but likely just because there are current trends in language design that get followed, and languages steal good ideas liberally from each other.


I think you're articulating what I more or less intended to say far better than I actually did, thanks. :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: