Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
From scratch: reverse-mode automatic differentiation (in Python) (sidsite.com)
114 points by montebicyclelo on June 14, 2020 | hide | past | favorite | 17 comments


I'm just an enthusiast. With that said, I think the author's first image image really captures the core of forward propagation and back propagation. For people who want to get more of a feel on autodiffing, I find these 2 sources amazing [1, 2]. They don't use the word autodiffing, but looking at it, I can see it's the same thing.

I once made an attempt at animating feed forward propagation [3]. If I'd animate back propagation, I'd probably make an animation on autodiffing as well as that is the most intuitive way I know on how to explain back propagation right now. It's still on my big to do list ;-)

[1] http://karpathy.github.io/neuralnets/

[2] 3Blue1Brown does in episode 4 of his Deep Learning series: https://www.youtube.com/watch?v=tIeHLnjs5U8

[3] https://melvinroest.github.io/Interactive-Neural-Networks/


Nice introductory resource.

On the same topic, see also:

- Andrej Karpathy's elegant micrograd library: https://github.com/karpathy/micrograd

- This tiny neural networks API inspired by it: https://github.com/bpesquet/pyfit


This is a fantastic hands-on introduction to practical autodiff. (And the blog's other articles are worth reading too if you find their subjects interesting).


I wonder if automatic integration is possible. And yes I'm aware LS.


There's a straight forward algorithm for automatic differentiation. You can think of it like long division: tedious to do, but you can easily program a computer to do it.

Integration is a whole different ball game. It's essentially a puzzle: what function, when differentiated, produces this result? There are various heuristics you can apply, e.g. integration by parts. And there are some databases of common integrals. But in the end, there's no algorithm, integration requires ingenuity to solve.


It’s slightly better. https://en.wikipedia.org/wiki/Risch_algorithm: “The Risch algorithm is used to integrate elementary functions. These are functions obtained by composing exponentials, logarithms, radicals, trigonometric functions, and the four arithmetic operations (+ − × ÷)”

That algorithm isn’t easily programmed, though, and will fail in the many cases where the integral cannot be expressed in elementary functions.

Worse, according to that page, it needs the ability to check whether expressions are zero (https://en.wikipedia.org/wiki/Constant_problem), and that, in general, is undecidable if I understand that page correctly.

https://mathworld.wolfram.com/RischAlgorithm.html seems a tiny bit more optimistic about the limitations of this approach, but https://link.springer.com/chapter/10.1007/978-3-319-00894-3_... seems, to me, in line with what Wikipedia says.


There is also an open source rule based integrator that I came across.

https://rulebasedintegration.org/


No, with automatic integration you presumably would not get the integrated function, the same way you don't with dual number autodiff.


Exactly.

Automatic differentiation works because we have the chain rule (f . g)'(x) = f'(g(x)) g'(x), telling us how to differentiate functions which are composed from simpler building blocks. But there is no analogous rule for integration.


Technically there is:

  D f = f' <=> S f' = f  # inverse
  
  # chain rule
  D (f . g) x = D f (g x) * D g x
  # divide by D g x
  D (f . g) x / D g x = (D f) (g x)
  # S vs D inverse
  D (S f' . g) x / D g x = (f' . g) x
  # commute = and integrate both sides
  S (f' . g) x = S \x( D (S f' . g) x / D g x ) x
  # rename and drop argument
  S (f . g) = S \x( D (S f . g) x / D g x )
The problem is that the RHS is, in the general case, not any less complex (and often much more) than the LHS.

Edit: Note that (S f . g) is integrating only f, then composing with g, which is what you want; it's the S \x(...) that causes problems.


Nice observation, thank you for sharing!


I don't think it makes sense to the same degree. Simple compositions of elementary functions, e.g. exp(x^2), do not have indefinite integrals.

Autodiff provides benefit since differentiation can be expensive numerically while easier symbolically. Integration is, in some ways, the opposite. Integrals with no symbolic representation can be estimated with quadrature methods.


> I don't think it makes sense to the same degree. Simple compositions of elementary functions, e.g. exp(x^2), do not have indefinite integrals.

While it's clear what you meant, it's mathematically important to note that functions like yours, and, indeed, all continuous functions, definitely do have indefinite integrals—that of `exp(x^2)` even has a name (https://mathworld.wolfram.com/Erfi.html); those integrals just aren't elementary, in the same technical sense in which you are using the word (https://en.wikipedia.org/wiki/Elementary_function).


Your first link is not working for me on mobile. Thanks for the clarification though.

Exp(x^2) is a finite composition of a polynomial and an exponential function so from what you posted, it does fall into the Wikipedia definition of elementary function.


The first link of your parent should've been https://mathworld.wolfram.com/Erfi.html

They were arguing that the antiderivative F of f: x ↦ exp(x²) is not elementary, whereas f itself is elementary. But both F and f are well-defined functions, which means that f has an indefinite integral, it just happens to be not elementary.


> The first link of your parent should've been https://mathworld.wolfram.com/Erfi.html

Sometimes Markdown eats following parentheses and sometimes it doesn't, and I've never figured out why. When I first viewed the post it showed me the parenthesis not being eaten, but then it did it anyway when I viewed it from the main page (too late for me to edit). Thanks for fixing it up!


What do you mean by LS? Label Smoothing?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: