Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Using Monads in C++ to Solve Constraints: 1. The List Monad (bartoszmilewski.com)
61 points by andrzejsz on May 11, 2015 | hide | past | favorite | 70 comments


This method is really inefficient. There are b! / (b - n)! ways to choose n of b unique numbers, whereas this solution is O(b^n).

That being said, it's still fast enough for this example.

Also, Python solution along the same lines:

    import itertoolsdef
    def valueOf(*values, base=10):
        v = 0
        for v in itertools.accumulate(values, lambda a,b: a*base+b):
            pass
        return v
    def solve():
        for s,e,n,d,m,o,r,y in itertools.permutations(range(10), r=8):
            if valueOf(s,e,n,d) + valueOf(m,o,r,e) == valueOf(m,o,n,e,y):
                yield (s,e,n,d),(m,o,r,e),(m,o,n,e,y)
(If anyone knows of a better way to do valueOf, let me know)


> This method is really inefficient.

That's what I've noticed with Haskellers. They have a tendency to write really slow code and don't have a clear idea of the execution model behind what they write down.

A while ago I asked in #haskell for translations into Haskell of the following C++ code:

http://codepad.org/sQTXhqC2

This should be easy. The algorithm is right in front of you. There should not be a lot of thinking in transcribing an algorithm into Haskell. But most of them wrote down something with a fold which turned what is an O(m*n) algorithm in time and O(n) in space into something like O(m^n) in space and time. The only one who got close to the C++ implementation was one who used a lot of State monads.

Then, of course, there was someone who pointed me out to a radically different algorithm written in Haskell that I think is only O(n+m) in both space and time. I haven't been able to understand this other algorithm yet.


The responses you got on IRC don't necessarily represent Haskell best practice (or even good practice). Most people probably have better things to do than translate random toy programs from C++.

Certain highly stateful algorithms may be difficult to translate into Haskell, in the same way that certain highly structural or lazy algorithms may be difficult to translate to C++.

I'm curious if your time complexity analyses are accurate, given that you (admittedly) did not understand the faster algorithm. You could be getting tripped up by laziness.

In general, I've found that most production Haskell code is quite fast, in both big-O and absolute terms.


Spoken like a true Haskell fanboy. Why don't you give it a shot, OP has posted the code.


Shriram Krishnamurti to the rescue! Here's a memoized version in scheme: http://blog.racket-lang.org/2012/08/dynamic-programming-vers...


> They have a tendency to write really slow code and don't have a clear idea of the execution model behind what they write down.

Of course, this isn’t unique to Haskellers. ;)

Granted, reasoning about performance in Haskell is still a relatively young art, and there aren’t many of us who are experienced with it.


If edit distance is bounded there appear to be algorithms that are linear in the input size:

http://en.wikipedia.org/wiki/Edit_distance and http://blog.notdot.net/2010/07/Damn-Cool-Algorithms-Levensht...


There is a set of simple patterns/recipes most people use in Haskell that corresponds to their kind of math intuition Haskell seems to support. It doesn't have anything to do with efficiency though, for that you need to be an expert and you still won't be 100% sure the compiler really optimizes the execution the way you intended.


> you still won't be 100% sure the compiler really optimizes the execution the way you intended.

You can be sure if you read the generated Core and/or assembly code.


And if you're willing to do that on every upgrade of the compiler or anything that may affect the compiler's exact execution path.


Performance regressions in GHC are taken seriously as far as I've seen. It's not that big of a deal since you should test/profile your program on any compiler upgrade.

If you see any regressions, just post it to the ghc mailing list and it will probably be solved for you since it's much needed regression feedback/data.


That's good to know. Though I will note that I don't use Haskell. (Haven't had the time to properly learn it, and keep getting frustrated by minor things when I try to dabble with it.)

That being said, I used to think that about Java as well - and then the whole substring thing happened.


The Haskell Wiki has what seems like a clean and efficient implementation:

https://wiki.haskell.org/Edit_distance


Yeah, it's got a completely different algorithm along the diagonal of the matrix. That's the O(m+n) algorithm. I still haven't been able to understand it.

But that's not the point. The point is that Haskellers have a hard time translating an algorithm that's right in front of them and instead of writing down an O(m*n) algorithm they instinctively do O(m^n). This whole "many-worlds" approach of the original article is also leading to a very inefficient algorithm.

The abstractions Haskellers prefer lead them towards slow code.


    Haskellers have a hard time translating an algorithm that's right in front of them 
That's no surprise: it is an open problem if this is possible in general. Some time ago Pippenger [1] showed that some programs cannot be transliterated into pure strict functional languages without suffering a slowdown. Later, Bird et al [2] show that the example given by Pippenger to show the slowdown really depends on strictness, and can be transliterated into a pure lazy functional language without asymptotic slowdown.

As of May 2015, nobody knows if lazy functional language can solve all problems with the same asymptotic time complexity as stateful languages.

[1] N. Pippenger, Pure Versus Impure Lisp.

[2] R. Bird, G. Jones, O. De Moor, More haste, less speed: lazy versus eager evaluation.


It's an open problem whether it can be translated into a pure, lazy language without an asymptotic slowdown. It's not at all an open problem as to whether it can be implemented in Haskell. The algorithm is mutating vectors - the direct translation is to use ST, and will have the same asymptotic behavior as the C++ version.


So you are saying Haskell is not a pure, lazy language?


Why is that surprising? Haskell (as typically implemented; I'm slightly less confident about the details of the report) is lazy and pure by default, but that can be violated locally or more globally if you ask for it loud enough. So the language, as a whole, is not pure in the sense required for those theoretical questions to really apply.


Sure, if you use some of the unsafe stuff ... but what about the safe core of Haskell, do you consider that pure?


What's needed here is ST. It permits local O(1) mutable vectors, while enforcing referential transparency when viewed from outside the runST.

ST is clearly not pure in the sense that theory requires to make "can we implement this algorithm with the same complexity" an interesting question. It seems wrong to consider it part of "the unsafe stuff", however - it doesn't have the same kind of unenforced caveats as unsafePerformIO and unsafeCoerce and similar.

Separately, "the unsafe stuff" is still part of the language as used, and while it's certainly better practice to avoid it when you can, it's silly to exclude it when asking whether it's possible to do something - they are sometimes appropriate.


By ST you mean Haskell's StateT monad transfomer?

Are you suggesting to define purity by the inability to define O(1) mutable vector-like operations? That would be an interesting proposal. But I wonder if it captures the conceptual content of the informal concept of purity. Maybe a time complexity restriction should be provable from purity?


"By ST you mean Haskell's StateT monad transfomer?"

No, I mean ST. An entirely different monad: http://hackage.haskell.org/package/base-4.7.0.0/docs/Control...

"Are you suggesting to define purity by the inability to define O(1) mutable vector-like operations?"

"Define", no, as that certainly doesn't encompass the full scope of purity. I think mutability runs counter to most people's notion of purity, particularly when they say "Haskell isn't pure". Of course, we can fake mutability at a cost of O(log n), which is what makes the O(1) notable.

If you do think ST is "pure" then it's not an open question whether you can implement update-heavy array based algorithms in a pure language without algorithmic penalty - you clearly can.


SPJ and J Launchbury seem to think ST is pure.

Looking over the PLDI paper, I get the impression that the key difference between ST and the usual State monad is (1) the ability to name state, and (2) the use of parametric polymorphism to prevent scope-extrusion of state.

Of course ML has scope-extrusion of state, and one of my questions/examples upthread was about the ability of Haskell's monadically controlled effects to implement such scope extrusion without 'tricks'.

All in all I feel that the notion of purity is controversial, insufficiently theorised, and that the community would benefit from a clearer understanding of the trade-offs involved in the different notions of purity


You are infuriating.

If you object to notions of purity, why did you bring it up in the first place?

The first place it was used in this thread was your citing a paper entitled "Pure versus Impure Lisp".

In that paper, it was used to mean prohibiting "set-car!" and "set-cdr!". ST provides the equivalent of "set-car!" and "set-cdr!", so in the presence of ST the result you were citing in the paper doesn't even apply to a strict version of Haskell.

Please just stop making claims about Haskell until you have some sense of what you're talking about (or at least add a caveat warning others that you might not).


I don't object to purity, I want to know what it means. I didn't realise that this is not a well-understood concept. (I mentioned Sabry's definition of purity which I found surprising.)

The original discussion started because somebody complained that Haskell programmers often appear unable to transliterate imperative algorithms. I assume that this complaint was about transliteration into pure Haskell (whatever that may mean exactly). Clearly transliteration into full Haskell with actual mutability is possible. I mentioned Pippenger's article to bring to the debate the idea that the question whether such transliteration is always possible is an open research problem.


"I assume that this complaint was about transliteration into pure Haskell (whatever that may mean exactly). Clearly transliteration into full Haskell with actual mutability is possible."

Ah, if you'd made that assumption explicit I would not have objected.

I don't see anywhere the querent suggesting that his translation demand invoked any notion of purity. If it did, then certainly he was asking a harder question than he presented, although even in that case a naive translation could have trivially produced an O(m n lg(n)) solution and O(m^n) should quite reasonably be considered failure.

I think the issue in the main is that relatively novice Haskell developers are likely to be the most eager to try and translate arbitrary code on demand in #haskell.


I don't think anyone knows a solid definition of purity. It certainly refers to at least two things (1. independence of order of evaluation, 2. no side effects) and informally speaking we often "know" what it is in a particular context, but writing down a formal definition is very hard. I'm certainly not satisfied with Sabry's definition. I don't like the idea that you have to have both a strict and a non-strict evaluation for your language.


If nobody knows what purity means, then what are all those tortured discussions on whether Haskell is pure or not about? ...

Anyway, I'm a bit unhappy about the requirement of full evaluation order independence, because even the purest of pure languages (e.g. the untyped lambda-calculus or PCF) don't have full evaluation order independence. I guess you mean evaluation order independence modulo termination?

My intuition is that pure means that all language expressions M can always be evaluated locally by just looking at the meaning of all subterms of M. In terms of Hoare logic (for total correctness) that would mean that the pre-condition speaks only about the termination of the free variables of M (sorry for being so terse, I think this can be made formal).


Usually this is the way words work - we notice a regularity in the environment we wish to refer to; we build a sense of it, usually motivated by some examples and counterexamples; we give it a handle; and we then try to find a definition that matches our intuition. There certainly seem to be regularities that "pure" is intended to point at in this context, but it definitely seems that we've not settled on a definition, and we may not even agree on examples.

The notions of purity I was pointing at was any that entailed the restrictions in the paper you'd linked. There are restrictions of Haskell that are pure in those senses, but Haskell taken as a whole is not, even without invoking anything unsafe.


> If nobody knows what purity means, then what are all those tortured discussions on whether Haskell is pure or not about? ...

They're about what purity means!


"I get the impression that the key difference between ST and the usual State monad is (1) the ability to name state, and (2) the use of parametric polymorphism to prevent scope-extrusion of state."

The key difference between ST and the State monad is that the State monad is simply sugar for a function call of the form (s -> (a, s)), whereas ST allows for actual mutability. This is what changes the asymptotic complexity.

ST also does things to prevent references from leaking, and that's valuable and more useful in the context of ST, but that's irrelevant to our questions up-thread.

"Of course ML has scope-extrusion of state, and one of my questions/examples upthread was about the ability of Haskell's monadically controlled effects to implement such scope extrusion without 'tricks'."

That wasn't this thread.

I still think scope extrusion - without asking for it - would be a bug not a feature; particularly so in the case of a multithreaded, lazily evaluated language with inlining and CSE - where you really have little control over what gets executed when if you're ducking the features that guide what gets executed when.

This doesn't limit expressibility, because you do have the ability to ask for it, using those "tricks" that you keep excluding for no good reason. "How expressive is Haskell if it lacked those" is an interesting question, but not am immediately practical one, and you should be clear that that is the question you're asking.


What do "pure" and "lazy" mean to you?


Lazy is easy: it's call-by-name evaluation, i.e. (lambda x.M)N -> M[N/x] with caching, ie. each argument is evaluated at most once.

Purity is more subtle. I don't think there's an agreed upon definition. [1] essentially ties purity to the equivalency of call-by-name, call-by-need and call-by-value evaluation orders (ignoring divergence and errors). I'm not totally sure that is the right definition.

[1] A. Sabry, What is a Purely Functional Language?


The most proximate question here, with respect to purity, is "can you do O(1) mutation (outside of the narrow case of replacing a thunk with its result)?" The theory is far more interesting when we say "no", but the answer in the case of Haskell is "yes".


With those definitions pure and lazy languages can certainly do mutation and have effects, viz. the IO and ST monads.


It seems a bit of a broad generalisation to say "Haskellers have a hard time translating an algorithm" based on who happened to be on the IRC channel at the time and decided to answer a challenge from a random C++ programmer who turned up.

Anyway, Haskell is perfectly decent at implementing your algorithm, and imperative algoritms in general. He's a more-or-less direct translation

    http://lpaste.net/132519
There's extra noise related to wrapping and unwrapping mutable cells which I could tidy up if I had the time, otherwise it's essentially identical.

In any case the author of the post in question, Bartosz Milewski, is an expert in C++ and not particularly an expert in Haskell, so your point seems misdirected.


And of course the root of this conversation was mistaken in the first place. The algorithm Bartosz is presenting is the b!/(b-n)! algorithm that just examines each permutation once.


> It seems a bit of a broad generalisation to say "Haskellers have a hard time translating an algorithm" based on who happened to be on the IRC channel at the time and decided to answer a challenge from a random C++ programmer who turned up.

I'm not a "C++ programmer". I'm just a programmer. I happened to have the algorithm written in C++, but I could quite easily have written it in pseudocode.

I got about 6 responses in 20 minutes. Four of those were O(m^n). Yes, I think this is quite indicative that Haskellers have a hard time writing performant code.


If, as per your profile, you are someone "who really just wants us all to get along", I would recommend against picking stupid fights like this.


The blog post explicitly says that the C++ is a translation of Haskell code, and links to the (efficient) Haskell original.


Note that the later parts of this series havd been published, and in fact the definition of StateL does inforce uniqueness as expected, in a way that makes the proposed solution O(n!).

For details, look at the first code block in the "The Client Side" section, here: http://bartoszmilewski.com/2015/05/18/using-monads-in-c-to-s...


Satisfying arbitrary constraints is often exponential (http://en.wikipedia.org/wiki/Boolean_satisfiability_problem). I'd certainly be interested in seeing more efficient implementations, but the approach taken by the example Haskell code (and duplicated in your Python) is substantially more efficient than the example non-Haskell code provided in the article.


I know about BSAT / BIP / ILP / MLP and the problems thereof in the general case.

However, this doesn't mean we have to punt the easy cases just because there are hard cases.


It certainly doesn't mean we have to punt the easy cases, but it seems inappropriate to lambaste the current solution without establishing that we are in fact looking at an easy case.

That no one seems to have produced (or pointed at) an asymptotically better solution in any language, I think that's some small evidence that there isn't such a solution and substantially more evidence that it's not easy to find.


There's an easy better method: assign variables starting from the lower order digits, greedily check correctness, and skip over all permutations with that prefix if it fails.

Although I don't know how one would go about proving or disproving if this is asymptotically better or just a (substantial) constant-factor speedup.


Oh! I had misread your original comment to be, "There are b! / (b - n)! ways to choose n of b unique numbers, [so] this solution is O(b^n)." You were contrasting! I believe that b!/(b-n)! is asymptotically the same class as b^n (whether we're varying b or n), hence my confusion.

I had initially skimmed the article, and assumed he was already handling uniqueness. With the correct definition of StateL, the solution presented is equivalent to what you describe here, but apparently that definition is deferred to a future article.

The idea would be that, for StateL, for_each would pick one value and pass that value to the lambda, interpreting the result in a context where the sel only sees the remaining values in the list.


> I believe that b!/(b-n)! is asymptotically the same class as b^n

Nope. It looks like it at first glance, but it isn't.

Look at what happens when n == b/C:

    b^(b/C) / (b!/(b - b/C)!) ==
    b^(b/C) / (b!/(b*(C-1)/C)!) ==
    e^(log(b^(b/C)) + log((b/C)!) - log((b*(C-1)/C)!)) ~=
    e^(b/C log(b) + [b*(C-1)/C] log (b*(C-1)/C) - [b*(C-1)/C] - b log b + b) ==
    e^((b+b (-1+C) log((-1+C)/C))/C) == 
    e^(b/C* (1+(C-1)log((C-1)/C)))
(I used Stirling's Approximation, namely the form ln(x!) ~= x ln x - x for large x)

However, note that (1+(C-1)log((C-1)/C)) / C is greater than zero for all C > 1, as 1+(C-1)log((C-1)/C) > 0 for all C > 1. As such, this diverges for all C > 1.


b!/(b-n)! only = b!/(b/C)! when C = 2, but interesting regardless. I'll revisit when I have the chance.

In any case, I make the assertion that the algorithm proposed in the article is in fact O(b!/(b-n)!), with the intended definition of StateL.


Whoops, started with C = 2 and generalized, incorrectly it would seem. I'll edit it.

Edit: edited.


Thanks :) C = 2 was enough, of course. So O(n!) is between O(k^n) and O(n^n)?


Yep. Though we often weaken a big Theta of n! to a big O of n^n, as it's easier to deal with.


That makes sense, and probably contributed to my confusion.


Here's one possible implementation of valueOf:

  import numpy as np

  def valueOf(*v):
    return sum( v * np.power(10,range(len(v)))[::-1] )

  > valueOf( 2,3,4 )  # => 234

Also, in python 2.7, I don't think you can put a named variable proceeding a wildcard? At least, that doesn't work for me, so I left it out.


That is a) hauling in an awfully big external library for something that should be relatively trivial, b) inefficient (you're replacing a single multiplication by 10 and addition per digit with O(log x) multiplications and an addition per digit, assuming np.power uses a decent exponentiation algorithm), and c) if anything, less readable than my version.

To put it simply, I don't see an improvement with that version.

And Python 3 has some improvements w.r.t. wildcard arguments.


If you don't need other bases you can just string join them. : P


That's actually how I started, and `int` has a base parameter.

But it's still relatively verbose, due to Python's whole "join-only-takes-strings" thing:

    def valueOf(*values, base=10):
        return int("".join(map(str, values)), base=10)
Not to mention it's doing a whole lot of unnecessary work.


Whoops. Miscopied. the second `=10` shouldn't be there.


This post should be read in the context of the related posts in Milewski's highly informative blog. The goal is not to show a solution, it is to demonstrate some aspects of monads, and in this particular case, that the concept is not confined to Haskell or functional languages.


this was mine, brutey

    import itertools as it
    a =  'dog'
    b =  'food'
    c = 'tasty'
    letters = sorted(set(a+b+c))
    for combo in it.permutations('0123456789', len(letters)):
        mapping = dict(zip(letters, combo))
        if len(c) > max(len(a), len(b)) and mapping[c[0]] is '0':
            continue
        f = lambda string: int(''.join(map(mapping.get, string)))
        n1, n2, n3 = map(f, [a, b, c])
        if n1 + n2 == n3:
            print( mapping )
            print( n1, n2, n3 )
            break
seems haskell friendly given all the mappings, but not in the way that the blogpost did it. but also not with that big nested loop either.


I assume/hope that the big idea for part 2 is that this setup allows for efficiently pruning impossible sub solutions and will result in an optimal algorithm.

Why would you write an algorithm in C++ if the runtime will be bad?


He even says explicitly "(never mind how we deal with uniqueness)" in the solution section and has already acknowledged that there are only around 2 million combinations to check.

ETA since he mentions the implementation is a translation of this Haskell implementation: http://blog.jle.im/entry/unique-sample-drawing-searches-with... presumably the uniqueness is done by updating the list of remaining possibilities for selection each time a new value is selected.


    from functools import reduce
    def valueOf(*values, base=10):
        return reduce(lambda x,y: x*base+y, values)


Doesn't work with a zero-length input (although that's easily fixed).

That being said, that's much cleaner. Thanks!


Sure. But it isn't clear to me it should work with zero length input. Not working is consistent with the builtin int, int("") throws an exception.


It's straight from the definition of an integer. An integer is defined as sum(i=0 to n) d_n * base^n. A number with zero digits is just the empty sum, which is canonically defined to be zero.

I'll also note that int() returns 0.


Do you get bonus points for knowing that these are Alphametics? And covered in ridiculous detail in volume 4A of The Art of Computer Programming? I'm curious how the solutions there stack up to what is in mind here.


I would just use std::next_permutation (or equivalent partial permutation from a library). This produces pretty clean code.


"Once you internalize the many-worlds approach to programming, the implementation is pretty straightforward." are you serious? this is why people don't take Haskell seriously.


It is disingenous in the extreme to ask, "So what’s all this talk about mutation and state? Well, who said you can’t have state in functional programming?" when the very first line of the very first hit on "functional programming mutation state" is the Wikipedia article on functional programming, which says:

"In computer science, functional programming is a programming paradigm—a style of building the structure and elements of computer programs—that treats computation as the evaluation of mathematical functions and avoids changing-state and mutable data." -- https://en.wikipedia.org/wiki/Functional_programming

So the answer to the question "Who said you can't have state in functional programming?" is: "Every introduction to functional programming ever."

This is why people hate Haskell and make fun of Haskellers. In their quest to appear more clever than the rest of us, they engage in utterly disingenous rhetoric. Which is too bad, because I'm pretty sure they are actually more clever than the rest of us--certainly more clever than me--and don't need this nonsense.

Haskell is an amazingly cool language, and I've talked to people at meetups who are using it for interesting things, and learning a bit of it myself has been an interesting and educational challenge. But getting around to learning it took five years longer than it should have because I was so put off by the rhetorical nonsense the language community engages in.


"Having state" != "Changing-state and mutable data".

5 years?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: