The skyline problem

bnegreve · on Sept 2, 2014

This is an instance of finding the Pareto front [1] of a set of points in a two dimensional space.

This problem is also known as "skyline queries"!

According to [2], author's bound O(n log(n)) is the best bound for d=2.

[1] http://en.wikipedia.org/wiki/Pareto_efficiency [2] http://pdf.aminer.org/000/211/201/on_the_computational_compl...

infinotize · on Sept 2, 2014

To interviewers who have given this problem: how do candidates typically perform? I would expect a lot of people to come up with some variant of an O(n^2) solution on the spot, but probably not get much further without some help, in an interview where you realistically have ~30 min to solve a problem.

Of course, this being HN, there are already several comments stating how the optimal solution is obvious.

drostie · on Sept 2, 2014

I'm also interested in how this varies by different languages.

For example, a programmer decently knowledgeable in JS will probably start with the "pre-data-structure-change" solution presented in the article, because if you're pretty good with JS then you know how events work and you probably start with, "okay, I'm going to scan left to right and emit an event on each new edge; that will look like this."

    var events = [];
    buildings.forEach(function (x) {
        events.push([x.l, 'add', x.h]);
        events.push([x.r, 'rem', x.h]);
    });
    events.sort(function (x, y) { 
        return x[0] - y[0];
    });

If you're lucky, they'll then say, "okay, I now need a height-stack where I can query the largest height, so..."

    function max(arr){
        return Math.max.apply(null, arr);
    }
    var height_stack = [], out;
    out = events.map(function (e) {
        var x = e[0], y = e[2], cmd = e[1];
        switch (cmd) {
        case "add":
            height_stack.push(y);
            return [x, max(height_stack);
        case "rem": 
            height_stack.splice(height_stack.indexOf(y), 1);
            return [x, max(height_stack)];
        default:
            throw new TypeError("No such command: " + cmd);
        }
    });

(There are a couple of other ways to implement max() above, depending on style; you could use .reduce(max_fn) for example, where max_fn is Math.max truncated to exactly two arguments.)

But I wouldn't expect them to come up with binary heaps a la http://eloquentjavascript.net/1st_edition/appendix2.html . I guess the best you could expect the candidate to do is to say, "then I Google someone who has done public-domain binary heaps and steal their code."

mpalme · on Sept 2, 2014

The problem es essentially to map from an indiscrete space to a discrete one. The solution that comes into my mind is a simple sweep-line algorithm:

1. Take all the start x-coordinates of a buildings (=start events).

2. Take all the end x-coodinates of the buildings (=end events).

3. Sort all the events the by the value of the coordinate.

4. Now walk over all the events and keep count of how many there currently are (or in this case: how high the building is) by adding one for a start event and substracting one for a end event.

http://en.wikipedia.org/wiki/Sweep_line_algorithm

prawks · on Sept 2, 2014

A closely analogous problem which I have had to implement is the problem of drawing n overlapping events with varying duration on a timeline, coloring the event based on how many are underneath it.

Personally I find that a simpler problem to visualize, however it deals with almost the same information. Rather than what height to draw, you're keeping a count of how many events are underneath. You increment at each leading "critical point" and decrement at each ending one.

The problem presented here is nice because it requires thought on both the algorithm as well as the data structure. Very nice article.

signa11 · on Sept 2, 2014

if we think about it somewhat, it becomes apparent, that it would take linear time to merge one building to the skyline, and also two skylines together.

merging skylines together gets us more bang-for-the-buck. and is trivially done with scanning them from left->right, match x-coordinates, adjust heights...

all of this is just divide-and-conquer, and usual recurrence rules apply i.e. T(n) = 2T(n/2)+O(n), which means T(n) = O(n lg n)

srean · on Sept 2, 2014

I am very bad at these things when put in a spot. OTOH if you would let me pace around for a bit and stay invisible for a few minutes then its not too bad.

Havent checked for correctness, but my off the cuff attempt would be to sort the left and right edges, then use a variable for the height and a stack. Proceed left to right. If I encounter a left edge and current height is lower than stack, push height on stack and update current height. If left edge lower than current, just push its value on the stack. If I encounter a right edge that matches current height pop height from stack in to current height. If right edge lower than current height, just pop the stack. The awkward case is when the left and right edges coincide.

prawks · on Sept 2, 2014

This is pretty much the solution given. I've encountered a similar problem before (draw a bunch of overlapping squares with differing z-values) and done it similarly. The difficulty you end up having is that a stack doesn't give you a fast way to find the maximum height (what building/square) is on top at any given point. Hence, the article suggests a max heap.

brianpgordon · on Sept 2, 2014

This is targeted toward students who could benefit from someone hand-holding them through the design of an algorithm, but anyone can learn something. It's always good to have another potential interview question in your repertoire.

optimusclimb · on Sept 2, 2014

Yup. I sure solve THAT problem every day. My boss comes in and is like, "well, we have the bugs from that security audit, we need to come up with a design for the feature we want to try and roll out in two quarters so we can agree on it and cut tasks/divy up work, and then we're using two many boxes for the Foo processor. I think you've never optmized that right? Why don't you have a look at that. Also, don't use any standard library functions or look anything up. When you're done with that, I have a question for you about manhole covers."

Yup, another great interview question for your repertoire to weed people out and talk about the kinds of things that come up in the day to day of being a modern software engineer.

legutierr · on Sept 2, 2014

Does a modern software engineer not need to demonstrate an ability to design efficient algorithms that solve problems that haven't been encountered before?

Or, in your mind, is the solution to every problem that a modern software engineer might encounter already found in some stackoverflow post?

btilly · on Sept 2, 2014

It all depends on the job.

Most people with the title "software engineer" are day in, day out, faced with random debugging, tinkering with code and keeping the code-base organized as they go. Clever algorithms are not needed. Solid development skills are. Furthermore when you do run into occasional algorithm problems, it is enough for there to be people in your environment that you can turn to for that.

This is an accurate description of most of the productive colleagues that I've known over the years. I've known a lot of people who do not do well on algorithm challenges, but who I'd like to work with again because I know that they can consistently pull their weight.

But doesn't someone need to know algorithms? Well, yes. I certainly do. But it doesn't need to be everyone. In a good team, people naturally turn to the most appropriate teammate for help when they need it. So they might turn to me for algorithm questions, someone else for weird git problems, and yet another person for networking expertise. Algorithms are just one component.

But, I can see you say, all else being equal, wouldn't you prefer someone who knows algorithms? Obviously so. But algorithms are only one component of what we do. And is not always strongly correlated with everything else that we need to know, such as unit testing and writing modular code.

And here we get to the real problem. When our interview tests test something different than what we actually need, then we don't know whether we're getting the skills we need. Furthermore when that different thing is something that every other organization is also testing for (like algorithms), then we wind up both not being interested in people who have the skills we need, but we wind up in direct competition with everyone and his dog.

Therefore interviews should try to test the skills that we actually need people to use day to day. If your job does not require people to regularly write clever data structures, don't test for that. If your job requires people to lay out and modify class hierarchies for new business requirements, test for that instead.

peeters · on Sept 2, 2014

> Does a modern software engineer not need to demonstrate an ability to design efficient algorithms that solve problems that haven't been encountered before?

On the job? Sure they do. In an interview room? That's not an adequate replication of the job environment. I am not asked to design an algorithm for a problem introduced to me 17 seconds ago and have 5 minutes to do it.

Plus, you're committing the #1 sin of interviews; that many of the people that can answer the question in an interview setting have encountered the problem before, and that's why they can answer it.

JoeAltmaier · on Sept 2, 2014

That answers something that's been bothering me about interview coding. The ones who pass, are the ones who've seen the problem before. So its not a coding test nor a creativity test; its an experience test only.

cousin_it · on Sept 2, 2014

I've passed many algorithm interviews without knowing the answer beforehand. It's very possible, people just don't bother to learn. Any good textbook on algorithms will give you the tools to create new ones. Mostly it's application of a few basic concepts, like big-O complexity, divide and conquer, or dynamic programming. Go through 100 problems on Project Euler and this stuff will become second nature.

cousin_it · on Sept 2, 2014

Sure, many programming jobs don't involve any algorithm design. But the most interesting programming jobs tend to involve algorithm design.

VLM · on Sept 2, 2014

Luckily thats only like 1% of jobs, even if all interviews are oriented toward assuming all jobs are like that. This is like the problem where distinct classes of applicants "A level" "B level" etc can be designated and have peculiar hiring techniques for their level, so logically ALL employers see themselves as an "A level employer" even if they're probably B or C or worse, and get all confused at the outcome when a "C level" employer tries to implement "A level" employee hiring practices.

Another unrelated problem is there is a legit observation that the most interesting jobs involve very interesting business logic and/or unusual situations. I'd much rather be programming a bubble sort on a mars rover at NASA than writing (from scratch of course?) quicksort at a corporate crud app house, despite in theory the crud app being more "elite" of an algo.

You'd be surprised how much embedded / life critical stuff is brutally brute force, and is also pretty fascinating work. Implementing a beautiful recursive algo for robot arm movement is cool, until a stack overflow kills an employee. Then its not so cool compared to brute forcing it where the employee would have lived. I heard about that one although I wasn't involved directly. (and this is bordering on hearsay so I have nothing further to say wrt that specific death)

sdrothrock · on Sept 2, 2014

> This is targeted toward students who could benefit from someone hand-holding them through the design of an algorithm

I enjoyed the article and then came back to read the comments and found this sort of insulting.

Just because something is detailed doesn't mean that it's hand-holding. It's interesting to see how other people think.

brianpgordon · on Sept 2, 2014

I'm sorry, I didn't mean to be insulting. I thought this was a little basic for the HN crowd so I wanted to make sure I left a disclaimer to that effect so people didn't think I was being condescending by explaining every little detail. I guess it backfired.

starmole · on Sept 2, 2014

In case you did not know: This is closely related to an s-buffer (s for span) in graphics. And the first approach (height map) is pretty much a z-buffer. Your silhouette there is eye distance (z) across a single scan line. One difference of how it is approached in graphics is that usually not all spans are an input array but inserted one at a time.

praptak · on Sept 2, 2014

> It's always good to have another potential interview question in your repertoire.

It is a widely known "good interview question" which means it is a bad interview question.

curiousDog · on Sept 2, 2014

A regular google interview question as well!

tomjen3 · on Sept 2, 2014

Why not just sort the rectangles by the left side position, then walk through the list, drawing a line from the current position to either the right of the next rectangle (in case that one is taller) or to the right of the current rectangle, in case it is not?

This even makes it easy to parallize it because you can use the quick sort algorithm to split the rectangles in two non-overlapping sets and once your two cores have made the two parts of the skyline you only need to update the line between them.

Retric · on Sept 2, 2014

Widths are not fixed. A tall skinny building might be drawn over a short wide one.

Also, you end up as N^2 degenerate case if the buildings form a pyramid.

sgeisenh · on Sept 2, 2014

If you think recursively, the n log n divide and conquer solution is pretty straightforward.

The problem becomes much more interesting when you consider optimal parallel performance. The best span you can do is log^2 n. I believe the problem can be reduced to sorting, though it is not required to explicitly sort the sequence of buildings in order to achieve the desired asymptotic performance.

cperciva · on Sept 2, 2014

I believe the problem can be reduced to sorting

Speaking of which: Given values 0 < a_i < A for some A, apply the skyline algorithm to the rectangles a_i < x < A, 0 < y < a_i. The resulting sequence of heights gives you the values a_i sorted in increasing order; thus the skyline problem cannot be solved in less than O(n log n) time.

CharlesC · on Sept 2, 2014

Got this as an interview question at Blizzard. Failed pretty spectacularly. I kept getting close but then the interviewer would jump in with "Well think about it this way...what if blah blah blah?"

michaelmior · on Sept 2, 2014

My first thought was to dump the rectangles into an interval tree. Also O(n log n), but less space efficient since it always requires O(n) space regardless of the structure of the problem.

brianpgordon · on Sept 3, 2014

Are you sure that it's n log n? I think that the pyramid shaped configuration defeats this strategy by forcing the interval tree to report something on the order of n intervals with every query. So there are n queries on the interval tree, and each query takes log(n) + n time.

Also, something interesting to note that I was thinking about yesterday is that you don't actually need to use an interval tree for this. You can use a range tree instead, which is simpler (to my mind). You only need an interval tree to do a certain kind of query which we never need to do:

https://briangordon.github.io/images/range-interval.png

michaelmior · on Sept 3, 2014

I'm not sure what you mean by the pyramid-shaped configuration. In any case, n queries taking log(n) + n time each is O(n log n). I think you're right that a range tree should work fine.

brianpgordon · on Sept 3, 2014

I mean this thing from the article:

https://briangordon.github.io/images/worstcase.svg

> n queries taking log(n) + n time each is O(n log n)

No, it's O(n^2).

michaelmior · on Sept 3, 2014

Wow, clearly commenting too late. Of course you're right :)

thewarrior · on Sept 2, 2014

How would you efficiently find the critical points ?

nitrogen · on Sept 2, 2014

The critical points are given as inputs to the algorithm in the form of the X coordinates of the rectangles' edges.

thewarrior · on Sept 2, 2014

D'oh .. Missed that . Thanks.

idunning · on Sept 2, 2014

The critical points are the left and right sides of the rectangles.

praptak · on Sept 2, 2014

Fun fact: this is a specific case of finding the union (usually: area of) of a set of axes-oriented possibly overlapping rectangles.

frownie · on Sept 2, 2014

is it me or the algorithm suggested at the end degenerates if the rectangles are organized as the steps of a stair and each of them extends to the rightmost position ? In that case it degenerates because the heap manipulation degenerates...

brianpgordon · on Sept 3, 2014

I don't think so. No matter what shape the problem is in, there are always exactly n inserts and n deletes to the heap. Those are O(log n) operations, and there are 2n of them, so it's O(n log n).

wiz21 · on Sept 4, 2014

Ahhh read too fast :-( Crucial sentence is "Fortunately, we know about a data structure which can keep track of an active set of integer-keyed objects and return the highest one in O(logn) time: a heap."

But then again. If I read well, he wants to 1st sort the points, 2 iterate over them and using the heap. So we have n log n for the sort and n * n log n for the heap use.. Basically (1+n) n log n (or i'm still tired ? :-))

brianpgordon · on Sept 5, 2014

The uses of the heap are:

1. Peek the root of the heap to find the tallest rectangle in the active set. This is a constant time operation.

2. Add a new rectangle to the active set. This is a O(log(n)) operation.

3. Remove a rectangle from the active set. This is a O(log(n)) operation if you use something like a hash table to locate the element in the heap.

Each rectangle is added once (at its left edge) and removed once (at its right edge). The root of the heap is peeked on every critical point. All together, that makes the algorithm O(n log n).

Or am I missing something?

arikrak · on Sept 2, 2014

The animations are cool and helpful. How were they created?

brianpgordon · on Sept 2, 2014

I used the trial of Edge Animate:

https://creative.adobe.com/products/animate

It's a really interesting tool. Here are the animation source files if you want to play around with them:

https://dl.dropboxusercontent.com/u/14282009/skyline.zip

arikrak · on Sept 2, 2014

Thanks, I'll check it out!