Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The main way the AI bots have problems is with timing. The neutral networks used have no way of encoding time dependent actions in a reasonable way. (As opposed to say a fuzzy decision tree with explicit time input.) And if you try to explicitly include it, curse of dimensionality strikes back hard.

Both absolute and relative timing have to be handled. And relative since specific salient action...

Plus the real reward is very sparse. Say, crippling mineral production early may or may not snowball. Likewise being a unit or two up...



What that tells me is that they haven't yet come up with the right featurization - that is, the function that maps input data into the actual neural network node values. The appropriate featurization would include the time information but reduce its dimensionality by hard-coding some basic assumptions, of the kind that humans presumably make when processing the same data.


I think these guys (and most people using deep models) try to avoid hand-crafted features as much as possible.


Gabriel (as well as the others on the team) have definitely looked at these areas - if things were left out/not "featurized" it was likely done via an ablation test, or showed improvement over benchmarks, or maybe just to set a baseline, as he is quoted in the main article. I don't know what techniques they used here, but I am excited to find out!

On the specific issue of encoding time-dependent behaviors in models, I think it is related to a broader issue that shows up in many application areas. To me the critical factor is that these models are ruthlessly good at exploiting local dependencies and totally forgetting long-term global dependencies or respecting required structure in control/generation.

This basically means it is very difficult to train long-term, time dependent behavior without tricks (early/mid/late game models, extensive handcrafting of the inputs, or using high level "macro actions"). Indeed, FAIR's recent mini-RTS engine ELF directly gives macro actions, in part to look closer at how well global strategies are really handled and remove one factor of complexity [0].

Gabriel's PhD thesis was entirely on Bayesian models for RTS AI, applied to SC:BW [1], so I am sure he is well aware of the "classic/rules based" approaches for this.

[0] https://code.facebook.com/posts/132985767285406/introducing-...

[1] http://emotion.inrialpes.fr/people/synnaeve/phdthesis/phdthe...


Alphago used several hand-crafted features as of the Nature paper, so DeepMind at least is not above a little feature engineering.


I suspect you might be able to do surprisingly well with just a few simple features, e.g. what did I last see at each position and how long ago was that, how many of each enemy unit have I seen simultaneously and at what time, etc.

As to the sparsity of reward, I'm not sure this is such a big problem. Once the AI learns that e.g. 'resources are good', it can then learn how to optimize resource production. You could even give the process a head start by learning a function of time+various resources+assorted features to win rate from human games to use as the reward function.


"Resources are good" doesn't really mean anything.

Yes resources are good, but how do you know when to expand?

Judging from opponents movements, you can tell if they're turtling, going for some cheese strat, or doing some build where they may not be able to respond to a aggressive expansion.

Of course if you choose wrong, you lost the game.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: