Ansible is great if you have workflows where sysadmins SSH to servers manually. It can pretty much take that workflow and automate it.
The problem is it doesn’t go much beyond that, so you’re limited by SSH roundtrip latency and it’s a pain to parallelize (you end up either learning lots of options or mitogen can help). However fundamentally you’re still SSHing to machines, when really at scale you want some kind of agent on the machine (although ansible is a reasonable way to bootstrap something else).
When I managed a large fleet of EC2 instances running CentOS I had Ansible running locally on each machine via a cron job. I only used remote SSH to orchestrate deployments (stop service, upgrade, test, put back in service).
There is Mitogen [0] that helps a bit. Their website also kind of explain some of the issues:
> Requiring minimal configuration changes, it updates Ansible’s slow and wasteful shell-centric implementation with pure-Python equivalents, invoked via highly efficient remote procedure calls to persistent interpreters tunnelled over SSH. No changes are required to target hosts.
Then of course python itself is not very performant and yaml is quite the mess too. With ansible, you have global variables, group level variables that can override them, host level variables that can override those, role level variables, play/book level variables that can override those and ad-hoc level variables that can override all of the above. I am telling you, it can get incredibly messy and needlessly complicated quickly.
As I said though, it's still the best we've got even if not optimal. So I think it's a good idea to implement it to at least have something.
We're just starting to implement it and we've only heard good things about it.