Migrating from Puppet / Mcollective to Ansible: Some facts

I have been engaged during the last month and a half in the Cloudways migration from a Puppet / Mcollective  environment to Ansible. 

We are far from done, but wanted to share with you some facts and data that we have been gathering along the way. Once the process is over, I hope to be able to give a more complete picture of the process.

Why?

In one word, simplicity. The Puppet/Mcollective combo has worked mostly fine for us since our cloud platform inception, but as the platform has become bigger (close to 2k servers now) and with more features, the level of complexity things were taking was hampering the business. 

We have been using a Masterless Puppet approach, orchestrating  the whole network via Mcollective applications (to deploy servers, to patch systems, to create console features ...).

With Ansible, we have the two pieces (configuration + orchestration) in one, which in itself simplifies things a lot. Additionally, the very straightforward inventory list + ssh approach to network control, offers increased security for our use case. 

The linear task execution in Ansible was also a huge improvement for us compared to the somewhat messy unordered (sure we have require, before ...) Puppet approach..

Last but not least, Ansible is much easier to code than the Puppet + Mcollective stuff. Our Ansible code base is much leaner and we have been able to homogenize what we previously had in Puppet manifests (DSL) and Mcollective applications (Ruby), in a single tool and language. 

CAVEATS

Although, so far Ansible looks pretty good for us, I don't mean to say it is for everyone. 

Here you have a  couple of things that you need to consider when evaluating which configuration/orchestration solution to use:

  • In our scenario, most of the network interactions we do are one to one (from our Command&Control server to one specific server in the network). In this context, Ansible is great. If you need to run lots of operations on large numbers of servers at the same time, Salt (ZeroMQ) or Puppet + Mcollective will be a much better approach.
  • Even with one to one operations, Ansible is slower than Puppet + Mcollective in most cases. Specifically, if you are deploying large number of files with your tasks, things can get very slow (still looking for workarounds).

NUMBERS

Finally to give some context to what we said above, find here some performance related numbers when executing different type of commands with Ansible vs Puppet + Mcollective over a real network of round 1000 servers:

mco ping - 5.952s answer from all servers
ansible -m ping - 58.961s answer from all servers

mco inventory [specific server] - 1.867s
ansible setup [specific server] - 3.415s

mco update [specific server] - 4.438s
ansible -m apt[specific server] - 3.148s

mco package install [package] [specific server] - 3.574s
ansible -m apt [package] [specific server] - 3.680s

MORE TO COME

As said, these are some first thoughts on the migration path from Puppet + Mcollective to Ansible. Hopefully within next month we are going to complete the process and will be in a better position to do a final and more comprehensive review. Hope this is useful in the meantime!