Optimizing Ansible Playbooks with Asynchronous Tasks and Parallelism

Priyanshu Bhatt
8 min readApr 19, 2023

--

Ansible is a great tool for automating systems but sometimes you can run into issues with it for not being able to execute all tasks asynchronously. In certain tasks, the time required by the task to complete is significantly high, and when we have more tasks following the task which require heavy completion time then things start getting complicated. If the task is independent of one another then we have to use certain measures to decrease the time required in processing like some batch processing and backup tasks, which requires a lot of time and just waste the time of the controller node if we wait for them. Ansible works in parallel by default, meaning it executes tasks on multiple hosts concurrently. However, the number of hosts that can be executed in parallel is limited by the number of forks set in the Ansible configuration. This post will go over what asynchronous/parallel task execution is, and how to write a playbook that saves your time.

To learn more about Ansible from scratch Check out the Basics Of Ansible blog before starting with the hands-on.

Setting Up The Lab

In Your Lab, you need 1 Controller and 2 Manager Nodes for this hands-on. You can use AWS Ec2 services or You can set up Virtual Machines in your Workstation using your favorite VM provider. To demonstrate the practical I am using AWS instances.

After the lab setup is done check the connection to hosts by pinging each from the controller node:

ansible all -m ping

Ansible Master-Slave setup in AWS:

Asynchronous Task Execution

Suppose You want to copy a large number of files into multiple host systems, using the synchronous way of doing this will cost us time as it will copy files in one host at a time. Or think About backing up the system after every 5hrs using Ansible. If a playbook that does backup in multiple systems starts waiting for each system to backup then there will be a huge lag in the systems which can impact the underlying applications. Even some tasks take so much time that the ssh connection limit also gets exhausted. To handle this we can use the Asynchronous way in Ansible, which provides a powerful way of managing large-scale infrastructure and performing routine maintenance tasks in a timely and efficient manner.

Alright, now that we’ve covered the theory, let’s roll up our sleeves and dive into the practical side of things! Don’t worry, I’ll guide you through each step and we’ll tackle this together.

Step 1: Add All the IPs in a group here instance( You can do this without putting it in a group but it’s more readable)

ansible instance --list

Step 2: Netstat is a command in Linux used to check the network connection and its associative statistics. So whenever any connection is made to a remote host we can view the type of connection, state, and Ip of the foreign Host using this utility. Here this can be used to check when the ansible manager connections are established.

netstat -net -c

Step 3: Write a dummy playbook to simulate the backup environment using the sleep command. This is a simple playbook without the use of async:

- hosts: instance
tasks:

- name: "This is sleep Task 1"
command: sleep 100


- name: "This is sleep Task 2"
command: sleep 10

When we run this playbook the execution will be stuck at task1 as by default ansible works in a synchronous way until it completes the command in the remote host and then moves to the second task. The workflow will be like it first runs task 1 with sleep 100 in host1 which can be seen in the netstat output then it runs task 1 in the second host and when both get completed then it moves to the second task. This shows that Ansible by default works in this mode. Parallelism means running the task on all the manager nodes at the same time. This can be manipulated by using the forks keyword in the ansible.conf file.

Step 4: When we introduce async in the playbook we can also use the poll keyword. In simpler words async means waiting for the task for a certain time only(by default is as long as the task takes time) and if under that period it is not complete it will throw an error and a poll is used to check the status of the task after a specific time. It's like It will not wait for the task to get complete and quickly move to the next task but when after the async interval it will check the completion of the task. After every poll interval time, it will check whether it’s completed or not till the max async time is reached.

Let's see this by using the same playbook with the async and poll keywords in different scenarios:

  1. If the Async time is set to less than the time of the sleep:
  - name: "This is sleep Task 1"
command: sleep 10
async: 5
poll: 2

When this will run it will throw an error as the execution time is more than the max time set for the polling. Maybe the task gets completed but as the task completion time was more than the time of async hence it throws the error.

2. To run the task in an async way we do a trick by making the poll 0 and running the playbook, This sends the signal to ansible to not check for the task completion and quickly move to the next task.

- hosts: 65.0.81.41
tasks:

- name: "This is sleep Task 1"
command: sleep 100
async: 50
poll: 0


- name: "This is sleep Task 2"
command: sleep 100
async: 20
poll: 0

The output will be seen in real-time when the connection was established and the tasks are running in the background of the remote host using the ps -aux command. Ansible gives the output that the task is complete as the polling is set to 0 but in the background, the task will be running in the remote host. One use case can be in backup jobs.

But it can happen that the task doesn’t get completed but the controller gave the output as success. In this scenario, we can use the async_status module which can check the completion at the end of the execution and keep on retrying a certain number of times to check whether the job is completed or not but run the task in the async way only.

Using Async_status

Every task run using the Async keyword gets an async jobid which can be further used to do operations on the workflow after the task is completed.Asyc_status is a module in Ansible that is used to get the status of the task run using async.

- hosts: 65.0.81.41
tasks:

- name: "This is sleep Task 1"
command: sleep 10
async: 30
poll: 0
register: x

- async_status:
jid: "{{x.ansible_job_id}}"
register: status
until: status.finished
retries: 20
- debug:
var: status
~

This playbook registers the output in the x variable which is then used to retrieve the value of the ansible_job_id and stored in the jid variable. This job-id can then be used to track the task. Ansible checks the task id and records its status and stored it in the variable status which has a key called finished which outputs two values 0 and 1.

Then we used until to keep on checking the particular jobid until the finished status is 1(i.e. completed), and as the retries are set to 20, pooling is tried 20 times till the finished status id becomes 0(i.e. not completed) if still in 20 tries it’s not 0 then stop the polling (It will then give an error).

async_output for two slave

Here it will keep on pooling the status of the task for the limit given.

If the job is finished less than the tries, then it will show the status of the task:

But if the task completion time is more than that of the retries number then it will throw an error:

So it's important for us to know which kind of workload we are running and architect the playbook in that way only. If we have a backup task and we know it will take a fixed amount of time then to check its status set the retries time always more than the completion time.

Batch Processing using Ansible:

Ansible by default setting is parallel i.e. it depends on the version of Ansible the value for the parallelism. In Ansible this is defined using the fork keyword in the ansible.conf file. We can alter the value of Fork by overriding the value in the configuration file.

But sometimes we have the use case of running the task in batches eg: in rolling updates, then we can use the serial keyword in the playbook to run the task in batches eg: first, run all tasks in 2 managers then in 3 managers or first update 20% slaves and then 80% slaves. Here I don’t have this much amount of manager but in large infrastructures serial in playbook can be used for batch processing.

- hosts: instance
serial:
- 2
- 3
tasks:

- name: "This is sleep Task 1"
command: sleep 10
async: 30
poll: 0
register: x


- name: "This is sleep Task 2"
command: sleep 10
async: 30
poll: 0

output:

The output is for when the serial is set to 1. In this case, all the tasks will first run in the first IP and then the second.

Conclusion

From all the points we discussed, one of the most important takeaways is this: Async and Parallelism in Ansible is not a simple feature, it takes your IT automation game to a new level where you can get things done by minimizing the time required in processing. Before implementing it in your workflow it's important to know what your requirements are and what values for forks, async, poll, and retries with other keywords are required to be set. You can even automate your required infrastructure for slaves using Terraform.

Thank you for Reading!

--

--

Priyanshu Bhatt

AWS Solutions Architect || Terraform Certified Associate || DevOps Engineer || I Share Crisp Tech Stories