Testing Idempotence for Infrastructure as Code


Waldemar Hummer1, Florian Rosenberg2, Fábio Oliveira2, Tamar Eilam2


1 Distributed Systems Group, Vienna University of Technology, Austria
2 IBM T.J. Watson Research Center, Yorktown Heights, NY, USA

Middleware Conference 2013, Dec. 9-13, Beijing, China


  • Introduction
    • Background: DevOps, Infrastructure as Code, Chef
    • Motivation
  • Testing Approach
    • State Transition Graphs
    • Test Coverage Goals
  • Implementation
  • Evaluation
    • Experiments with ~ 300 public Chef scripts
  • Conclusion


Problem Context

  • Cloud Computing
    • New perspective on deployment and configuration of infrastructure
      • Previously: infrequent, tailor-made, mostly manual task
      • Today: very frequent, highly automated, repeatable deployments
    • RightScale Report [1]: 54% of surveyed professionals (338/625) adopt DevOps
  • DevOps - Unification of Conflicting Interests (Developers/Operators)
    • "release new features quickly" - vs - "keep production systems stable"
    • DevOps - trend towards continuous delivery
  • Infrastructure as Code (IaC)
    • Apply software engineering practices to create repeatable automations
    • Examples: Chef [2], Puppet [3], CFEngine, ...
  1. RightScale. State of the Cloud Report 2013, 2013. http://www.rightscale.com/lp/state-of-the-cloud-report.php
  2. OpsCode. Chef. http://www.opscode.com/chef/
  3. Puppet Labs. IT Automation Software for System Administrators. http://puppetlabs.com/

Background: Chef

  • Chef Cookbook = Collection of Recipes
    • Recipe defines a set of resources (e.g., files/directories, packages, services, ...)
Declarative Recipe:
directory "/webapps" do
   mode 0755
   action :create

package "tomcat6" do
   action :install

service "tomcat6" do
   action [ :start, :enable ]
Imperative Recipe:
bash "build php" do
  cwd /tmp
  code <<-EOF

    tar -zxf php.tar.gz
    cd php
    make && make install

  not_if "which php"

  • Key Idea: Recipes Should be Repeatable and Idempotent
    • Only the first (successful) execution should have an effect
    • Mathematically: t(x) = t(t( ... t(x)))


  • Automations Should be Robust and Repeatable
    • Should make the system converge [1] to a target state
    • Idempotence [2]: central concept in the execution model of IaC
  • Why is Idempotence Crucial?!
    • IaC automations are not transactionally safe
      • If a task is interrupted or fails → restart the entire automation
    • IaC automations are designed to be continuously re-executed
      • overwrite out-of-band changes
  • Simple Example of Non-Idempotence
    • Command "mkdir /webapps"
    • (Idempotent version: "mkdir -p /webapps")
  1. A. Couch and Y. Sun. On the algebraic structure of convergence. 14th International Workshop on Distributed Systems: Operations and Management (DSOM), 2003
  2. Pat Helland, Idempotence is not a medical condition. Communications of the ACM 55(5). pp 56-65, 2012

Some Related Work

  • Fundamentals of Idempotence
    • Idempotence as the key prerequisite for convergence (Couch [1])
    • Idempotence for correctness and fault-tolerance (Ramalingam [2])
  • System Administration and Testing
    • Turing equivalence: how to reliably configure identical machines (Traugott [3])
    • Testing challenges under non-determinism, "non-ideal" conditions (Burgess [4])
  • Efficient and Reliable Middleware Provisioning
    • Efficient placement of Cloud infrastructure components (Giurgiu [5])
    • Stress testing of multi-tier middleware systems (Casale [6])
  1. A. Couch, Y. Sun. On the algebraic structure of convergence. 14th DSOM Workshop, 2003
  2. G. Ramalingam, K. Vaswani. Fault Tolerance via Idempotence. Princ. of Prog. Languages (POPL), 2013
  3. S. Traugott. Why order matters: Turing equivalence in automated systems administration. 16th LISA, 2002
  4. M. Burgess. Testable system administration. Communications of the ACM 54(3), 2011
  5. I. Giurgiu et al. Enabling efficient placement of virtual infrastructures in the cloud. 13th Middleware, 2012
  6. G. Casale et al. Automatic stress testing of multi-tier systems by dynamic bottleneck switch generation. 10th Middleware Conference, 2009

Testing Approach

IaC Execution Model

  • Model the Execution as a State Transition Graph (STG)
    • Directed Graph
      • Nodes = system states
      • Arrows = task executions (state transitions)
    • Pre-state: state before execution
    • Post-state: state after execution
  • Example: 3 Tasks, 1 Branch

Notion of Idempotence in the Context of IaC

  • Example Task Sequence
  • The Sequence is Idempotent Iff:
    • EITHER: no state changes on second execution, i.e., sc2 = ∅
    • OR: post2 is non-conflicting with post1, i.e., one state eventually leads to the other
      • example: post1 = "server starting", post2 = "server started"
  • Detailed Definition: See Paper

Testing of Automation Scripts (1)

  • Note: Differences to Traditional Software Testing
    • Input Parameters: not as important to test!
      • Input parameters: e.g., DB password, target install directory, etc.
      • More important:
        • order and repeated execution of tasks
        • environment (OS, hardware, peripherals, ...)
    • Symbolic Execution: hardly applicable!
      • Hard to capture all state changes and side effects
      • Hence: We target real execution in target environments

Testing of Automation Scripts (2)

  • Goal: Modify and "Perturb" the STG
    • exhibit non-idempotent behavior
  • Simulating Real-World Fault Situations, e.g.:
    • software package download fails (temporarily)
    • machine crashes during automation run
    • power outage
  • Example: Package Download Error

STG-Based Test Generation (1)

  • Configurable Coverage Criteria
    • idemN ∈ {1, 2, ..., |A|}
      • maximum task sequence length to test for idempotence
    • repeatN ∈ {0, 1, 2, ...}
      • maximum number of times each task is repeated
    • restart ∈ {true, false}
      • whether or not always to restart the automation from the beginning
    • forcePre ∈ {true, false}
      • whether all possible pre-states should be covered for each task (if possible)
    • graph ∈ {transition, predicate, transition-pair, full sequence} [1]
      • which paths to follow in the final STG after applying previous criteria
  1. J. Offutt, S. Liu, A. Abdurazik, P. Ammann. Generating test data from state-based specifications, Software Testing, Verification and Reliability (STVR), 13(1), pp 25–53, 2003

STG-Based Test Generation (2)

  • Construct the Modified STG
  • Derive Test Cases
    • Traverse the STG based on graph coverage criteria [1]
  1. J. Offutt, S. Liu, A. Abdurazik, P. Ammann. Generating test data from state-based specifications, Software Testing, Verification and Reliability (STVR), 13(1), pp 25–53, 2003

Test Execution and Analysis

  • Execute Test Cases
    • Execute each test case in a clean, isolated environment
    • Implementation details on next slides ...
  • Capture State Changes
    • Intercept the task execution, measure pre-state and post-state
  • Result Analysis
    • Intra-Test-Case:
      • Compare repeated executions of tasks and task sequences
      • Identify cases of non-idempotence
    • Inter-Test-Case:
      • Compare the post-states across all tests in the test suite
      • Determine convergent state properties


Implementation: ToASTER  Framework

Test Container 'proto' C-O-WFile-system LXC Test Container 'tc1' Test Container 'tc2' . . . Test Container 'tcN' Testing Host SoftwareRepositories download Squid HTTP Proxy AutomationParameters AutomationScripts invoke generate initialize TestAgent forward state/results execute &intercept TestManager Database save test data UserInterface loaddata starttests TestQueue
  • Execute Test Cases in Virtual Machine Containers (LXC)
    • Capture state changes within the containers

Implementation: Capturing State Changes (1)

  • How to capture state changes in a Chef run?
Declarative Tasks:
directory "/webapps" do
   mode 0755
   action :create
  • Check if directory "/webapps" exists
    • before and after task execution
  • Straight-forward!
Imperative Tasks:
bash "build php" do
  cwd /tmp
  code <<-EOF
    tar -zxf php.tar.gz
    cd php
    make && make install
  not_if "which php"
  • Complex!
  • No information about potential state changes
    • external processes are invoked
    • 100's of files get created

Implementation: Capturing State Changes (2)

  • Use a patched version of strace [1]
  • Intercept all system calls, capture pre-state and post-state

ToASTER: Implementation and Usage

  • Implemented as a Ruby GEM
  • Based on docker.io (creation of LXC containers) and various other tools
  • Main Commands
    • toaster setup
      • setup testing host
    • toaster proto ubuntu1 ubuntu
      • initialize prototype container
    • toaster web -d
      • initialize prototype container
    • toaster spawn lxc1 ubuntu1
      • create test container

ToASTER: Implementation and Usage (2)

  • Main Commands (cont'd)
    • toaster download COOKBOOK PROTOTYPE
      • Download Chef COOKBOOK from opscode.com into PROTOTYPE.
    • toaster runchef NAME IP CHEF_RUNLIST CHEF_JSON
      • Run Chef recipe in container with NAME and IP.

    • toaster testinit CHEF_NODE RECIPES PROTOTYPE
      • Initialize a test suite for the given CHEF_NODE and RECIPES

  • All commands also available via the Web UI!



  • Chef Scripts Available from OpsCode Community


  • Based on Real-World Chef Scripts Available From OpsCode
    • Tested roughly 300 automations ("cookbooks")
      • apache2, tomcat6, mysql, rabbitmq, php, drupal, git, java, hadoop, zabbix, ...
    • Goal: Identify faulty and in particular non-idempotent automations
  • Aggregated Test Statistics

    Tested Cookbooks298
    Number of Test Cases3.671
    Number of Tasks (min/max/total)1 / 103 / 4.112
    Total Task Executions187.986
    Captured State Changes164.117
    Overall Net Execution Time25.7 CPU-days
    Overall Gross Execution Time44.07 CPU-days
    Total Non-Idempotent Tasks263 (6.4%)
    Cookbooks With Non-Idempotent Tasks92 (30.9%)

An Interesting Case: Chef Cookbook tomcat6

  • List of Tasks (Excerpt):
    Second execution of a21 fails Implicit dependency introduced, which “rescues” some test cases.

Evaluation: Result Details (1)

  • Idempotence for Different Task Types
    Imperative script tasks: > ⅓ of total non-idempotent tasks Including “service” tasks: 60 % of total non-idempotent tasks
  • Main Insight: Imperative script task types seem most problematic
    • coarse-grained and error-prone task logic
    • missing or inappropriate use of conditional statements (not_if, only_if)
    • service tasks often use custom script code for starting/restarting/enabling services

Evaluation: Result Details (2)

  • Idempotence for Different Cookbook Versions
    • i = latest version of cookbook
    • Looking (up to) 10 versions into the past
  • Positive Results for Evolution of Cookbooks
    • Some issues were fixed
    • No new issues have been introduced

Evaluation: Identified Bug in Chef

  • file resource has idempotence issue for certain attribute combinations
    • First execution produces incorrect result, second execution fails



  • Idempotence: Key Principle for IaC Automations
  • Testing Approach based on STGs
    • STGs are modified and perturbed to simulate real-world fault situations
  • Prototype Implementation "ToASTER"
    • Efficient test execution (using LXC containers), see demo @ Middleware'13
  • Comprehensive Evaluation based on Chef
    • Tested ~300 Chef cookbooks in different versions
    • > 30% of cookbooks contain non-idempotent tasks
    • Identified a bug in the Chef implementation itself
  • Future Work
    • Distributed automations with cross-node dependencies
    • Apply approach to other frameworks, e.g., Puppet
    • Integrate parameter-based testing, extend systematic debugging/analysis