Semantics and provenance of configuration programming language μPuppet
Nowadays computing infrastructures have grown bigger in scale and more complex. Automated configuration management tools have taken the place of traditional approaches of configuration tasks, such as manual configuration and writing simple scripts, which become tedious, inefficient and error-prone to human mistakes. These tools typically contain their own specific configuration languages. In general, users write configuration specifications of the system and the tools compile them to configuration files that are deployed on each machine to change its configuration. Configuration specifications tend to be large and involve thousands of parameters and the relations between them. However automating configuration management does not prevent configuration errors that might cause system failures that are costly and time-consuming to remedy. Configuration languages normally support separation of concerns of different users, but there is limited work on access control and work flow to guarantee the correctness of the configurations when there are changes in the specifications, or the analysis tools for root causes for configuration errors. Configuration errors remains one of the dominant reasons for system failures (Oppenheimer et al., 2003). In this thesis, we argue that provenance techniques, a dynamic technique of tracking data history developed in the database field, is a natural solution to providing configuration management tools with the ability of analysing configuration errors and identifying their root causes in the configuration specifications. For this purpose, we first choose a popular configuration management tool Puppet and present μPuppet which models the operational semantics of a core subset of Puppet language. In addition, we prototype an interpreter and a parser for μPuppet and compare their validation results by using more 50 Puppet manifests against the real Puppet language. Based on the formal semantics of μPuppet, we formalise where-, expression- and dependency-provenance which track the original inputs of an output data value, the primitive operation procedure that derives an output data value and all the inputs on which a generated output depended respectively in the process of compiling the configuration specifications. Further more, we establish the static correctness properties of three forms of provenance. We prove that where- and expression-provenance satisfy these properties. We find the limitation of dependency-provenance we define that does not record the full dependency information for some unusual value-overriding semantics, which reveals the complexity of the Puppet language. We prove the partial correctness of dependency-provenance confined to a subset of μPuppet that excludes the unusual cases and propose possible solutions to deal with these unusual cases.