Who am I?
Office of Technology
Zuul
Ansible
Who am I?
Technical Committee
Developer Infrastructure Core Team
Former Foundation Board of Directors
PTL of shade project
What are we going to talk about?
What is Zuul?
- Multi-cloud, scalable, elastic CI/CD engine
- Validation of speculative future states
- Single-use VM build nodes - safely run tests that need root
- Multi-node builds
- Multi-repo projects
- Native support for gating configuration
Terminology
- Periodic: jobs run in response to a timer
- Post: jobs run after a change
- Check: job run when someone proposes a change
- Gate: jobs run between change approval and landing
Why?
The original OpenStack use case
- Fully automated gated commits
- Full end-to-end integration tests from scratch for every commit
- Massive scale
OpenStack Scale by the numbers
- 2 KJPH (kilo-jobs / hour)
- 2376 arbitrary developers
- 1474 git repositories
- 11727 Jobs
- Merge 10k Changes / Month
ansible has received 13171 PRs (changes),
has merged 8190 of them and has 37788 commits in its entire lifetime
Multi Repository Speculative Execution
- Zuul constructs speculative states as-if a change were merged
- Tests future states without landing those changes first
- The as-if spans multiple repos
- In the Gate pipeline, speculative changes are put into a
virtual serial queue, then tested in parallel as-if each change
combination in front of them had landed
Multi-Repo Dependencies
commit 30039f04109efa2263aba6eb302a29bd8d5e8f53
Author: Monty Taylor
Date: Mon Sep 26 14:14:15 2016 -0500
Add simple field for disabled flavors
When we were equalizing out the old silly names from the new pretty
ones, we missed disabled.
Change-Id: I4cbf5f7c27f640c566460c18951ab9030aae84e4
Depends-On: I523e0ab6e376f5ff6205b1cc1748aa6d546919cb
Multi-Repo Dependencies - an example
- shade library runs functional tests against openstack
- neutron change breaks shade tests (more later)
- neutron fix is proposed
- shade change Depends-On proposed neutron change
- shade tests are run as-if neutron change has landed
- shade change cannot land until neutron change lands
Isn't this supposed to prevent neutron from breaking shade?
- shade and neutron do not share a gating relationship
- shade has, by choice, a test that tests against master of OpenStack
- Such a test is 'risky' to shade devs
- shade devs desire the risk - can work around breaks, or submit bugs
Not Specific to OpenStack
- "Gate" and "Check" are configurations
- 50+ OpenStack Vendors use Zuul for "3rd Party CI" of drivers
- BMW and Wikimedia use Zuul
Pluggable
- Triggers
- Reporters
- Node Providers
- Execution content
Zuul v2
- In production for OpenStack for 4 years
- What most people run
- Triggers: Gerrit, Periodic
- Reporters: Gerrit, Email, MySQL
- Node Providers: Elastic OpenStack Nodepool, Static servers
- Jobs executed by Jenkins
Not for lack of trying
- OpenStack started on Jenkins (actually, on Hudson, remember that?)
- We funded the Jenkins JClouds Plugin
- We did deep dev in the Gerrit Trigger Plugin
- Maintain the SCP artifact plugin (added console log support)
- Added 0mq notification plugin
- Added Gearman Worker plugin - allowed us to grow to 8 Masters/1000 slaves
- We wrote Jenkins Job Builder
Before I answer that ...
The world is a better place with Jenkins existing
Jenkins Problems
Security
- don't run WebUI on the internet
- ssh slave plugin - it's possible for a slave to run arbitrary
code on the master
Stability
- almost every Jenkins upgrade has broken us
Scalability
- Jenkins has global mutexes, especially in plugins
- Extra large cloud server could handle ~100 concurrent jobs
- We ran 8 Jenkins Masters with slaves sharded across them
Overkill
- we only used it as a remote shell execution engine
We know a better engine for remote execution
Zuul v3
Test things like you deploy them
- Intended for broad use
- Jobs written in and executed with ansible
- GitHub Support
- Multi-node build clusters as first class resource
- Self-testing tests
- Multi-Tenant
- In-repo config
- Fully support Bare Metal, VMs and Containers
- AWS, GCE, Mac Stadium and Static Node Provider Support
- Docker and Kubernetes Build Resources
Ansible execution
Merge the mergers and the launchers
- Constructs relevant repo states and inventory
- Push repos to test nodes
- Execute playbook with inventory
- Playbook can be defined centrally or in each repo
In repo config
- Minimal config at start - other than "where are my repos"
- 'trusted' config repos
- config per-repo
- At start, zuul asks the fleet of merger-launchers to calculate config
Test tests before you land them
- Config is in git. Zuul does multo-repo git depends.
- v3 adds with in-tree test definitions
- Speculative state includes speculative state of test jobs
First class multi-node support
- job:
name: ursula
nodes:
- name: controller
image: fedora-26
- name: db
image: rhel-7
Active node requests
- zuul requests named resources
- resources are created/checked out of pool
- at finish, checked back in
- Single-use environments delete on check in
- Multi-use environments become available for subsequent jobs
- Request non-node resources, such as kubernetes cluster
Status
Focus
- OpenStack next week, and the hard problems that brings
- Extra-hard is handled. So is simple - but zuul is complex to run
if you only have the simple use cases
Next up
- Get it ready for Ansible project
- Making it truly suitable for not-OpenStack Infra to run
- Making the easy tasks simple