What is Zuul?
- Multi-cloud, scalable, elastic CI/CD engine
- Validation of speculative future states
- Test things like you deploy them
- Single-use VM build nodes - safely run tests that need root
- Fully support Bare Metal, VMs and Containers
- Multi-node builds
- Multi-repo projects
- Native support for gating configuration
Terminology
- Periodic: jobs run in response to a timer
- Post: jobs run after a change
- Check: job run when someone proposes a change
- Gate: jobs run between change approval and landing
Why - the original OpenStack use case
- Fully automated gated commits
- Full end-to-end integration tests from scratch for every commit
- Massive scale
OpenStack Scale by the numbers
- 1 KJPH (kilo-jobs / hour)
- 2500 arbitrary developers
- 11727 Jobs
- Merge 10k Changes / 42 days
ansible has _received_ 13171 PRs (changes), has
merged 8190 of them and has 37788 commits in its entire lifetime
Multi Repository Speculative Execution
- Zuul constructs speculative states as-if a change were merged
- Tests future states without landing those changes first
- The as-if spans multiple repos
- In the Gate pipeline, speculative changes are put into a
virtual serial queue, then tested in parallel as-if each change
combination in front of them had landed
Not Specific to OpenStack
- "Gate" and "Check" are merely configurations
- 50+ OpenStack Vendors use Zuul for "3rd Party CI" of drivers
- HP uses zuul for both OpenStack and non-OpenStack products
- Wikimedia uses zuul
Pluggable
- Triggers
- Reporters
- Node Providers
- Execution content (ansible)
Zuul v2
- In production for OpenStack for 4 years
- What most people run
- Triggers: Gerrit, Periodic
- Reporters: Gerrit, Email, MySQL
- Node Providers: OpenStack, Long-lived non-managed servers
- Jobs executed by Jenkins
Not for lack of trying
- OpenStack started on Jenkins (actually, on Hudson, remember that?)
- We funded the Jenkins Jclouds Plugin
- We did deep dev in the Gerrit Trigger Plugin
- Maintain the SCP artifact plugin (added console log support)
- Added 0mq notification plugin
- Added Gearman Worker plugin - allowed us to grow to 8 Masters/1000 slaves
- We wrote Jenkins Job Builder
Jenkins Problems
Security
- don't run WebUI on the internet
- ssh slave plugin - it's possible for a slave to run arbitrary
code on the master
Stability
- almost every Jenkins upgrade has broken us
Scalability
- Jenkins has global mutexes, especially in plugins
- Extra large cloud server could handle ~100 concurrent jobs
- We ran 8 Jenkins Masters with slaves sharded across them
Overkill
- we only used it as a remote shell execution engine
We know a better engine for remote execution
Zuul V3
- Intended for broad use
- triggers: gerrit, periodic, github
(? bitbucket, stash, fedmsg, email)
- reporters: gerrit, email, github (? bitbucket, stash, resultsdb)
- node providers: pre-existing servers, dynamic cloud slaves (OpenStack, AWS, GCE), k8s clusters
- jobs written in and executed with ansible
- in-repo config
- Multi-node build clusters as first class resource
- Multi-Tenant
Focus
So far
- OpenStack, and the hard problems that brings
- Extra-hard is handled. So is simple - but zuul is complex to run
if you only have the simple use cases
Zuul v3
- Get it ready for Ansible project
- Making it truly suitable for not-OpenStack Infra to run
- Making the easy tasks simple
- Making zuul the thing everyone WANTS to use
For More Information
- http://docs.openstack.org/infra/zuul/
- http://specs.openstack.org/openstack-infra/infra-specs/specs/zuulv3.html
- freenode:#zuul
- https://post-office.corp.redhat.com/mailman/listinfo/zuul-discuss
- http://docs.openstack.org/infra/publications/zuul/#(1)