. display in 68x24 .. display in 88x24 .. pygments yaml? (only file breaks (---) tinted) .. slide on high level v3 changes .. slide on nodepool .. transition:: dissolve :duration: 0.4 Test Slide ========== .. hidetitle:: .. ansi:: images/testslide.ans Preshow ======= .. hidetitle:: .. ansi:: images/cursor.ans images/cursor2.ans Zuul ==== .. hidetitle:: .. ansi:: images/title.ans This Talk ========= * In git: http://git.inaugust.com/cgit/inaugust.com/ .. code:: bash git clone http://git.inaugust.com/cgit/inaugust.com cd src/zuulv3 * Then: .. code:: bash cat overview.rst * Or: .. code:: bash pip install presentty presentty overview.rst Red Hat ======= .. hidetitle:: .. container:: handout i work for .. ansi:: images/redhat.ans Ansible ======= .. hidetitle:: .. ansi:: images/ansible.ans OpenDev ======= :: "most insane CI infrastructure I've ever been a part of" -- Alex Gaynor "like the SpaceX of CI" -- Emily Dunham Zuul ==== .. hidetitle:: .. ansi:: images/zuul.ans What Zuul Does ============== * "Speculative Future State" * gated changes * one or more git repositories * integrated deliverable * testing like deployment Underlying Philosophy ===================== * All changes flow through code review * Changes only land if they pass all tests * End-to-end integration testing is essential * Computers are cheaper than humans Ramifications of Philosophy =========================== * No direct push access for anyone * Software should be installable from source * Testing should be automated and repeatable * Developers write tests with their patches * Code always works Getting to Gating ================= No Tests / Manual Tests ======================= * No test automation exists or ... * Developer runs test suite before pushing code * Prone to developer skipping tests for "trivial" changes * Doesn't scale organizationally Periodic Testing ================ * Developers push changes directly to shared branch * CI system runs tests from time to time - report if things still work * "Who broke the build?" * Leads to hacks like NVIE model Post-Merge Testing ================== * Developers push changes directly to shared branch * CI system is triggered by push - reports if push broke something * Frequently batched / rolled up * Easier to diagnose which change broke things * Reactive - the bad changes are already in Pre-Review Testing ================== * Changes are pushed to code review (Gerrit Change, GitHub PR, etc) * CI system is triggered by code review change creation * Test results inform review decisions * Proactive - testing code before it lands * Reviewers can get bored waiting for tests * Only tests code as written, not potential result of merging code Gating ====== * Changes are pushed to code review * Gating system is triggered by code review approval * Gating system merges code IFF tests pass * Proactive - testing code before it lands * Future state resulting from merge of code is tested * Reviewers can fire-and-forget safely Mix and Match ============= * Zuul supports all of those modes * Zuul users frequently combine them * Run pre-review (check) and gating (gate) on each change * Post-merge/post-tag for release/publication automation * Periodic for catching bitrot Multi-repository integration ============================ * Multiple source repositories are needed for deliverable * Future state to be tested is the future state of all involved repos To test proposed future state ============================= * Get tip of each project. Merge appropriate change(s). Test. * Changes must be serialized, otherwise state under test is invalid. * Integrated deliverable repos share serialized queue Speculative Execution ===================== * Correct parallel processing of serialized future states * Create virtual serial queue of changes for each deliverable * Assume each change will pass its tests * Test successive changes with previous changes applied to starting state Nearest Non-Failing Change ========================== (aka 'The Jim Blair Algorithm') * If a change fails, move it aside * Cancel all test jobs behind it in the queue * Reparent queue items on the nearest non-failing change * Restart tests with new state Zuul Simulation =============== .. transition:: pan .. container:: handout * todo .. ansi:: images/zsim-00.ans Zuul Simulation =============== .. transition:: cut .. container:: handout * todo .. ansi:: images/zsim-01.ans Zuul Simulation =============== .. transition:: cut .. container:: handout * todo .. ansi:: images/zsim-02.ans Zuul Simulation =============== .. transition:: cut .. container:: handout * todo .. ansi:: images/zsim-03.ans Zuul Simulation =============== .. transition:: cut .. container:: handout * todo .. ansi:: images/zsim-04.ans Zuul Simulation =============== .. transition:: cut .. container:: handout * todo .. ansi:: images/zsim-05.ans Zuul Simulation =============== .. transition:: cut .. container:: handout * todo .. ansi:: images/zsim-06.ans Zuul Simulation =============== .. transition:: cut .. container:: handout * todo .. ansi:: images/zsim-07.ans Zuul Simulation =============== .. transition:: cut .. container:: handout * todo .. ansi:: images/zsim-08.ans Zuul Simulation =============== .. transition:: cut .. container:: handout * todo .. ansi:: images/zsim-09.ans Zuul Simulation =============== .. transition:: cut .. container:: handout * todo .. ansi:: images/zsim-10.ans Zuul Simulation =============== .. transition:: cut .. container:: handout * todo .. ansi:: images/zsim-11.ans Zuul Simulation =============== .. transition:: cut .. container:: handout * todo .. ansi:: images/zsim-12.ans Zuul Simulation =============== .. transition:: cut .. container:: handout * todo .. ansi:: images/zsim-13.ans Zuul Simulation =============== .. transition:: cut .. container:: handout * todo .. ansi:: images/zsim-14.ans Zuul Simulation =============== .. transition:: cut .. container:: handout * todo .. ansi:: images/zsim-15.ans Zuul Simulation =============== .. transition:: cut .. container:: handout * todo .. ansi:: images/zsim-16.ans Zuul Simulation =============== .. transition:: cut .. container:: handout * todo .. ansi:: images/zsim-17.ans Zuul Simulation =============== .. transition:: cut .. container:: handout * todo .. ansi:: images/zsim-18.ans Zuul Simulation =============== .. transition:: cut .. container:: handout * todo .. ansi:: images/zsim-19.ans Zuul Simulation =============== .. transition:: cut .. container:: handout * todo .. ansi:: images/zsim-20.ans Zuul Simulation =============== .. transition:: cut .. container:: handout * todo .. ansi:: images/zsim-21.ans Zuul Simulation =============== .. transition:: cut .. container:: handout * todo .. ansi:: images/zsim-22.ans Lock Step Changes ================= * Circular Dependencies are not supported on purpose * Rolling upgrades across interdependent services * HOWEVER - many valid use cases (go/rust/c++) - support will be coming Live Configuration Changes ========================== .. container:: handout Zuul is a distributed system, with a distributed configuration. .. code:: yaml - tenant: name: openstack source: gerrit: config-repos: - opendev/project-config project-repos: - zuul/zuul-jobs - zuul/zuul - zuul/nodepool - ansible/ansible - openstack/openstacksdk Zuul Startup ============ * Read config file Zuul Startup ============ * Read config file * Ask mergers for branches of each repo .. ansi:: images/startup1.ans Zuul Startup ============ * Read config file * Ask mergers for branches of each repo * Ask mergers for .zuul.yaml for each branch of each repo .. ansi:: images/startup2.ans When .zuul.yaml Changes ======================= .. container:: progressive * Zuul looks for changes to .zuul.yaml * Asks mergers for updated content * Splices into configuration used for that change * Works with cross-repo dependencies ("This change depends on a change to the job definition") Explicit Cross-Project Dependencies =================================== * Developers can mark changes as being dependent * Depends-On: footer - in commit or PR * Zuul uses depends-on when constructing virtual serial queue * Will not merge changes in gate before depends-on changes * Works cross-repo AND cross-source Depends-On Example ================== * Service 'nova' talks to service 'ironic' * Currently using 'python-ironicclient' * Want to replace python-ironicclient with openstacksdk: * https://review.openstack.org/643664 * Need some plumbing in nova first: * https://review.openstack.org/642899 * That change "Depends-On" a change to openstacksdk Depends-On Example - openstacksdk ================================= * In openstacksdk, need a new method to extract config differently * https://review.openstack.org/643601 * The nova plumbing change adds this: :: Depends-On: https://review.openstack.org/643601 Depends-On Example - keystoneauth ================================= * openstacksdk uses 'keystoneauth' library to make REST calls * Config extraction change wants a new helper method in keystoneauth * https://review.openstack.org/644251 * openstacksdk change adds: :: Depends-On: https://review.openstack.org/644251 Depends-On Example - In the Gate ================================ * When Zuul prepares git repos for the Ironic nova change: * Tip of nova, plus nova plumbing change, plus nova ironic change * Tip of openstacksdk, plus config method change * Tip of keystoneauth, plus helper method change * Developers iterate on the nova service change * BEFORE finalizing and releasing keystoneauth and openstacksdk changes Zuul Architecture ================= We used to call "microservices" "distributed" * Zuul is comprised of several services (mostly python3) * zuul-scheduler * zuul-executor * zuul-merger * zuul-web * zuul-dashboard (javascript/react) * zuul-fingergw * zuul-proxy (c++) * RDBMS * Gearman * Zookeeper * Nodepool Zuul Architecture ================= .. ansi:: images/architecture.ans Where Does Job Content Run? =========================== Nodepool ======== * A separate program that works very closely with *Zuul* * *Zuul* requires *Nodepool* but *Nodepool* can be used independently * Creates and destroys zero or more node resources * Resources can include VMs, Containers, COE contexts or Bare Metals * Static driver for allocating pre-existing nodes to jobs * Optionally periodically builds images and uploads to clouds Nodepool Launcher ================= Where build nodes should come from * OpenStack * Static * Kubernetes Pod * Kubernetes Namespace * AWS In work / coming soon: * Azure * GCE What about job content? ======================= * Written in Ansible * Ansible is excellent at running one or more tasks in one or more places * The answer to "how do I" is almost always "Ansible" What Zuul Does ============== * Listens for code events * Prepares appropriate job config and git repo states * Requests nodes for test jobs from *Nodepool* * Runs user-defined Ansible playbooks with nodes in an inventory * Collects/reports results * Potentially merges change Jobs ==== * Jobs define test node needs * Metadata defined in Zuul's configuration * Execution content in Ansible * Jobs may be defined centrally or in the repo being tested * Jobs have contextual variants that simplify configuration Job === .. code:: yaml - job: name: base parent: null description: | The base job for Zuul. timeout: 1800 nodeset: nodes: - name: primary label: ubuntu-bionic pre-run: playbooks/base/pre.yaml post-run: - playbooks/base/post-ssh.yaml - playbooks/base/post-logs.yaml secrets: - site_logs Simple Job ========== .. code:: yaml - job: name: tox pre-run: playbooks/setup-tox.yaml run: playbooks/tox.yaml post-run: playbooks/fetch-tox-output.yaml Simple Job Inheritance ====================== .. code:: yaml - job: name: tox-py36 parent: tox vars: tox_envlist: py36 Inheritance Works Like An Onion =============================== * pre-run playbooks run in order of inheritance * run playbook of job runs * post-run playbooks run in reverse order of inheritance * If pre-run playbooks fail, job is re-tried * All post-run playbooks run - as far as pre-run playbooks got Inheritance Example =================== For tox-py36 job * base pre-run playbooks/base/pre.yaml * tox pre-run playbooks/setup-tox.yaml * tox run playbooks/tox.yaml * tox post-run playbooks/fetch-tox-output.yaml * base post-run playbooks/base/post-ssh.yaml * base post-run playbooks/base/post-logs.yaml Simple Job Variant ================== .. code:: yaml - job: name: tox-py27 branches: stable/mitaka nodeset: - name: ubuntu-trusty label: ubuntu-trusty Nodesets for Multi-node Jobs ============================ .. code:: yaml - nodeset: name: ceph-cluster nodes: - name: controller label: centos-7 - name: compute1 label: fedora-28 - name: compute2 label: fedora-28 groups: - name: ceph-osd nodes: - controller - name: ceph-monitor nodes: - controller - compute1 - compute2 Multi-node Job ============== * nodesets are provided to Ansible for jobs in inventory .. code:: yaml - job: name: ceph-multinode nodeset: ceph-cluster run: playbooks/install-ceph.yaml * Creates ansible inventory: :: controller ansible_host=1.2.3.4 compute1 ansible_host=1.2.3.5 compute2 ansible_host=1.2.3.6 [ceph-osd] controller [ceph-monitor] controller compute1 compute2 Multi-node Ceph Job Content =========================== .. code:: yaml - hosts: all roles: - install-ceph - hosts: ceph-osd roles: - start-ceph-osd - hosts: ceph-monitor roles: - start-ceph-monitor - hosts: all roles: - do-something-interesting Project With Central and Local Config ===================================== .. code:: yaml # In opendev.org/openstack-infra/project-config: - project: name: openstack/nova templates: - openstack-tox-jobs .. code:: yaml # In opendev.org/openstack/nova/.zuul.yaml: - project: check: - nova-placement-functional-devstack zuul-jobs standard library ========================== * https://opendev.org/openstack-infra/zuul-jobs * Repo containing general purpose job definitions * Add the git repo directly to a local Zuul config Project with Job Dependencies ============================= .. code:: yaml - project: release: jobs: - build-artifacts - upload-tarball: dependencies: build-artifacts - upload-pypi: dependencies: build-artifacts - notify-mirror: dependencies: - upload-tarball - upload-pypi Secrets ======= * Inspired by Kubernetes Secrets API * Projects can add named encrypted secrets to their .zuul.yaml file * Jobs can request to use secrets by name * Jobs using secrets are not reconfigured speculatively * Secrets can only be used by the same project they are defined in * Public key per project: ``{{ zuul_url }}/{{ tenant }}/{{ project }}.pub`` :: GET https://zuul.openstack.org/openstack-infra/shade.pub Secret Example (note, no admins had to enable this) =================================================== .. code:: yaml # In opendev.org/openstack/loci/.zuul.yaml: - secret: name: loci_docker_login data: user: loci-username password: !encrypted/pkcs1-oaep - gUEX4eY3JAk/Xt7Evmf/hF7xr6HpNRXTibZjrKTbmI4QYHlzEBrBbHey27Pt/eYvKKeKw hk8MDQ4rNX7ZK1v+CKTilUfOf4AkKYbe6JFDd4z+zIZ2PAA7ZedO5FY/OnqrG7nhLvQHE 5nQrYwmxRp4O8eU5qG1dSrM9X+bzri8UnsI7URjqmEsIvlUqtybQKB9qQXT4d6mOeaKGE 5h6Ydkb9Zdi4Qh+GpCGDYwHZKu1mBgVK5M1G6NFMy1DYz+4NJNkTRe9J+0TmWhQ/KZSqo 4ck0x7Tb0Nr7hQzV8SxlwkaCTLDzvbiqmsJPLmzXY2jry6QsaRCpthS01vnj47itoZ/7p taH9CoJ0Gl7AkaxsrDSVjWSjatTQpsy1ub2fuzWHH4ASJFCiu83Lb2xwYts++r8ZSn+mA hbEs0GzPI6dIWg0u7aUsRWMOB4A+6t2IOJibVYwmwkG8TjHRXxVCLH5sY+i3MR+NicR9T IZFdY/AyH6vt5uHLQDU35+5n91pUG3F2lyiY5aeMOvBL05p27GTMuixR5ZoHcvSoHHtCq 7Wnk21iHqmv/UnEzqUfXZOque9YP386RBWkshrHd0x3OHUfBK/WrpivxvIGBzGwMr2qAj /AhJsfDXKBBbhGOGk1u5oBLjeC4SRnAcIVh1+RWzR4/cAhOuy2EcbzxaGb6VTM= Secret Example ============== .. code:: yaml # In opendev.org/openstack/loci/.zuul.yaml: - job: name: publish-loci-cinder parent: loci-cinder post-run: playbooks/push secrets: - loci_docker_login # In opendev.org/openstack/loci/playbooks/push.yaml: - hosts: all tasks: - include_vars: vars.yaml - name: Push project to DockerHub block: - command: docker login -u {{ loci_docker_login.user }} -p {{ loci_docker_login.password }} no_log: True - command: docker push openstackloci/{{ project }}:{{ branch }}-{{ item.name }} with_items: "{{ distros }}" Speculative Conatiner Images ============================ * Gating applied to continuously deployed container images * Build and test images that depend on other images * Build and test deployments comprising multiple images * Without publishing to final location * Publish the actual image that was built in the gate Zuul is not New =============== * Has been in Production for OpenStack for Six Years * Zuul is now a top-level effort of OpenStack Foundation * Zuul v3 first release where not-OpenStack is first-class use case OpenDev - Largest Known Zuul ============================ * 2KJPH (2,000 jobs per hour) * Build Nodes from 16 Regions of 5 Public and 3 Private OpenStack Clouds * Rackspace, Internap, OVH, Vexxhost, CityCloud * Linaro (ARM), Limestone, Packethost * 10,000 changes merged per month Not just for OpenStack ====================== * BMW (control plane in OpenShift) * GoDaddy (control plane in private Kubernetes) * GoodMoney (control plane in EKS, adding GKE) * Le Bon Coin * Easystack * TungstenFabric * OpenLab * Red Hat * others ... Code Review Systems =================== * Gerrit * GitHub (Public and Enterprise) In work / coming soon: * Pagure * Gitea Commonly Requested: * GitLab * Bitbucket Support for non-git =================== .. container:: progressive * Nope * helix4git may work for perforce, but is untested Installation of Software ======================== Ways to Install Zuul ==================== * Containers: https://hub.docker.com/_/zuul/ * Windmill: http://opendev.org/openstack/windmill * Software Factory: https://softwarefactory-project.io/ * Puppet: http://opendev.org/openstack-infra/puppet-zuul Zuul Containers =============== * Published on every commit * Application/Process containers * Config / Data should be bind-mounted in zuul/zuul-executor ================== * In k8s, zuul-executor must be run privileged * Uses bubblewrap for unprivileged sanboxing * Restriction may be lifted in the future Release Management ================== * Zuul is run Continuously Delivered and Deployed upstream * Some users deploy Zuul with Zuul * Releases are tagged from code run for OpenDev * There is no intent to have a 'stable' release * 'stable' is a synonym for "old and buggy" zuul/zuul-scheduler =================== * SPOF * We're working on it - HA/Distributed scheduler is coming * Recommend running scheduler from tags Quick Start =========== * docker-compose https://zuul-ci.org/docs/zuul/admin/quick-start.html Important Links =============== * https://zuul-ci.org/ * https://zuul-ci.org/docs/zuul * https://zuul-ci.org/docs/zuul-jobs/ * freenode:#zuul * https://opendev.org/zuul (https://git.zuul-ci.org/cgit/zuul) Questions ========= .. ansi:: images/questions.ans Presentty ========= .. hidetitle:: .. transition:: pan .. figlet:: Presentty * Console presentations written in reStructuredText * Cross-fade, pan, tilt, cut transitions * https://pypi.python.org/pypi/presentty