2856
views
✓ Answered

Behind the Seamless Rewrite: How Kubernetes Image Promoter Got Faster and Smaller

Asked 2026-05-01 20:16:36 Category: Cloud Computing

Every container image you pull from registry.k8s.io arrives there thanks to kpromo, the Kubernetes image promoter. It copies images from staging registries to production, signs them with cosign, replicates signatures across more than 20 regional mirrors, and generates SLSA provenance attestations. If this tool breaks, no Kubernetes release ships. Recently, we rewrote its core from scratch, deleted 20% of the codebase, made it dramatically faster, and nobody noticed — and that was exactly the goal. Here's how we did it, and why it matters.

What is the Kubernetes image promoter and why is it so critical?

The Kubernetes image promoter, known as kpromo, is the automated system that moves container images from staging registries to production. It handles copying, signing, replication, and provenance attestation. Without it, every Kubernetes release would grind to a halt. The tool ensures that images are securely and reliably distributed across more than 20 regional mirrors worldwide. It operates as a GitOps workflow: a developer pushes an image to a staging registry, opens a pull request with a YAML manifest, and after review and merge, kpromo takes over. This replaces the earlier manual, Googler-gated process, making the pipeline community-owned and transparent. In short, kpromo is the invisible backbone of Kubernetes image distribution.

Behind the Seamless Rewrite: How Kubernetes Image Promoter Got Faster and Smaller

How did the image promoter originate and evolve over time?

The project began in late 2018 as an internal Google project by Linus Arver. Its goal was to replace the manual, Googler-gated process of copying images into k8s.gcr.io with a community-owned, GitOps-based workflow. KEP-1734 formalized the proposal. In early 2019, the code moved to kubernetes-sigs/k8s-container-image-promoter and rapidly expanded. Over the next few years, Stephen Augustus consolidated multiple tools (cip, gh2gcs, krel promote-images, promobot-files) into a single CLI called kpromo. The repository was renamed to promo-tools. Later, Adolfo Garcia Veytia (Puerco) added cosign signing and SBOM support, Tyler Ferrara built vulnerability scanning, and Carlos Panato maintained its health and releases. In total, 42 contributors made ~3,500 commits across 60+ releases, building a powerful but increasingly complex system.

What specific problems prompted the 2026 rewrite?

By 2025, the codebase carried seven years of incremental additions from multiple SIGs and subprojects. The README itself warned of “duplicated code, multiple techniques for accomplishing the same thing, and several TODOs.” Production promotion jobs for Kubernetes core images regularly took over 30 minutes and frequently failed due to rate limit errors. The core promotion logic had become a monolith that was hard to extend and difficult to test, making new features like provenance or vulnerability scanning painful to add. Two items sat on the SIG Release roadmap for a while: “Rewrite artifact promoter” and “Make artifact validation more robust.” After discussions at SIG Release meetings and KubeCons, open research spikes on project board #171 captured eight critical questions that needed answers before any rewrite could proceed.

How was the rewrite structured and executed in phases?

In February 2026, we opened issue #1701 (“Rewrite artifact promoter pipeline”) and answered all eight preceding spike questions in a single tracking issue. The rewrite was deliberately phased so that each step could be reviewed, merged, and validated independently. The three main phases were:

  • Phase 1: Rate Limiting (#1702). Rewrote rate limiting to properly throttle all registry operations with adaptive backoff.
  • Phase 2: Interfaces (#1704). Put registry and auth operations behind clean interfaces so they can be swapped out and tested independently.
  • Phase 3: Pipeline. Refactored the core promotion pipeline into a modular, testable flow.

This phased approach minimized risk and allowed each improvement to be validated before moving to the next.

What specific improvements did the rewrite deliver?

The rewrite delivered three major technical improvements. First, adaptive backoff rate limiting dramatically reduced failures from rate limit errors, which had plagued long-running promotion jobs. Second, clean interfaces for registry and authentication operations made the code more modular, enabling independent testing and future swapping of backends. Third, the core promotion pipeline was refactored into a well-defined, testable flow. These changes allowed the team to delete 20% of the codebase — removing duplicate logic and dead code. The result is a smaller, faster, and more reliable system. Production promotion jobs now complete in a fraction of the previous time, and the code is far easier to extend for upcoming features like enhanced provenance validation or integration with new signing tools.

Why was the rewrite designed to be invisible to users?

The rewrite’s invisibility was intentional. The image promoter is a critical piece of infrastructure — if it breaks, Kubernetes releases stop. By making changes behind the scenes, we ensured zero disruption to developers, release managers, and end users who pull images from registry.k8s.io. The phased approach meant we could merge and validate each step without changing external behavior. All improvements — speed, reliability, code cleanliness — were internal. The only observable change is that images arrive faster and with fewer failures. This “invisible rewrite” philosophy reflects the mature DevOps principle that infrastructure upgrades should be transparent. The team prioritized stability and backward compatibility over flashy announcements, knowing that the real win is a system that just works better.

What future benefits does the rewrite enable?

The rewrite unlocks several strategic advantages. With a modular codebase and clean interfaces, adding new features like advanced provenance attestation, real-time vulnerability scanning, or support for additional signing mechanisms becomes straightforward. The removal of 20% of the code reduces maintenance burden and accelerates onboarding of new contributors. The adaptive rate limiting makes the system more resilient to traffic spikes and registry throttling. Most importantly, the improved testability allows the team to confidently evolve the promoter without fear of regressions. This foundation supports the SIG Release’s roadmap items for more robust artifact validation and paves the way for deeper integration with Kubernetes release automation. For the broader community, it means even more reliable and secure container image distribution, all without ever noticing the upgrade.