- Published on
Not Just Go — Building Operators With Kopf, Metacontroller, and Shell Operator
- Authors

- Name
- Youngju Kim
- @fjvbn20031
- Introduction — An Operator Is Not Exclusive to Go
- Big Picture — Three Branches of Non-Go Operator Frameworks
- Kopf — Building an Operator With Python
- Metacontroller — Declarative Hooks Without Code
- shell-operator and Ansible Operator — Script/Playbook Based
- Kopf in More Depth — Periodic Execution and Idempotency
- Deployment Comparison — How Each Framework Goes Onto the Cluster
- Comparison Table — Options at a Glance
- Prototyping vs Production
- Language Selection Criteria
- Performance and Resources — The Hidden Cost of Language Choice
- Migration — Moving Between Frameworks
- Ecosystem Maturity — A Realistic View
- A Collection of Pitfalls
- Metacontroller DecoratorController — Attaching to Existing Resources
- Fitting Scenarios — Seen Through Concrete Cases
- A Decision Checklist — What Should Our Team Pick
- One-Line Summary — The Identity of the Five Frameworks
- Ecosystem Trends — As of 2026
- Closing
- References
Introduction — An Operator Is Not Exclusive to Go
When we say "build an Operator," Go and kubebuilder come to mind almost reflexively. Indeed Operator SDK and kubebuilder have evolved around the Go ecosystem, and the powerful library controller-runtime is also for Go. So many teams give up in advance, thinking "we don't know Go well, so an Operator is out of reach."
But this is not true. The essence of an Operator is "a controller that watches and reconciles the Kubernetes API," not a specific language. The API server is language-agnostic and offers everyone the same watch/REST interface. So you can build an Operator with Python, with shell scripts, or even with almost no code.
This article compares the representative options beyond Go/kubebuilder with real code, and offers criteria for which fits which situation.
Big Picture — Three Branches of Non-Go Operator Frameworks
Non-Go options divide broadly into three branches.
1) Write reconcile directly in another language
- Kopf (Python): decorator-based, closest experience to the Go Operator
2) Declarative hooks (minimal code)
- Metacontroller: delegate state to a JSON in/out function, language-agnostic
3) Shell/config-based
- shell-operator: hooks as shell scripts
- Ansible Operator: reconcile as Ansible playbooks
The key difference is "where, and in what form, the reconcile logic lives." Kopf puts it in Python functions, Metacontroller in an external webhook function, and shell/Ansible Operators in scripts/playbooks.
Kopf — Building an Operator With Python
Kopf (Kubernetes Operator Pythonic Framework) is the most natural choice for Python developers. Registering event handlers with decorators is intuitive, and it is conceptually close to the Go Operator's reconcile model.
The following is a simple Kopf handler that reacts to a hypothetical CRD (kind: Database) and creates a ConfigMap.
import kopf
import kubernetes
@kopf.on.create('example.com', 'v1', 'databases')
def create_fn(spec, name, namespace, logger, **kwargs):
size = spec.get('size', 1)
logger.info(f"Database {name} creation requested — size={size}")
api = kubernetes.client.CoreV1Api()
cm = kubernetes.client.V1ConfigMap(
metadata=kubernetes.client.V1ObjectMeta(name=f"{name}-config"),
data={"size": str(size)},
)
api.create_namespaced_config_map(namespace=namespace, body=cm)
# Returning a value to reflect in status; Kopf records it automatically.
return {"provisioned": True, "size": size}
@kopf.on.update('example.com', 'v1', 'databases')
def update_fn(spec, status, name, namespace, logger, **kwargs):
new_size = spec.get('size', 1)
logger.info(f"Database {name} update — new size={new_size}")
# Change reconcile logic...
@kopf.on.delete('example.com', 'v1', 'databases')
def delete_fn(name, logger, **kwargs):
logger.info(f"Database {name} cleanup")
# Kopf manages finalizer cleanup automatically.
Kopf's strengths are as follows.
- A clear decorator-based event model: intent is sharp with on.create/on.update/on.delete/on.timer, etc.
- Automated finalizers, retries, backoff, and status management: Kopf handles much of what you must wire by hand in Go.
- A rich Python ecosystem: you can pull in Python libraries for data processing, external API integration, and even machine learning as-is.
A caveat is that the CRD itself must be defined and applied separately. Kopf handles handlers but does not generate the CRD schema for you. Also, due to the Python runtime, memory usage and cold start are heavier than Go.
Metacontroller — Declarative Hooks Without Code
Metacontroller takes a different approach. Metacontroller operates the entire reconcile loop, and the user provides only "a function that takes the current state (JSON) and returns the desired child resources (JSON)." This function is just an HTTP endpoint, so it can be written in any language.
Metacontroller's two core controllers are:
- CompositeController: looks at a parent CR and creates and maintains a set of child resources (e.g., a homegrown CR -> Deployment + Service).
- DecoratorController: attaches extra children or changes to existing resources.
The CompositeController's flow is as follows.
[Metacontroller]
bundles the parent CR + current children as JSON
| POST (sync request)
v
[User's webhook function] (Python/JS/anything)
returns the list of desired child resources (JSON)
|
v
[Metacontroller]
creates/updates/deletes child resources to match the returned desired state
Both the request the sync hook receives and the response it returns are plain JSON. For example, the core of a sync hook written in Python is:
from flask import Flask, request, jsonify
app = Flask(__name__)
@app.route('/sync', methods=['POST'])
def sync():
observed = request.get_json()
parent = observed['parent']
replicas = parent['spec'].get('replicas', 1)
name = parent['metadata']['name']
# Build and return the desired child resource (Deployment) as JSON
desired_deployment = {
"apiVersion": "apps/v1",
"kind": "Deployment",
"metadata": {"name": name},
"spec": {
"replicas": replicas,
"selector": {"matchLabels": {"app": name}},
"template": {
"metadata": {"labels": {"app": name}},
"spec": {"containers": [{"name": "app", "image": "nginx"}]},
},
},
}
return jsonify({"status": {"replicas": replicas}, "children": [desired_deployment]})
Metacontroller's appeal is that it takes responsibility for the hard parts of reconcile (watch, queue, retries, garbage collection, ownership). The user writes only a pure function ("the desired output for this input"). If you are comfortable with functional thinking, it is very clean. The downsides are that you must install and operate Metacontroller itself in the cluster, and that it lacks expressiveness for complex imperative operational logic (sequential tasks, external system integration).
shell-operator and Ansible Operator — Script/Playbook Based
shell-operator
shell-operator is a simple, powerful framework that "runs a shell script (or any executable) when an event occurs." Each hook prints the configuration (JSON) of which events it reacts to, and the body runs when an actual event arrives.
#!/usr/bin/env bash
if [[ $1 == "--config" ]]; then
# Declare the events this hook subscribes to
cat <<EOF
configVersion: v1
kubernetes:
- apiVersion: v1
kind: ConfigMap
executeHookOnEvent: ["Added", "Modified"]
EOF
else
# Actual event handling: the binding context is passed as a file path
echo "ConfigMap event detected, perform follow-up work with kubectl"
# reconcile with commands like kubectl ...
fi
shell-operator is familiar to teams already doing operational automation with shell/kubectl. However, idempotency, error handling, and retries are all the script author's responsibility, so it gets harder to manage as it grows complex.
Ansible Operator
Operator SDK supports Ansible-based Operators in addition to Go. When the change of a CR is watched, the corresponding Ansible playbook runs to realize the desired state. It is great for organizations that already manage infrastructure with Ansible to move that asset directly into Kubernetes reconcile. An advantage is that Ansible modules largely guarantee idempotency.
Kopf in More Depth — Periodic Execution and Idempotency
Kopf supports not only one-shot event handlers but also periodic reconcile. It is useful for periodically checking external state (e.g., cloud resources, a SaaS API) and matching the cluster to it.
import kopf
@kopf.timer('example.com', 'v1', 'databases', interval=60.0)
def reconcile_periodically(spec, status, name, logger, **kwargs):
# Called every 60 seconds to compare desired state against the outside.
desired = spec.get('size', 1)
current = status.get('observedSize')
if current != desired:
logger.info(f"drift detected: current={current}, target={desired}")
# Logic to match the external system to desired
return {"observedSize": desired}
# If nothing changed, return nothing to avoid status updates (idempotent)
The important principle here is idempotency. Since the timer handler is called every 60 seconds, unconditionally calling the external API or updating status every time invites load and an infinite update loop. You must encode "if already at the target state, do nothing." This principle is identical to the Go Operator, showing that even when the language changes, the essence of reconcile does not.
Kopf also provides child resource watching (on.event), sub-handlers, exponential backoff on error, concurrency control, and more. It is a feature set sufficient for a Python team to build a production-grade Operator.
Deployment Comparison — How Each Framework Goes Onto the Cluster
The way you actually put your built Operator onto the cluster also differs by framework.
- Kopf: build a container image holding the handler code and run it as a Deployment. Apply the CRD as separate YAML. You must also define and attach RBAC yourself.
- Metacontroller: first install Metacontroller itself in the cluster, then run the sync hook (a web server) as a Deployment, and declare "which parent to look at and which hook to call" via a CompositeController resource.
- shell-operator: build an image with the hook scripts on top of the shell-operator base image and run it as a Deployment.
- Ansible Operator: Operator SDK generates the image holding the playbooks and the manifests.
Common structure (any framework)
[container image] <- reconcile logic (code/script/playbook)
+
[CRD] <- the vocabulary users declare
+
[RBAC] <- permissions over the resources the controller handles
+
[Deployment] <- actually runs the controller
Even with different languages and frameworks, the skeleton of "image + CRD + RBAC + running workload" is the same. Once you understand this common structure, you can adapt quickly to any framework you meet.
Comparison Table — Options at a Glance
| Item | Go/kubebuilder | Kopf (Python) | Metacontroller | shell-operator | Ansible Operator |
|---|---|---|---|---|---|
| Main language | Go | Python | Language-agnostic (JSON) | Shell/any | Ansible YAML |
| Learning curve | High | Medium | Medium | Low | Low (with Ansible experience) |
| Expressiveness | Very high | High | Medium (declarative) | Low to medium | Medium |
| Performance/resources | Best | Moderate | Moderate | Light | Moderate |
| Idempotency responsibility | Developer | Partly automatic | Framework | Developer | Ansible modules |
| Ecosystem maturity | Most mature | Mature | Stable/niche | Stable/niche | Mature |
| Suitable for | Production standard | Python teams, ML integration | Simple composition patterns | Lightweight ops automation | Owning Ansible assets |
The simple guidance derived from this table is: if you need a production standard and expect long-term maintenance, Go is still the top choice. But depending on team skills, automation complexity, and existing assets, a non-Go option can be a much faster and more sensible path.
Prototyping vs Production
One of the most important questions in tool choice is "is this a prototype or production?"
- Prototyping stage: if the goal is idea validation and fast iteration, Kopf or Metacontroller is overwhelmingly faster. You can build a working Operator in a few hours and quickly verify the validity of the CRD design.
- Production stage: if you must handle thousands of resources, guarantee an SLA, and maintain for years, performance and ecosystem maturity become important. Here, the performance, memory efficiency, and rich tooling of Go's controller-runtime are favorable.
One wise strategy is to "quickly build a prototype with Kopf to validate the value of the CRD schema and reconcile logic, then reimplement in Go once large scale and long life are truly confirmed." Since the CRD schema itself is language-independent, you can swap only the internal implementation without changing the user interface (CR).
Language Selection Criteria
The practical factors that drive the choice are as follows.
- Team skills: if the team is strong in Python and has no Go experience, an Operator well built with Kopf is operationally safer than forcing Go. The best code is code the maintainers can read and fix.
- The nature of external integration: if you need ML inference, complex data transformation, or specific SaaS SDK calls inside reconcile, a language with a well-developed SDK (often Python) is favorable.
- Automation complexity: for simple "parent -> child" composition, you can use Metacontroller with almost no code. Conversely, if sequential steps and a state machine are complex, an imperative language (Go/Python) is better.
- Existing assets: if you already run infrastructure with Ansible, an Ansible Operator is sensible for asset reuse.
Performance and Resources — The Hidden Cost of Language Choice
The appeal of non-Go frameworks is development speed, but you must also weigh runtime cost.
- Memory footprint: a Go controller is usually light at tens of MB. Python (Kopf) is heavier due to the interpreter and libraries, and as the number of handled objects grows, cache memory grows with it.
- Cold start: a Go binary starts instantly. Python takes more time for imports and initialization. Most Operators are long-running, so cold start is rarely fatal, but in environments with frequent restarts the difference is felt.
- Throughput: at large scale where tens of thousands of resources must be reconciled quickly, Go's efficiency is a clear advantage. Metacontroller and webhook-based approaches add latency from the HTTP round trip.
- Number of operational components: Metacontroller must run two components — itself plus the user hook. shell-operator is likewise a separate runtime. Do not forget that "my logic is light, but the framework on top may be heavy."
These costs are nearly negligible at small scale or for prototypes. But accumulated in large-scale production, they become non-negligible. That is why the typical path of "prototype with a fast tool, large-scale production in Go" is sensible.
Migration — Moving Between Frameworks
Moving from non-Go to Go (or vice versa) is smoother than you might think. The key is keeping the contract called the CRD as-is and changing only the implementation.
- Freeze the CR schema. The interface users see (apiVersion, kind, spec fields) must not change during migration.
- Prepare equivalence tests. Compare whether the old and new implementations produce identical child resources/status for the same CR input.
- Replace the controller. At any single moment in a cluster, only one controller should reconcile a given CRD. If two implementations touch the same CR at once, they conflict.
- Migrate incrementally. Where possible, validate the new implementation in a separate namespace/separate CRD version before switching.
Ecosystem Maturity — A Realistic View
The maturity of each tool deserves an honest assessment.
- Go/kubebuilder/controller-runtime: the most active, evolving in step with the Kubernetes core. Almost every commercial Operator is on this stack. It is the safest choice in the long run.
- Kopf: effectively the standard Operator framework in the Python ecosystem, and stable. A proven path for Python teams.
- Metacontroller: elegant in concept and stable, but its applicability is specialized to the niche of "declarative composition." It is not an all-purpose tool.
- shell-operator/Ansible Operator: very practical for specific operational patterns, but clearly limited for complex state management.
When assessing maturity, it is good to also consider "is the project alive, do the use cases resemble my problem, and is there material and community to consult when something goes wrong?"
A Collection of Pitfalls
- An easy language does not mean easy reconcile: even if Python makes the code short, the controller's inherent challenges like idempotency, races, and infinite loops remain. The framework merely hides some of them.
- Infinite status-update loop: the pattern where a handler updates status and that update wakes the handler again can happen in every framework. Add a guard so it is used only when a change is truly needed.
- Purity of the Metacontroller hook: the sync hook must compute only "input -> desired output" without side effects. Creating resources directly with kubectl inside the hook conflicts with Metacontroller's model.
- Missing idempotency in shell/Ansible: if a script does not guarantee the same result every time, side effects accumulate with each reconcile. Explicitly implement "do nothing if it already exists."
- Shifting the operational burden: Metacontroller and shell-operator are themselves another component you must install and operate in the cluster. Remember that "my Operator is light, but the framework that runs it may be heavy."
- Distributed responsibility for CRD management: many non-Go frameworks do not auto-generate the CRD. You must write and version the CRD YAML separately, and handle schema validation (OpenAPI v3) yourself. If you forget the CRD, the controller has nothing to watch and silently does nothing.
- Silent failure from missing RBAC: when permissions are insufficient, watch or create is denied, and depending on the framework this error is buried deep in logs, leaving you wandering over "why isn't reconcile working?" Right after deployment, first check the controller logs for any forbidden messages.
Metacontroller DecoratorController — Attaching to Existing Resources
While the earlier CompositeController handles "parent CR -> child resources," the DecoratorController has a slightly different idea. It "decorates" existing resources (built-in or another CR) with additional behavior. For example, you can implement a rule like "whenever a Deployment with a specific label appears, always create a paired Service alongside it" with almost no code.
The flow of using a DecoratorController
Target: Deployments labeled app-type=web
|
v
Metacontroller calls the sync hook (passing the Deployment as JSON)
|
v
The hook returns the desired JSON of "the Service attached to this Deployment"
|
v
Metacontroller creates/maintains the Service
This pattern meets the idea of the "CRD-less controller" covered in the previous article. That is, with a DecoratorController you can implement automation over existing resources as a declarative hook, without creating a new CRD. The barrier to entry is far lower than writing a Go controller.
Fitting Scenarios — Seen Through Concrete Cases
Picturing the actual situations where each framework shines makes the choice much easier.
When Kopf Fits
An internal ML platform team wants automation where "when a CR called TrainingJob appears, validate the dataset, call an external feature store API to fetch metadata, then create an appropriate Job." This logic depends deeply on Python data libraries and an internal SDK (also Python). The team's primary language is Python too. Going to Go here would waste time re-wrapping the SDK. Kopf is the clear answer.
When Metacontroller Fits
There is a requirement that "when you create our company's standard microservice CR (kind: Microservice), a set of Deployment + Service + HPA + ServiceMonitor must always be created together." The logic is purely "this spec -> these children." No imperative procedure, no external calls. With a CompositeController's sync hook, you take the input spec and return the desired child list, and that is it. There is no reason to write a new Go controller.
When shell-operator Fits
An ops team already does all automation with kubectl and bash, and needs just one small automation: "when a Secret is added to a specific namespace, sync it to an external secret management system." They have no bandwidth to learn a new language or build a pipeline. With shell-operator, they can turn existing script assets into event-driven ones almost as-is.
When Ansible Operator Fits
An organization that already manages on-premises infrastructure with hundreds of Ansible playbooks wants to move that operational knowledge into Kubernetes reconcile. They can reuse the playbooks almost entirely while triggering them on a CR basis, giving the lowest learning cost.
The common lesson of these cases is "context decides the tool." Even for the same goal of "building an Operator," the optimal tool changes entirely depending on the team's language, existing assets, and the nature of the logic.
A Decision Checklist — What Should Our Team Pick
Finally, a checklist to aid the actual choice. Answer in order from the top.
- Is there an already well-maintained external Operator? If so, do not build one yourself; use it. It is the cheapest choice.
- Do you really need a new CRD, or is built-in resource automation enough? If the latter, consider the CRD-less controller (from the previous article) or a Metacontroller DecoratorController.
- Is it a prototype or production? For a prototype, use Kopf or Metacontroller for speed. For large-scale production, seriously consider Go.
- What is the team's primary language? For a Python team, Kopf is natural. If you have Ansible assets, an Ansible Operator.
- Is the logic declarative or imperative? If it expresses cleanly as "input -> desired output," Metacontroller. For a complex sequential procedure, Kopf/Go.
- Can the long-term maintainers read and fix it? This is the most important. No matter how cool the tool, if the team cannot handle it, it is debt.
Pass these six questions and, in most cases, you naturally narrow to one or two candidates. The key is to choose by "it fits our problem and our team," not by "others use it."
One-Line Summary — The Identity of the Five Frameworks
Finally, let us imprint each tool in a single sentence.
- Go/kubebuilder: the most powerful and mature orthodox path. The production standard, but with a high cost of entry.
- Kopf (Python): most faithfully reproduces the Go Operator experience in Python. The top choice for Python teams.
- Metacontroller: delegates the hard parts of reconcile to the framework, leaving the user with only a pure function. A master of declarative composition.
- shell-operator: the lightest bridge that lifts shell/kubectl assets into event-driven ones.
- Ansible Operator: a passage that moves existing Ansible operational knowledge into Kubernetes reconcile.
Keep these five lines in mind, and when you meet a new automation request, you can quickly recall the candidates.
Ecosystem Trends — As of 2026
Before adopting non-Go tools, it is worth noting where the ecosystem currently stands.
- Continuous evolution of the Go stack: kubebuilder and controller-runtime evolve in step with the Kubernetes core. The latest line supports Kubernetes 1.36 and Go 1.26, and operational simplification is underway, such as removing the separate sidecar (kube-rbac-proxy) for protecting the metrics endpoint and replacing it with controller-runtime's built-in authentication/authorization. If you build your own Operator, you must follow this flow.
- Stabilization of non-Go tools: Kopf and Metacontroller are at a stage that weights stability and maturity over active new-feature additions. This is not a weakness but a signal that they are trustworthy as foundational technology.
- The boundary with policy engines: simple validation/mutation is increasingly absorbed by policy engines like Kyverno. Always first ask yourself "isn't a policy enough for this rather than an Operator?"
In summary, even in 2026, the broad framing of "non-Go for prototypes and special-language needs, Go for large-scale standard production" holds.
Closing
An Operator is not exclusive to Go. The Kubernetes API is open equally to all languages, and Kopf, Metacontroller, shell-operator, and Ansible Operator each pass through that door with different strengths. What matters is not "the coolest tool" but "the tool our team can build quickly and maintain for a long time."
For fast validation, Kopf and Metacontroller shine; for large-scale, long-lived production, Go is still strong. And between the two stands the stable contract of the CRD, so if needed you can swap the implementation without breaking the interface. Once you understand the essence of the problem (reconcile) rather than the language, the tool follows naturally.
Finally, one more thing to emphasize. "Using a non-Go framework" does not mean "compromising." Many organizations run Operators built with Kopf stably in production for years, and many have built standard platform abstractions with Metacontroller. What matters is to know the tool's limits precisely and choose the most productive path within the range where those limits do not touch your problem. Match the actual capability of your team and the shape of the problem, not the tool's reputation. That is the surest way to build automation that survives over the long term.
References
- Kubernetes Operator pattern: https://kubernetes.io/docs/concepts/extend-kubernetes/operator/
- Kopf official docs: https://kopf.readthedocs.io/
- Kopf GitHub: https://github.com/nolar/kopf
- Metacontroller official docs: https://metacontroller.github.io/metacontroller/
- shell-operator GitHub: https://github.com/flant/shell-operator
- Operator SDK (incl. Ansible): https://sdk.operatorframework.io/docs/building-operators/ansible/
- Kubebuilder Book: https://book.kubebuilder.io/
- controller-runtime: https://pkg.go.dev/sigs.k8s.io/controller-runtime
- Operator SDK framework overview: https://sdk.operatorframework.io/docs/overview/
- Metacontroller CompositeController API: https://metacontroller.github.io/metacontroller/api/compositecontroller.html
- Metacontroller DecoratorController API: https://metacontroller.github.io/metacontroller/api/decoratorcontroller.html
- Kopf handlers reference: https://kopf.readthedocs.io/en/stable/handlers/
- kubebuilder GitHub: https://github.com/kubernetes-sigs/kubebuilder
- Kyverno official docs: https://kyverno.io/docs/