Skip to content
Published on

Platform Engineering in Banking — The Marriage of Backstage and Governance

Authors

Introduction

What does it take to create one new microservice at a bank? Writing the code might take a day. Reality looks like this: an approval form for Git repository creation, a build pipeline registration request, a firewall opening request (days of round trips), a database account request, a security review submission, asset inventory registration, a monitoring onboarding request. Each step has a different owning department, a different approval chain, a different SLA. Add it up: three weeks to a month. The developer spends that time writing request forms instead of code.

The interesting part: every one of those procedures has a rational basis. Access control under electronic finance supervision rules, security review under information protection policy, asset management obligations, change control. The problem is not that the procedures exist, but that they are executed by hand, serially, on paper.

Platform engineering attacks exactly this point. Not removing the procedures, but embedding them in code so they execute automatically and leave evidence automatically. This article covers designing and operating an internal developer platform (IDP) around Backstage in a bank. It is a technical write-up; the regulatory application at any specific institution must follow its compliance organization's judgment.

The Value of Platform Engineering in Finance — Standardization Is Compliance Automation

In a typical company, platform engineering is justified mainly by "better developer experience" and "reduced cognitive load." In finance, a decisive third value is added:

A standardized golden path is, in itself, a compliance automation device.

  • If every service is born from the same template, omitting security settings becomes structurally impossible.
  • If every deployment passes through the same pipeline, change-control evidence accumulates automatically.
  • If every asset is registered in the catalog, asset inventory and ownership obligations are met by default.

From the security and audit teams' perspective, the inspection surface shrinks from "300 services configured 300 different ways" to "5 templates and 1 pipeline." Inspection cost drops by an order of magnitude while inspection confidence rises. This is the business case for platform engineering in finance — the language that wins over executives and the security organization.

[Before]  every service configured differently
  inspection = number of services x number of checks  (manual, omissions possible)

[IDP]     convergence onto golden paths
  inspection = number of templates x number of checks  (replaceable by code review)
  + automated detection of template violations (policy engine)

Building Regulatory Requirements into the Golden Path — Backstage Templates

The Backstage Scaffolder template is the entrance to the golden path. The key is to make security and audit requirements defaults, not options. Services are born with security scanning, audit logging, and encryption already on — without the developer thinking about it.

# template.yaml — a new-service template with financial controls built in
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
  name: secure-spring-service
  title: Spring Boot service (secure defaults built in)
  description: Standard service with security scanning, audit logging, and encryption enabled by default
  tags:
    - recommended
    - java
    - compliance-ready
spec:
  owner: platform-team
  type: service

  parameters:
    - title: Basic service information
      required:
        - serviceName
        - ownerGroup
        - dataClassification
      properties:
        serviceName:
          title: Service name
          type: string
          pattern: '^[a-z0-9-]+$'
        ownerGroup:
          title: Owning organization (synced with HR department codes)
          type: string
          ui:field: OwnerPicker
        dataClassification:
          title: Data classification (used in asset reporting)
          type: string
          enum:
            - public
            - internal
            - personal-credit    # personal credit data — extra controls auto-applied
          default: internal

  steps:
    # 1) Generate the skeleton — a codebase with security config already included
    - id: fetch
      name: Generate standard skeleton
      action: fetch:template
      input:
        url: ./skeleton
        values:
          serviceName: ${{ parameters.serviceName }}
          owner: ${{ parameters.ownerGroup }}
          dataClass: ${{ parameters.dataClassification }}

    # 2) If classification is personal-credit, force-inject the encryption module
    - id: inject-crypto
      name: Inject field-encryption module
      if: ${{ parameters.dataClassification === 'personal-credit' }}
      action: fetch:template
      input:
        url: ./addons/field-encryption
        targetPath: ./src

    # 3) Create the internal Git repo + branch protection + CODEOWNERS
    - id: publish
      name: Create repository
      action: publish:gitlab
      input:
        repoUrl: gitlab.bank.internal?owner=${{ parameters.ownerGroup }}&repo=${{ parameters.serviceName }}
        defaultBranch: main
        protectDefaultBranch: true

    # 4) Register the security pipeline (SAST + SCA + image scan + signing)
    - id: pipeline
      name: Attach standard pipeline
      action: bank:pipeline:register
      input:
        repo: ${{ steps.publish.output.repoContentsUrl }}
        stages: ['sast', 'sca', 'image-scan', 'cosign-sign']

    # 5) Draft e-approval — new asset registration sign-off (custom action, see below)
    - id: approval
      name: Submit asset registration approval
      action: bank:approval:submit
      input:
        docType: NEW_SERVICE_REGISTRATION
        serviceName: ${{ parameters.serviceName }}
        dataClass: ${{ parameters.dataClassification }}
        requester: ${{ user.entity.metadata.name }}

    # 6) Catalog registration — asset inventory obligation met automatically
    - id: register
      name: Register in catalog
      action: catalog:register
      input:
        repoContentsUrl: ${{ steps.publish.output.repoContentsUrl }}
        catalogInfoPath: '/catalog-info.yaml'

The skeleton side also carries regulation as code: logging configuration with audit logging (who, when, what) enabled, TLS enforced, secrets allowed only as external vault references, standard health checks and metrics endpoints included. Developers do not "study the security requirements and apply them" — they are in compliance unless they actively turn the defaults off.

Integrating Approval Workflows — Wiring e-Approval into the Scaffolder

Every action at a bank comes with an approval. If the IDP bypasses approvals, adoption is vetoed; if approvals stay manual, the automation is pointless. The answer: a Scaffolder custom action that calls the e-approval system and waits for completion before proceeding.

// plugins/scaffolder-backend-module-bank/src/actions/submitApproval.ts
// Concept code: custom action integrating the e-approval system
import { createTemplateAction } from '@backstage/plugin-scaffolder-node';

export const submitApprovalAction = () => {
  return createTemplateAction<{
    docType: string;
    serviceName: string;
    dataClass: string;
    requester: string;
  }>({
    id: 'bank:approval:submit',
    description: 'Submit a draft to the e-approval system and wait for sign-off',
    schema: {
      input: {
        required: ['docType', 'serviceName', 'dataClass', 'requester'],
        type: 'object',
        properties: {
          docType: { type: 'string' },
          serviceName: { type: 'string' },
          dataClass: { type: 'string' },
          requester: { type: 'string' },
        },
      },
    },
    async handler(ctx) {
      const { docType, serviceName, dataClass, requester } = ctx.input;

      // 1) Create the approval document — the chain depends on data class
      //    internal: team lead only / personal-credit: lead + infosec + CISO
      const approvalLine = dataClass === 'personal-credit'
        ? ['TEAM_LEAD', 'INFOSEC_REVIEW', 'CISO']
        : ['TEAM_LEAD'];

      const doc = await approvalClient.createDocument({
        type: docType,
        title: `New service registration: ${serviceName}`,
        body: renderTemplate(docType, ctx.input),
        approvers: approvalLine,
        drafter: requester,
      });
      ctx.logger.info(`Approval submitted: ${doc.id}`);

      // 2) Wait for completion — polling (or event-driven if webhooks exist)
      const result = await approvalClient.waitForCompletion(doc.id, {
        timeoutHours: 72,
        pollIntervalSec: 60,
      });

      if (result.status !== 'APPROVED') {
        throw new Error(`Approval rejected: ${result.comment ?? 'no reason given'}`);
      }

      // 3) Emit the approval evidence — recorded in later steps and catalog annotations
      ctx.output('approvalDocId', doc.id);
      ctx.output('approvedAt', result.completedAt);
      ctx.output('approvers', result.approverHistory);
    },
  });
};

Design points:

  • Make the approval chain a function of data classification. Low-risk work is signed off by the team lead in minutes; only high-risk work involves the security organization. This alone slashes average lead time dramatically.
  • Persist the approval evidence (document ID, approvers, timestamps) as catalog entity annotations. At audit time, "the creation approval evidence for this service" is one click away.
  • Even if the approval system exposes only a creaky SOAP API, you build the adapter once. That adapter becomes the automation gateway for the whole organization.

Asset Management via the Catalog — System Inventory, Owners, Classifications

Vulnerability assessments of electronic financial infrastructure, asset audits, supervisory reporting — all demand "the list of systems, their owners, and their criticality." When the Backstage catalog replaces the spreadsheet, the list is always current (auto-registered at creation), owners are always valid (HR-synced groups), and classifications become live data consumed by the policy engine.

# catalog-info.yaml — metadata mapped to supervisory reporting fields
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: loan-application-api
  description: Loan application intake API
  annotations:
    bank.example.com/asset-id: "AST-2026-0142"        # asset management number
    bank.example.com/approval-doc: "APR-2026-08731"   # creation approval document
    bank.example.com/last-vuln-assessment: "2026-03"  # latest vulnerability assessment
  labels:
    bank.example.com/data-class: personal-credit
    bank.example.com/criticality: critical
    bank.example.com/dr-tier: tier-2
spec:
  type: service
  lifecycle: production
  owner: group:retail-lending-dev        # synced with HR department codes
  system: loan-origination
  dependsOn:
    - resource:loan-db
    - component:customer-master-api
Supervisory/audit requirementCatalog mappingHow it is produced
Information asset inventoryAll Component/Resource entitiesDaily extract via catalog API
Responsible owner per assetspec.owner (HR-synced group)Group leader interpreted as owner
System criticalitycriticality labelAssessment results recorded as labels
Systems processing personal credit datadata-class label filterA one-line label query
Inter-system dependency mapdependsOn graphDependency graph export
Change approval evidenceapproval-doc annotation + Git historyTrace back by approval ID

The quarterly asset-audit spreadsheet becomes a batch job: "catalog API call plus format conversion." Even more valuable is the reverse check: when the policy engine detects a workload running in the cluster that is absent from the catalog, that is your "unregistered asset detection" control.

Operating the IDP on an Air-Gapped Network — Mirroring

Bank development networks block the internet or restrict it through proxies. Backstage and its ecosystem are a pile of external dependencies — npm packages, container images, plugins — so a mirroring regime must come first.

[Internet zone — inbound only]
  registry.npmjs.org      ghcr.io / docker.io     plugin sources
        |                       |                       |
        v                       v                       v
  +---------------------------------------------------------+
  |  Ingestion gateway (periodic sync + vuln scan + approval)|
  +---------------------------------------------------------+
        |                       |
        v                       v
[Internal network]
  Nexus/Verdaccio          Harbor
  (internal npm mirror)    (internal image registry)
        |                       |
        +----------+------------+
                   v
            Backstage build/deploy
            (yarn.lock resolved URLs pinned to internal mirrors)

A few operational lessons:

  • Version pinning and reproducible builds. "Pull the latest and see" does not exist on an air-gapped network. Pin yarn.lock and image digests, and treat Backstage upgrades as planned quarterly work.
  • Standardize the plugin ingestion procedure. Using a community plugin means: source ingestion, SCA scan, security review, publication to the internal npm registry. Make that procedure itself a Backstage template (the platform bootstrapping the platform).
  • TechDocs needs air-gap treatment too. Host the mkdocs images, fonts, and plugins on internal mirrors and strip external CDN references at build time.
  • Remove external auth dependencies. Replace GitHub OAuth with OIDC against the internal IdP (Keycloak, AD FS).

The Boundaries of Self-Service — What Is Automatic, What Is Reviewed

"Everything self-service" does not fly in finance; "everything approved" erases the platform's reason to exist. Draw the boundary with a risk-tier matrix.

ActionLow risk (auto-approve)Medium risk (team lead)High risk (security review)
Dev-environment namespace creationO------
New service from standard templateO (internal class)O (externally exposed)O (personal credit data)
Production deploy (standard pipeline)O (within pre-approved windows)O (outside windows)---
Firewall rule addition---O (internal segment, standard ports)O (external segment, non-standard)
DB schema changeO (dev)O (prod, non-destructive)O (prod, destructive)
Secret issuance/rotationO (auto-rotation)---O (manual exception issuance)
Production data read access------O (time-boxed + mandatory reason)

Three principles:

  1. For low-risk actions, approval is not removed — it is replaced by policy passage. "Auto-approve" really means machine adjudication by the policy engine (quotas, classifications, template conformance), with the verdict preserved as evidence.
  2. High-risk actions keep human review, but the platform auto-attaches everything the reviewer needs. When the security team receives a review request with the data classification, dependency graph, and scan results already attached, review time drops by more than half.
  3. Manage the boundary itself as code. Declare the matrix in YAML and have approval-chain routing read it; then boundary adjustments are themselves tracked through PRs and approvals.

Measuring Maturity — DORA Plus Regulatory Metrics

A platform earns budget only when its outcomes are measured. In finance, a scorecard of the four DORA metrics plus regulatory metrics is recommended.

CategoryMetricBefore (example)One-year target
DORADeployment frequencyMonthlyTwice weekly or more
DORAChange lead time3 weeks2 days
DORAChange failure rate18%8% or lower
DORATime to restore (MTTR)4 hours30 minutes
RegulatorySecure-default adoption for new servicesUnmeasurable100% (template-enforced)
RegulatoryAsset inventory accuracy (vs audit)About 70%99% or higher
RegulatoryVulnerability remediation lead time (Critical)30 days7 days
RegulatoryUnregistered assets detectedUndetectableConverging to 0 per month
ExperienceNew service setup time15-20 business days1 business day
ExperienceDeveloper satisfaction (quarterly survey)---Upward trend

One caution: never weaponize the metrics for performance evaluation. The moment you build a team leaderboard, the metrics get gamed and the platform is perceived as surveillance — adoption collapses. State publicly that the metrics serve only platform improvement and executive reporting.

Culture Shift — The Security Team as a Customer of the Platform

In banking, platform engineering succeeds or fails on the relationship with security and audit organizations more than on technology. The common failure: the platform team treats developers as the only customers and the security team as "a gate to pass." Gates retaliate.

Flip the framing and define the security team as the second customer.

  • Interview the security team's pain points (manual checks, evidence collection, asset visibility) and turn them into platform features. A feature like "scan results for every in-scope service on one screen" turns the security team into the platform's most fervent advocates.
  • Co-own the golden path's secure defaults with the security team. Put them in the template repository's CODEOWNERS so security-relevant changes require their review. The structure is not "security approves the platform" but "security co-builds the platform."
  • When the platform takes over evidence production during audit season, that experience cements the platform's value across the organization.

Case Scenario — The Path from Three Weeks to One Day

A fictional before/after: a retail lending team creates a loan limit inquiry API.

[Before — 21 business days]
D1-D3   : Git repo request form, approval wait, manual creation by infra team
D4-D8   : pipeline registration request, CI staff queue
D6-D10  : firewall requests (DB plus 2 internal APIs) — security review queue
D8-D12  : DB account issuance, secrets handed over manually (via messenger... risky)
D10-D15 : security review — checklist filled by hand, rejected once
D15-D18 : asset inventory registration, monitoring onboarding request
D19-D21 : production deploy approval, first deploy
Developer verdict: "I wrote more request forms than code."

[After — 1 business day]
09:00  Run the secure-spring-service template in the Backstage portal
       (service name, owning group, data class = personal-credit)
09:01  Repo, pipeline, monitoring, catalog all auto-created
       standard firewall policies auto-requested (DB rule auto-approved: standard port)
09:02  Data class is personal-credit, so approval auto-submitted
       (team lead + infosec review — scan results and dependency graph attached)
14:30  Infosec approves (fast: everything needed for review was attached)
14:31  Production namespace provisioned, secrets injected as vault references
16:00  First production deploy done — every artifact from creation to deploy preserved
Developer verdict: "Built it in the morning, shipped it in the afternoon."

The point is that approvals did not disappear. The approval is still there (infosec at 14:30). What disappeared is form-writing, queue-waiting, information re-collection, manual registration — that is, the friction between the approvals.

Failure Patterns

Knowing the common failures in advance avoids half of them.

  • Building the portal first. Backstage UI without backend automation is "a pretty request-form generator." One vertical slice where a template run truly creates every resource end to end is worth more than ten screens.
  • Onboarding every team at once. Refine the golden path deeply with one or two pilot teams, then expand. The first impression of a half-baked platform is hard to recover.
  • Automation without governance consensus. If the platform team unilaterally decides to streamline approvals, an audit overturns everything. Co-sign the risk-tier matrix with security and compliance before starting.
  • Avoiding the legacy approval system integration. Postponing the integration "because there is no API" leaves the platform forever half-built. Even screen automation (RPA) beats nothing.
  • The platform team becoming a ticket desk. If the model degrades to "ask the platform team and they do it," you have merely concentrated the bottleneck. Every capability must default to portal/API self-service.
  • Sailing without measurement. If you do not record pre-adoption lead times, you cannot prove outcomes a year later — and the budget dies.

Adoption Roadmap

PhaseDuration (example)Key deliverables
0. Alignment1-2 monthsRisk-tier matrix agreed with security/compliance, pilot team selection, baseline lead-time measurement
1. Vertical slice2-3 monthsOne template fully automated from repo to production deploy (approval integration included)
2. Catalog2 monthsBulk-register existing services, HR-synced owners, asset reporting batch
3. Expansion3-6 monthsGrow to 3-5 templates, onboard divisions beyond the pilot, stabilize air-gap mirrors
4. Governance depthOngoingUnregistered asset detection, automated evidence production, regular scorecard reporting

Checklist

Governance

  • Is the risk-tier matrix jointly agreed and signed with security/compliance?
  • Are policy adjudication results for auto-approvals preserved as evidence?
  • Does the security team co-own template secure defaults (CODEOWNERS)?

Golden path

  • Does a single template run complete repo, pipeline, monitoring, and catalog registration?
  • Do encryption and approval chains branch automatically by data classification?
  • Are security scans (SAST/SCA/image) and signing included in the pipeline by default?

Catalog

  • Are owners synced with HR so retirement/transfer updates them automatically?
  • Are supervisory reporting fields (asset ID, classification, approval evidence) preserved as metadata?
  • Does detection of unregistered assets (runtime workloads absent from the catalog) work?

Air-gapped operations

  • Are npm/image/plugin mirrors and the ingestion procedure standardized?
  • Is the build reproducible from internal mirrors alone (zero external dependencies)?
  • Is there a planned cadence and procedure for Backstage upgrades?

Measurement

  • Is the pre-adoption baseline (lead time, deploy frequency) recorded?
  • Is the DORA-plus-regulatory scorecard reported on a regular cadence?
  • Is it publicly stated that metrics are not used for team evaluation?

Closing

Doing platform engineering at a bank ultimately means becoming a translator between two worlds. To developers, the platform is "the experience of a three-week procedure shrinking to a day." To the security team, it is "the structure that shrinks the inspection surface from 300 to 5." To executives, it is "the investment that converts compliance cost into automation." The same platform must be explained in three languages.

The marriage of Backstage and governance is not a metaphor but an implementation task: putting regulatory requirements into templates, wiring approvals into the Scaffolder, turning the catalog into the asset ledger. It is unglamorous plumbing — but in organizations where this plumbing is done, you watch the conventional wisdom about development speed in regulated industries fall apart.

References