Skip to content
Published on

Platform Engineering and Building an Internal Developer Platform with Backstage: A Practical Guide

Authors
  • Name
    Twitter
Platform Engineering Backstage

Introduction - Why Platform Engineering Is Replacing Traditional DevOps

The DevOps principle of "You build it, you run it" gave developers autonomy, but also imposed an enormous cognitive load. Developers found themselves writing Kubernetes manifests, configuring CI/CD pipelines, setting up monitoring dashboards, and handling infrastructure provisioning all on their own. Gartner predicted that 80% of software engineering organizations would establish Platform Engineering teams by 2026, and that trend has become reality.

Platform Engineering is a discipline that builds Internal Developer Platforms (IDPs) with self-service capabilities to reduce developer cognitive load and accelerate software delivery across the organization. While DevOps focused on "culture and practices," Platform Engineering focuses on "platform as a product." Developers are the IDP's customers, and the platform team develops and operates this product.

This article covers the entire process of building an IDP based on Backstage, a CNCF Incubating project. It provides a practice-oriented guide from Software Catalog configuration, Golden Path Template design, and plugin development to failure cases and recovery procedures encountered during operations.

Internal Developer Platform Concepts and Components

What Is an IDP?

An Internal Developer Platform (IDP) is a platform that provides a self-service interface so developers can focus on code without worrying about infrastructure and operational complexity. It integrates infrastructure provisioning, deployment, monitoring, and documentation through a unified interface.

The core components of an IDP are:

  • Service Catalog: Centralized management of metadata for all services, APIs, resources, and teams in the organization
  • Self-Service Portal: Developers directly provision environments without infrastructure requests
  • Golden Path Template: Project scaffolding templates that conform to organizational standards
  • Documentation Hub: Auto-rendering of per-service technical docs linked to code repositories
  • Integration Layer: Integration with CI/CD, monitoring, and incident management tools

IDP vs Developer Portal

IDP and Developer Portal are often used interchangeably, but they are strictly different. A Developer Portal is the frontend layer of an IDP. The IDP is a broader concept that includes the automation layer, infrastructure abstraction, policy engine, and workflow orchestration behind the Portal.

┌──────────────────────────────────────────────────────┐
Developer Portal (UI)│   ┌───────────┐ ┌───────────┐ ┌───────────────────┐  │
│   │  Catalog  │ │  Scaffolder│TechDocs         │  │
│   └───────────┘ └───────────┘ └───────────────────┘  │
├──────────────────────────────────────────────────────┤
Internal Developer Platform│   ┌───────────┐ ┌───────────┐ ┌───────────────────┐  │
│   │ IaC Engine│CI/CD     │ │ Policy Engine     │  │
 (Terraform│  (GitHub (OPA/Kyverno)     │  │
│   │  Crossplane│Actions) │ │                   │  │
│   └───────────┘ └───────────┘ └───────────────────┘  │
│   ┌───────────┐ ┌───────────┐ ┌───────────────────┐  │
│   │ K8s       │ │ Observ-   │ │ Secret Mgmt       │  │
│   │ Clusters  │ │ ability   │  (Vault)           │  │
│   └───────────┘ └───────────┘ └───────────────────┘  │
└──────────────────────────────────────────────────────┘

Backstage Architecture and Core Features

What Is Backstage?

Backstage is a project that Spotify open-sourced in 2020 from the Developer Portal it used internally. It was promoted to a CNCF Incubating project in 2024, and as of 2026, it is the most actively used IDP framework in the community.

Four Core Features

  1. Software Catalog: Register and search all software assets in the organization (services, libraries, pipelines, infrastructure, etc.) using YAML-based metadata. You can see service dependencies, ownership, and API specifications at a glance.

  2. Software Templates (Scaffolder): Codify Golden Paths. When a developer enters a few parameters in the UI, a project conforming to organizational standards is automatically generated, including Git repository creation, CI/CD pipeline setup, and K8s namespace provisioning -- all in one step.

  3. TechDocs: Automatically build MkDocs-based technical documentation from service repositories using a docs-as-code approach and render it in the Backstage UI. Documentation stays up-to-date because it is managed alongside the code.

  4. Plugins: The core of Backstage's extensibility. Over 200 community plugins exist, integrating with GitHub, GitLab, PagerDuty, Datadog, ArgoCD, Kubernetes, and more. You can also develop your own plugins.

Architecture Structure

┌─────────────────────────────────────────────────┐
Backstage App (React)│  ┌─────────┐ ┌──────────┐ ┌─────────────────┐   │
│  │ Catalog │ │ Scaffolder│TechDocs        │   │
│  │ Plugin  │ │ Plugin   │ │ Plugin          │   │
│  └─────────┘ └──────────┘ └─────────────────┘   │
│  ┌─────────┐ ┌──────────┐ ┌─────────────────┐   │
│  │ K8s     │ │ CI/CD    │ │ Custom Plugins  │   │
│  │ Plugin  │ │ Plugin   │ │                 │   │
│  └─────────┘ └──────────┘ └─────────────────┘   │
├─────────────────────────────────────────────────┤
Backstage Backend (Node.js)│  ┌──────────┐ ┌───────────┐ ┌────────────────┐  │
│  │ Catalog  │ │ Scaffolder│Auth Provider  │  │
│  │ Backend  │ │ Backend (GitHub/Okta)  │  │
│  └──────────┘ └───────────┘ └────────────────┘  │
├─────────────────────────────────────────────────┤
Database (PostgreSQL) / Search└─────────────────────────────────────────────────┘

Backstage has a separated architecture of frontend (React SPA) and backend (Node.js). Frontend plugins are implemented as React components, and backend plugins as Express routers. PostgreSQL (production) or SQLite (development) is used as the data store.

Backstage Installation and Initial Setup

Project Creation

The first step is creating a Backstage app. Node.js 18 or higher and Yarn Classic (1.x) are required.

# Create Backstage app
npx @backstage/create-app@latest

# Project directory structure
# my-backstage-app/
# ├── app-config.yaml           # Main configuration file
# ├── app-config.production.yaml # Production overrides
# ├── catalog-info.yaml         # Backstage's own catalog entry
# ├── packages/
# │   ├── app/                  # Frontend (React)
# │   └── backend/              # Backend (Node.js)
# ├── plugins/                  # Custom plugins
# └── package.json

# Start local dev server
cd my-backstage-app
yarn dev

Core Configuration - app-config.yaml

app-config.yaml is Backstage's central configuration file. It defines the database, authentication, catalog sources, and integrations.

# app-config.yaml
app:
  title: 'ACME Developer Platform'
  baseUrl: http://localhost:3000

organization:
  name: 'ACME Corp'

backend:
  baseUrl: http://localhost:7007
  listen:
    port: 7007
  database:
    client: pg
    connection:
      host: ${POSTGRES_HOST}
      port: ${POSTGRES_PORT}
      user: ${POSTGRES_USER}
      password: ${POSTGRES_PASSWORD}

# GitHub integration settings
integrations:
  github:
    - host: github.com
      token: ${GITHUB_TOKEN}

# Authentication provider settings
auth:
  environment: production
  providers:
    github:
      production:
        clientId: ${AUTH_GITHUB_CLIENT_ID}
        clientSecret: ${AUTH_GITHUB_CLIENT_SECRET}

# Catalog source registration
catalog:
  import:
    entityFilename: catalog-info.yaml
    pullRequestBranchName: backstage-integration
  rules:
    - allow: [Component, System, API, Resource, Location, Template, Group, User]
  locations:
    # Auto-discover all GitHub repos in the organization
    - type: github-discovery
      target: https://github.com/acme-corp/*/blob/main/catalog-info.yaml
    # Register templates
    - type: file
      target: ../../templates/all-templates.yaml
    # Sync organization structure
    - type: github-org
      target: https://github.com/acme-corp

Note: GITHUB_TOKEN requires repo, read:org, and read:user permissions. When using Fine-grained Tokens, you must explicitly grant Contents and Metadata read permissions for target repositories. Insufficient token permissions will cause NotFoundError during catalog registration.

Production Deployment Configuration

# app-config.production.yaml
app:
  baseUrl: https://developer.acme.com

backend:
  baseUrl: https://developer-api.acme.com
  cors:
    origin: https://developer.acme.com

# Configure TechDocs with external storage
techdocs:
  builder: 'external'
  generator:
    runIn: 'local'
  publisher:
    type: 'awsS3'
    awsS3:
      bucketName: 'acme-techdocs'
      region: 'ap-northeast-2'

Software Catalog Configuration

Writing catalog-info.yaml

All software entities are registered in Backstage through a catalog-info.yaml file. This file is placed at the root of the service repository and defines the service's metadata, ownership, dependencies, and API specifications.

# catalog-info.yaml - Microservice registration example
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: order-service
  title: 'Order Service'
  description: 'Core domain service responsible for order creation, payment processing, and inventory deduction'
  annotations:
    # GitHub integration
    github.com/project-slug: acme-corp/order-service
    # CI/CD integration
    backstage.io/techdocs-ref: dir:.
    github.com/workflows: build-and-deploy.yaml
    # Kubernetes integration
    backstage.io/kubernetes-id: order-service
    backstage.io/kubernetes-namespace: order
    # PagerDuty incident integration
    pagerduty.com/service-id: PXXXXXX
    # Datadog dashboard
    datadoghq.com/dashboard-url: https://app.datadoghq.com/dashboard/xxx
  tags:
    - java
    - spring-boot
    - grpc
  links:
    - url: https://grafana.acme.com/d/order-svc
      title: 'Grafana Dashboard'
      icon: dashboard
    - url: https://wiki.acme.com/order-domain
      title: 'Domain Wiki'
      icon: docs
spec:
  type: service
  lifecycle: production
  owner: team-order
  system: ecommerce-platform
  providesApis:
    - order-api
  consumesApis:
    - inventory-api
    - payment-api
  dependsOn:
    - resource:order-database
    - resource:order-redis
---
# API specification registration
apiVersion: backstage.io/v1alpha1
kind: API
metadata:
  name: order-api
  description: 'Order processing gRPC API'
spec:
  type: grpc
  lifecycle: production
  owner: team-order
  system: ecommerce-platform
  definition:
    $text: ./proto/order.proto
---
# Database resource registration
apiVersion: backstage.io/v1alpha1
kind: Resource
metadata:
  name: order-database
  description: 'Order Service PostgreSQL database'
spec:
  type: database
  owner: team-order
  system: ecommerce-platform

Entity Relationship Structure

Backstage's entity model has a hierarchical structure:

Entity KindRoleExample
DomainBusiness domain areacommerce, logistics
SystemLogical group of related componentsecommerce-platform
ComponentIndividual software unit (service, library, website)order-service
APIInterface provided by a componentorder-api (gRPC/REST/GraphQL)
ResourceInfrastructure resourceorder-database, order-redis
GroupTeam/organizational unitteam-order
UserIndividual userjane.doe

Through this relationship structure, you can instantly determine "who owns this service, what APIs it provides, what infrastructure it depends on, and who to contact during an incident." In microservice environments where services grow to hundreds, this visibility is critical.

Creating Golden Path Templates

A Golden Path codifies the organization's recommended "right way to start." When developers begin a new service, they can automatically generate a project with CI/CD, testing, monitoring, and security settings built in by default.

Writing a Scaffolder Template

# templates/spring-boot-service/template.yaml
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
  name: spring-boot-grpc-service
  title: 'Spring Boot gRPC Microservice'
  description: |
    Creates a Spring Boot 3.x + gRPC-based microservice project.
    Includes: Dockerfile, Helm Chart, GitHub Actions CI/CD,
    Prometheus metrics, Health Check, catalog-info.yaml
  tags:
    - java
    - spring-boot
    - grpc
    - recommended
spec:
  owner: platform-team
  type: service

  # Define user input parameters
  parameters:
    - title: 'Service Basic Information'
      required:
        - serviceName
        - owner
        - system
      properties:
        serviceName:
          title: 'Service Name'
          type: string
          description: 'Lowercase letters and hyphens only (e.g., order-service)'
          pattern: '^[a-z][a-z0-9-]*$'
          ui:autofocus: true
        description:
          title: 'Service Description'
          type: string
        owner:
          title: 'Owning Team'
          type: string
          ui:field: OwnerPicker
          ui:options:
            catalogFilter:
              kind: Group
        system:
          title: 'Parent System'
          type: string
          ui:field: EntityPicker
          ui:options:
            catalogFilter:
              kind: System

    - title: 'Technical Options'
      properties:
        javaVersion:
          title: 'Java Version'
          type: string
          enum: ['17', '21']
          default: '21'
        database:
          title: 'Database'
          type: string
          enum: ['postgresql', 'mysql', 'none']
          default: 'postgresql'
        enableKafka:
          title: 'Kafka Integration'
          type: boolean
          default: false

    - title: 'Infrastructure Settings'
      properties:
        namespace:
          title: 'K8s Namespace'
          type: string
          default: 'default'
        cluster:
          title: 'Deployment Cluster'
          type: string
          enum: ['dev', 'staging', 'production']
          default: 'dev'

  # Define execution steps
  steps:
    # 1. Generate project code from template
    - id: fetch-template
      name: 'Generate Project Code'
      action: fetch:template
      input:
        url: ./skeleton
        values:
          serviceName: ${{ parameters.serviceName }}
          description: ${{ parameters.description }}
          owner: ${{ parameters.owner }}
          system: ${{ parameters.system }}
          javaVersion: ${{ parameters.javaVersion }}
          database: ${{ parameters.database }}
          enableKafka: ${{ parameters.enableKafka }}
          namespace: ${{ parameters.namespace }}

    # 2. Create GitHub repository
    - id: publish-github
      name: 'Create GitHub Repository'
      action: publish:github
      input:
        allowedHosts: ['github.com']
        repoUrl: github.com?owner=acme-corp&repo=${{ parameters.serviceName }}
        description: ${{ parameters.description }}
        defaultBranch: main
        protectDefaultBranch: true
        requireCodeOwnerReviews: true
        repoVisibility: internal

    # 3. Register in Backstage catalog
    - id: register-catalog
      name: 'Register in Catalog'
      action: catalog:register
      input:
        repoContentsUrl: ${{ steps['publish-github'].output.repoContentsUrl }}
        catalogInfoPath: /catalog-info.yaml

    # 4. Create ArgoCD Application
    - id: create-argocd-app
      name: 'Configure ArgoCD Deployment'
      action: argocd:create-resources
      input:
        appName: ${{ parameters.serviceName }}
        argoInstance: main
        namespace: ${{ parameters.namespace }}
        repoUrl: ${{ steps['publish-github'].output.remoteUrl }}
        path: deploy/helm

  # Post-completion guidance
  output:
    links:
      - title: 'GitHub Repository'
        url: ${{ steps['publish-github'].output.remoteUrl }}
      - title: 'View in Catalog'
        icon: catalog
        entityRef: ${{ steps['register-catalog'].output.entityRef }}

With this single template, a developer can select a service name and a few options from the UI, and everything from GitHub repository creation to CI/CD pipeline setup, K8s deployment, and catalog registration is completed within 5 minutes. New service onboarding that previously took an average of 3 days is dramatically shortened.

Plugin Development and Integration

Creating Custom Plugins

Use the Backstage CLI to scaffold frontend or backend plugins.

# Create frontend plugin
cd my-backstage-app
yarn new --select plugin

# Create backend plugin
yarn new --select backend-plugin

# Generated plugin structure
# plugins/
# └── my-custom-plugin/
#     ├── src/
#     │   ├── components/
#     │   │   └── ExampleComponent/
#     │   ├── plugin.ts          # Plugin definition
#     │   ├── routes.ts          # Routing configuration
#     │   └── index.ts
#     ├── dev/                   # Standalone dev environment
#     │   └── index.tsx
#     └── package.json

Frontend Plugin Implementation Example

Here is an example plugin that shows a dashboard of per-team service health status.

// plugins/team-health-dashboard/src/plugin.ts
import { createPlugin, createRoutableExtension } from '@backstage/core-plugin-api'

export const teamHealthDashboardPlugin = createPlugin({
  id: 'team-health-dashboard',
  routes: {
    root: rootRouteRef,
  },
})

export const TeamHealthDashboardPage = teamHealthDashboardPlugin.provide(
  createRoutableExtension({
    name: 'TeamHealthDashboardPage',
    component: () => import('./components/DashboardPage').then((m) => m.DashboardPage),
    mountPoint: rootRouteRef,
  })
)
// plugins/team-health-dashboard/src/components/DashboardPage.tsx
import React, { useEffect, useState } from 'react';
import {
  Table,
  TableColumn,
  StatusOK,
  StatusError,
  StatusWarning,
  Progress,
  ResponseErrorPanel,
} from '@backstage/core-components';
import { useApi, configApiRef } from '@backstage/core-plugin-api';
import { catalogApiRef } from '@backstage/plugin-catalog-react';

interface ServiceHealth {
  name: string;
  owner: string;
  status: 'healthy' | 'degraded' | 'down';
  uptime: number;
  lastIncident: string;
  errorRate: number;
}

const columns: TableColumn<ServiceHealth>[] = [
  { title: 'Service', field: 'name' },
  { title: 'Owner', field: 'owner' },
  {
    title: 'Status',
    field: 'status',
    render: (row: ServiceHealth) => {
      switch (row.status) {
        case 'healthy':
          return <StatusOK>Healthy</StatusOK>;
        case 'degraded':
          return <StatusWarning>Degraded</StatusWarning>;
        case 'down':
          return <StatusError>Down</StatusError>;
        default:
          return null;
      }
    },
  },
  { title: 'Uptime (%)', field: 'uptime', type: 'numeric' },
  { title: 'Error Rate (%)', field: 'errorRate', type: 'numeric' },
  { title: 'Last Incident', field: 'lastIncident' },
];

export const DashboardPage = () => {
  const catalogApi = useApi(catalogApiRef);
  const [services, setServices] = useState<ServiceHealth[]>([]);
  const [loading, setLoading] = useState(true);
  const [error, setError] = useState<Error>();

  useEffect(() => {
    const fetchData = async () => {
      try {
        const entities = await catalogApi.getEntities({
          filter: { kind: 'Component', 'spec.type': 'service' },
        });

        // Fetch health data for each service from Prometheus/Datadog API
        const healthData = await Promise.all(
          entities.items.map(async entity => {
            const metrics = await fetchServiceMetrics(
              entity.metadata.name,
            );
            return {
              name: entity.metadata.name,
              owner:
                entity.spec?.owner?.toString() ?? 'unknown',
              status: metrics.status,
              uptime: metrics.uptime,
              lastIncident: metrics.lastIncident,
              errorRate: metrics.errorRate,
            };
          }),
        );

        setServices(healthData);
      } catch (e) {
        setError(e as Error);
      } finally {
        setLoading(false);
      }
    };

    fetchData();
  }, [catalogApi]);

  if (loading) return <Progress />;
  if (error) return <ResponseErrorPanel error={error} />;

  return (
    <Table
      title="Service Health Dashboard"
      columns={columns}
      data={services}
      options={{
        sorting: true,
        paging: true,
        pageSize: 20,
        search: true,
      }}
    />
  );
};

Key Plugin Integration List

Frequently integrated plugins and their roles in production IDPs:

PluginPurposeConfiguration Point
@backstage/plugin-kubernetesReal-time Pod status, logs, event viewingServiceAccount, RBAC settings
@backstage/plugin-techdocsdocs-as-code based doc renderingMkDocs config, S3/GCS publisher
@roadiehq/backstage-plugin-github-insightsPR status, contributor statsGitHub App token
@backstage/plugin-catalog-importRegister catalog-info.yaml from UICatalog rules
@backstage-community/plugin-cost-insightsCloud cost visualizationCost API integration
@pagerduty/backstage-pluginOn-call status, incident listPagerDuty API key
@backstage/plugin-scaffolder-backend-module-githubAutomatic GitHub repository creationGitHub App permissions

IDP Tool Comparison

Besides Backstage, several IDP tools exist in the market. The right tool varies based on organizational size, technical capability, and budget.

ItemBackstagePortCortexOpsLevel
LicenseApache 2.0 (Open Source)SaaS (free tier available)SaaSSaaS
HostingSelf-hostedCloud managedCloud managedCloud managed
CustomizationVery high (plugin dev)Medium (widgets/blueprints)LowMedium
Initial Setup CostHigh (dedicated team needed)LowLowLow
Operations BurdenHigh (upgrades, security)None (SaaS)NoneNone
Service CatalogYAML-based, powerfulUI-based, intuitiveAuto-discoveryAuto-discovery
ScaffoldingBuilt-in (Scaffolder)Self-service actionsLimitedService templates
Technical DocsTechDocs built-inExternal integrationExternal integrationExternal integration
ScorecardsAdded via pluginBuilt-inCore featureBuilt-in
Best ForLarge, highly technical teamsSmall-medium, fast adoptionService maturity focusMid-size orgs
Price (50 users)Free (infra/personnel costs separate)~$2,000/month~$5,000/month~$3,000/month

Selection Criteria Summary: If you have a dedicated Platform Engineering team and need high customization, choose Backstage. For fast adoption and low operational burden, choose Port. If you want to focus on service maturity measurement and standards compliance, Cortex is the right fit.

Operational Considerations

Tracking Adoption Metrics

Building an IDP is easier than getting developers to actually use it. Track the following metrics to measure adoption:

MetricMeasurement MethodTarget
Catalog CoverageRegistered services / Total servicesOver 95%
Template Usage RateServices created via template / Total new servicesOver 80%
DAU/WAUBackstage daily/weekly active usersOver 60% of developers
Onboarding TimeTime from new service creation to first deploymentUnder 1 hour
MTTR ImprovementTime from incident to responsible team awareness50% reduction from baseline
TechDocs FreshnessPercentage of docs updated within 30 daysOver 70%

Anti-Patterns to Avoid

  1. Big Bang Launch: Trying to launch all features at once leads to failure. Start with an MVP. The first step is just the Software Catalog. Registering all services in the catalog and visualizing ownership and dependencies alone provides significant value.

  2. Platform = Portal Misconception: If you build a pretty UI with no automation behind it, developers will quickly leave. The key is practical automation -- "one click to set up a K8s namespace and CI/CD."

  3. Ignoring Developer Feedback: A platform is a product. Without developer surveys, usage data analysis, and regular feedback sessions, the platform team building features they think are "nice to have" will see adoption rates drop.

  4. Neglecting Backstage Version Upgrades: Backstage has a fast release cycle (1-2 times per month). Postponing upgrades for over 6 months makes migration extremely difficult. Run backstage-cli versions:bump regularly.

  5. Forced Adoption: If the IDP devolves into a management tool, it breeds developer resentment. An IDP should improve developer experience (DX). The message should not be "use this or face consequences" but rather "use this and what took 3 days now takes 5 minutes."

Backstage Upgrade Procedure

# 1. Check current version
yarn backstage-cli info

# 2. Version bump (auto-update dependencies)
yarn backstage-cli versions:bump

# 3. Review changes
git diff

# 4. Type check
yarn tsc

# 5. Run tests
yarn test

# 6. Verify build
yarn build

# 7. Check migration guide
# https://backstage.io/docs/getting-started/keeping-backstage-updated

# Recommended: Auto-generate upgrade PRs in CI
# .github/workflows/backstage-upgrade.yaml
# Runs weekly on Monday to bump versions and create PR

Warning: Always commit current changes before running versions:bump. Upgrades may include changes to package.json, yarn.lock, and sometimes code migration for breaking changes. It is recommended to verify in a staging environment first before applying to production.

Failure Cases and Recovery Procedures

Case 1: Catalog Entity Sync Failure

Symptom: catalog-info.yaml exists on GitHub but does not appear in the Backstage UI

Root Cause Analysis:

  • GitHub token expired or insufficient permissions (most common)
  • YAML syntax error (indentation, invalid kind value)
  • The Kind is not in the allow list in catalog.rules
  • Network issues causing GitHub API call failures

Recovery Procedure:

  1. Check Backstage logs: kubectl logs -l app=backstage -c backstage --tail=100
  2. Validate token: curl -H "Authorization: token ${GITHUB_TOKEN}" https://api.github.com/user
  3. Validate YAML: npx @backstage/cli catalog-info validate catalog-info.yaml
  4. Manual refresh: Click the "Refresh" button on the entity page in the Backstage UI
  5. If cache reset is needed: Restart the Backstage Pod

Case 2: Scaffolder Template Execution Fails During GitHub Repository Creation

Symptom: RequestError: HttpError: Resource not accessible by integration error at the "publish:github" step

Root Cause Analysis:

  • Missing Repository: Administration permission on the GitHub App
  • Blocked by the organization's repository creation policy
  • A repository with the same name already exists

Recovery Procedure:

  1. Check and update GitHub App permissions
  2. Verify the App is allowed to create repositories in the organization settings
  3. Check the failed task logs in the Backstage UI (Scaffolder task log page)
  4. If partially created resources (empty repos, etc.) exist, manually clean up and re-run

Case 3: PostgreSQL Connection Pool Exhaustion

Symptom: Backstage response time sharply degrades, intermittent ConnectionTimeoutError

Root Cause Analysis:

  • Catalog entities grew to thousands, exceeding available DB connections
  • Default connection pool size (10) cannot handle the traffic

Recovery Procedure and Prevention:

# app-config.yaml - Connection pool tuning
backend:
  database:
    client: pg
    connection:
      host: ${POSTGRES_HOST}
      port: ${POSTGRES_PORT}
      user: ${POSTGRES_USER}
      password: ${POSTGRES_PASSWORD}
    pool:
      min: 5
      max: 30
      acquireTimeoutMillis: 60000
      idleTimeoutMillis: 30000

Case 4: TechDocs Build Failure

Symptom: TechDocs tab on the service page is empty or shows a build error

Root Cause Analysis:

  • mkdocs.yml file is missing from the repository
  • Python dependency (mkdocs-techdocs-core) installation failure
  • S3 bucket permission issues (when using external publisher)

Recovery Procedure:

  1. Verify that mkdocs.yml and docs/index.md exist in the repository
  2. Test the build locally with npx @techdocs/cli generate
  3. Check IAM permissions when using an external publisher
  4. Change techdocs.builder to local so Backstage builds directly (for small environments)

Case 5: Plugin Compatibility Break After Backstage Upgrade

Symptom: White screen or runtime error on specific plugin pages after upgrade

Root Cause Analysis:

  • Version mismatch between Backstage core packages and plugins
  • API was deprecated and then removed
  • Migration to the New Backend System is required

Recovery Procedure:

  1. Immediately roll back to the previous version (Helm rollback or deploy with the previous image tag)
  2. Check error messages in the browser console
  3. Search the plugin's GitHub issues
  4. Update the plugin to the latest compatible version, or temporarily disable it
  5. Re-deploy to production after verifying in staging

Checklists

IDP Build Preparation Checklist

  • Form a dedicated Platform Engineering team (minimum 2-3 people)
  • Conduct developer pain point survey (identify areas with highest cognitive load)
  • Decide on the first MVP scope for the IDP (recommended: start with Software Catalog)
  • Complete comparative evaluation of Backstage vs SaaS tools
  • Provision PostgreSQL instance (production)
  • Create and configure permissions for GitHub App or OAuth App
  • Plan SSO provider integration (Okta, Azure AD, Google, etc.)

Backstage Operations Checklist

  • Separate app-config.production.yaml and manage secrets (Vault, K8s Secret)
  • Configure HTTPS/TLS (Ingress or load balancer)
  • Set up database backup schedule
  • Configure monitoring (Backstage metrics + infrastructure metrics)
  • Automate weekly version upgrade workflow
  • Catalog entity validation CI (validate catalog-info.yaml in PRs)
  • Write developer onboarding guide documentation
  • Plan quarterly developer satisfaction surveys

Catalog Registration Checklist (Per Service)

  • Write catalog-info.yaml and place it at the repository root
  • Verify metadata.name follows organizational naming conventions
  • Set spec.owner to the correct team group
  • Assign spec.system to the correct system
  • Register API specifications (OpenAPI, gRPC, AsyncAPI)
  • Add CI/CD, monitoring, and incident tool integration info in annotations
  • Configure TechDocs (mkdocs.yml + docs/ directory)
  • Set up Kubernetes annotations (workload mapping within the cluster)

References