- Authors
- Name
- Introduction - Why Platform Engineering Is Replacing Traditional DevOps
- Internal Developer Platform Concepts and Components
- Backstage Architecture and Core Features
- Backstage Installation and Initial Setup
- Software Catalog Configuration
- Creating Golden Path Templates
- Plugin Development and Integration
- IDP Tool Comparison
- Operational Considerations
- Failure Cases and Recovery Procedures
- Checklists
- References

Introduction - Why Platform Engineering Is Replacing Traditional DevOps
The DevOps principle of "You build it, you run it" gave developers autonomy, but also imposed an enormous cognitive load. Developers found themselves writing Kubernetes manifests, configuring CI/CD pipelines, setting up monitoring dashboards, and handling infrastructure provisioning all on their own. Gartner predicted that 80% of software engineering organizations would establish Platform Engineering teams by 2026, and that trend has become reality.
Platform Engineering is a discipline that builds Internal Developer Platforms (IDPs) with self-service capabilities to reduce developer cognitive load and accelerate software delivery across the organization. While DevOps focused on "culture and practices," Platform Engineering focuses on "platform as a product." Developers are the IDP's customers, and the platform team develops and operates this product.
This article covers the entire process of building an IDP based on Backstage, a CNCF Incubating project. It provides a practice-oriented guide from Software Catalog configuration, Golden Path Template design, and plugin development to failure cases and recovery procedures encountered during operations.
Internal Developer Platform Concepts and Components
What Is an IDP?
An Internal Developer Platform (IDP) is a platform that provides a self-service interface so developers can focus on code without worrying about infrastructure and operational complexity. It integrates infrastructure provisioning, deployment, monitoring, and documentation through a unified interface.
The core components of an IDP are:
- Service Catalog: Centralized management of metadata for all services, APIs, resources, and teams in the organization
- Self-Service Portal: Developers directly provision environments without infrastructure requests
- Golden Path Template: Project scaffolding templates that conform to organizational standards
- Documentation Hub: Auto-rendering of per-service technical docs linked to code repositories
- Integration Layer: Integration with CI/CD, monitoring, and incident management tools
IDP vs Developer Portal
IDP and Developer Portal are often used interchangeably, but they are strictly different. A Developer Portal is the frontend layer of an IDP. The IDP is a broader concept that includes the automation layer, infrastructure abstraction, policy engine, and workflow orchestration behind the Portal.
┌──────────────────────────────────────────────────────┐
│ Developer Portal (UI) │
│ ┌───────────┐ ┌───────────┐ ┌───────────────────┐ │
│ │ Catalog │ │ Scaffolder│ │ TechDocs │ │
│ └───────────┘ └───────────┘ └───────────────────┘ │
├──────────────────────────────────────────────────────┤
│ Internal Developer Platform │
│ ┌───────────┐ ┌───────────┐ ┌───────────────────┐ │
│ │ IaC Engine│ │ CI/CD │ │ Policy Engine │ │
│ │ (Terraform│ │ (GitHub │ │ (OPA/Kyverno) │ │
│ │ Crossplane│ │ Actions) │ │ │ │
│ └───────────┘ └───────────┘ └───────────────────┘ │
│ ┌───────────┐ ┌───────────┐ ┌───────────────────┐ │
│ │ K8s │ │ Observ- │ │ Secret Mgmt │ │
│ │ Clusters │ │ ability │ │ (Vault) │ │
│ └───────────┘ └───────────┘ └───────────────────┘ │
└──────────────────────────────────────────────────────┘
Backstage Architecture and Core Features
What Is Backstage?
Backstage is a project that Spotify open-sourced in 2020 from the Developer Portal it used internally. It was promoted to a CNCF Incubating project in 2024, and as of 2026, it is the most actively used IDP framework in the community.
Four Core Features
Software Catalog: Register and search all software assets in the organization (services, libraries, pipelines, infrastructure, etc.) using YAML-based metadata. You can see service dependencies, ownership, and API specifications at a glance.
Software Templates (Scaffolder): Codify Golden Paths. When a developer enters a few parameters in the UI, a project conforming to organizational standards is automatically generated, including Git repository creation, CI/CD pipeline setup, and K8s namespace provisioning -- all in one step.
TechDocs: Automatically build MkDocs-based technical documentation from service repositories using a docs-as-code approach and render it in the Backstage UI. Documentation stays up-to-date because it is managed alongside the code.
Plugins: The core of Backstage's extensibility. Over 200 community plugins exist, integrating with GitHub, GitLab, PagerDuty, Datadog, ArgoCD, Kubernetes, and more. You can also develop your own plugins.
Architecture Structure
┌─────────────────────────────────────────────────┐
│ Backstage App (React) │
│ ┌─────────┐ ┌──────────┐ ┌─────────────────┐ │
│ │ Catalog │ │ Scaffolder│ │ TechDocs │ │
│ │ Plugin │ │ Plugin │ │ Plugin │ │
│ └─────────┘ └──────────┘ └─────────────────┘ │
│ ┌─────────┐ ┌──────────┐ ┌─────────────────┐ │
│ │ K8s │ │ CI/CD │ │ Custom Plugins │ │
│ │ Plugin │ │ Plugin │ │ │ │
│ └─────────┘ └──────────┘ └─────────────────┘ │
├─────────────────────────────────────────────────┤
│ Backstage Backend (Node.js) │
│ ┌──────────┐ ┌───────────┐ ┌────────────────┐ │
│ │ Catalog │ │ Scaffolder│ │ Auth Provider │ │
│ │ Backend │ │ Backend │ │ (GitHub/Okta) │ │
│ └──────────┘ └───────────┘ └────────────────┘ │
├─────────────────────────────────────────────────┤
│ Database (PostgreSQL) / Search │
└─────────────────────────────────────────────────┘
Backstage has a separated architecture of frontend (React SPA) and backend (Node.js). Frontend plugins are implemented as React components, and backend plugins as Express routers. PostgreSQL (production) or SQLite (development) is used as the data store.
Backstage Installation and Initial Setup
Project Creation
The first step is creating a Backstage app. Node.js 18 or higher and Yarn Classic (1.x) are required.
# Create Backstage app
npx @backstage/create-app@latest
# Project directory structure
# my-backstage-app/
# ├── app-config.yaml # Main configuration file
# ├── app-config.production.yaml # Production overrides
# ├── catalog-info.yaml # Backstage's own catalog entry
# ├── packages/
# │ ├── app/ # Frontend (React)
# │ └── backend/ # Backend (Node.js)
# ├── plugins/ # Custom plugins
# └── package.json
# Start local dev server
cd my-backstage-app
yarn dev
Core Configuration - app-config.yaml
app-config.yaml is Backstage's central configuration file. It defines the database, authentication, catalog sources, and integrations.
# app-config.yaml
app:
title: 'ACME Developer Platform'
baseUrl: http://localhost:3000
organization:
name: 'ACME Corp'
backend:
baseUrl: http://localhost:7007
listen:
port: 7007
database:
client: pg
connection:
host: ${POSTGRES_HOST}
port: ${POSTGRES_PORT}
user: ${POSTGRES_USER}
password: ${POSTGRES_PASSWORD}
# GitHub integration settings
integrations:
github:
- host: github.com
token: ${GITHUB_TOKEN}
# Authentication provider settings
auth:
environment: production
providers:
github:
production:
clientId: ${AUTH_GITHUB_CLIENT_ID}
clientSecret: ${AUTH_GITHUB_CLIENT_SECRET}
# Catalog source registration
catalog:
import:
entityFilename: catalog-info.yaml
pullRequestBranchName: backstage-integration
rules:
- allow: [Component, System, API, Resource, Location, Template, Group, User]
locations:
# Auto-discover all GitHub repos in the organization
- type: github-discovery
target: https://github.com/acme-corp/*/blob/main/catalog-info.yaml
# Register templates
- type: file
target: ../../templates/all-templates.yaml
# Sync organization structure
- type: github-org
target: https://github.com/acme-corp
Note:
GITHUB_TOKENrequiresrepo,read:org, andread:userpermissions. When using Fine-grained Tokens, you must explicitly grant Contents and Metadata read permissions for target repositories. Insufficient token permissions will causeNotFoundErrorduring catalog registration.
Production Deployment Configuration
# app-config.production.yaml
app:
baseUrl: https://developer.acme.com
backend:
baseUrl: https://developer-api.acme.com
cors:
origin: https://developer.acme.com
# Configure TechDocs with external storage
techdocs:
builder: 'external'
generator:
runIn: 'local'
publisher:
type: 'awsS3'
awsS3:
bucketName: 'acme-techdocs'
region: 'ap-northeast-2'
Software Catalog Configuration
Writing catalog-info.yaml
All software entities are registered in Backstage through a catalog-info.yaml file. This file is placed at the root of the service repository and defines the service's metadata, ownership, dependencies, and API specifications.
# catalog-info.yaml - Microservice registration example
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
name: order-service
title: 'Order Service'
description: 'Core domain service responsible for order creation, payment processing, and inventory deduction'
annotations:
# GitHub integration
github.com/project-slug: acme-corp/order-service
# CI/CD integration
backstage.io/techdocs-ref: dir:.
github.com/workflows: build-and-deploy.yaml
# Kubernetes integration
backstage.io/kubernetes-id: order-service
backstage.io/kubernetes-namespace: order
# PagerDuty incident integration
pagerduty.com/service-id: PXXXXXX
# Datadog dashboard
datadoghq.com/dashboard-url: https://app.datadoghq.com/dashboard/xxx
tags:
- java
- spring-boot
- grpc
links:
- url: https://grafana.acme.com/d/order-svc
title: 'Grafana Dashboard'
icon: dashboard
- url: https://wiki.acme.com/order-domain
title: 'Domain Wiki'
icon: docs
spec:
type: service
lifecycle: production
owner: team-order
system: ecommerce-platform
providesApis:
- order-api
consumesApis:
- inventory-api
- payment-api
dependsOn:
- resource:order-database
- resource:order-redis
---
# API specification registration
apiVersion: backstage.io/v1alpha1
kind: API
metadata:
name: order-api
description: 'Order processing gRPC API'
spec:
type: grpc
lifecycle: production
owner: team-order
system: ecommerce-platform
definition:
$text: ./proto/order.proto
---
# Database resource registration
apiVersion: backstage.io/v1alpha1
kind: Resource
metadata:
name: order-database
description: 'Order Service PostgreSQL database'
spec:
type: database
owner: team-order
system: ecommerce-platform
Entity Relationship Structure
Backstage's entity model has a hierarchical structure:
| Entity Kind | Role | Example |
|---|---|---|
| Domain | Business domain area | commerce, logistics |
| System | Logical group of related components | ecommerce-platform |
| Component | Individual software unit (service, library, website) | order-service |
| API | Interface provided by a component | order-api (gRPC/REST/GraphQL) |
| Resource | Infrastructure resource | order-database, order-redis |
| Group | Team/organizational unit | team-order |
| User | Individual user | jane.doe |
Through this relationship structure, you can instantly determine "who owns this service, what APIs it provides, what infrastructure it depends on, and who to contact during an incident." In microservice environments where services grow to hundreds, this visibility is critical.
Creating Golden Path Templates
A Golden Path codifies the organization's recommended "right way to start." When developers begin a new service, they can automatically generate a project with CI/CD, testing, monitoring, and security settings built in by default.
Writing a Scaffolder Template
# templates/spring-boot-service/template.yaml
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
name: spring-boot-grpc-service
title: 'Spring Boot gRPC Microservice'
description: |
Creates a Spring Boot 3.x + gRPC-based microservice project.
Includes: Dockerfile, Helm Chart, GitHub Actions CI/CD,
Prometheus metrics, Health Check, catalog-info.yaml
tags:
- java
- spring-boot
- grpc
- recommended
spec:
owner: platform-team
type: service
# Define user input parameters
parameters:
- title: 'Service Basic Information'
required:
- serviceName
- owner
- system
properties:
serviceName:
title: 'Service Name'
type: string
description: 'Lowercase letters and hyphens only (e.g., order-service)'
pattern: '^[a-z][a-z0-9-]*$'
ui:autofocus: true
description:
title: 'Service Description'
type: string
owner:
title: 'Owning Team'
type: string
ui:field: OwnerPicker
ui:options:
catalogFilter:
kind: Group
system:
title: 'Parent System'
type: string
ui:field: EntityPicker
ui:options:
catalogFilter:
kind: System
- title: 'Technical Options'
properties:
javaVersion:
title: 'Java Version'
type: string
enum: ['17', '21']
default: '21'
database:
title: 'Database'
type: string
enum: ['postgresql', 'mysql', 'none']
default: 'postgresql'
enableKafka:
title: 'Kafka Integration'
type: boolean
default: false
- title: 'Infrastructure Settings'
properties:
namespace:
title: 'K8s Namespace'
type: string
default: 'default'
cluster:
title: 'Deployment Cluster'
type: string
enum: ['dev', 'staging', 'production']
default: 'dev'
# Define execution steps
steps:
# 1. Generate project code from template
- id: fetch-template
name: 'Generate Project Code'
action: fetch:template
input:
url: ./skeleton
values:
serviceName: ${{ parameters.serviceName }}
description: ${{ parameters.description }}
owner: ${{ parameters.owner }}
system: ${{ parameters.system }}
javaVersion: ${{ parameters.javaVersion }}
database: ${{ parameters.database }}
enableKafka: ${{ parameters.enableKafka }}
namespace: ${{ parameters.namespace }}
# 2. Create GitHub repository
- id: publish-github
name: 'Create GitHub Repository'
action: publish:github
input:
allowedHosts: ['github.com']
repoUrl: github.com?owner=acme-corp&repo=${{ parameters.serviceName }}
description: ${{ parameters.description }}
defaultBranch: main
protectDefaultBranch: true
requireCodeOwnerReviews: true
repoVisibility: internal
# 3. Register in Backstage catalog
- id: register-catalog
name: 'Register in Catalog'
action: catalog:register
input:
repoContentsUrl: ${{ steps['publish-github'].output.repoContentsUrl }}
catalogInfoPath: /catalog-info.yaml
# 4. Create ArgoCD Application
- id: create-argocd-app
name: 'Configure ArgoCD Deployment'
action: argocd:create-resources
input:
appName: ${{ parameters.serviceName }}
argoInstance: main
namespace: ${{ parameters.namespace }}
repoUrl: ${{ steps['publish-github'].output.remoteUrl }}
path: deploy/helm
# Post-completion guidance
output:
links:
- title: 'GitHub Repository'
url: ${{ steps['publish-github'].output.remoteUrl }}
- title: 'View in Catalog'
icon: catalog
entityRef: ${{ steps['register-catalog'].output.entityRef }}
With this single template, a developer can select a service name and a few options from the UI, and everything from GitHub repository creation to CI/CD pipeline setup, K8s deployment, and catalog registration is completed within 5 minutes. New service onboarding that previously took an average of 3 days is dramatically shortened.
Plugin Development and Integration
Creating Custom Plugins
Use the Backstage CLI to scaffold frontend or backend plugins.
# Create frontend plugin
cd my-backstage-app
yarn new --select plugin
# Create backend plugin
yarn new --select backend-plugin
# Generated plugin structure
# plugins/
# └── my-custom-plugin/
# ├── src/
# │ ├── components/
# │ │ └── ExampleComponent/
# │ ├── plugin.ts # Plugin definition
# │ ├── routes.ts # Routing configuration
# │ └── index.ts
# ├── dev/ # Standalone dev environment
# │ └── index.tsx
# └── package.json
Frontend Plugin Implementation Example
Here is an example plugin that shows a dashboard of per-team service health status.
// plugins/team-health-dashboard/src/plugin.ts
import { createPlugin, createRoutableExtension } from '@backstage/core-plugin-api'
export const teamHealthDashboardPlugin = createPlugin({
id: 'team-health-dashboard',
routes: {
root: rootRouteRef,
},
})
export const TeamHealthDashboardPage = teamHealthDashboardPlugin.provide(
createRoutableExtension({
name: 'TeamHealthDashboardPage',
component: () => import('./components/DashboardPage').then((m) => m.DashboardPage),
mountPoint: rootRouteRef,
})
)
// plugins/team-health-dashboard/src/components/DashboardPage.tsx
import React, { useEffect, useState } from 'react';
import {
Table,
TableColumn,
StatusOK,
StatusError,
StatusWarning,
Progress,
ResponseErrorPanel,
} from '@backstage/core-components';
import { useApi, configApiRef } from '@backstage/core-plugin-api';
import { catalogApiRef } from '@backstage/plugin-catalog-react';
interface ServiceHealth {
name: string;
owner: string;
status: 'healthy' | 'degraded' | 'down';
uptime: number;
lastIncident: string;
errorRate: number;
}
const columns: TableColumn<ServiceHealth>[] = [
{ title: 'Service', field: 'name' },
{ title: 'Owner', field: 'owner' },
{
title: 'Status',
field: 'status',
render: (row: ServiceHealth) => {
switch (row.status) {
case 'healthy':
return <StatusOK>Healthy</StatusOK>;
case 'degraded':
return <StatusWarning>Degraded</StatusWarning>;
case 'down':
return <StatusError>Down</StatusError>;
default:
return null;
}
},
},
{ title: 'Uptime (%)', field: 'uptime', type: 'numeric' },
{ title: 'Error Rate (%)', field: 'errorRate', type: 'numeric' },
{ title: 'Last Incident', field: 'lastIncident' },
];
export const DashboardPage = () => {
const catalogApi = useApi(catalogApiRef);
const [services, setServices] = useState<ServiceHealth[]>([]);
const [loading, setLoading] = useState(true);
const [error, setError] = useState<Error>();
useEffect(() => {
const fetchData = async () => {
try {
const entities = await catalogApi.getEntities({
filter: { kind: 'Component', 'spec.type': 'service' },
});
// Fetch health data for each service from Prometheus/Datadog API
const healthData = await Promise.all(
entities.items.map(async entity => {
const metrics = await fetchServiceMetrics(
entity.metadata.name,
);
return {
name: entity.metadata.name,
owner:
entity.spec?.owner?.toString() ?? 'unknown',
status: metrics.status,
uptime: metrics.uptime,
lastIncident: metrics.lastIncident,
errorRate: metrics.errorRate,
};
}),
);
setServices(healthData);
} catch (e) {
setError(e as Error);
} finally {
setLoading(false);
}
};
fetchData();
}, [catalogApi]);
if (loading) return <Progress />;
if (error) return <ResponseErrorPanel error={error} />;
return (
<Table
title="Service Health Dashboard"
columns={columns}
data={services}
options={{
sorting: true,
paging: true,
pageSize: 20,
search: true,
}}
/>
);
};
Key Plugin Integration List
Frequently integrated plugins and their roles in production IDPs:
| Plugin | Purpose | Configuration Point |
|---|---|---|
@backstage/plugin-kubernetes | Real-time Pod status, logs, event viewing | ServiceAccount, RBAC settings |
@backstage/plugin-techdocs | docs-as-code based doc rendering | MkDocs config, S3/GCS publisher |
@roadiehq/backstage-plugin-github-insights | PR status, contributor stats | GitHub App token |
@backstage/plugin-catalog-import | Register catalog-info.yaml from UI | Catalog rules |
@backstage-community/plugin-cost-insights | Cloud cost visualization | Cost API integration |
@pagerduty/backstage-plugin | On-call status, incident list | PagerDuty API key |
@backstage/plugin-scaffolder-backend-module-github | Automatic GitHub repository creation | GitHub App permissions |
IDP Tool Comparison
Besides Backstage, several IDP tools exist in the market. The right tool varies based on organizational size, technical capability, and budget.
| Item | Backstage | Port | Cortex | OpsLevel |
|---|---|---|---|---|
| License | Apache 2.0 (Open Source) | SaaS (free tier available) | SaaS | SaaS |
| Hosting | Self-hosted | Cloud managed | Cloud managed | Cloud managed |
| Customization | Very high (plugin dev) | Medium (widgets/blueprints) | Low | Medium |
| Initial Setup Cost | High (dedicated team needed) | Low | Low | Low |
| Operations Burden | High (upgrades, security) | None (SaaS) | None | None |
| Service Catalog | YAML-based, powerful | UI-based, intuitive | Auto-discovery | Auto-discovery |
| Scaffolding | Built-in (Scaffolder) | Self-service actions | Limited | Service templates |
| Technical Docs | TechDocs built-in | External integration | External integration | External integration |
| Scorecards | Added via plugin | Built-in | Core feature | Built-in |
| Best For | Large, highly technical teams | Small-medium, fast adoption | Service maturity focus | Mid-size orgs |
| Price (50 users) | Free (infra/personnel costs separate) | ~$2,000/month | ~$5,000/month | ~$3,000/month |
Selection Criteria Summary: If you have a dedicated Platform Engineering team and need high customization, choose Backstage. For fast adoption and low operational burden, choose Port. If you want to focus on service maturity measurement and standards compliance, Cortex is the right fit.
Operational Considerations
Tracking Adoption Metrics
Building an IDP is easier than getting developers to actually use it. Track the following metrics to measure adoption:
| Metric | Measurement Method | Target |
|---|---|---|
| Catalog Coverage | Registered services / Total services | Over 95% |
| Template Usage Rate | Services created via template / Total new services | Over 80% |
| DAU/WAU | Backstage daily/weekly active users | Over 60% of developers |
| Onboarding Time | Time from new service creation to first deployment | Under 1 hour |
| MTTR Improvement | Time from incident to responsible team awareness | 50% reduction from baseline |
| TechDocs Freshness | Percentage of docs updated within 30 days | Over 70% |
Anti-Patterns to Avoid
Big Bang Launch: Trying to launch all features at once leads to failure. Start with an MVP. The first step is just the Software Catalog. Registering all services in the catalog and visualizing ownership and dependencies alone provides significant value.
Platform = Portal Misconception: If you build a pretty UI with no automation behind it, developers will quickly leave. The key is practical automation -- "one click to set up a K8s namespace and CI/CD."
Ignoring Developer Feedback: A platform is a product. Without developer surveys, usage data analysis, and regular feedback sessions, the platform team building features they think are "nice to have" will see adoption rates drop.
Neglecting Backstage Version Upgrades: Backstage has a fast release cycle (1-2 times per month). Postponing upgrades for over 6 months makes migration extremely difficult. Run
backstage-cli versions:bumpregularly.Forced Adoption: If the IDP devolves into a management tool, it breeds developer resentment. An IDP should improve developer experience (DX). The message should not be "use this or face consequences" but rather "use this and what took 3 days now takes 5 minutes."
Backstage Upgrade Procedure
# 1. Check current version
yarn backstage-cli info
# 2. Version bump (auto-update dependencies)
yarn backstage-cli versions:bump
# 3. Review changes
git diff
# 4. Type check
yarn tsc
# 5. Run tests
yarn test
# 6. Verify build
yarn build
# 7. Check migration guide
# https://backstage.io/docs/getting-started/keeping-backstage-updated
# Recommended: Auto-generate upgrade PRs in CI
# .github/workflows/backstage-upgrade.yaml
# Runs weekly on Monday to bump versions and create PR
Warning: Always commit current changes before running
versions:bump. Upgrades may include changes topackage.json,yarn.lock, and sometimes code migration for breaking changes. It is recommended to verify in a staging environment first before applying to production.
Failure Cases and Recovery Procedures
Case 1: Catalog Entity Sync Failure
Symptom: catalog-info.yaml exists on GitHub but does not appear in the Backstage UI
Root Cause Analysis:
- GitHub token expired or insufficient permissions (most common)
- YAML syntax error (indentation, invalid
kindvalue) - The Kind is not in the
allowlist incatalog.rules - Network issues causing GitHub API call failures
Recovery Procedure:
- Check Backstage logs:
kubectl logs -l app=backstage -c backstage --tail=100 - Validate token:
curl -H "Authorization: token ${GITHUB_TOKEN}" https://api.github.com/user - Validate YAML:
npx @backstage/cli catalog-info validate catalog-info.yaml - Manual refresh: Click the "Refresh" button on the entity page in the Backstage UI
- If cache reset is needed: Restart the Backstage Pod
Case 2: Scaffolder Template Execution Fails During GitHub Repository Creation
Symptom: RequestError: HttpError: Resource not accessible by integration error at the "publish:github" step
Root Cause Analysis:
- Missing
Repository: Administrationpermission on the GitHub App - Blocked by the organization's repository creation policy
- A repository with the same name already exists
Recovery Procedure:
- Check and update GitHub App permissions
- Verify the App is allowed to create repositories in the organization settings
- Check the failed task logs in the Backstage UI (Scaffolder task log page)
- If partially created resources (empty repos, etc.) exist, manually clean up and re-run
Case 3: PostgreSQL Connection Pool Exhaustion
Symptom: Backstage response time sharply degrades, intermittent ConnectionTimeoutError
Root Cause Analysis:
- Catalog entities grew to thousands, exceeding available DB connections
- Default connection pool size (10) cannot handle the traffic
Recovery Procedure and Prevention:
# app-config.yaml - Connection pool tuning
backend:
database:
client: pg
connection:
host: ${POSTGRES_HOST}
port: ${POSTGRES_PORT}
user: ${POSTGRES_USER}
password: ${POSTGRES_PASSWORD}
pool:
min: 5
max: 30
acquireTimeoutMillis: 60000
idleTimeoutMillis: 30000
Case 4: TechDocs Build Failure
Symptom: TechDocs tab on the service page is empty or shows a build error
Root Cause Analysis:
mkdocs.ymlfile is missing from the repository- Python dependency (mkdocs-techdocs-core) installation failure
- S3 bucket permission issues (when using external publisher)
Recovery Procedure:
- Verify that
mkdocs.ymlanddocs/index.mdexist in the repository - Test the build locally with
npx @techdocs/cli generate - Check IAM permissions when using an external publisher
- Change
techdocs.buildertolocalso Backstage builds directly (for small environments)
Case 5: Plugin Compatibility Break After Backstage Upgrade
Symptom: White screen or runtime error on specific plugin pages after upgrade
Root Cause Analysis:
- Version mismatch between Backstage core packages and plugins
- API was deprecated and then removed
- Migration to the New Backend System is required
Recovery Procedure:
- Immediately roll back to the previous version (Helm rollback or deploy with the previous image tag)
- Check error messages in the browser console
- Search the plugin's GitHub issues
- Update the plugin to the latest compatible version, or temporarily disable it
- Re-deploy to production after verifying in staging
Checklists
IDP Build Preparation Checklist
- Form a dedicated Platform Engineering team (minimum 2-3 people)
- Conduct developer pain point survey (identify areas with highest cognitive load)
- Decide on the first MVP scope for the IDP (recommended: start with Software Catalog)
- Complete comparative evaluation of Backstage vs SaaS tools
- Provision PostgreSQL instance (production)
- Create and configure permissions for GitHub App or OAuth App
- Plan SSO provider integration (Okta, Azure AD, Google, etc.)
Backstage Operations Checklist
- Separate
app-config.production.yamland manage secrets (Vault, K8s Secret) - Configure HTTPS/TLS (Ingress or load balancer)
- Set up database backup schedule
- Configure monitoring (Backstage metrics + infrastructure metrics)
- Automate weekly version upgrade workflow
- Catalog entity validation CI (validate catalog-info.yaml in PRs)
- Write developer onboarding guide documentation
- Plan quarterly developer satisfaction surveys
Catalog Registration Checklist (Per Service)
- Write
catalog-info.yamland place it at the repository root - Verify
metadata.namefollows organizational naming conventions - Set
spec.ownerto the correct team group - Assign
spec.systemto the correct system - Register API specifications (OpenAPI, gRPC, AsyncAPI)
- Add CI/CD, monitoring, and incident tool integration info in
annotations - Configure TechDocs (
mkdocs.yml+docs/directory) - Set up Kubernetes annotations (workload mapping within the cluster)
References
- Backstage Official Documentation - What is Backstage?
- Platform Engineering - How to Set Up an Internal Developer Platform
- GitGuardian - Platform Engineering: Building Your Developer Portal with Backstage Part 1
- Growin Blog - Platform Engineering 2026
- The New Stack - In 2026, AI Is Merging with Platform Engineering: Are You Ready?
- CNCF - Backstage Project Page
- Spotify Engineering - How We Use Backstage at Spotify