Skip to content
Published on

Building an IDP with Backstage Part 1 — The Software Catalog Is Everything

Authors

Introduction — Why an IDP, and Why Now

The moment an organization crosses roughly fifty microservices, the same questions start pouring in: "Which team owns this service?", "Where is the spec for the payments API?", "Which services use this database?", "Who should a new hire ask to understand our whole system?" The cost of answering these questions is your organization's cognitive load, and reducing that load systematically is the core mission of platform engineering.

The Internal Developer Portal (IDP) is the industry's answer to this problem, and the de facto standard is Backstage, which Spotify open-sourced in 2020 and donated to the CNCF. Backstage matured through the CNCF Sandbox (2020) and Incubating (2022) stages and, judging by its public adopters list, has been adopted by thousands of organizations, making it one of the most active projects in the CNCF. Platform engineering reports from Gartner and Puppet have consistently pointed in the same direction: the majority of large engineering organizations will be running internal platform teams by 2026.

This series covers building an IDP with Backstage in three parts. Part 1 has exactly one topic: the software catalog. Let me state the conclusion up front. A Backstage instance with an empty catalog is nothing. The Scaffolder, TechDocs, and the Kubernetes plugin all hang off catalog entities to function. Catalog design decides whether your IDP project succeeds or fails.

Where Backstage Sits in the IDP Landscape

First, let us sort out the terminology. The acronym IDP is used in two ways.

TermMeaningFocus
Internal Developer PlatformThe entire self-service infrastructure platformProvisioning, golden paths, environment management
Internal Developer PortalThe single entry-point UI for the platformCatalog, docs, self-service UI

Strictly speaking, Backstage is the latter — a portal framework. It is not an engine that provisions infrastructure directly; it is a framework that brings all of an organization's software assets and tools into a single pane of glass. There are four core building blocks.

+---------------------------------------------------------------+
|                  Backstage (portal framework)                  |
|                                                               |
|  +----------------+  +----------------+  +-----------------+  |
|  |   Software     |  |   Scaffolder   |  |    TechDocs     |  |
|  |   Catalog      |  | (golden paths) |  |  (docs-as-code) |  |
|  | (this article) |  |    [Part 2]    |  |    [Part 3]     |  |
|  +-------+--------+  +-------+--------+  +--------+--------+  |
|          |                   |                    |           |
|  +-------v-------------------v--------------------v--------+  |
|  |         Plugin ecosystem (K8s, ArgoCD, ...)              |  |
|  +----------------------------------------------------------+  |
+---------------------------------------------------------------+

The catalog is the data foundation for everything else. When the Scaffolder creates a new service, it registers it in the catalog; TechDocs attaches documentation to catalog entities; the Kubernetes plugin reads catalog annotations to decide which workloads to display.

The Catalog Data Model — Six Core Kinds

The Backstage catalog is a graph of entities. Every entity is expressed as a YAML document with apiVersion, kind, metadata, and spec, deliberately inspired by the Kubernetes resource model. The core kinds are as follows.

KindDescriptionExamples
ComponentA unit of software built from codeBackend service, web app, library
APIAn interface a component provides/consumesOpenAPI, gRPC, GraphQL, AsyncAPI
ResourceInfrastructure a component depends onRDS, S3 bucket, Kafka topic
SystemA bundle that delivers one capability togetherPayment system, search system
DomainA business area grouping systemsCommerce, settlement, membership
Group / UserThe subjects of ownershipTeams, squads, individuals

These connect through relations to form a graph.

                      +------------------+
                      |     Domain       |   e.g. payments-domain
                      | (business area)  |
                      +--------^---------+
                               | partOf
                      +--------+---------+
                      |     System       |   e.g. payment-system
                      +--------^---------+
                               | partOf
          +--------------------+--------------------+
          |                    |                    |
  +-------+--------+   +------+---------+   +------+--------+
  |   Component    |   |   Component    |   |   Component   |
  |  payment-api   |   | payment-worker |   |  payment-web  |
  +---+-------+----+   +-------+--------+   +---------------+
      |       |                |
      |       | providesApi    | consumesApi
      |       v                v
      |    +--+----------------+--+
      |    |         API          |   e.g. payment-v1 (OpenAPI)
      |    +----------------------+
      | dependsOn
      v
  +---+------------+        +-----------------+
  |   Resource     |        |     Group       |
  |  payment-db    |        | team-payments   |
  +----------------+        +--------^--------+
                                     | ownedBy (from every entity)

Relations are generated bidirectionally. When a Component declares providesApis, the API side automatically gets the inverse apiProvidedBy relation. Thanks to this graph, queries like "every service that uses payment-db" or "every asset owned by team-payments" become a single click.

Writing catalog-info.yaml in Practice

The standard convention is to place a catalog-info.yaml file at the root of each repository. Let us look at three practical examples.

A backend service (Component + provided API)

apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: payment-api
  title: Payment API Server
  description: Core backend service handling payment authorization and cancellation
  annotations:
    github.com/project-slug: acme-corp/payment-api
    backstage.io/techdocs-ref: dir:.
    backstage.io/kubernetes-id: payment-api
    pagerduty.com/integration-key: PD-INTEGRATION-KEY
    sonarqube.org/project-key: acme_payment-api
  tags:
    - java
    - spring-boot
    - payments
  links:
    - url: https://grafana.acme.io/d/payment-api
      title: Grafana Dashboard
      icon: dashboard
spec:
  type: service
  lifecycle: production
  owner: group:default/team-payments
  system: payment-system
  providesApis:
    - payment-v1
  dependsOn:
    - resource:payment-db
    - resource:payment-events-topic

Pay attention to annotations. The catalog itself does not interpret these values, but each plugin reads its own annotations to function. The Kubernetes tab only appears when backstage.io/kubernetes-id is present, and on-call information only shows up when pagerduty.com/integration-key exists. Annotations are effectively the activation switches for plugins.

An API entity (linking an OpenAPI spec)

apiVersion: backstage.io/v1alpha1
kind: API
metadata:
  name: payment-v1
  title: Payment API v1
  description: REST API for payment authorization, cancellation, and lookup
spec:
  type: openapi
  lifecycle: production
  owner: group:default/team-payments
  system: payment-system
  definition:
    $text: ./openapi/payment-v1.yaml

When you attach an OpenAPI document to definition, Backstage renders Swagger UI in the API definition tab. The text loader directive shown in the example reads a relative-path file, and URLs can be specified as well. For gRPC you would use type: grpc with a proto file, and for event-driven interfaces, type: asyncapi.

Infrastructure resources, systems, and domains

apiVersion: backstage.io/v1alpha1
kind: Resource
metadata:
  name: payment-db
  description: Payment ledger PostgreSQL (AWS RDS)
  annotations:
    amazonaws.com/arn: arn:aws:rds:ap-northeast-2:111122223333:db:payment-db
spec:
  type: database
  owner: group:default/team-payments
  system: payment-system
---
apiVersion: backstage.io/v1alpha1
kind: System
metadata:
  name: payment-system
  description: System responsible for payment authorization through settlement
spec:
  owner: group:default/team-payments
  domain: commerce
---
apiVersion: backstage.io/v1alpha1
kind: Domain
metadata:
  name: commerce
  description: Commerce business domain
spec:
  owner: group:default/commerce-tribe

Multiple entities can be declared in one file using --- separators. Systems and Domains are usually cleaner to manage in a dedicated governance repository (for example acme-corp/software-catalog).

Discovery — Manual Registration Does Not Scale

Registering entities one by one in the UI collapses as soon as you pass thirty repositories. To understand the mechanisms that populate the catalog, you need to distinguish two concepts.

ConceptRoleExamples
Entity ProviderInjects entities into the catalog from external sourcesGitHub discovery, LDAP, static files
ProcessorValidates/enriches injected entities and builds relationsSchema validation, relation building, CODEOWNERS resolution
  GitHub Org          LDAP/AD          static locations
      |                  |                    |
      v                  v                    v
+-----+------------------+--------------------+-----+
|           Entity Providers (ingestion layer)       |
+--------------------------+-------------------------+
                           v
+--------------------------+-------------------------+
| Processing loop: validate -> transform -> relations |
| -> store (Processors intervene at each stage)       |
+--------------------------+-------------------------+
                           v
                 +---------+---------+
                 |  PostgreSQL (DB)  |
                 +---------+---------+
                           v
                 +---------+---------+
                 |  Catalog REST API |
                 +-------------------+

The discovery configuration that automatically scans an entire GitHub organization looks like this.

# app-config.yaml
catalog:
  providers:
    github:
      acmeProvider:
        organization: 'acme-corp'
        catalogPath: '/catalog-info.yaml'
        filters:
          branch: 'main'
          repository: '.*'        # filter target repos with a regex
          topic:
            include: ['backstage-managed']
        schedule:
          frequency: { minutes: 30 }
          timeout: { minutes: 3 }

With this single block, "every repository that has catalog-info.yaml on the main branch and carries the backstage-managed topic" is synchronized automatically every 30 minutes. The corresponding module must be registered in the backend.

// packages/backend/src/index.ts
backend.add(import('@backstage/plugin-catalog-backend-module-github/alpha'));

For the organizational structure (Groups/Users), use the org discovery that imports GitHub Teams directly.

catalog:
  providers:
    githubOrg:
      - id: acme-github-org
        githubUrl: https://github.com
        orgs: ['acme-corp']
        schedule:
          frequency: { hours: 1 }
          timeout: { minutes: 15 }

The Ownership Model — Every Entity Must Have an Owner

The most important invariant of the catalog is "no entity without an owner." The subjects of ownership are Group and User entities.

apiVersion: backstage.io/v1alpha1
kind: Group
metadata:
  name: team-payments
  description: Payments platform team
spec:
  type: team
  profile:
    displayName: Payments Team
    email: team-payments@acme.io
  parent: commerce-tribe
  children: []
---
apiVersion: backstage.io/v1alpha1
kind: User
metadata:
  name: youngju.kim
spec:
  profile:
    displayName: Youngju Kim
    email: youngju.kim@acme.io
  memberOf:
    - team-payments

Groups form an organizational tree via parent/children. Whether you run a tribe-squad structure or a division-team structure, it maps directly, and the tree is used for ownership aggregation (rolling up assets of child teams into parent organization views).

For repositories where someone forgot to set an owner, CODEOWNERS integration is available. If you enable the catalog backend CodeOwnersProcessor, it reads the repository CODEOWNERS file and infers the owning team for entities whose spec.owner is empty. That said, treat this only as a fallback and make explicit owner declarations the rule: CODEOWNERS means "code review responsibility," while catalog owner means "operational responsibility" — subtly different things.

Metadata Governance — Annotation Policy and Linting

As the catalog grows, metadata quality becomes the catalog's credibility. It pays to plant governance mechanisms from day one.

1) Declare a required-metadata policy as a document. For example:

Field/annotationRequiredNotes
spec.ownerRequiredGroups only, no individuals
spec.lifecycleRequiredOne of experimental, production, deprecated
descriptionRequiredAt least one sentence
github.com/project-slugRequiredSource linkage
backstage.io/techdocs-refRecommendedRequired for documented services
pagerduty.com/integration-keyRecommendedRequired for production services

2) Lint catalog-info.yaml in CI. Putting an entity validation tool into the PR pipeline filters out broken files before they merge.

# .github/workflows/catalog-lint.yaml
name: catalog-lint
on:
  pull_request:
    paths:
      - 'catalog-info.yaml'
jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Validate catalog entity
        run: npx @roadiehq/backstage-entity-validator validate catalog-info.yaml

3) Enforce organizational policy with a custom processor. For example, "a production-lifecycle component without a PagerDuty annotation is a validation error" can be expressed in code.

// Skeleton of an org-policy validation processor
import { CatalogProcessor } from '@backstage/plugin-catalog-node';
import { Entity } from '@backstage/catalog-model';

export class RequiredAnnotationsProcessor implements CatalogProcessor {
  getProcessorName(): string {
    return 'RequiredAnnotationsProcessor';
  }

  async validateEntityKind(entity: Entity): Promise<boolean> {
    if (entity.kind === 'Component' && entity.spec?.lifecycle === 'production') {
      const annotations = entity.metadata.annotations ?? {};
      if (!annotations['pagerduty.com/integration-key']) {
        throw new Error(
          `production component ${entity.metadata.name} requires pagerduty annotation`,
        );
      }
    }
    return false; // let other processors continue
  }
}

Production Deployment Configuration

SQLite is fine for a local demo, but production must use PostgreSQL. The essential app-config looks like this.

# app-config.production.yaml
app:
  baseUrl: https://backstage.acme.io

backend:
  baseUrl: https://backstage.acme.io
  listen:
    port: 7007
  database:
    client: pg
    connection:
      host: ${POSTGRES_HOST}
      port: ${POSTGRES_PORT}
      user: ${POSTGRES_USER}
      password: ${POSTGRES_PASSWORD}
      ssl:
        rejectUnauthorized: true
  cache:
    store: memory

integrations:
  github:
    - host: github.com
      apps:
        - $include: github-app-credentials.yaml

catalog:
  rules:
    - allow: [Component, API, Resource, System, Domain, Group, User, Location]

For GitHub integration, prefer the GitHub App approach over a personal access token (PAT). Rate limits are separated per installation, permission scopes can be controlled precisely, and credentials are not tied to a human account.

The skeleton of a Kubernetes deployment manifest follows.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: backstage
  namespace: backstage
spec:
  replicas: 2
  selector:
    matchLabels:
      app: backstage
  template:
    metadata:
      labels:
        app: backstage
    spec:
      containers:
        - name: backstage
          image: ghcr.io/acme-corp/backstage:1.0.3
          ports:
            - containerPort: 7007
          envFrom:
            - secretRef:
                name: backstage-secrets
          readinessProbe:
            httpGet:
              path: /healthcheck
              port: 7007
          resources:
            requests:
              cpu: 500m
              memory: 1Gi
            limits:
              memory: 2Gi
---
apiVersion: v1
kind: Service
metadata:
  name: backstage
  namespace: backstage
spec:
  selector:
    app: backstage
  ports:
    - port: 80
      targetPort: 7007

If you run two or more replicas, it is worth knowing that Backstage uses database-backed coordination so that entity provider scheduled tasks are not executed redundantly. It works without any separate leader-election setup.

Authentication — GitHub Login and OIDC

Since this is an internal portal, authentication is mandatory. The simplest GitHub OAuth configuration:

auth:
  environment: production
  providers:
    github:
      production:
        clientId: ${AUTH_GITHUB_CLIENT_ID}
        clientSecret: ${AUTH_GITHUB_CLIENT_SECRET}
        signIn:
          resolvers:
            - resolver: usernameMatchingUserEntityName

A sign-in resolver is the rule that maps "an identity from the external IdP" to "a User entity in the catalog." In other words, a User entity must exist in the catalog for login to succeed. If you auto-ingest Users via org discovery, this resolves naturally. If you run an internal IdP such as Okta, Keycloak, or Azure AD, use the OIDC provider.

auth:
  environment: production
  providers:
    oidc:
      production:
        metadataUrl: https://keycloak.acme.io/realms/acme/.well-known/openid-configuration
        clientId: ${AUTH_OIDC_CLIENT_ID}
        clientSecret: ${AUTH_OIDC_CLIENT_SECRET}
        prompt: auto
        signIn:
          resolvers:
            - resolver: emailMatchingUserEntityProfileEmail

The auth modules must be added to the backend.

backend.add(import('@backstage/plugin-auth-backend'));
backend.add(import('@backstage/plugin-auth-backend-module-github-provider'));
// or the OIDC module
backend.add(import('@backstage/plugin-auth-backend-module-oidc-provider'));

The Value the Catalog Creates — Three Scenes

Instead of abstract benefits, let us look at concrete scenes.

Scene 1: Finding the on-call at 2 a.m. during an incident. The order service fires a payment API timeout alert. Open payment-api in the catalog and you get the owning team, the PagerDuty on-call engineer, the Grafana dashboard link, and recent deployment history on one screen. The twenty minutes burned shouting "who owns this?" in Slack disappears.

Scene 2: Dependency tracking and change impact analysis. You are planning a PostgreSQL major version upgrade for payment-db. Query the inverse dependsOn graph in the catalog and you immediately get every component depending on the database along with each owning team. Selecting the notification audience takes one minute.

Scene 3: New-hire onboarding. In their first week, a new hire can walk down the Domain to System to Component hierarchy and map out the organization's software landscape on their own — because "the senior engineer's mental map of the whole picture" has been externalized into a system.

Adoption Strategy — Pilot Teams and Automated Population

The hardest part of catalog adoption is not technology but uptake. The proven playbook:

  1. Start with one or two pilot teams. Company-wide big bangs almost always fail. Pick a cooperative team with a reasonable number of services, observe the friction of writing catalog-info.yaml and onboarding firsthand, and turn it into templates.
  2. Automate the initial population. Iterating over the repository list with a script and generating baseline catalog-info.yaml files as bulk PRs works well. Infer language/framework from repository contents, propose owner candidates from CODEOWNERS or commit history in the PR body, and each team only has to "review and merge."
  3. Create enforcement by wiring the catalog into other processes. For example, once a policy exists that "the production deployment pipeline only accepts services registered in the catalog," registration stops being optional and becomes a precondition of deployment.
  4. Deliver one visible win within the first 90 days. Whether it is consolidated on-call information or an API documentation hub, voluntary adoption only starts after someone experiences "the portal saved me time."

Failure Patterns — How to Ruin It

Empty catalog syndrome. Installing Backstage, registering a few demo entities, and leaving it with "the teams will register things themselves." A developer who lands on an empty portal does not come back. Do not launch without discovery automation and an initial bulk population.

Manual maintenance rot. If registration happened but updates are manual, metadata starts diverging from reality within six months. The first time catalog information is proven wrong (the moment an incident inquiry lands on the wrong team), trust collapses. The fix is a one-way principle: pin the source of truth to catalog-info.yaml in Git repositories, minimize manual UI registration, and auto-sync data like org charts from IdP/HR systems.

Ownership inflation. Dumping every owner onto a single "platform-team" or assigning entities to individuals who have left the company. Owners must be real team groups, and updating group entities must be part of the reorganization process.

Over-modeling. Trying to design the full Domain/System/Component/API/Resource hierarchy perfectly on day one. Starting with just Components and Groups, then adding Systems and Domains when the need arises, is the realistic incremental approach.

Checklist

Verify the following items during adoption.

  • PostgreSQL-backed production configuration (no SQLite)
  • GitHub App based integration (avoid PATs)
  • GitHub discovery + org (Group/User) discovery enabled
  • Authentication (GitHub OAuth or OIDC) with a sign-in resolver configured
  • Allowed kinds declared explicitly via catalog rules
  • Required-metadata policy documented (owner, lifecycle, description, key annotations)
  • catalog-info.yaml linting added to the PR pipeline
  • Bulk registration automation script executed for existing repositories
  • Pilot team selected with a feedback loop in place
  • Owners are always groups, with a defined process for reorgs
  • A visible-win goal set for the first 90 days (e.g. consolidated on-call info)

Closing

This article covered the software catalog, the foundation of a Backstage IDP: the entity model and relation graph, catalog-info.yaml authoring, discovery automation, ownership and governance, and production deployment. The core message is singular: the catalog is not a feature but a foundation, and it only survives if it is designed to populate and refresh itself automatically.

Part 2 covers the Scaffolder that operates on top of the catalog — software templates that turn golden paths into code. It is the mechanism that produces a new service in five minutes with every organizational standard built in.

References