Skip to content
Published on

The Hands That Build Surveillance — Developer Ethics and Privacy Engineering

Authors

Introduction — Why This Topic, Why Now

In the tech community of 2026, surveillance is no longer a word for conspiracy theorists. Three currents that topped GeekNews and Hacker News in recent months have pushed the topic front and center.

First, the remark by Oracle co-founder Larry Ellison is circulating again. Discussing AI-based surveillance systems, he said: citizens will be on their best behavior, because we are constantly recording and reporting everything that is going on. Packaged as a promise of better public safety, the sentence read to many like a dystopian warning about the erosion of privacy. And Oracle is a company that can actually sell such infrastructure.

Second, the piece "A walking tour of surveillance infrastructure in Seattle" on coveillance.org became a major topic on Hacker News. Walking ordinary streets while pointing out where and how CCTV, automated license plate readers (ALPR), traffic sensors, and private doorbell cameras are embedded, the piece showed that surveillance is not abstract discourse but physically existing urban infrastructure.

Third, the age-verification mandates moving through legislatures in multiple countries. Under the banner of protecting minors, laws requiring identity checks for adult sites and social networks are spreading. The protective aim is legitimate, but depending on implementation it can become infrastructure that attaches a real-name tag to the internet usage of every citizen — hence the heated debate in the tech community.

What do these three currents have in common? Every one of those systems is built by developers like us. Camera firmware, video analysis models, location data pipelines, age verification APIs — none of it springs into being on its own. This post is not a manifesto declaring surveillance technology evil. It is an attempt to lay out the judgment criteria and the practical toolbox (privacy engineering) for developers who sit in the seat where such technology gets built.

The Terrain of the 2026 Surveillance Debate

Let us map the axes first. Positions on surveillance technology are not a simple for/against; they fall roughly into four quadrants.

                    safety first
                       |
        (A) full        |   (B) conditional
        deployment      |   deployment
        "technology     |   "proven efficacy +
         prevents       |    warrants +
         crime"         |    auditability required"
                       |
  state/corporate -----+----- citizen
  control              |      control
                       |
        (C) full        |   (D) constrain
        opposition      |   by design
        "the mere       |   "if built at all, force
         existence of   |    minimization, decentral-
         the infra      |    ization, transparency
         is dangerous"  |    through engineering"
                       |
                    liberty first

For developers, the realistic position is mostly somewhere between (B) and (D). We are rarely in the seat that decides whether a system gets deployed, but how it is built is substantially in our hands. Given the same requirement, "store the location history of the entire population in a central DB in plaintext for five years" and "on-device processing + transmit aggregates only + automatic 90-day deletion" are completely different systems.

The Spectrum of Surveillance Tech — What Has Arrived, and How Far

As the walking-tour piece shows, surveillance is not a single technology but layer upon layer.

LayerRepresentative techData collectedCore issue
Fixed videoCCTV, private doorbell camerasvideo, time, placethe boundary between public and private space
Vehicle trackingALPR (license plate recognition)vehicle movement historyone capture is harmless; linked, it is the whole movement pattern
Biometricsface recognition, gait recognitionidentity itselfirrevocable identifiers, misidentification bias
Location datacarrier towers, app SDKs, data brokersprecise location historyhollowed-out consent, a resale ecosystem
Online behaviortracking pixels, ad IDs, fingerprintingbrowsing/purchases/interestsprofiling and dark patterns
Identity gatesage verification, real-name systemsthe identity-to-behavior linkthe possible end of anonymity

The crucial insight is that the danger comes not from any single layer but from combination. One ALPR unit is a tool for finding stolen cars; link the ALPR records of an entire city in time order and you can reconstruct when a given citizen visited a hospital, a protest, a place of worship. The data broker ecosystem performs this combination commercially. That is why the central question of privacy engineering is not "what if this data leaks?" but "what if this data is combined with other data?"

The Moments Developers Actually Face — Ethics Arrives as a Ticket, Not an Abstraction

Ethical choices usually arrive not as grand meetings but in the shape of ordinary work tickets. Consider the scenes you will actually encounter.

Scene 1: Over-collection in logging

A ticket says: "Please log the entire request for debugging." The request body contains the search terms of users, their location, sometimes health-related keywords. Full logging is convenient for debugging, but that log effectively becomes a shadow profile. A log system with sloppy access control is often the largest unofficial surveillance infrastructure in the company.

Scene 2: A request to add a tracking SDK

The marketing team requests a third-party SDK for conversion measurement. Read the SDK documentation and you find it collects the advertising ID, the list of installed apps, even precise location. "We only use the conversion data" does not help — the data goes to the servers of the SDK vendor and is reused under their terms. The single line you add can become the first gateway sending user locations into the data broker ecosystem.

Scene 3: Dark pattern demands

"Consent rates are low, so please move the decline button two levels deeper." Legally one might still claim consent was obtained, but the design intent itself is to circumvent the genuine will of the user. Both the GDPR and the Korean Personal Information Protection Act have moved toward enforcing substantive rather than formal consent, and sanctions specifically targeting dark patterns are accumulating.

Scene 4: "Collect everything, just in case"

An architecture decision: "We do not know what we will need later, so collect everything into the data lake." A common temptation in the machine-learning era, but purposeless collection is itself legally questionable under most privacy regimes, and it expands the blast radius of any security incident without bound. As the 2026 npm supply-chain attacks showed, breaches are a matter of when, not if. Data you do not hold cannot leak.

Privacy Engineering in Practice 1 — Designing for Data Minimization

The principle fits in one sentence: only the minimum needed for the purpose, only for as long as needed, only to those who need it. Privacy engineering is enforcing this at the level of code and schema.

Minimize at collection time — enforce it in the schema

-- Bad design: "store everything for now"
CREATE TABLE user_events_bad (
    id          BIGSERIAL PRIMARY KEY,
    user_id     BIGINT,
    raw_request JSONB,          -- full request dump (search terms, location, everything)
    ip_address  INET,
    user_agent  TEXT,
    created_at  TIMESTAMPTZ
);

-- Better design: purpose-scoped columns + pseudonymization + built-in retention
CREATE TABLE user_events (
    id            BIGSERIAL PRIMARY KEY,
    user_pseudo   TEXT NOT NULL,        -- one-way pseudonymous ID (mapping stored separately)
    event_type    TEXT NOT NULL,        -- defined events only (no free text)
    coarse_geo    TEXT,                 -- location truncated to region level
    created_at    TIMESTAMPTZ NOT NULL DEFAULT now(),
    expires_at    TIMESTAMPTZ NOT NULL  -- row-level retention deadline embedded in the data
        DEFAULT now() + INTERVAL '90 days'
);

-- Make deletion a system behavior, not an operational procedure
CREATE INDEX idx_user_events_expiry ON user_events (expires_at);

The point is expires_at. Write retention periods only in a policy document and nobody honors them. Stamp the expiry on each row and make the deletion job operate on that column alone, and the retention policy becomes code.

Automated retention — make deletion the default

# retention_worker.py — a worker that automatically enforces retention deadlines
import logging
from datetime import datetime, timezone

RETENTION_POLICIES = {
    # table: (basis for retention, deletion method)
    "user_events":      ("service quality analytics, 90 days", "hard_delete"),
    "access_logs":      ("security audit, 180 days",           "hard_delete"),
    "payment_records":  ("tax-law mandated retention, 5 years", "archive_then_delete"),
    "support_tickets":  ("dispute handling, 3 years",          "anonymize"),  # anonymize instead of delete
}

def enforce_retention(db, now=None):
    now = now or datetime.now(timezone.utc)
    report = []
    for table, (basis, method) in RETENTION_POLICIES.items():
        if method == "hard_delete":
            n = db.execute(
                f"DELETE FROM {table} WHERE expires_at < %s", (now,)
            ).rowcount
        elif method == "anonymize":
            n = db.execute(
                f"UPDATE {table} SET user_pseudo = 'expired', "
                f"free_text = NULL WHERE expires_at < %s", (now,)
            ).rowcount
        else:
            n = archive_then_delete(db, table, now)
        report.append((table, basis, method, n))
        logging.info("retention: table=%s purged=%d basis=%s", table, n, basis)
    return report  # this report itself becomes compliance evidence

Three things to stress.

  1. The deletion log is the evidence. The claim "we delete after 90 days" only means something when execution records of the deletion job exist.
  2. Make conflicts with statutory retention explicit in the policy table. For data you must keep under tax law, like payment records, write the reason it stays right next to the code.
  3. Track backups too. Deleting from the production DB while seven years of backups remain is not deletion. Aligning backup rotation cycles with the retention policy is part of the job.

Purpose limitation — tagging data with its allowed uses

# data-catalog.yaml — example of declaring purpose limitations per dataset
datasets:
  - name: user_events
    purpose: ["service_quality", "abuse_prevention"]
    prohibited: ["ad_targeting", "sale_to_third_party", "model_training"]
    legal_basis: "legitimate interest (Terms of Service art. 7)"
    owner: data-platform-team
    review_cycle: quarterly

  - name: precise_location
    purpose: ["delivery_routing"]
    prohibited: ["analytics", "retention_beyond_delivery"]
    legal_basis: "explicit consent (location-based services consent)"
    owner: delivery-team
    review_cycle: monthly

With this catalog, the question "may we use this data to train the recommendation model?" is no longer answered by gut feeling. The ideal endpoint is a data platform that automatically checks purpose matching on access requests, but even one YAML file and a quarterly review block much of purpose creep.

Privacy Engineering in Practice 2 — Anonymization and Differential Privacy Basics

Anonymization is harder than you think

The idea that removing names and ID numbers makes data anonymous was shattered long ago. Classic research showed that the combination of birth date, gender, and postal code uniquely identifies the large majority of the US population, and behavioral data like Netflix viewing records could be re-identified from just a few points. Location history is worse: three or four spatio-temporal points pin down most individuals.

So practical anonymization is not "deleting identifiers" but design that assumes linkage attacks: k-anonymity (at least k people share any attribute combination), truncation (location to region level), aggregation (store only statistics, not individual rows) — combined.

Differential privacy — the concept and the limits

Differential privacy is a mathematical framework that injects noise so that the result of an analysis barely changes whether any given individual is included in the dataset or not. The intuition in code:

# dp_count.py — a differentially private count using Laplace noise (conceptual example)
import numpy as np

def dp_count(true_count: int, epsilon: float) -> float:
    """Smaller epsilon means stronger privacy and a less accurate result.
    The sensitivity of a count query is 1: the presence or absence of
    one person changes the result by at most 1."""
    sensitivity = 1.0
    noise = np.random.laplace(loc=0.0, scale=sensitivity / epsilon)
    return true_count + noise

# Usage: publishing "number of users of feature A last week" in an external report
true_value = 1843
print(dp_count(true_value, epsilon=1.0))   # e.g. 1841.7 — still useful
print(dp_count(true_value, epsilon=0.05))  # e.g. 1812.3 — strong protection, large error

The limits are equally clear.

  1. The privacy budget (epsilon) is a consumable. Repeated queries against the same data accumulate exposure. Saying "we mixed in some noise so it is safe" without budget accounting is not differential privacy.
  2. It is brutal on small data. Inject enough noise into an analysis of a few dozen users and the result becomes meaningless. It is a tool for large-scale aggregate statistics.
  3. DP protects outputs; it is not absolution for collection. Over-collecting raw data and applying DP only at the output is half a solution. Minimizing collection always comes first.

Privacy Impact Assessment (PIA) — A Lightweight Template

A formal PIA is heavy, but a lightweight per-feature version fits inside a sprint. Start by adding this one page to your design review template.

=== Lightweight privacy impact assessment (per feature, 30-minute version) ===

1. What is being collected
   - Newly collected data items:
   - Which of them are personal/sensitive data:
   - Possibility of data from minors: yes / no

2. Why it is needed (purpose specification)
   - Purpose of this feature:
   - Why each item is essential to that purpose (one line each):
   - Result of considering alternatives that achieve the purpose with less data:

3. Where it flows
   - Storage location and encryption status:
   - Transfers to third parties (including SDKs/APIs):
   - Possibility of being combined with other data:

4. When it disappears
   - Retention period and its basis:
   - Deletion method (automatic/manual, hard delete/anonymization):
   - Residual lifetime inside backups:

5. What could go wrong
   - Worst-case scenario on a leak:
   - Linkage-attack scenario (combined with other data?):
   - Insider misuse scenario and access controls:

6. Decision
   - proceed / proceed with changes / hold
   - Conditions for changes:
   - Reviewer sign-off (engineering/security/legal):

The real value of this template is item 5. The moment you ask not only "what if it leaks?" but "what if it is combined?" and "what if an insider misuses it?", the quality of the assessment changes. And as the records under item 6 accumulate, the organization can prove "what we knew at the time and what we decided."

The Craft of Saying No — Raising Dissent Inside an Organization

Feeling an ethical concern and raising it effectively inside an organization are different skills. There is a sequence.

Step 1: Translate into the language of facts and costs

"This is unethical" is a sentence that closes a discussion. Translate the same content into the language of cost and risk and the discussion opens.

  • Weak phrasing: "Secretly tracking users is wrong."
  • Stronger phrasing: "This SDK transmits precise location to the vendor servers. That fact is absent from our consent screen, so there is exposure under privacy law, with fines and reputational loss if found. If the goal is conversion measurement, an aggregate measurement API gives us the same metrics."

The core structure is fact (what happens) + risk (legal/security/reputation) + alternative (another way to achieve the goal). Objection without an alternative sounds like a veto; objection with an alternative is engineering.

Step 2: Dissent that leaves a record

Verbal objections evaporate. Leave them in traceable channels: review comments on the design doc, the concerns field of the ticket, dissents recorded in meeting notes. This is self-protection, but it is also a public good — it lets the organization later learn that someone did warn at the time.

Step 3: The escalation ladder — one rung at a time

  1. one-on-one with your direct lead     -- "I have concerns about this design + alternative"
       | if unresolved
       v
  2. design review / architecture forum   -- table it as a formal agenda item, on the record
       | if unresolved
       v
  3. security/privacy/legal channels      -- request advice from the DPO, security, compliance
       | if unresolved
       v
  4. ethics reporting channel/ombudsman   -- use the anonymous channel if one exists
       | if unresolved
       v
  5. personal choice                      -- request a project transfer, resign, (last) whistleblowing

Not skipping rungs matters. Carrying a problem solvable at steps 1 to 3 straight to step 5 raises the cost for both the organization and the individual. Conversely, stopping because step 1 brushed you off is not the fulfillment of professional responsibility either.

The reality of whistleblowing — with caution

If steps 3 and 4 have all failed and the matter concerns public safety or violations of law, the option of whistleblowing remains. But face the reality squarely. Whistleblower protection regimes vary in strength by country, and even where protection exists, the costs to career and livelihood are commonly borne by the discloser. So the recommendations are conservative. First, consult a lawyer before anything else (before the press). Second, removing confidential company material without authorization is itself a legal risk, so get legal advice on how evidence may be gathered. Third, examine official routes first — the Korean Protection of Public Interest Reporters Act and the equivalent regimes elsewhere. Whistleblowing is not an ethical hero story; it is rightly reserved as the last resort when every other route has failed.

The ACM Code of Ethics — Annotated Excerpts for Developers

The ACM Code of Ethics is the most widely cited professional code in the software field. Here are the provisions that bear directly on surveillance technology.

ProvisionGist of the principleMeaning in the surveillance context
1.1Contribute to society and human well-beingimpact assessment measured against everyone affected, not the shareholders
1.2Avoid harmdesign responsibility includes potential misuse, not only intended function
1.6Respect privacyexplicitly demands minimal collection, clear notice, no use beyond purpose
1.7Honor confidentialityhandling of information learned on the job — but never to conceal violations of law and ethics
2.5Give comprehensive evaluations of impactsassessments like a PIA are not a nice-to-have but a baseline professional duty
3.1Ensure the public good is centralleaders are responsible for structures where members can raise ethical concerns

Provision 1.6 is especially concrete. It states that collection of personal information should be the minimum necessary for a legitimate purpose, that the persons concerned should be able to know, and that use beyond the collection purpose requires consent. Data minimization, purpose limitation, retention deadlines — everything covered in this post is literally what the code demands. A code of ethics is not decoration; it is the professional baseline that says "if you are a professional, this much is table stakes."

Balance — Safety vs Liberty, Both Sides Stated Honestly

This topic invites hearing only one side, so let me summarize both honestly.

Arguments on the safety-first side:

  1. Cases where surveillance technology genuinely contributed to solving crimes are hard to deny. In missing-person searches, vehicle theft, violent crime investigations, CCTV and ALPR are everyday tools.
  2. The problem of protecting minors — the starting point of the age-verification debate — is real. As evidence accumulates about the harm algorithmic feeds do to children, "doing nothing" has also stopped being an ethical option.
  3. The doctrine that expectations of privacy in public space are inherently limited is itself an old one.

Arguments on the liberty-first side:

  1. The harm of surveillance comes not from detection but from the chilling effect. Exactly as the Ellison "best behavior" remark reveals, humans under constant observation self-censor even lawful behavior — assembly, journalism, counseling.
  2. Infrastructure transfers with regimes. A surveillance network built under a benign government is inherited intact by the next one. System design must assume the worst-case operator.
  3. Misuse is not a hypothesis but a record. Documented cases abound: police officers running personal ALPR lookups, warrant-free purchases of location data through brokers.
  4. The evidence of efficacy is weak. Studies of the effect of expanded camera coverage on crime rates are mixed, with many meta-analyses finding significance only for specific crimes such as vehicle crime in parking facilities.

The honest conclusion: safety and liberty are not zero-sum, but neither is there a free lunch. That is exactly why the developer matters. A design that achieves the same safety goal at a lower cost in liberty often exists, and the people who know that design are engineers, not politicians. Warrant-gated access control, short retention windows, on-device processing, publicly auditable logs — these are the toolbox of quadrant (D), constraint by design.

A Developer Action Checklist

Finally, a checklist you can use the moment you get to work tomorrow.

[ Design stage ]
[ ] For every new data item, asked "is the feature impossible without this?"
[ ] Included the lightweight PIA in the design review template
[ ] Embedded retention in the schema (row-level expires_at)
[ ] Routed location/biometric/minor data through a separate approval process

[ Implementation stage ]
[ ] Installed middleware that masks personal data fields in logs by default
[ ] Verified and documented the collection scope of third-party SDKs
[ ] Wired deletion-worker results into monitoring/alerts
[ ] Attached audit logs and alerts to insider access (who looked up whom)

[ Organization stage ]
[ ] Declared purposes/prohibited uses in the data catalog
[ ] Raised concerns through recordable channels (review comments, tickets)
[ ] Know where the next rung of the escalation ladder is
[ ] Linked ACM code provision 1.6 in the team onboarding docs

[ Personal stage ]
[ ] Imagined the scenario where a hostile operator owns the system I am building
[ ] Habitually ask "what happens when this is combined?"
[ ] When refusing, speak in the structure of fact + risk + alternative

The Korean and Japanese Context — The Surveillance Terrain Next Door

Finally, the issues closest to developers in Korea and Japan.

Korea has one of the highest CCTV densities in the world, with integrated public CCTV control centers operated at the municipal level. The Personal Information Protection Act requires signage and purpose limitation for fixed video devices, and the 2023 amendment added rules for mobile video devices (drones, autonomous vehicles). For developers, the important shifts are that the fine ceiling of the Personal Information Protection Commission is now based on total revenue, and that the pseudonymized-data regime increasingly requires implementing the balance between data use and protection in code. In an environment with a powerful universal identifier — the resident registration number system — linkage attacks are more potent than elsewhere, and that fact should be a design premise.

Japan updates its Act on the Protection of Personal Information (APPI) on a three-year review cycle, and rules on facial recognition data and the pseudonymously processed information regime have been refined. Debates over guidelines for investigative use of security camera footage, and concerns accompanying the expanding uses of the My Number card, parallel the Korean issues closely. The shared practical lesson in both countries is the same: there is always a gap between what the law permits and what users expect, and reputational risk grows in that gap. Do not stop at the legality check — add a surprise test to design review: would users be startled to learn this fact?

Frequently Asked Questions

Q1. We are B2B — do we still need all this?

Yes. Responsibility for end-user data processed by B2B SaaS flows through contractual processor structures, but the reputational cost of a breach and the duties of a processor remain. If anything, B2B vendors face ever deeper scrutiny of data minimization and retention policies in customer security reviews.

Q2. We are a startup with no dedicated privacy staff. Where do we start?

In the order of this post. Week one: a single data-catalog YAML. Week two: the retention column and the deletion worker. Week three: the lightweight PIA added to the design review template. These three deliver most of the value for the cost.

Q3. I received a job offer from a surveillance tech company. Should I take it?

That is a question of personal values, but a decision frame can be offered. Ask in the interview which quadrant the product aims for (does it build warrant requirements, auditability, retention limits into the product?), whether internal dissent structures exist, and whether there are sales restrictions (export controls, no sales to authoritarian regimes). The reaction to those questions is itself the best data you will get.

Closing — The Responsibility of the Hands That Build

The power of the Seattle walking-tour piece lay not in accusation but in visibility — making infrastructure that was always there but unseen, seen. This post attempts the same. Surveillance infrastructure is not built by villains somewhere; it is built by our hands, processing ordinary tickets in ordinary sprints.

That is no reason for despair. To say the building hands carry responsibility is also to say the building hands carry power. One retention column, one masking middleware, one question in a design review changes the exposure surface of millions of users. Even if the world Ellison described — where everything is recorded — arrives, what gets recorded, for how long, and visible to whom remains a design question, and design is our job.

Tech ethics is, in the end, a form of professional excellence. Just as a good engineer imagines failure scenarios, a good engineer imagines misuse scenarios. That imagination, I believe, is the new fundamental skill demanded of the developer of 2026.

References