Skip to content
Published on

AI Era Survival Guide Part 2: Transitioning from Developer to LLM Engineer / AI Engineer

Authors

Introduction: In an Age Where AI Writes Code, What Is Your Value?

Do you remember the first time you used Cursor?

The moment you think "this... is faster than me," a vague unease washes over you. GitHub Copilot has already become part of many developers' daily lives. Claude and GPT-4 generate an entire backend API in 30 minutes from a spec document. AI agents like Devin independently process GitHub issues and open PRs.

"Where am I supposed to be now?"

I will answer that question honestly. Differentiating yourself through simple code writing alone is getting harder. But paradoxically, demand is exploding for engineers who deeply understand AI technology itself and build AI products.

That is the LLM engineer — also called the AI engineer.

These people are not afraid of AI. They are the ones who build, connect, and optimize AI. And people with existing software development experience are the most advantaged to enter this role.

This post presents a realistic roadmap for anyone with backend, full-stack, or any software development experience to transition to LLM engineer within 12 to 18 months.


1. What Is an LLM Engineer?

Differences Between Similar Job Titles

Let us start by clarifying the titles that look similar.

Prompt engineer: Someone who designs prompts to elicit better responses from LLMs. Got a lot of attention early on, but instead of solidifying into an independent role, it has been trending toward integration with other functions.

LLM engineer / AI engineer: A software engineer who designs and implements AI applications. Covers prompt design, RAG systems, AI agents, fine-tuning, and production deployment — a comprehensive role. Currently the highest-demand category.

ML engineer: Someone who researches and develops machine learning models themselves. Usually requires an ML or statistics background and deep mathematical understanding. Higher barrier to entry than LLM engineer.

MLOps engineer: Focused on the deployment, operation, and monitoring of ML models. The intersection of DevOps and ML.

For software developers making a transition, the most realistic path is LLM engineer / AI engineer.

What LLM Engineers Actually Do

Analyzing job postings and real-world work, an LLM engineer's daily life looks like this:

LLM API integration and optimization

  • Implementing features using OpenAI, Anthropic, and Google Gemini APIs
  • Cost optimization: which model to use when, caching strategies, batch processing
  • Minimizing latency: streaming responses, parallel processing

RAG system design and implementation

  • Building pipelines that allow LLMs to utilize company documents and databases
  • Chunking strategy, embedding model selection, vector DB operations
  • Measuring and improving retrieval quality

AI agent and workflow construction

  • Developing AI agents that autonomously use multiple tools
  • Multi-agent systems (CrewAI, LangGraph)
  • Human-in-the-loop design: deciding when a human needs to intervene

LLM fine-tuning

  • Additional training of existing models for specific domains or styles
  • Dataset collection, cleaning, and training pipeline construction
  • Designing evaluation metrics

AI product evaluation and monitoring

  • Hallucination detection
  • Measuring response quality (automated + human evaluation)
  • Cost and latency dashboards

2. Why Your Development Experience Is a Major Advantage

Many developers hesitate about the LLM engineer transition because they think "ML is hard and needs a lot of math." But an LLM engineer is not an ML researcher. Your software development experience is actually a strength here.

Why SW Development Experience Shines in AI Product Development

API design skills: Building RAG systems and AI agents to production quality requires well-designed APIs and service structure. Pure data scientists often have weaknesses here.

Database knowledge: Vector databases are a new technology, but concepts of indexing, query optimization, and transactions carry over naturally from existing DB knowledge.

Performance optimization: LLM API calls are expensive. Caching, batch processing, async processing — these are techniques you already learned from backend development.

Systems thinking: Understanding how each component in a complex AI pipeline connects to the others, and how failures propagate, comes from system design experience.

Code quality: The code quality in many AI development projects is surprisingly low — notebooks going straight to production with no tests. Software developers' habits around test writing, code review, and CI/CD shine here.


3. LLM Engineer Transition Roadmap (12 Months)

Months 1–2: LLM Fundamentals

Understanding How LLMs Work

You do not need to know all the deep learning math, but you do need to understand the basic concepts:

  • Transformer architecture: Why the attention mechanism matters, how self-attention understands context
  • Tokenization: How text is split into tokens, why Korean and English have different token counts
  • Context window: The limit on how much text a model can process in one shot
  • Temperature, Top-p: Parameters that control the diversity of output
  • Few-shot, Zero-shot: Ways of providing examples in a prompt

Recommended resource: Andrej Karpathy's YouTube series "Neural Networks: Zero to Hero" — the best material for building intuition before formulas.

Hands-On with ChatGPT, Claude, and Gemini APIs

Create accounts and actually call the APIs.

# Basic OpenAI API usage
from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful code reviewer."},
        {"role": "user", "content": "Please review this Python function: def add(a, b): return a + b"}
    ],
    temperature=0.7,
    max_tokens=500
)

print(response.choices[0].message.content)
print(f"Tokens used: {response.usage.total_tokens}")
print(f"Cost (gpt-4o rate): ${response.usage.total_tokens * 0.000005:.6f}")

Prompt Engineering Fundamentals

  • Zero-shot: Asking directly without examples
  • Few-shot: Providing 2–3 examples before asking
  • Chain-of-Thought (CoT): Asking the model to "think step by step" improves complex reasoning quality
  • System prompt: Defining the model's role and behavior
  • Structured Output: Forcing answers into JSON format
# Structured output example
from openai import OpenAI
from pydantic import BaseModel
from typing import List

class CodeReview(BaseModel):
    issues: List[str]
    suggestions: List[str]
    severity: str  # "low", "medium", "high"
    score: int     # 1-10

client = OpenAI()

response = client.beta.chat.completions.parse(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a senior developer. Review the code and return results as JSON."},
        {"role": "user", "content": "Please review the following code: [code content]"}
    ],
    response_format=CodeReview
)

review = response.choices[0].message.parsed
print(f"Severity: {review.severity}")
print(f"Score: {review.score}/10")

Months 3–4: RAG Systems

RAG (Retrieval-Augmented Generation) is a core skill for LLM engineers. It allows LLMs to utilize company internal documents and recent data that they would otherwise know nothing about.

How RAG Works

  1. Split documents into small chunks
  2. Convert each chunk into a vector (an array of numbers) via embedding
  3. Store in a vector database
  4. When a question comes in, retrieve chunks similar to the question
  5. Provide the retrieved chunks and the question to the LLM
  6. The LLM generates an answer based on the context

Vector Databases

  • Chroma: Perfect for local development. One Python package and you can start immediately
  • Pinecone: Cloud-managed. Widely used in production
  • pgvector: Adds vector capability to existing PostgreSQL. Attractive if you already use Postgres
# Building a RAG system with LangChain
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import Chroma
from langchain.chains import RetrievalQA

# 1. Load documents
loader = PyPDFLoader("company_docs.pdf")
documents = loader.load()

# 2. Chunking
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200
)
chunks = text_splitter.split_documents(documents)

# 3. Embedding and vector DB storage
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory="./chroma_db"
)

# 4. Create RAG chain
llm = ChatOpenAI(model="gpt-4o", temperature=0)
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectorstore.as_retriever(search_kwargs={"k": 5})
)

# 5. Ask a question
result = qa_chain.invoke({"query": "What is the company's vacation policy?"})
print(result["result"])

Hands-On Project: Build a QA chatbot on company internal documents (HR policies, technical docs, FAQs). This is the most realistic first RAG project.


Months 5–6: AI Agents

Agents go beyond simply answering questions — they are AI systems that use tools to achieve goals.

Core Components of an Agent

  • Tools: Code executor, web search, DB queries, API calls, etc.
  • Memory: Conversation history, long-term memory
  • Planning: Formulating step-by-step plans to achieve goals
# LangChain Agent example: code review agent
from langchain import hub
from langchain.agents import create_react_agent, AgentExecutor
from langchain_openai import ChatOpenAI
from langchain.tools import Tool

# Define tools
def analyze_code_complexity(code: str) -> str:
    """Analyzes the complexity of the code."""
    # In a real implementation, use a library like radon
    lines = code.count('\n')
    functions = code.count('def ')
    return f"Lines of code: {lines}, Function count: {functions}, Complexity score: {lines // 10}"

def check_security_issues(code: str) -> str:
    """Checks for basic security vulnerabilities."""
    issues = []
    if 'eval(' in code:
        issues.append("eval() usage detected - security risk")
    if 'shell=True' in code:
        issues.append("shell=True usage detected - command injection risk")
    if not issues:
        return "No obvious security vulnerabilities found"
    return "\n".join(issues)

tools = [
    Tool(
        name="code_complexity_analysis",
        func=analyze_code_complexity,
        description="Analyzes the complexity of Python code. Input: code string"
    ),
    Tool(
        name="security_vulnerability_check",
        func=check_security_issues,
        description="Checks code for security vulnerabilities. Input: code string"
    )
]

llm = ChatOpenAI(model="gpt-4o", temperature=0)
prompt = hub.pull("hwchase17/react")

agent = create_react_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

result = agent_executor.invoke({
    "input": "Please do a comprehensive review of the following code: [code content]"
})

Complex Agent Workflows with LangGraph

LangGraph defines agent state machines as graphs. Well-suited for complex branching, loops, and multi-agent collaboration.

CrewAI Multi-Agent Systems

A system in which multiple specialized agents collaborate:

from crewai import Agent, Task, Crew

# Define agents
researcher = Agent(
    role="Technical Researcher",
    goal="Research the latest technology trends on a given topic",
    backstory="Technical analyst with 10 years of experience",
    verbose=True
)

writer = Agent(
    role="Technical Blog Writer",
    goal="Write research findings as an accessible blog post",
    backstory="Technical blog writer specializing in developer content",
    verbose=True
)

# Define tasks
research_task = Task(
    description="Research the latest trends in LLM engineering",
    agent=researcher
)

writing_task = Task(
    description="Write a blog post based on research findings",
    agent=writer
)

# Run the crew
crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, writing_task]
)
result = crew.kickoff()

Months 7–9: LLM Fine-Tuning and Evaluation

When Is Fine-Tuning Needed?

Fine-tuning is not a cure-all. Only consider it in these situations:

  • When specific domain terminology or style is needed (legal, medical, financial)
  • When you want to get a specific output format consistently
  • When prompt iteration cannot produce the desired result
  • When the fine-tuning cost is lower than the API cost

If the problem can be solved with RAG or prompt engineering, start there.

Efficient Fine-Tuning with LoRA/QLoRA

Full fine-tuning requires enormous GPU resources. LoRA (Low-Rank Adaptation) trains only a subset of the model's parameters, dramatically reducing cost.

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import LoraConfig, get_peft_model, TaskType
from trl import SFTTrainer
from datasets import load_dataset

# Load base model
model_name = "meta-llama/Llama-3-8B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    load_in_4bit=True,  # QLoRA: 4-bit quantization
    device_map="auto"
)

# LoRA configuration
lora_config = LoraConfig(
    r=16,           # Rank (fewer parameters as rank decreases)
    lora_alpha=32,  # Scaling parameter
    target_modules=["q_proj", "v_proj"],  # Apply only to attention layers
    lora_dropout=0.05,
    bias="none",
    task_type=TaskType.CAUSAL_LM
)

model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# Output example: trainable params: 3,407,872 (0.04% of all parameters)

LLM Evaluation: RAGAS

A framework for measuring the quality of RAG systems:

from ragas import evaluate
from ragas.metrics import (
    faithfulness,        # Is the answer faithful to the context?
    answer_relevancy,    # Is the answer relevant to the question?
    context_precision,   # Is the retrieved context accurate?
    context_recall       # Were all necessary contexts retrieved?
)

from datasets import Dataset

data = {
    "question": ["What is the company vacation policy?", "When is payday?"],
    "answer": ["15 days of vacation are provided.", "Payment is made on the 25th of each month."],
    "contexts": [
        ["15 days of vacation are provided annually; unused days cannot be carried over..."],
        ["Payday is the 25th of each month; if a holiday, payment is made the day before..."]
    ],
    "ground_truth": ["15 days annually", "25th of each month"]
}

dataset = Dataset.from_dict(data)
result = evaluate(dataset, metrics=[faithfulness, answer_relevancy])
print(result)

Months 10–12: Production AI Systems

Serving Open-Source Models with vLLM

When you need to serve open-source models yourself for cost savings or data privacy:

# Install vLLM and serve a model
pip install vllm

# Start the API server
python -m vllm.entrypoints.openai.api_server \
  --model meta-llama/Llama-3-8B-Instruct \
  --max-model-len 4096 \
  --gpu-memory-utilization 0.9

vLLM provides a server that is compatible with the OpenAI API, so you can use existing OpenAI SDK code by just changing the URL.

Cost Optimization Strategies

# Cost monitoring wrapper example
import time
from functools import wraps
from dataclasses import dataclass

COST_PER_TOKEN = {
    "gpt-4o": {"input": 0.000005, "output": 0.000015},
    "gpt-4o-mini": {"input": 0.00000015, "output": 0.0000006},
    "claude-3-5-sonnet": {"input": 0.000003, "output": 0.000015},
}

@dataclass
class APICall:
    model: str
    input_tokens: int
    output_tokens: int
    latency_ms: float
    cost_usd: float

def track_llm_cost(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
        start = time.time()
        result = func(*args, **kwargs)
        elapsed = (time.time() - start) * 1000

        # Track usage (in a real implementation, save to DB)
        print(f"Latency: {elapsed:.0f}ms")
        return result
    return wrapper

Monitoring with LangSmith

LangSmith is the monitoring and debugging tool for the LangChain ecosystem. It allows you to track how prompts execute in production, and where failures occur.

import os
from langchain_core.tracers import LangChainTracer

# Set environment variables
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "your-langsmith-api-key"

# All subsequent LangChain calls will be traced automatically

4. Real Project Ideas (For Your Portfolio)

"I learned this" is far less convincing than "I built this" when applying for jobs. The following projects are the ones that catch attention in actual hiring interviews.

Project 1: Personalized Code Review Bot

  • Automatically posts AI code review comments on GitHub PRs
  • Learns the codebase's style guide for consistent reviews
  • Tech stack: GitHub Actions + Claude API + GitHub API
  • Differentiator: Uses RAG to learn team-specific conventions, not just generic reviews

Project 2: Company Knowledge Base QA Chatbot

  • A chatbot that searches internal documents from Confluence, Notion, Slack, etc.
  • Multimodal: handles documents containing PDFs and images
  • Tech stack: FastAPI + LangChain + pgvector + React
  • Differentiator: Source citation feature, confidence score display

Project 3: News Analysis Agent

  • Collects, summarizes, and categorizes major news daily
  • Sentiment analysis on specific companies or keywords
  • Tech stack: CrewAI + SerpAPI + Anthropic Claude + Cron
  • Differentiator: Optimized for language-specific content using a fine-tuned LLM

Project 4: SQL Auto-Generator (Text-to-SQL)

  • A system for querying databases in natural language
  • "Show me the top 10 customers by purchase volume last month"
  • Tech stack: LangChain SQL Agent + Streamlit + DuckDB
  • Differentiator: Advanced handling of complex joins and subqueries

Project 5: AI Interview Coach

  • Analyzes a JD (Job Description) and resume to generate expected interview questions
  • Listens to answers and provides feedback (including voice input)
  • Tech stack: OpenAI Whisper + GPT-4o + FastAPI + Next.js
  • Differentiator: Evaluation model fine-tuned on real interview data

5. The LLM Engineer Job Market

Skills Required in Job Postings

Analyzing actual LLM engineer job postings:

Required skills

  • Python proficiency (goes without saying)
  • LLM API usage experience (OpenAI, Anthropic, Google)
  • RAG system implementation experience
  • Vector DB (Pinecone, Weaviate, pgvector, etc.)
  • LangChain or LlamaIndex experience

Preferred skills

  • Fine-tuning experience (LoRA, PEFT)
  • Model serving experience (vLLM, TGI)
  • MLOps (MLflow, W&B)
  • Cloud experience (AWS Bedrock, GCP Vertex AI, Azure AI)

The key point: They want to see things you have actually built, not theory. Your portfolio is everything.

Salary Ranges (Estimated, 2026)

  • Junior LLM engineer (1–2 years): 50–70 million KRW
  • Mid-level (3–5 years): 70–100 million KRW
  • Senior (5+ years): 100–150 million KRW
  • Big tech / AI startup: 150+ million KRW + stock options

Among software developers, the LLM engineer transition is currently one of the career moves with the highest salary growth.

Korea vs. Overseas

Demand for LLM engineers in the Korean market is explosive too. Naver, Kakao, Samsung, LG, and countless startups are building AI products.

Remote work overseas is also worth considering. Search "LLM Engineer remote" on LinkedIn and Glassdoor and you can find positions at US startups while living in Korea. If your English is strong enough, the salary difference can be 2 to 3 times.


6. The First Project You Can Start Right Now

Enough theory. Here is the simplest but most impressive project you can build today.

Build "Your Own Book Recommendation Chatbot" (2–3 hours)

This project lets you experience the essence of RAG while producing something genuinely useful.

Setup

pip install openai langchain chromadb tiktoken

Step 1: Prepare your data

Create a text file with information on 10–20 books you like: title, author, plot summary, main themes, recommended reader.

Step 2: Build the vector DB

from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_community.document_loaders import TextLoader
import os

os.environ["OPENAI_API_KEY"] = "your-api-key"

# Load book data
loader = TextLoader("books.txt", encoding="utf-8")
documents = loader.load()

# Chunking
splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50
)
chunks = splitter.split_documents(documents)

# Create vector DB
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory="books_db"
)
print(f"Chunks stored: {len(chunks)}")

Step 3: Implement the chatbot

from langchain_openai import ChatOpenAI
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory

# LLM configuration
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.7)

# Conversation memory
memory = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=True
)

# RAG chain
qa_chain = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=vectorstore.as_retriever(search_kwargs={"k": 3}),
    memory=memory,
    verbose=True
)

# Conversation loop
print("Book recommendation chatbot. What kind of book are you looking for?")
while True:
    query = input("You: ")
    if query.lower() in ["quit", "exit"]:
        break

    response = qa_chain.invoke({"question": query})
    print(f"Bot: {response['answer']}\n")

Step 4: Improve it

This is already a working RAG chatbot. From here, explore ways to improve it:

  • Add more book data
  • Display sources in search results
  • Build a web interface with Streamlit
  • Add long-term memory that remembers reader preferences

Upload this project to GitHub, write up how it works in the README, add screenshots, and you have a strong portfolio item.


Closing: Move Faster Than the Fear

The speed at which AI is replacing developers, versus the speed at which developers using AI are creating new value — which is faster?

I believe the latter is faster. And LLM engineers are at the center of that second speed.

Already knowing software development is an enormous head start. If you can handle Python, connect APIs, and have experience designing systems — the foundational fitness for an LLM engineer is already there.

What remains is understanding the characteristics of LLMs and practicing building practical things with them.

Today, get an OpenAI API key and run the first code block above. In the moment when AI first responds to your input, you will begin to see, just a little, what you will look like twelve months from now.


References