Skip to content
Published on

From Shell Basics to Advanced Operations: A Practical Shell Guide for Engineers

Authors
  • Name
    Twitter

Introduction

SSH into a server, write CI/CD pipelines, analyze logs, run deployment scripts. An engineer's day starts on the Shell and ends on the Shell. Yet surprisingly, many developers repeat "commands that work" without deeply understanding how the Shell actually works.

This article starts from Bash/Zsh basic syntax and covers pipelines, process substitution, signal handling, and performance optimization -- Shell techniques engineers need to know, with a focus on practical examples.


1. Choosing a Shell: Bash vs Zsh vs Fish

FeatureBashZshFish
Built-inMost Linux distrosmacOS (Catalina+)Separate install
POSIX CompatNearly completeNearly completeNon-compatible
Auto-completionBasic levelPowerful with pluginsBest built-in
Script CompatStandardBash compat mode availableUnique syntax
Prompt CustomizeManual PS1Oh My Zsh / Powerlevel10kBuilt-in config
Recommended ForServer scripts, CI/CDLocal dev environmentPersonal terminal

Practical rule: Write server scripts with #!/usr/bin/env bash, and use Zsh for your local interactive shell.


2. Fundamentals: Variables, Conditionals, Loops

2.1 Variable Declaration and Scope

# Local variable (current shell only)
APP_NAME="my-service"

# Environment variable (passed to child processes)
export DB_HOST="db.prod.internal"

# readonly - Prevent accidental overwrite
readonly CONFIG_PATH="/etc/app/config.yaml"

# Variable default value patterns
: "${LOG_LEVEL:=info}"          # Assign info if unset
: "${TIMEOUT:?TIMEOUT env var required}"  # Error and exit if unset
echo "${USER:-unknown}"         # Print unknown if unset (no assignment)

2.2 Conditional Patterns

# String comparison - use [[ ]] (Bash/Zsh extension)
if [[ "$ENV" == "production" ]]; then
  echo "Production mode"
elif [[ "$ENV" =~ ^(staging|dev)$ ]]; then
  echo "Non-production environment: $ENV"
else
  echo "Unknown environment"
fi

# File tests
[[ -f /etc/hosts ]]   # File exists
[[ -d /var/log ]]     # Directory exists
[[ -r "$file" ]]      # Read permission
[[ -s "$file" ]]      # File size > 0
[[ "$f1" -nt "$f2" ]] # f1 is newer than f2

# Arithmetic comparison - use (( ))
if (( retries > 3 )); then
  echo "Retry limit exceeded"
fi

2.3 Loop Patterns

# Iterate over file list - use glob (never parse ls!)
for f in /var/log/*.log; do
  [[ -f "$f" ]] || continue
  echo "Processing: $f ($(wc -l < "$f") lines)"
done

# C-style for
for (( i=0; i<10; i++ )); do
  curl -s "http://api.local/health" > /dev/null && break
  sleep 1
done

# while + read - Process file/command output line by line
while IFS=',' read -r name email role; do
  echo "Creating user: $name ($role)"
done < users.csv

# Infinite loop + exit condition
while true; do
  status=$(curl -s -o /dev/null -w '%{http_code}' http://api/health)
  [[ "$status" == "200" ]] && break
  sleep 5
done

3. Pipeline Deep Dive

3.1 Pipeline Fundamentals

A pipe (|) connects the stdout of the preceding command to the stdin of the following command. Each command runs simultaneously in a separate subshell.

# Top 10 connecting IPs
awk '{print $1}' /var/log/nginx/access.log \
  | sort \
  | uniq -c \
  | sort -rn \
  | head -10

# pipefail - Detect mid-pipeline failures
set -o pipefail
curl -s "$URL" | jq '.items[]' | wc -l
# If curl fails, the entire pipeline exit code != 0

3.2 Process Substitution

Pass the output of two commands to another command as if they were files.

# Compare package lists from two servers
diff <(ssh server1 'rpm -qa | sort') <(ssh server2 'rpm -qa | sort')

# Compare two API responses
diff <(curl -s api-v1/users | jq -S .) <(curl -s api-v2/users | jq -S .)

# tee + process substitution: Send one stream to multiple destinations simultaneously
cat access.log \
  | tee >(grep 'ERROR' > errors.log) \
  | tee >(awk '{print $1}' | sort -u > unique_ips.txt) \
  | wc -l

3.3 Advanced Redirection Patterns

# Capture stderr only
errors=$(command 2>&1 1>/dev/null)

# Redirect both stdout + stderr to file
command &> output.log        # Bash 4+
command > output.log 2>&1    # POSIX compatible

# Here String
grep "pattern" <<< "$variable"

# File Descriptor usage
exec 3>/tmp/audit.log         # Open FD 3
echo "Task started: $(date)" >&3
do_something
echo "Task completed: $(date)" >&3
exec 3>&-                     # Close FD 3

4. Functions and Error Handling

4.1 Function Definition Patterns

# Defensive function structure
log() {
  local level="${1:?level required (INFO|WARN|ERROR)}"
  local message="${2:?message required}"
  printf '[%s] [%s] %s\n' "$(date '+%Y-%m-%d %H:%M:%S')" "$level" "$message" >&2
}

retry() {
  local max_attempts="${1:?}"
  local delay="${2:?}"
  shift 2
  local attempt=1

  until "$@"; do
    if (( attempt >= max_attempts )); then
      log ERROR "Command failed ($max_attempts attempts): $*"
      return 1
    fi
    log WARN "Retry $attempt/$max_attempts (after ${delay}s): $*"
    sleep "$delay"
    (( attempt++ ))
  done
}

# Usage
retry 5 3 curl -sf http://api.internal/health

4.2 Safe Script Header

#!/usr/bin/env bash
set -euo pipefail
IFS=$'\n\t'

# set -e: Exit immediately on command failure
# set -u: Error on undefined variable usage
# set -o pipefail: Detect mid-pipeline failures
# IFS: Restrict word splitting to newline and tab

# Cleanup trap
cleanup() {
  local exit_code=$?
  rm -f "$TMPFILE"
  log INFO "Exiting (exit code: $exit_code)"
  exit "$exit_code"
}
trap cleanup EXIT
trap 'log ERROR "Error at line $LINENO"; exit 1' ERR

TMPFILE=$(mktemp)

5. Text Processing Pipelines

5.1 Tool Comparison Table

ToolPurposeSpeedComplexity
grepPattern matching/filteringVery fastLow
sedStream editing/replacingFastMedium
awkField-based processingFastHigh
jqJSON processingFastMedium
yqYAML processingModerateMedium
cut/pasteSimple field extract/mergeVery fastLow
xargsstdin to argumentsFastMedium

5.2 Practical Examples

# 1. Top 10 request paths with 5xx errors from log
awk '$9 ~ /^5[0-9]{2}$/ {print $7}' access.log \
  | sort | uniq -c | sort -rn | head -10

# 2. Extract specific fields from JSON API response + convert to CSV
curl -s https://api.example.com/users \
  | jq -r '.[] | [.id, .name, .email] | @csv'

# 3. Batch update image tags in YAML config
yq -i '.spec.template.spec.containers[].image |= sub("v1\\.2\\.3", "v1.2.4")' \
  k8s/deployment.yaml

# 4. Parallel search of large logs (xargs + grep)
find /var/log -name '*.log' -mtime -1 -print0 \
  | xargs -0 -P4 grep -l 'OutOfMemoryError'

# 5. Sum of 3rd column in CSV
awk -F',' '{sum += $3} END {printf "Total: %.2f\n", sum}' sales.csv

6. Signal Handling and Process Management

6.1 Key Signals

SignalNumberDefault ActionPurpose
SIGHUP1TerminateDaemon config reload
SIGINT2TerminateCtrl+C
SIGQUIT3Core dumpCtrl+\
SIGKILL9Force killCannot be trapped
SIGTERM15TerminateGraceful shutdown
SIGUSR110User-definedLog level change etc
SIGSTOP19SuspendCannot be trapped

6.2 Graceful Shutdown Pattern

#!/usr/bin/env bash
set -euo pipefail

RUNNING=true
CHILD_PID=""

shutdown() {
  log INFO "Shutdown signal received, starting graceful shutdown"
  RUNNING=false
  if [[ -n "$CHILD_PID" ]]; then
    kill -TERM "$CHILD_PID" 2>/dev/null || true
    wait "$CHILD_PID" 2>/dev/null || true
  fi
}

trap shutdown SIGTERM SIGINT

while $RUNNING; do
  process_job &
  CHILD_PID=$!
  wait "$CHILD_PID" || true
  CHILD_PID=""
  sleep 5
done

log INFO "Clean shutdown complete"

6.3 Job Control

# Background execution + wait for completion
build_frontend &
pid1=$!
build_backend &
pid2=$!

wait "$pid1" "$pid2"
echo "Build complete"

# nohup - Keep running after session ends
nohup long_task.sh > /var/log/task.log 2>&1 &
disown

# timeout - Limit command execution time
timeout 30s curl -s http://slow-api.com/data

7. Arrays and Associative Arrays

# Indexed array
servers=("web01" "web02" "web03" "db01")
echo "Server count: ${#servers[@]}"
echo "First: ${servers[0]}"
echo "All: ${servers[@]}"

# Array slice
web_servers=("${servers[@]:0:3}")

# Append to array
servers+=("cache01")

# Associative array (Bash 4+)
declare -A service_ports
service_ports=(
  [nginx]=80
  [api]=8080
  [redis]=6379
  [postgres]=5432
)

for svc in "${!service_ports[@]}"; do
  echo "$svc -> ${service_ports[$svc]}"
done

# Safely construct commands with arrays
curl_opts=(
  -s
  --max-time 10
  --retry 3
  -H "Authorization: Bearer $TOKEN"
  -H "Content-Type: application/json"
)
curl "${curl_opts[@]}" "$API_URL"

8. Advanced Patterns

8.1 Subshell vs Command Group

# Subshell () - Separate process, does not modify parent variables
(cd /tmp && tar czf backup.tar.gz /var/data)
# Current directory unchanged

# Command Group {} - Runs in the current shell
{
  echo "=== System Info ==="
  uname -a
  free -h
  df -h
} > system_report.txt

8.2 Dynamic Variable Names (nameref)

# Bash 4.3+ nameref
setup_db() {
  local -n result=$1  # nameref
  result="postgresql://localhost:5432/app"
}

setup_db DB_URL
echo "$DB_URL"  # postgresql://localhost:5432/app

8.3 Parallel Execution Patterns

# Parallel processing with GNU parallel
cat server_list.txt | parallel -j10 'ssh {} "df -h / | tail -1"'

# xargs parallel
find . -name '*.png' -print0 \
  | xargs -0 -P$(nproc) -I{} convert {} -resize 50% resized/{}

# wait + array for parallel control
pids=()
for host in web0{1..5}; do
  deploy.sh "$host" &
  pids+=($!)
done

failed=0
for pid in "${pids[@]}"; do
  wait "$pid" || (( failed++ ))
done
echo "Deployment complete: $failed failure(s)"

9. Performance Optimization Checklist

ItemSlow PatternFast Pattern
External cmds in loopsfor f in ...; do cat "$f" | grep ...; donegrep -r ... /path/
Excessive subshellsresult=$(echo "$var" | sed ...)result="${var//old/new}"
Unnecessary pipescat file | grep patterngrep pattern file
Sort then uniquesort | uniqsort -u
Line count of large filecat file | wc -lwc -l < file
File existence checkls /path/file 2>/dev/null[[ -f /path/file ]]
Extract from stringecho "$s" | cut -d. -f1"${s%%.*}" (Parameter Expansion)

Key Parameter Expansion Patterns

file="/var/log/nginx/access.log"

echo "${file##*/}"    # access.log (strip path)
echo "${file%.*}"     # /var/log/nginx/access (strip extension)
echo "${file%%/*}"    # (empty string, before first /)
echo "${file%.log}.bak"  # /var/log/nginx/access.bak

version="v1.2.3-rc1"
echo "${version#v}"       # 1.2.3-rc1
echo "${version%-*}"      # v1.2.3
echo "${version^^}"       # V1.2.3-RC1 (uppercase)
echo "${version,,}"       # v1.2.3-rc1 (lowercase)
echo "${#version}"        # 10 (string length)

10. Practical Script Template

Deployment Script

#!/usr/bin/env bash
set -euo pipefail

readonly SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
readonly APP_NAME="${1:?Usage: $0 <app-name> <version>}"
readonly VERSION="${2:?Usage: $0 <app-name> <version>}"
readonly DEPLOY_ENV="${DEPLOY_ENV:-staging}"
readonly LOG_FILE="/var/log/deploy/${APP_NAME}-$(date +%Y%m%d-%H%M%S).log"

# --- Logging ---
log()  { printf '[%s] [%-5s] %s\n' "$(date +%T)" "$1" "$2" | tee -a "$LOG_FILE" >&2; }
info() { log INFO "$1"; }
warn() { log WARN "$1"; }
die()  { log ERROR "$1"; exit 1; }

# --- Pre-flight checks ---
preflight() {
  info "Starting pre-flight checks"
  command -v docker   >/dev/null || die "docker is not installed"
  command -v kubectl  >/dev/null || die "kubectl is not installed"

  local context
  context=$(kubectl config current-context)
  [[ "$context" == *"$DEPLOY_ENV"* ]] || die "kubectl context($context) does not match $DEPLOY_ENV"
  info "Pre-flight checks passed (context: $context)"
}

# --- Deploy ---
deploy() {
  info "Starting deployment: $APP_NAME:$VERSION -> $DEPLOY_ENV"

  kubectl set image "deployment/$APP_NAME" \
    "$APP_NAME=registry.internal/$APP_NAME:$VERSION" \
    --record

  info "Waiting for rollout..."
  if ! kubectl rollout status "deployment/$APP_NAME" --timeout=300s; then
    warn "Rollout failed, executing rollback"
    kubectl rollout undo "deployment/$APP_NAME"
    die "Deployment failed -> Rollback complete"
  fi

  info "Deployment successful"
}

# --- Main ---
main() {
  mkdir -p "$(dirname "$LOG_FILE")"
  info "=== Deploying $APP_NAME $VERSION ($DEPLOY_ENV) ==="
  preflight
  deploy
  info "=== Deployment complete ==="
}

main "$@"

Final Checklist

  • Did you declare set -euo pipefail at the top of the script?
  • Did you wrap all variables in double quotes ("$var")?
  • Did you avoid passing external input (user input, filenames) directly to commands?
  • Did you ensure temp file/process cleanup with trap?
  • Did you minimize unnecessary external command calls inside loops?
  • Did you pass static analysis with ShellCheck (shellcheck script.sh)?
  • If POSIX compatibility is needed, did you avoid Bash-specific syntax?

Shell is a tool that is "fast when you know it, dangerous when you don't." Build a strong foundation in the basics and make safe patterns a habit, and you can confidently solve problems in any server environment.