Skip to content
Published on

English Speaking Shadowing System Design: 90-Day Routine and Measurement Metrics

Authors
  • Name
    Twitter
English Speaking Shadowing System Design: 90-Day Routine and Measurement Metrics

Scientific Basis for Why Shadowing Is Effective for Speaking

Shadowing is a training method where you listen to a native speaker's audio and repeat it with a 0.5-1 second delay. It was systematized by Professor Kadota Shuhei, a Japanese interpretation training expert, in second language acquisition research and is currently used as a core training method in interpreter training programs worldwide.

The reasons why shadowing is effective can be found in neuroscience.

Working Memory Training: Processing listening and speaking simultaneously strengthens the phonological loop. This is the core mechanism explained in Baddeley's working memory model.

Motor Learning: Repeatedly moving the mouth, tongue, and vocal cords in the same pattern forms muscle memory. It works on the same principle as piano practice. Initially done consciously, it becomes automated through repetition.

Prosody Internalization: English rhythm, stress, and intonation patterns cannot be learned word by word. They are naturally internalized by repeating entire sentences as a whole.

According to Stephen Krashen's Input Hypothesis, "comprehensible input" is the key to language acquisition. Shadowing adds "forced output" to this, bridging the gap between input and output.

The 4-Stage Shadowing Progression Model

Shadowing requires only 15 minutes per day, but skipping stages reduces its effectiveness by half. Follow these 4 stages in order.

Stage 1: Mumbling - Weeks 1-2

Goal: Getting accustomed to the rhythm and flow of speech

Method:

  1. Listen to the audio and move your lips, following along in a quiet voice
  2. You don't need to say every word accurately. The goal is to ride the rhythm
  3. Don't look at the script

What to measure at this stage: The proportion of 15 daily minutes where you "followed along without stopping." 30-40% is normal at first.

Stage 2: Synchronized Shadowing - Weeks 3-4

Goal: Speaking simultaneously with the native speaker

Method:

  1. Repeat the same audio at least 5 times before starting to shadow
  2. Speak as simultaneously as possible with the native speaker's utterance (within 0.5 second delay)
  3. Check with the script while shadowing

What to measure at this stage: Synchronization rate (proportion of words spoken simultaneously with the native speaker). Move to the next stage when you reach 60% or higher.

Stage 3: Prosody Shadowing - Weeks 5-8

Goal: Accurately reproducing stress, rhythm, and intonation

Method:

  1. Shadow while recording yourself
  2. Compare the original with your own recording
  3. Focus particularly on stress placement, linking, and reduction

Key Point - English pronunciation patterns that Korean speakers often miss:

PatternExampleKorean pronunciationNative pronunciation
Function word reductionwant towon-teu tuwanna
Linkingpick it uppik it eoppi-ki-dup
Stress shiftI didn't SAY thatEven stressStress on SAY
Vowel reductioncomfortablekeom-po-teo-beulKUMF-ter-bul
t-droppinginternetin-teo-netin-er-net
Consonant clustersaskede-seu-keu-deuaskt

Stage 4: Content Shadowing - Weeks 9-12

Goal: Shadowing naturally while understanding the content

Method:

  1. Shadow without a script while simultaneously grasping the content
  2. Immediately after shadowing, summarize what you heard in your own words (Retelling)
  3. Measure your first-attempt shadowing success rate with new material

90-Day Training Program Details

Material Selection Criteria

Material selection is half the battle. Choosing the wrong material wastes 90 days.

Characteristics of good material:

  • Speed of 120-150 words per minute (for beginner-intermediate level)
  • 2-5 minutes per session (too long causes loss of focus)
  • Content with available scripts (subtitles)
  • Topics related to your interests or work

Recommended material (by difficulty):

DifficultyMaterialSpeed (WPM)Features
BeginnerVOA Learning English90-110Slow speed, clear pronunciation
BeginnerTED-Ed (educational animations)110-130Easy to understand with visual aids
IntermediateTED Talks (with subtitles)130-160Various topics, clear delivery
IntermediateBBC 6 Minute English140-160British pronunciation, structured dialogue
Upper-intermediateNPR Planet Money150-170Economics/business topics
Upper-intermediateThe Daily (NYT podcast)160-180News pace, various accents
AdvancedJoe Rogan Experience170-200+Natural conversation, includes slang
AdvancedMovie/TV drama dialogueVariesEmotional expression, fast dialogue switching

Weekly Training Plan

# 90-Day Shadowing Program Configuration
program:
  name: '90-Day Shadowing System'
  daily_minutes: 15
  phases:
    - name: 'Foundation'
      weeks: 1-2
      stage: 'Mumbling'
      material: 'VOA Learning English'
      daily_routine:
        - listen_without_speaking: 3min
        - mumbling_practice: 10min
        - vocabulary_review: 2min
      target: 'Rhythm following rate 40% or above'

    - name: 'Sync'
      weeks: 3-4
      stage: 'Synchronized'
      material: 'TED-Ed'
      daily_routine:
        - full_speed_listening: 2min
        - synchronized_shadowing: 10min
        - script_check: 3min
      target: 'Synchronization rate 60% or above'

    - name: 'Prosody'
      weeks: 5-8
      stage: 'Prosody Shadowing'
      material: 'TED Talks'
      daily_routine:
        - shadowing_with_recording: 10min
        - compare_with_original: 3min
        - note_problem_sounds: 2min
      target: 'Intonation similarity self-assessment 7/10 or above'

    - name: 'Content'
      weeks: 9-12
      stage: 'Content Shadowing'
      material: 'NPR / BBC'
      daily_routine:
        - blind_shadowing: 8min
        - retelling_in_own_words: 5min
        - new_vocabulary_log: 2min
      target: 'First-attempt shadowing success rate 70% or above'

Actual Daily 15-Minute Routine (Phase 3 Example)

[0:00-0:30]  Play today's material - Listen through once (grasp key points)
[0:30-2:00]  1st shadowing - Start recording
[2:00-4:00]  Play recording + Compare with original - Note 3 differences
[4:00-8:00]  Focused repetition on problem sections (at least 5 times)
[8:00-12:00] 2nd shadowing - Record full pass
[12:00-14:00] Compare 2nd recording vs original - Confirm improvements
[14:00-15:00] Note 3 expressions learned today (add to vocabulary app)

Measurement Metrics System

If you evaluate by "feeling," you won't know if your skills have improved. Record the following 4 metrics weekly.

4 Core Metrics

MetricMeasurement MethodTarget (After 12 weeks)
WPM (Words Per Minute)Record 1-minute free speech, count words120 WPM or above
Synchronization rateTiming match rate between original and shadowing recording80% or above
Pronunciation accuracyMeasure recognition rate with Google Speech-to-Text85% or above
Retelling completenessListen to 2-min audio and summarize within 30 seconds (key point reproduction rate)2 out of 3 key points or more

Automated Pronunciation Accuracy Measurement Script

You can automatically measure the recognition rate of shadowing recordings using the Google Cloud Speech-to-Text API.

import difflib
from google.cloud import speech_v1 as speech

def measure_accuracy(audio_path: str, expected_text: str) -> dict:
    """Measures pronunciation accuracy of a shadowing recording file."""
    client = speech.SpeechClient()

    with open(audio_path, "rb") as f:
        audio = speech.RecognitionAudio(content=f.read())

    config = speech.RecognitionConfig(
        encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
        sample_rate_hertz=16000,
        language_code="en-US",
        enable_word_time_offsets=True,
    )

    response = client.recognize(config=config, audio=audio)

    recognized = " ".join(
        result.alternatives[0].transcript
        for result in response.results
    ).lower().split()

    expected = expected_text.lower().split()

    # Calculate word-level match rate
    matcher = difflib.SequenceMatcher(None, expected, recognized)
    accuracy = matcher.ratio() * 100

    # Extract list of missed words
    missed_words = []
    for tag, i1, i2, j1, j2 in matcher.get_opcodes():
        if tag in ("replace", "delete"):
            missed_words.extend(expected[i1:i2])

    return {
        "accuracy_pct": round(accuracy, 1),
        "total_expected": len(expected),
        "total_recognized": len(recognized),
        "missed_words": missed_words[:10],  # Top 10 only
    }

# Usage example
result = measure_accuracy(
    "recordings/2026-03-04-ted-talk.wav",
    "The greatest glory in living lies not in never falling "
    "but in rising every time we fall"
)
print(f"Accuracy: {result['accuracy_pct']}%")
print(f"Missed words: {result['missed_words']}")

Weekly Log Template

import json
from datetime import date, timedelta

def weekly_log(week_num: int, data: dict) -> dict:
    """Records weekly shadowing performance and compares with previous week."""
    log = {
        "week": week_num,
        "period": f"{date.today() - timedelta(days=6)} ~ {date.today()}",
        "metrics": {
            "wpm": data.get("wpm", 0),
            "sync_rate_pct": data.get("sync_rate", 0),
            "pronunciation_accuracy_pct": data.get("accuracy", 0),
            "retelling_score": data.get("retelling", 0),
        },
        "practice_days": data.get("days_practiced", 0),
        "total_minutes": data.get("total_minutes", 0),
        "material_used": data.get("materials", []),
        "problem_sounds": data.get("problem_sounds", []),
        "notes": data.get("notes", ""),
    }

    # Save cumulatively to JSON file
    log_file = "shadowing_progress.json"
    try:
        with open(log_file, "r") as f:
            all_logs = json.load(f)
    except FileNotFoundError:
        all_logs = []

    all_logs.append(log)

    with open(log_file, "w") as f:
        json.dump(all_logs, f, indent=2, ensure_ascii=False)

    return log

# Usage example
weekly_log(5, {
    "wpm": 105,
    "sync_rate": 65,
    "accuracy": 72,
    "retelling": 5,
    "days_practiced": 6,
    "total_minutes": 90,
    "materials": ["TED: The power of vulnerability"],
    "problem_sounds": ["th sound", "r/l distinction", "word-final consonants"],
    "notes": "Linking has improved a lot, but th pronunciation is still difficult",
})

Plateau Breakthrough Strategies

After 4-6 weeks of shadowing, almost all learners experience a plateau. Don't quit at this point. A plateau is the process of the brain integrating new patterns.

Plateau Signals and Countermeasures

Plateau SignalCauseBreakthrough Strategy
"I think this is good enough" feelingSettling for comfortable materialIncrease the difficulty by one level
WPM stuck at the same level for 3 weeksUsing only same-speed materialTry shadowing at 1.1x-1.2x speed
Can shadow but can't speak freelyLack of output practiceExtend Retelling time to 5 minutes after shadowing
Loss of motivationLack of perceived growthCompare Week 1 recording with current recording
Specific pronunciation won't improveMuscle pattern fixationFocused training on that sound (minimal pair practice)

Minimal Pairs Focused Training

Practice the phoneme pairs that Korean speakers find most difficult for an additional 5 minutes daily.

R vs L:

  • right / light
  • road / load
  • correct / collect
  • crowd / cloud
  • Sentence: "The right light was really lovely."

B vs V:

  • base / vase
  • boat / vote
  • berry / very
  • best / vest
  • Sentence: "The best vote came from a very brave person."

F vs P:

  • fan / pan
  • feel / peel
  • coffee / copy
  • fast / past
  • Sentence: "I feel like peeling a fresh peach."

TH vs S/D:

  • think / sink
  • three / tree (or free)
  • that / dat (Korean-style)
  • bath / bass
  • Sentence: "I think three thousand is the right number."

Tool Comparison: Shadowing Apps and Platforms

ToolPriceScriptRecording ComparisonSpeed ControlRecommended For
ELSA Speak~$9/monthOO (AI evaluation)OPronunciation focused
Shadowing.appFree/PremiumOOOShadowing-dedicated
YouTube + RecorderFreeSubtitlesManualO (speed)Zero-cost start
Audacity (desktop)FreeXWaveform comparisonODetailed analysis
Otter.aiFree/$16.99/moAuto-generatedXXEnglish meeting review
SpeechlingFree/$19.99/moOO (coach eval)OProfessional coach feedback

Recommended combination (cost minimized):

  1. YouTube (material) + Smartphone recorder (recording) + Google Docs voice typing (recognition rate check)
  2. This combination allows you to perform all stages at zero cost

Shadowing vs Other Speaking Training Methods

Training MethodCore EffectLimitationsWhen Combined with Shadowing
ShadowingPronunciation, rhythm, speedWeak free speaking skillsFoundation
DictationListening, spellingNot a speaking exerciseListening supplement
Read AloudPronunciation practiceLacks natural rhythmAccuracy supplement
Role PlayReal conversationCan't do aloneApplication expansion
Free TalkingFluencyMay ignore accuracyOutput expansion
RetellingSummary/expressionNo speed trainingComprehension check

Best combination: 10 min shadowing + 5 min Retelling (daily) + 30 min Free Talking (twice weekly)

Real-World Scenario: Office Worker A's 90 Days

Profile: 32 years old, IT company PM, TOEIC 780, can listen in English meetings but can't speak

Week 1-2: Started with VOA Learning English. Day 1 synchronization rate 25%. Lost the rhythm most of the time. Shocked by "My mouth doesn't move as much as I expected."

Week 3-4: Switched material to TED-Ed. 15 minutes every morning before work. Synchronization rate rose to 55%. Started recognizing linking patterns in particular (pick it up -> pi-ki-dup).

Week 5-6: Listened to recordings for the first time. Response: "This is my English?" Korean intonation carried over completely. Started focusing on th pronunciation and r/l distinction.

Week 7-8: Plateau. WPM stuck at 110 for 3 weeks. Switched to 1.2x speed TED Talk. Painful for 2 days, but returning to normal speed felt much more comfortable.

Week 9-10: Switched material to NPR Planet Money. Difficult at first due to many economics terms, but work-related vocabulary expanded. Starting Retelling enabled beginning utterances in English meetings with "I think the main point is..."

Week 11-12: First-attempt shadowing success rate 72%. WPM 128. Successfully spoke 3 consecutive sentences in an English meeting. Received feedback from a colleague: "Your English has really improved."

90-Day Final Results:

  • WPM: 85 -> 128 (50% improvement)
  • Meeting utterance frequency: 0-1 times -> 4-5 times/meeting
  • Google Speech-to-Text recognition rate: 58% -> 84%

Quiz

Q1. List the 4 stages of shadowing in order. Answer: Mumbling -> Synchronized Shadowing -> Prosody Shadowing -> Content Shadowing

Q2. What is the appropriate speed (WPM) for beginners when selecting shadowing material?

Answer: 90-130 words per minute (WPM). VOA Learning English (90-110) or TED-Ed (110-130) are suitable.

Q3. Why doesn't shadowing alone sufficiently improve free speaking ability? Answer: Since shadowing is "repeating," it lacks the training to construct your own thoughts in English. It must be combined with Retelling and Free Talking.

Q4. What is an effective breakthrough strategy when WPM has been stuck for 3 weeks during a plateau?

Answer: Shadow at 1.1x-1.2x playback speed, then return to 1.0x speed and you'll feel much more at ease. This is the principle of overload training - adapting to faster speeds.

Q5. What are the 3 most commonly confused phoneme pairs for Korean speakers in English pronunciation?

Answer: R/L distinction, B/V distinction, and TH/S (or D) distinction. These are corrected through minimal pair training.

Q6. What is the principle behind measuring pronunciation accuracy using Google Speech-to-Text?

Answer: The shadowing recording is converted to text via STT and compared word-by-word with the original script to calculate the match rate. 85% or above means native speakers can understand without difficulty.

Q7. Why is recording essential in shadowing practice? Answer: Objective evaluation is impossible while you're speaking. You need to compare recordings with the original to specifically identify differences in stress placement, linking, and intonation patterns.

Q8. Why is "10 min shadowing + 5 min Retelling" the most efficient combination? Answer: Shadowing trains input (listening) and imitation (pronunciation/rhythm), while Retelling adds output training by reconstructing in your own language, balancing input and output.

References