Skip to content

Split View: 영어 스피킹 쉐도잉 시스템 설계: 90일 루틴과 측정 지표

|

영어 스피킹 쉐도잉 시스템 설계: 90일 루틴과 측정 지표

영어 스피킹 쉐도잉 시스템 설계: 90일 루틴과 측정 지표

쉐도잉이 스피킹에 효과적인 과학적 근거

쉐도잉(Shadowing)은 원어민 음성을 듣고 0.5-1초 간격으로 따라 말하는 훈련법이다. 일본의 통역 훈련 전문가 타무라 카즈아키(門田修平) 교수가 제2언어 습득 연구에서 체계화했고, 현재 전 세계 통역사 양성 프로그램에서 핵심 훈련법으로 사용된다.

쉐도잉이 효과적인 이유는 뇌과학에서 찾을 수 있다.

작업 기억(Working Memory) 훈련: 듣기와 말하기를 동시에 처리하면서 음운 루프(phonological loop)가 강화된다. 이는 Baddeley의 작업 기억 모델에서 설명하는 핵심 메커니즘이다.

운동 학습(Motor Learning): 입, 혀, 성대를 반복적으로 같은 패턴으로 움직이면 근육 기억이 형성된다. 피아노 연습과 동일한 원리다. 처음에는 의식적으로 하지만 반복하면 자동화된다.

프로소디(Prosody) 내재화: 영어의 리듬, 강세, 억양 패턴은 단어 단위로 배울 수 없다. 문장 전체를 통째로 따라 하면서 자연스럽게 체화된다.

Stephen Krashen의 입력 가설(Input Hypothesis)에 따르면 "이해 가능한 입력(comprehensible input)"이 언어 습득의 핵심이다. 쉐도잉은 여기에 "출력 강제(forced output)"를 더해 입력과 출력의 간극을 줄인다.

쉐도잉의 4단계 진행 모델

쉐도잉은 하루 15분이면 충분하지만, 단계를 건너뛰면 효과가 반감된다. 아래 4단계를 순서대로 따른다.

Stage 1: Mumbling (중얼거리기) - 1-2주차

목표: 음성의 리듬과 흐름에 익숙해지기

방법:

  1. 음성을 들으면서 입을 움직이며 작은 소리로 따라 한다
  2. 모든 단어를 정확히 말할 필요 없다. 리듬을 타는 것이 목표다
  3. 스크립트를 보지 않는다

이 단계에서 측정할 것: 하루 15분 중 "멈추지 않고 따라간 비율". 처음엔 30-40%면 정상이다.

Stage 2: Synchronized Shadowing (동기화) - 3-4주차

목표: 원어민과 동시에 말하기

방법:

  1. 같은 음성을 최소 5회 반복 후 쉐도잉 시작
  2. 원어민 발화와 최대한 동시에 말한다 (0.5초 이내 딜레이)
  3. 스크립트를 보면서 확인한다

이 단계에서 측정할 것: 동기화율(원어민과 동시에 말한 단어 비율). 60% 이상이면 다음 단계로.

Stage 3: Prosody Shadowing (억양 모방) - 5-8주차

목표: 강세, 리듬, 억양까지 정확히 재현

방법:

  1. 녹음하면서 쉐도잉한다
  2. 원본과 자신의 녹음을 비교한다
  3. 특히 강세 위치, 연음(linking), **축약(reduction)**에 집중한다

핵심 포인트 - 한국인이 놓치는 영어 발음 패턴:

패턴예시한국인 발음원어민 발음
기능어 축약want to원트 투wanna
연음pick it up픽 잇 업pi-ki-dup
강세 이동I didn't SAY that균등 강세SAY에 강세
모음 약화comfortable컴포터블KUMF-ter-bul
t 탈락internet인터넷in-er-net
자음군 연결asked에스크드askt

Stage 4: Content Shadowing (내용 이해) - 9-12주차

목표: 의미를 이해하면서 자연스럽게 쉐도잉

방법:

  1. 스크립트 없이 쉐도잉하면서 내용을 동시에 파악한다
  2. 쉐도잉 직후 들은 내용을 자신의 말로 요약한다 (Retelling)
  3. 새로운 소재로 첫 시도 쉐도잉 성공률을 측정한다

90일 훈련 프로그램 상세

소재 선정 기준

소재 선택이 절반이다. 잘못된 소재를 고르면 90일이 낭비된다.

좋은 소재의 조건:

  • 분당 120-150 단어 속도 (초급-중급 기준)
  • 1회 분량 2-5분 (너무 길면 집중력 저하)
  • 스크립트(자막)가 제공되는 콘텐츠
  • 본인의 관심 분야 또는 업무 관련 주제

추천 소재 (난이도별):

난이도소재속도 (WPM)특징
초급VOA Learning English90-110느린 속도, 명확한 발음
초급TED-Ed (교육 애니메이션)110-130시각 자료와 함께 이해 쉬움
중급TED Talks (자막 있는 것)130-160다양한 주제, 명확한 전달
중급BBC 6 Minute English140-160영국식 발음, 구조화된 대화
중상NPR Planet Money150-170경제/비즈니스 주제
중상The Daily (NYT 팟캐스트)160-180뉴스 속도, 다양한 억양
고급Joe Rogan Experience170-200+자연스러운 대화, 슬랭 포함
고급영화/드라마 대사다양감정 표현, 빠른 대화 전환

주차별 훈련 플랜

# 90일 쉐도잉 프로그램 설정
program:
  name: '90-Day Shadowing System'
  daily_minutes: 15
  phases:
    - name: 'Foundation'
      weeks: 1-2
      stage: 'Mumbling'
      material: 'VOA Learning English'
      daily_routine:
        - listen_without_speaking: 3min
        - mumbling_practice: 10min
        - vocabulary_review: 2min
      target: '리듬 따라가기 40% 이상'

    - name: 'Sync'
      weeks: 3-4
      stage: 'Synchronized'
      material: 'TED-Ed'
      daily_routine:
        - full_speed_listening: 2min
        - synchronized_shadowing: 10min
        - script_check: 3min
      target: '동기화율 60% 이상'

    - name: 'Prosody'
      weeks: 5-8
      stage: 'Prosody Shadowing'
      material: 'TED Talks'
      daily_routine:
        - shadowing_with_recording: 10min
        - compare_with_original: 3min
        - note_problem_sounds: 2min
      target: '억양 유사도 자기평가 7/10 이상'

    - name: 'Content'
      weeks: 9-12
      stage: 'Content Shadowing'
      material: 'NPR / BBC'
      daily_routine:
        - blind_shadowing: 8min
        - retelling_in_own_words: 5min
        - new_vocabulary_log: 2min
      target: '첫 시도 쉐도잉 성공률 70% 이상'

하루 15분 실제 루틴 (Phase 3 예시)

[0:00-0:30]  오늘 소재 재생 - 전체 1청취 (핵심 파악)
[0:30-2:00]  1차 쉐도잉 - 녹음 시작
[2:00-4:00]  녹음 재생 + 원본 비교 - 차이 3개 메모
[4:00-8:00]  문제 구간 집중 반복 (최소 5)
[8:00-12:00] 2차 쉐도잉 - 전체 통으로 녹음
[12:00-14:00] 2차 녹음 vs 원본 비교 - 개선점 확인
[14:00-15:00] 오늘 배운 표현 3메모 (단어장 앱에 추가)

측정 지표 시스템

"느낌"으로 평가하면 실력이 올랐는지 알 수 없다. 아래 4가지 지표를 주 단위로 기록한다.

핵심 지표 4가지

지표측정 방법목표 (12주 후)
WPM (Words Per Minute)1분간 자유 발화 녹음 → 단어 수 세기120 WPM 이상
동기화율원본과 쉐도잉 녹음의 타이밍 일치율80% 이상
발음 정확도Google Speech-to-Text로 인식률 측정인식률 85% 이상
Retelling 완성도2분 음성을 듣고 30초 내 요약 (핵심 포인트 재현율)핵심 3개 중 2개 이상

발음 정확도 자동 측정 스크립트

Google Cloud Speech-to-Text API를 활용해 쉐도잉 녹음의 인식률을 자동으로 측정할 수 있다.

import difflib
from google.cloud import speech_v1 as speech

def measure_accuracy(audio_path: str, expected_text: str) -> dict:
    """쉐도잉 녹음 파일의 발음 정확도를 측정한다."""
    client = speech.SpeechClient()

    with open(audio_path, "rb") as f:
        audio = speech.RecognitionAudio(content=f.read())

    config = speech.RecognitionConfig(
        encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
        sample_rate_hertz=16000,
        language_code="en-US",
        enable_word_time_offsets=True,
    )

    response = client.recognize(config=config, audio=audio)

    recognized = " ".join(
        result.alternatives[0].transcript
        for result in response.results
    ).lower().split()

    expected = expected_text.lower().split()

    # 단어 단위 일치율 계산
    matcher = difflib.SequenceMatcher(None, expected, recognized)
    accuracy = matcher.ratio() * 100

    # 틀린 단어 목록 추출
    missed_words = []
    for tag, i1, i2, j1, j2 in matcher.get_opcodes():
        if tag in ("replace", "delete"):
            missed_words.extend(expected[i1:i2])

    return {
        "accuracy_pct": round(accuracy, 1),
        "total_expected": len(expected),
        "total_recognized": len(recognized),
        "missed_words": missed_words[:10],  # 상위 10개만
    }

# 사용 예시
result = measure_accuracy(
    "recordings/2026-03-04-ted-talk.wav",
    "The greatest glory in living lies not in never falling "
    "but in rising every time we fall"
)
print(f"정확도: {result['accuracy_pct']}%")
print(f"놓친 단어: {result['missed_words']}")

주간 기록 템플릿

import json
from datetime import date, timedelta

def weekly_log(week_num: int, data: dict) -> dict:
    """주간 쉐도잉 성과를 기록하고 이전 주와 비교한다."""
    log = {
        "week": week_num,
        "period": f"{date.today() - timedelta(days=6)} ~ {date.today()}",
        "metrics": {
            "wpm": data.get("wpm", 0),
            "sync_rate_pct": data.get("sync_rate", 0),
            "pronunciation_accuracy_pct": data.get("accuracy", 0),
            "retelling_score": data.get("retelling", 0),
        },
        "practice_days": data.get("days_practiced", 0),
        "total_minutes": data.get("total_minutes", 0),
        "material_used": data.get("materials", []),
        "problem_sounds": data.get("problem_sounds", []),
        "notes": data.get("notes", ""),
    }

    # JSON 파일에 누적 저장
    log_file = "shadowing_progress.json"
    try:
        with open(log_file, "r") as f:
            all_logs = json.load(f)
    except FileNotFoundError:
        all_logs = []

    all_logs.append(log)

    with open(log_file, "w") as f:
        json.dump(all_logs, f, indent=2, ensure_ascii=False)

    return log

# 사용 예시
weekly_log(5, {
    "wpm": 105,
    "sync_rate": 65,
    "accuracy": 72,
    "retelling": 5,
    "days_practiced": 6,
    "total_minutes": 90,
    "materials": ["TED: The power of vulnerability"],
    "problem_sounds": ["th sound", "r/l distinction", "word-final consonants"],
    "notes": "연음 처리가 많이 나아졌으나 th 발음은 여전히 어렵다",
})

고원 현상(Plateau) 돌파 전략

쉐도잉을 4-6주 하면 거의 모든 학습자가 정체기를 경험한다. 이때 그만두면 안 된다. 정체기는 뇌가 새로운 패턴을 통합하는 과정이다.

정체기 신호와 대응법

정체기 신호원인돌파 전략
"이 정도면 됐지" 느낌편안한 소재에 안주난이도를 한 단계 올린다
WPM이 3주째 같은 수준동일한 속도의 소재만 사용1.1x-1.2x 배속 쉐도잉 시도
쉐도잉은 되는데 자유 발화가 안 됨출력 훈련 부족쉐도잉 후 Retelling 시간을 5분으로 확대
동기 저하성장 체감 부족1주차 녹음과 현재 녹음을 비교 재생
특정 발음이 계속 안 됨근육 패턴 고착해당 소리만 집중 훈련 (최소 음소 쌍 연습)

최소 음소 쌍(Minimal Pairs) 집중 훈련

한국인이 가장 어려워하는 음소 쌍을 매일 5분씩 추가 훈련한다.

R vs L:

  • right / light
  • road / load
  • correct / collect
  • crowd / cloud
  • 문장: "The right light was really lovely."

B vs V:

  • base / vase
  • boat / vote
  • berry / very
  • best / vest
  • 문장: "The best vote came from a very brave person."

F vs P:

  • fan / pan
  • feel / peel
  • coffee / copy
  • fast / past
  • 문장: "I feel like peeling a fresh peach."

TH (θ) vs S/D:

  • think / sink
  • three / tree (or free)
  • that / dat (한국식)
  • bath / bass
  • 문장: "I think three thousand is the right number."

도구 비교: 쉐도잉 앱과 플랫폼

도구가격스크립트녹음 비교속도 조절추천 대상
ELSA Speak월 ₩11,000OO (AI 평가)O발음 교정 집중
Shadowing.app무료/프리미엄OOO쉐도잉 전용
YouTube + Recorder무료자막 활용수동O (배속)비용 0원 시작
Audacity (데스크톱)무료X파형 비교O상세 분석
Otter.ai무료/월 $16.99자동 생성XX영어 회의 복습
Speechling무료/월 $19.99OO (코치 평가)O전문 코치 피드백

추천 조합 (비용 최소화):

  1. YouTube (소재) + 스마트폰 녹음기 (녹음) + Google Docs 음성입력 (인식률 체크)
  2. 이 조합이면 비용 0원으로 모든 단계를 수행할 수 있다

쉐도잉과 다른 스피킹 훈련법 비교

훈련법핵심 효과한계쉐도잉과 병행 시
쉐도잉발음, 리듬, 속도자유 발화력 약함기본
Dictation (받아쓰기)청취력, 철자말하기 훈련 아님듣기 보완
Read Aloud (낭독)발음 연습자연스러운 리듬 부족정확성 보완
Role Play실전 대화력혼자 못함응용력 확장
Free Talking유창성정확성 무시 가능출력 확장
Retelling요약/표현력속도 훈련 안 됨이해력 확인

최고의 조합: 쉐도잉 10분 + Retelling 5분 (매일) + Free Talking 30분 (주 2회)

실전 시나리오: 직장인 A씨의 90일

프로필: 32세, IT 기업 PM, TOEIC 780, 영어 회의에서 듣기는 되지만 발화가 안 됨

Week 1-2: VOA Learning English로 시작. 첫날 동기화율 25%. 대부분 리듬을 놓침. "생각보다 입이 안 움직인다"는 충격.

Week 3-4: TED-Ed로 소재 변경. 매일 출근 전 15분. 동기화율 55%까지 상승. 특히 연음 패턴(pick it up → pi-ki-dup)을 인식하기 시작.

Week 5-6: 처음으로 녹음을 들어봄. "이게 내 영어?"라는 반응. 한국어 억양이 그대로 실림. th 발음과 r/l 구별에 집중 시작.

Week 7-8: 정체기. WPM이 110에서 3주째 멈춤. 1.2x 배속 TED Talk으로 전환. 2일간 고통스러웠지만 원래 속도로 돌아오니 훨씬 여유가 생김.

Week 9-10: NPR Planet Money로 소재 변경. 경제 용어가 많아 처음엔 어려웠지만 업무 관련 어휘가 늘어남. Retelling을 시작하자 영어 회의에서 "I think the main point is..."로 발화를 시작할 수 있게 됨.

Week 11-12: 첫 시도 쉐도잉 성공률 72%. WPM 128. 영어 회의에서 3문장 연속 발화 성공. 동료로부터 "영어 많이 늘었네"라는 피드백.

90일 최종 성과:

  • WPM: 85 → 128 (50% 향상)
  • 회의 발화 빈도: 0-1회 → 4-5회/미팅
  • Google Speech-to-Text 인식률: 58% → 84%

퀴즈

Q1. 쉐도잉의 4단계를 순서대로 나열하시오. 정답: ||Mumbling(중얼거리기) → Synchronized Shadowing(동기화) → Prosody Shadowing(억양 모방) → Content Shadowing(내용 이해)||

Q2. 쉐도잉 소재 선정 시 초급자에게 적합한 속도(WPM)는? 정답: ||분당 90-130단어(WPM). VOA Learning English(90-110)나 TED-Ed(110-130)가 적합하다.||

Q3. 쉐도잉만으로 자유 발화력이 충분히 올라가지 않는 이유는? 정답: ||쉐도잉은 "따라 하기"이므로 자신의 생각을 영어로 구성하는 훈련이 빠져 있다. Retelling과 Free Talking을 병행해야 한다.||

Q4. 정체기(Plateau)에서 WPM이 3주째 멈췄을 때 효과적인 돌파 전략은? 정답: ||소재의 재생 속도를 1.1x-1.2x로 올려 쉐도잉한 뒤, 다시 1.0x로 돌아오면 여유가 생긴다. 더 빠른 속도에 적응시키는 과부하 훈련 원리다.||

Q5. 한국인이 영어 발음에서 가장 자주 틀리는 음소 쌍 3가지는? 정답: ||R/L 구별, B/V 구별, TH(θ)/S(또는 D) 구별이다. 최소 음소 쌍(minimal pairs) 훈련으로 교정한다.||

Q6. Google Speech-to-Text를 활용한 발음 정확도 측정의 원리는? 정답: ||쉐도잉 녹음을 STT로 변환한 텍스트와 원본 스크립트를 단어 단위로 비교하여 일치율을 계산한다. 85% 이상이면 원어민이 이해하는 데 무리가 없는 수준이다.||

Q7. 쉐도잉 훈련에서 녹음이 필수인 이유는? 정답: ||자신이 말하는 동안에는 객관적 평가가 불가능하다. 녹음을 원본과 비교해야 강세 위치, 연음 처리, 억양 패턴의 차이를 구체적으로 인식할 수 있다.||

Q8. "쉐도잉 10분 + Retelling 5분"이 가장 효율적인 조합인 이유는? 정답: ||쉐도잉으로 입력(듣기)과 모방(발음/리듬)을 훈련하고, Retelling으로 자신의 언어로 재구성하는 출력 훈련을 더해 입출력 균형을 맞추기 때문이다.||

참고 자료

English Speaking Shadowing System Design: 90-Day Routine and Measurement Metrics

English Speaking Shadowing System Design: 90-Day Routine and Measurement Metrics

Scientific Basis for Why Shadowing Is Effective for Speaking

Shadowing is a training method where you listen to a native speaker's audio and repeat it with a 0.5-1 second delay. It was systematized by Professor Kadota Shuhei, a Japanese interpretation training expert, in second language acquisition research and is currently used as a core training method in interpreter training programs worldwide.

The reasons why shadowing is effective can be found in neuroscience.

Working Memory Training: Processing listening and speaking simultaneously strengthens the phonological loop. This is the core mechanism explained in Baddeley's working memory model.

Motor Learning: Repeatedly moving the mouth, tongue, and vocal cords in the same pattern forms muscle memory. It works on the same principle as piano practice. Initially done consciously, it becomes automated through repetition.

Prosody Internalization: English rhythm, stress, and intonation patterns cannot be learned word by word. They are naturally internalized by repeating entire sentences as a whole.

According to Stephen Krashen's Input Hypothesis, "comprehensible input" is the key to language acquisition. Shadowing adds "forced output" to this, bridging the gap between input and output.

The 4-Stage Shadowing Progression Model

Shadowing requires only 15 minutes per day, but skipping stages reduces its effectiveness by half. Follow these 4 stages in order.

Stage 1: Mumbling - Weeks 1-2

Goal: Getting accustomed to the rhythm and flow of speech

Method:

  1. Listen to the audio and move your lips, following along in a quiet voice
  2. You don't need to say every word accurately. The goal is to ride the rhythm
  3. Don't look at the script

What to measure at this stage: The proportion of 15 daily minutes where you "followed along without stopping." 30-40% is normal at first.

Stage 2: Synchronized Shadowing - Weeks 3-4

Goal: Speaking simultaneously with the native speaker

Method:

  1. Repeat the same audio at least 5 times before starting to shadow
  2. Speak as simultaneously as possible with the native speaker's utterance (within 0.5 second delay)
  3. Check with the script while shadowing

What to measure at this stage: Synchronization rate (proportion of words spoken simultaneously with the native speaker). Move to the next stage when you reach 60% or higher.

Stage 3: Prosody Shadowing - Weeks 5-8

Goal: Accurately reproducing stress, rhythm, and intonation

Method:

  1. Shadow while recording yourself
  2. Compare the original with your own recording
  3. Focus particularly on stress placement, linking, and reduction

Key Point - English pronunciation patterns that Korean speakers often miss:

PatternExampleKorean pronunciationNative pronunciation
Function word reductionwant towon-teu tuwanna
Linkingpick it uppik it eoppi-ki-dup
Stress shiftI didn't SAY thatEven stressStress on SAY
Vowel reductioncomfortablekeom-po-teo-beulKUMF-ter-bul
t-droppinginternetin-teo-netin-er-net
Consonant clustersaskede-seu-keu-deuaskt

Stage 4: Content Shadowing - Weeks 9-12

Goal: Shadowing naturally while understanding the content

Method:

  1. Shadow without a script while simultaneously grasping the content
  2. Immediately after shadowing, summarize what you heard in your own words (Retelling)
  3. Measure your first-attempt shadowing success rate with new material

90-Day Training Program Details

Material Selection Criteria

Material selection is half the battle. Choosing the wrong material wastes 90 days.

Characteristics of good material:

  • Speed of 120-150 words per minute (for beginner-intermediate level)
  • 2-5 minutes per session (too long causes loss of focus)
  • Content with available scripts (subtitles)
  • Topics related to your interests or work

Recommended material (by difficulty):

DifficultyMaterialSpeed (WPM)Features
BeginnerVOA Learning English90-110Slow speed, clear pronunciation
BeginnerTED-Ed (educational animations)110-130Easy to understand with visual aids
IntermediateTED Talks (with subtitles)130-160Various topics, clear delivery
IntermediateBBC 6 Minute English140-160British pronunciation, structured dialogue
Upper-intermediateNPR Planet Money150-170Economics/business topics
Upper-intermediateThe Daily (NYT podcast)160-180News pace, various accents
AdvancedJoe Rogan Experience170-200+Natural conversation, includes slang
AdvancedMovie/TV drama dialogueVariesEmotional expression, fast dialogue switching

Weekly Training Plan

# 90-Day Shadowing Program Configuration
program:
  name: '90-Day Shadowing System'
  daily_minutes: 15
  phases:
    - name: 'Foundation'
      weeks: 1-2
      stage: 'Mumbling'
      material: 'VOA Learning English'
      daily_routine:
        - listen_without_speaking: 3min
        - mumbling_practice: 10min
        - vocabulary_review: 2min
      target: 'Rhythm following rate 40% or above'

    - name: 'Sync'
      weeks: 3-4
      stage: 'Synchronized'
      material: 'TED-Ed'
      daily_routine:
        - full_speed_listening: 2min
        - synchronized_shadowing: 10min
        - script_check: 3min
      target: 'Synchronization rate 60% or above'

    - name: 'Prosody'
      weeks: 5-8
      stage: 'Prosody Shadowing'
      material: 'TED Talks'
      daily_routine:
        - shadowing_with_recording: 10min
        - compare_with_original: 3min
        - note_problem_sounds: 2min
      target: 'Intonation similarity self-assessment 7/10 or above'

    - name: 'Content'
      weeks: 9-12
      stage: 'Content Shadowing'
      material: 'NPR / BBC'
      daily_routine:
        - blind_shadowing: 8min
        - retelling_in_own_words: 5min
        - new_vocabulary_log: 2min
      target: 'First-attempt shadowing success rate 70% or above'

Actual Daily 15-Minute Routine (Phase 3 Example)

[0:00-0:30]  Play today's material - Listen through once (grasp key points)
[0:30-2:00]  1st shadowing - Start recording
[2:00-4:00]  Play recording + Compare with original - Note 3 differences
[4:00-8:00]  Focused repetition on problem sections (at least 5 times)
[8:00-12:00] 2nd shadowing - Record full pass
[12:00-14:00] Compare 2nd recording vs original - Confirm improvements
[14:00-15:00] Note 3 expressions learned today (add to vocabulary app)

Measurement Metrics System

If you evaluate by "feeling," you won't know if your skills have improved. Record the following 4 metrics weekly.

4 Core Metrics

MetricMeasurement MethodTarget (After 12 weeks)
WPM (Words Per Minute)Record 1-minute free speech, count words120 WPM or above
Synchronization rateTiming match rate between original and shadowing recording80% or above
Pronunciation accuracyMeasure recognition rate with Google Speech-to-Text85% or above
Retelling completenessListen to 2-min audio and summarize within 30 seconds (key point reproduction rate)2 out of 3 key points or more

Automated Pronunciation Accuracy Measurement Script

You can automatically measure the recognition rate of shadowing recordings using the Google Cloud Speech-to-Text API.

import difflib
from google.cloud import speech_v1 as speech

def measure_accuracy(audio_path: str, expected_text: str) -> dict:
    """Measures pronunciation accuracy of a shadowing recording file."""
    client = speech.SpeechClient()

    with open(audio_path, "rb") as f:
        audio = speech.RecognitionAudio(content=f.read())

    config = speech.RecognitionConfig(
        encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
        sample_rate_hertz=16000,
        language_code="en-US",
        enable_word_time_offsets=True,
    )

    response = client.recognize(config=config, audio=audio)

    recognized = " ".join(
        result.alternatives[0].transcript
        for result in response.results
    ).lower().split()

    expected = expected_text.lower().split()

    # Calculate word-level match rate
    matcher = difflib.SequenceMatcher(None, expected, recognized)
    accuracy = matcher.ratio() * 100

    # Extract list of missed words
    missed_words = []
    for tag, i1, i2, j1, j2 in matcher.get_opcodes():
        if tag in ("replace", "delete"):
            missed_words.extend(expected[i1:i2])

    return {
        "accuracy_pct": round(accuracy, 1),
        "total_expected": len(expected),
        "total_recognized": len(recognized),
        "missed_words": missed_words[:10],  # Top 10 only
    }

# Usage example
result = measure_accuracy(
    "recordings/2026-03-04-ted-talk.wav",
    "The greatest glory in living lies not in never falling "
    "but in rising every time we fall"
)
print(f"Accuracy: {result['accuracy_pct']}%")
print(f"Missed words: {result['missed_words']}")

Weekly Log Template

import json
from datetime import date, timedelta

def weekly_log(week_num: int, data: dict) -> dict:
    """Records weekly shadowing performance and compares with previous week."""
    log = {
        "week": week_num,
        "period": f"{date.today() - timedelta(days=6)} ~ {date.today()}",
        "metrics": {
            "wpm": data.get("wpm", 0),
            "sync_rate_pct": data.get("sync_rate", 0),
            "pronunciation_accuracy_pct": data.get("accuracy", 0),
            "retelling_score": data.get("retelling", 0),
        },
        "practice_days": data.get("days_practiced", 0),
        "total_minutes": data.get("total_minutes", 0),
        "material_used": data.get("materials", []),
        "problem_sounds": data.get("problem_sounds", []),
        "notes": data.get("notes", ""),
    }

    # Save cumulatively to JSON file
    log_file = "shadowing_progress.json"
    try:
        with open(log_file, "r") as f:
            all_logs = json.load(f)
    except FileNotFoundError:
        all_logs = []

    all_logs.append(log)

    with open(log_file, "w") as f:
        json.dump(all_logs, f, indent=2, ensure_ascii=False)

    return log

# Usage example
weekly_log(5, {
    "wpm": 105,
    "sync_rate": 65,
    "accuracy": 72,
    "retelling": 5,
    "days_practiced": 6,
    "total_minutes": 90,
    "materials": ["TED: The power of vulnerability"],
    "problem_sounds": ["th sound", "r/l distinction", "word-final consonants"],
    "notes": "Linking has improved a lot, but th pronunciation is still difficult",
})

Plateau Breakthrough Strategies

After 4-6 weeks of shadowing, almost all learners experience a plateau. Don't quit at this point. A plateau is the process of the brain integrating new patterns.

Plateau Signals and Countermeasures

Plateau SignalCauseBreakthrough Strategy
"I think this is good enough" feelingSettling for comfortable materialIncrease the difficulty by one level
WPM stuck at the same level for 3 weeksUsing only same-speed materialTry shadowing at 1.1x-1.2x speed
Can shadow but can't speak freelyLack of output practiceExtend Retelling time to 5 minutes after shadowing
Loss of motivationLack of perceived growthCompare Week 1 recording with current recording
Specific pronunciation won't improveMuscle pattern fixationFocused training on that sound (minimal pair practice)

Minimal Pairs Focused Training

Practice the phoneme pairs that Korean speakers find most difficult for an additional 5 minutes daily.

R vs L:

  • right / light
  • road / load
  • correct / collect
  • crowd / cloud
  • Sentence: "The right light was really lovely."

B vs V:

  • base / vase
  • boat / vote
  • berry / very
  • best / vest
  • Sentence: "The best vote came from a very brave person."

F vs P:

  • fan / pan
  • feel / peel
  • coffee / copy
  • fast / past
  • Sentence: "I feel like peeling a fresh peach."

TH vs S/D:

  • think / sink
  • three / tree (or free)
  • that / dat (Korean-style)
  • bath / bass
  • Sentence: "I think three thousand is the right number."

Tool Comparison: Shadowing Apps and Platforms

ToolPriceScriptRecording ComparisonSpeed ControlRecommended For
ELSA Speak~$9/monthOO (AI evaluation)OPronunciation focused
Shadowing.appFree/PremiumOOOShadowing-dedicated
YouTube + RecorderFreeSubtitlesManualO (speed)Zero-cost start
Audacity (desktop)FreeXWaveform comparisonODetailed analysis
Otter.aiFree/$16.99/moAuto-generatedXXEnglish meeting review
SpeechlingFree/$19.99/moOO (coach eval)OProfessional coach feedback

Recommended combination (cost minimized):

  1. YouTube (material) + Smartphone recorder (recording) + Google Docs voice typing (recognition rate check)
  2. This combination allows you to perform all stages at zero cost

Shadowing vs Other Speaking Training Methods

Training MethodCore EffectLimitationsWhen Combined with Shadowing
ShadowingPronunciation, rhythm, speedWeak free speaking skillsFoundation
DictationListening, spellingNot a speaking exerciseListening supplement
Read AloudPronunciation practiceLacks natural rhythmAccuracy supplement
Role PlayReal conversationCan't do aloneApplication expansion
Free TalkingFluencyMay ignore accuracyOutput expansion
RetellingSummary/expressionNo speed trainingComprehension check

Best combination: 10 min shadowing + 5 min Retelling (daily) + 30 min Free Talking (twice weekly)

Real-World Scenario: Office Worker A's 90 Days

Profile: 32 years old, IT company PM, TOEIC 780, can listen in English meetings but can't speak

Week 1-2: Started with VOA Learning English. Day 1 synchronization rate 25%. Lost the rhythm most of the time. Shocked by "My mouth doesn't move as much as I expected."

Week 3-4: Switched material to TED-Ed. 15 minutes every morning before work. Synchronization rate rose to 55%. Started recognizing linking patterns in particular (pick it up -> pi-ki-dup).

Week 5-6: Listened to recordings for the first time. Response: "This is my English?" Korean intonation carried over completely. Started focusing on th pronunciation and r/l distinction.

Week 7-8: Plateau. WPM stuck at 110 for 3 weeks. Switched to 1.2x speed TED Talk. Painful for 2 days, but returning to normal speed felt much more comfortable.

Week 9-10: Switched material to NPR Planet Money. Difficult at first due to many economics terms, but work-related vocabulary expanded. Starting Retelling enabled beginning utterances in English meetings with "I think the main point is..."

Week 11-12: First-attempt shadowing success rate 72%. WPM 128. Successfully spoke 3 consecutive sentences in an English meeting. Received feedback from a colleague: "Your English has really improved."

90-Day Final Results:

  • WPM: 85 -> 128 (50% improvement)
  • Meeting utterance frequency: 0-1 times -> 4-5 times/meeting
  • Google Speech-to-Text recognition rate: 58% -> 84%

Quiz

Q1. List the 4 stages of shadowing in order. Answer: Mumbling -> Synchronized Shadowing -> Prosody Shadowing -> Content Shadowing

Q2. What is the appropriate speed (WPM) for beginners when selecting shadowing material?

Answer: 90-130 words per minute (WPM). VOA Learning English (90-110) or TED-Ed (110-130) are suitable.

Q3. Why doesn't shadowing alone sufficiently improve free speaking ability? Answer: Since shadowing is "repeating," it lacks the training to construct your own thoughts in English. It must be combined with Retelling and Free Talking.

Q4. What is an effective breakthrough strategy when WPM has been stuck for 3 weeks during a plateau?

Answer: Shadow at 1.1x-1.2x playback speed, then return to 1.0x speed and you'll feel much more at ease. This is the principle of overload training - adapting to faster speeds.

Q5. What are the 3 most commonly confused phoneme pairs for Korean speakers in English pronunciation?

Answer: R/L distinction, B/V distinction, and TH/S (or D) distinction. These are corrected through minimal pair training.

Q6. What is the principle behind measuring pronunciation accuracy using Google Speech-to-Text?

Answer: The shadowing recording is converted to text via STT and compared word-by-word with the original script to calculate the match rate. 85% or above means native speakers can understand without difficulty.

Q7. Why is recording essential in shadowing practice? Answer: Objective evaluation is impossible while you're speaking. You need to compare recordings with the original to specifically identify differences in stress placement, linking, and intonation patterns.

Q8. Why is "10 min shadowing + 5 min Retelling" the most efficient combination? Answer: Shadowing trains input (listening) and imitation (pronunciation/rhythm), while Retelling adds output training by reconstructing in your own language, balancing input and output.

References