Skip to content
Published on

Japanese Pronunciation and Pitch Accent — The Secret to Sounding Natural

Authors

Introduction

Even people who have studied Japanese for years often hit a wall when talking with native speakers: "My pronunciation is accurate, so why does it sound off?" The individual sounds (vowels and consonants) are nearly perfect, yet whole sentences come out unnatural.

Much of the cause lies not in individual sounds but in pitch accent and mora-timed rhythm. Unlike English, Japanese does not distinguish words by stress; it distinguishes them by pitch (the highness or lowness of the voice). And unlike a syllable-timed language, Japanese builds rhythm from mora (拍), evenly timed rhythmic units.

This article starts from the basics of the Japanese sound system, moves through the four pitch accent types, covers the sounds learners struggle with most, and ends with practical drills — all backed by a generous set of example tables. By the end, you should have a clear answer to why your Japanese has sounded unnatural.

The Basic Units of Japanese Sound

The Five Vowels

Japanese has only five vowels — far simpler than English or Korean.

KanaRomajiIPA (approx.)Notes
aaOpen mouth
iiLips spread wide
uɯUnrounded, flat lips
ee
oo

The tricky one is う. Unlike the English "oo," it is pronounced with flat, unrounded lips, closer to a tense "uh." This is especially true in standard Tokyo speech.

The Consonant System

Japanese consonants are easiest to organize by kana row (行).

RowConsonantExamplesNotes for learners
か rowkか き く け こDo not turn into a tense stop word-initially
が rowgが ぎ ぐ げ ごNasalized (鼻濁音) word-medially
さ rowsさ し す せ そし is an sh sound
ざ rowzざ じ ず ぜ ぞVoiced fricatives, hard for many
た rowtた ち つ て とつ is the biggest hurdle
だ rowdだ ぢ づ で ど
な rownな に ぬ ね の
は rowhは ひ ふ へ ほふ is a bilabial fricative
ば rowbば び ぶ べ ぼ
ぱ rowpぱ ぴ ぷ ぺ ぽ
ま rowmま み む め も
や rowyや ゆ よ
ら rowrら り る れ ろA flap, not English r or l
わ rowwわ を

The ざ row, つ, ふ, and the ら row get detailed treatment later.

Special Beats: Long Vowels, Geminates, and the Moraic Nasal

To understand Japanese rhythm you must know three special beats. Each occupies one full mora (beat).

NameExampleNatureMeaning
Long vowel (長音)おばあさんExtends a vowel by one beatGrandmother
Geminate (促音)きってっ, a one-beat consonant holdPostage stamp
Moraic nasal (撥音)ほんん, a one-beat nasalBook

The fact that each takes a full beat is central to Japanese rhythm. Learners often rush through these beats, which is a major source of unnaturalness.

The Sense of Mora Timing

Syllable Timing vs. Mora Timing

English is largely stress-timed and Korean is roughly syllable-timed. Japanese, by contrast, is mora-timed. A mora is a minimal rhythmic unit of nearly equal length.

The basic rules:

ElementMora countExplanation
One kana (plain or voiced)1 moraか, し, ぐ, etc.
Contracted sound (拗音) きゃ1 moraSmall ゃゅょ still counts as one
Long vowel1 moraThe lengthened vowel is a separate beat
Geminate っ1 moraThe pause is a beat
Moraic nasal ん1 moraThe nasal is a beat

A Mora-Counting Table

Let us count moras word by word — cases where learners often drop a beat.

WordMeaningKanaMora breakdownMora count
東京Tokyoとうきょうと/う/きょ/う4
学校Schoolがっこうが/っ/こ/う4
切手Stampきってき/っ/て3
新聞Newspaperしんぶんし/ん/ぶ/ん4
病院Hospitalびょういんびょ/う/い/ん4
先生Teacherせんせいせ/ん/せ/い4
空港Airportくうこうく/う/こ/う4

The key point is that とうきょう sounds like a two-syllable "Tokyo" to an English ear but is four moras in Japanese. Each of と, う, きょ, う must be given equal weight. In particular, do not drop the long vowel う.

The Principle of Pitch Accent

Why Pitch Matters

Japanese distinguishes words not by stress but by pitch patterns. Each mora carries one of two pitches: High (H) or Low (L).

Consider the most famous minimal pair (Tokyo dialect):

WordKanaPitch patternMeaning
はしLHBridge
はしHLChopsticks
はしLH (distinguished by particle)Edge

The same はし becomes entirely different words depending on pitch. Chopsticks (箸) are high then low (HL); bridge (橋) is low then high (LH).

Two Cardinal Rules of Pitch

Tokyo-dialect pitch has two absolute rules.

RuleContent
Rule 1The first and second mora must differ in pitch
Rule 2Once pitch falls within a word, it never rises again

If the first mora is high, the second must be low, and vice versa. After a fall, pitch stays low. This falling point is called the accent kernel (核).

The Four Pitch Accent Types

Tokyo-dialect word accents fall into four broad types, defined by where the kernel sits.

Overview Table

TypeJapaneseKernel positionPattern (3-mora)Feature
Flat平板型NoneLHHStays high through the particle
Head-high頭高型First moraHLLOnly the first beat is high
Middle-high中高型MiddleLHLThe middle is high
Tail-high尾高型Last moraLHH (falls on particle)Word is high, particle is low

A particle here means a following word such as が, を, or は. Flat and tail-high look identical (LHH) on the word alone; the difference emerges the moment a particle attaches.

Flat (平板型)

A kernel-less type. Only the first mora is low; the rest stays high all the way, even through the particle. It is the most common type in Tokyo speech.

WordKanaWord patternWith particle
さくらLHHさくらが LHHH
学生がくせいLHHHがくせいが LHHHH
日本語にほんごLHHHにほんごが LHHHH
友達ともだちLHHHともだちが LHHHH

The particle が staying high is the decisive marker of the flat type.

Head-High (頭高型)

Only the first mora is high, dropping sharply from the second — a high-then-low type. The kernel is on the first mora.

WordKanaPatternMeaning
あめHLRain
はしHLChopsticks
ねこHLCat
電気でんきHLLElectricity
日本にほんHLLJapan

Hit the first beat clearly high and drop immediately.

Middle-High (中高型)

The kernel sits in the middle, so the contour rises then falls. The first mora is low, it rises, and it drops again at the kernel.

WordKanaPatternMeaning
お菓子おかしLHLSweets
たまごLHLEgg
二人ふたりLHLTwo people
こころLHLHeart, mind
みずうみLHHLLake

Aim for a peak in a middle mora, then come down.

Tail-High (尾高型)

The word itself stays high to the end, but the moment a particle attaches, that particle drops. The kernel is on the last mora of the word.

WordKanaWord patternWith particle
はなLHはなが LHL
おとこLHHおとこが LHHL
やまLHやまが LHL
おとうとLHHHおとうとが LHHHL

Flat vs. Tail-High

This distinction confuses learners most, because the words sound identical in isolation. You must attach a particle to tell them apart.

WordKanaTypeWord aloneWith particle
はしFlatLHはしが LHH
はしTail-highLHはしが LHL
はなTail-highLHはなが LHL
はなFlatLHはなが LHH

If はな means nose (鼻) it is flat, so the particle stays high (LHH); if it means flower (花) it is tail-high, so the particle drops (LHL).

Accent Types by Kernel Position

Organizing by which mora holds the kernel in an n-mora word yields a clean rule.

Kernel positionType name3-mora pattern
No kernelFlatLHH (particle H)
1stHead-highHLL
2ndMiddle-highLHL
3rd (last)Tail-highLHH (L on particle)

Homophones and Pitch

Here are more pairs where pitch alone carries the meaning — often distinguished without any context.

KanaPattern AMeaning APattern BMeaning B
あめHLRainLHCandy
はしHLChopsticksLHBridge
かきHLOysterLHPersimmon
いまHLLiving roomLHNow
かみHLGodLH (flat)Paper, hair
せきHLCoughLHSeat

Native speakers distinguish these perfectly and unconsciously. Learners pick them up much faster by memorizing minimal pairs in sets.

Sounds That Are Especially Hard

つ (tsu)

An affricate absent from Korean and English. Do not substitute an "s" or "ch." Touch the tongue tip behind the upper teeth and release it, producing t and s simultaneously.

Wrong substituteCorrect directionExample word
Replace with "s"ts as one burstつき (moon)
Replace with "ch"ts as one burstつくえ (desk)
Replace with "tu"ts as one burstみつ (honey)

The ざ Row (Voiced Fricatives and Affricates)

ざ, じ, ず, ぜ, ぞ are voiced sounds that learners often devoice or slide into a "j" sound.

KanaCommon mistakeTarget sound
Slides to "ja"Voiced z fricative
Devoiced "chi"Voiced j
DevoicedVoiced z
DevoicedVoiced z
DevoicedVoiced z

They are harder word-medially and word-finally than word-initially. Practice with words like かぜ (wind) and みず (water).

Voicing Contrasts (清濁)

Many languages auto-voice consonants medially, but in Japanese the voiced-versus-voiceless contrast carries meaning.

Voiceless (清音)MeaningVoiced (濁音)Meaning
かいShellfishがいHarm
てんPointでんElectric (bound form)
こまSpinning topごまSesame
たいSea breamだいStand, base
きんGoldぎんSilver

Train yourself to clearly separate か and が, た and だ word-initially in particular.

Long Vowels (長音)

Dropping or adding a long vowel produces a completely different word — one of the most common learner errors.

ShortMeaningLongMeaning
おばさんAuntおばあさんGrandmother
おじさんUncleおじいさんGrandfather
ゆきSnowゆうきCourage
とるTo takeとおるTo pass through
ここHereこうこうHigh school
びよういんBeauty salonびょういんHospital

The last pair, びよういん (salon) and びょういん (hospital), splits on the contracted-sound-plus-long-vowel difference — a famous example.

Geminates っ

Rushing through a geminate yields a different word.

No geminateMeaningWith geminateMeaning
いかSquidいっかA family
かこPastかっこParentheses
さかSlopeさっかAuthor
ぶかSubordinateぶっかPrices (物価)

The ら Row (Flap)

The Japanese ら row is neither English r nor l but a flap (a light tap of the tongue). It resembles the initial ㄹ in Korean but is softer.

KanaCommon mistakeTarget
Rolled English rLight flap
Pronounced as lLight flap
Over-rolledLight flap

ふ (Bilabial Fricative)

ふ is not made with the upper teeth and lower lip like English f, but by narrowing both lips and pushing air through.

Wrong directionCorrect direction
English f (teeth + lip)Air between both lips
Plain "hu"Narrow the lips

Nasalized が Row (鼻濁音)

In traditional Tokyo speech, the が row nasalizes word-medially, adding a nasal quality. It is fading nowadays but survives in announcer speech.

WordMedial が rowNasalized?
学校None (initial)Not nasalized
鏡 かがみTraditionally nasalized
りんごTraditionally nasalized

Intonation and Rhythm

Sentence-Level Intonation

When word accents combine into a sentence, an overall gentle downward drift (downstep) appears. Overall pitch tends to lower gradually toward the end of the sentence.

ElementExplanation
DownstepThe overall register lowers toward the end
Phrase boundaryA slight reset at each meaning unit
Sentence-finalQuestions rise at the end; statements fall

Even Beats Keep the Rhythm Alive

The single biggest trick to natural-sounding Japanese is giving each mora equal length. Speakers of stress-timed languages tend to lengthen important beats and shorten the rest, which breaks Japanese's even rhythm.

WordWrong rhythmCorrect rhythm
ありがとう"arigato" (like 4 beats)あ/り/が/と/う, 5 even moras
がっこう"gakko" (rushed)が/っ/こ/う, 4 even moras
おはよう"ohayo" (3 beats)お/は/よ/う, 4 even moras

Practical Drills

Shadowing (シャドーイング)

The single most effective pronunciation drill. While listening to native audio, follow along like a shadow about half a second behind.

StageContentGoal
Stage 1Confirm sounds while reading the scriptGrasp pitch and beats
Stage 2Speak in sync while reading the scriptMatch the rhythm
Stage 3Shadow without the scriptInternalize intonation
Stage 4Record and compareSelf-correct

Minimal-Pair Training

Repeat pitch-distinguished pairs as sets. Pairing あめ (rain/candy) and はし (chopsticks/bridge) builds a fast sense of pitch.

Using OJAD

OJAD (Online Japanese Accent Dictionary), built at the University of Tokyo, is a free accent dictionary. It shows the pitch curve of words and sentences visually and provides audio. Strongly recommended for learners.

Record and Self-Compare

Recording your own voice and comparing waveforms and pitch with a native speaker reveals objectively where you differ. It is especially good for checking the length of long vowels, geminates, and the moraic nasal.

A Sample Practice Routine

DayActivityTime
Mon30 minimal pairs15 min
TueShadow a 1-minute script20 min
WedCheck new-word accent on OJAD15 min
ThuRecord and compare20 min
FriShadow news audio20 min

Dialects — Tokyo-Type and Keihan-Type

Japanese pitch accent varies greatly by region, splitting into two broad systems.

CategoryRepresentative areaFeature
Tokyo-type (東京式)Tokyo, most of eastern JapanThe standard baseline; first two moras always differ
Keihan-type (京阪式)Osaka, Kyoto, KansaiCan start high on the first mora; more complex patterns

The same word can have opposite pitch depending on the region.

WordTokyo-typeKeihan-type (Kansai)
橋 はしLH (bridge)Differs, HL-family
雨 あめHLDiffers, LH-family

Early on, focus on the standard Tokyo-type alone. Just knowing that Kansai content (manzai, dramas) uses different pitch will reduce confusion when you hear it.

Common Misconceptions

"You do not need to learn pitch"?

Meaning still gets across with accurate sounds. But mismatched pitch is immediately heard as a "foreign accent," and minimal pairs can cause real misunderstanding. If you aim for advanced fluency, pitch is essential.

"Knowing the kanji reading is enough"?

The same kanji can take different accent types depending on the word. Memorize accent together with each word.

"Speaking slowly makes it natural"?

Often the opposite. You should maintain even mora timing at a natural speed to keep the rhythm alive. Too slow, and the sense of beat collapses.

Conclusion

The real key to Japanese pronunciation is not individual sounds but pitch accent and mora rhythm. The five vowels and the consonant system are relatively easy to acquire, but the four accent types (flat, head-high, middle-high, tail-high) and the sense of even timing require conscious training.

To summarize: first, build the habit of counting moras evenly. Second, internalize pitch differences through minimal pairs. Third, compare yourself with real audio using shadowing and OJAD. Fourth, never rush the beats of long vowels, geminates, and the moraic nasal.

Practice these four consistently and you can move past the "accurate but off" wall to genuinely natural Japanese.

References