Chapter 1 — The Sounds of Japanese

Japanese has a sound system that differs from English in fundamental ways — not just in which sounds exist, but in how rhythm, length, and pitch operate. This chapter lays out those differences honestly. Understanding them now will save you from persistent listening problems later. Most textbooks skip or minimize this material. That is a mistake.


1.1 Mora, Not Syllables

English is a stress-timed language. Some syllables are long and loud, others are crushed and swallowed. The word "comfortable" has four syllables in the dictionary but most speakers produce something closer to two. Japanese does not work this way.

Japanese is a mora-timed language. Each kana character represents one mora — a unit of timing that receives roughly equal duration. The word おばあさん (obaasan, "grandmother") is not three syllables. It is five morae: お・ば・あ・さ・ん. A Japanese speaker gives each of those five units approximately the same length. English speakers tend to rush through them and produce something that sounds like "obasan" — which is a different word entirely (おばさん, "aunt," four morae).

This distinction between mora-timing and stress-timing is not academic trivia. It is the rhythmic foundation of the language. In English, you can stretch or compress syllables freely without changing meaning. In Japanese, the number of morae in a word is fixed and meaningful. If you produce three morae where the word requires four, you have said a different word.

Three elements in Japanese each occupy exactly one mora despite not being full syllables in the English sense:

The moraic nasal ん counts as its own mora. The word さんぽ (sanpo, "walk") has three morae: さ・ん・ぽ — not two syllables. The ん receives the same duration as さ or ぽ.

Small っ (the geminate) counts as its own mora. It represents a beat of held silence before the following consonant is released. The word きって (kitte, "stamp") has three morae: き・っ・て. Compare きて (kite, "come"), which has only two morae: き・て.

Long vowels add an extra mora. The あ after ば in おばあさん is a separate unit of time, not merely a longer version of the same syllable.

Consider these minimal pairs, where mora count alone changes meaning:

WordRomajiMoraeMeaning
きてkite2 (き・て)come
きってkitte3 (き・っ・て)stamp / cut (command)
おばさんobasan4 (お・ば・さ・ん)aunt
おばあさんobaasan5 (お・ば・あ・さ・ん)grandmother

If you do not internalize mora timing, you will consistently mishear words. This is not a minor detail.


1.2 The Five Vowels

Japanese has exactly five vowels. They are simple, pure, and stable. The following table shows where each vowel sits in terms of tongue position — how far forward or back in your mouth, and how open or closed your jaw is.

How to use this chapter: Focus on the descriptions and English comparisons. The IPA symbols in square brackets are reference notation — you don't need to memorize them. They are included so you can look up precise pronunciations later if you wish.

VowelRomajiIPAApproximate quality
a[a]Open, like the "a" in "father" — never like "cat" or "cake"
i[i]Close front, like "ee" in "see" — never reduced to "ih"
u[ɯ]Close back, unrounded — not like English "oo"
e[e]Mid front, like "e" in "bet" — never like "ay"
o[o]Mid back, rounded, like "o" in "coat" but without the glide

Three properties distinguish these vowels from their English counterparts.

First, Japanese vowels are pure — they do not diphthongize. English vowels frequently glide: the "o" in "go" actually moves from [o] toward [ʊ], producing a diphthong [oʊ]. Japanese お does not glide. It starts as [o] and stays [o]. Similarly, English "ay" in "day" is really [eɪ] — a vowel that shifts. Japanese え is a steady [e] throughout.

Second, Japanese vowels show no reduction. In English, unstressed vowels collapse into a neutral "schwa" sound [ə] — the second syllable of "sofa," the first syllable of "about." Japanese has no schwa. Every vowel maintains its full quality regardless of position or speed. The あ in the middle of a long word is the same あ you would produce in isolation. Vowels do not weaken or change quality in unstressed positions the way English vowels constantly do.

Third, the Japanese う is unrounded. English speakers producing "oo" push their lips forward into a tight circle. For Japanese う [ɯ], the lips remain relatively flat and relaxed. The tongue position is similar to English "oo," but without the lip rounding. This is one of the most common errors English speakers make, and it is immediately noticeable to Japanese listeners. To produce it, aim for the vowel quality of "oo" while keeping your lips in a neutral, slightly spread position.


1.3 Consonants

Most Japanese consonants will feel reasonably familiar to English speakers. A few, however, differ in ways that matter for both listening and production.

The Japanese r — Between "d," "l," and "r"

The Japanese r-sound (ら・り・る・れ・ろ — ra ri ru re ro) is neither the English "r" nor the English "l." It is an alveolar tap [ɾ]: the tip of the tongue strikes the alveolar ridge (the bump behind your upper front teeth) once, briefly, and drops away. It lives somewhere between English "d," "l," and "r."

If you have ever produced a fast, casual American English "d" in a word like "butter" or "ladder" — that quick tap in the middle — you are close to the Japanese r. Do not curl your tongue back as you would for English "r." Do not press your tongue against the ridge and hold it there as you would for English "l." One quick tap. That is it. All five sounds らりるれろ use this same mechanism.

Practice words: さくら (sakura, cherry blossom), そら (sora, sky), りんご (ringo, apple).

ふ — Not Quite an F

The kana ふ is typically romanized as "fu," but it is not produced like English "f." English "f" is a labiodental fricative [f] — the lower lip presses against the upper teeth, and air is forced through the gap. Japanese ふ is a bilabial fricative [ɸ]: air passes between the two lips, which are brought close together but do not touch the teeth. The sound is softer and less sharp than English "f."

To produce it, bring both lips close together — as if you were about to blow out a candle very gently — and let air pass between them. No teeth are involved. Practice: ふね (fune, ship), ふゆ (fuyu, winter).

ん — One Letter, Many Sounds

The kana ん represents a moraic nasal that takes one full mora of time. Its actual pronunciation shifts depending on what follows it. All of these realizations are a single phoneme — one sound unit — but its surface form changes automatically based on environment:

EnvironmentRealizationIPAExample
Before b, p, mBilabial nasal[m]さんぽ (sanpo) → [sampo]
Before t, d, nAlveolar nasal[n]さんど (sando) → [sando]
Before k, gVelar nasal[ŋ]さんかい (sankai) → [saŋkai]
End of utteranceUvular nasal[ɴ]ほん (hon) → [hoɴ]

You do not need to consciously control this variation. If you simply hold the ん for its full mora of time, your mouth will naturally move into the correct position for the following sound. What matters is that you recognize ん as a full beat of time, not a quick throwaway nasal tacked onto the previous syllable.

Sounds That Don't Match Their Row

Several kana do not match what their row membership would predict:

  • is [ɕi], not [si]. It sounds close to English "she" but with a slightly different tongue position — the blade of the tongue approaches the alveolar-palatal region rather than the postalveolar area.
  • is [tɕi], not [ti]. It sounds like English "chi" in "cheese."
  • is [tsɯ], not [tu]. An affricate: the tongue briefly stops airflow behind the teeth, then releases into a hissing [s].
  • is [çi], a voiceless palatal fricative — softer and more forward than English "h."
  • is [ɸɯ], the bilabial fricative described above.

These are not exceptions to apologize for. They are simply the sounds of modern Japanese, shaped by historical sound changes. The rest of the consonant inventory (k, s, t, n, m, y, w, g, z, d, b, p) is close enough to English equivalents that no special instruction is needed at this stage.

Brief consonant inventory overview

RowRomajiNotes
か行ka ki ku ke koLike English k
さ行sa shi su se soし = [ɕi]
た行ta chi tsu te toち = [tɕi], つ = [tsɯ]
な行na ni nu ne noLike English n
は行ha hi fu he hoひ = [çi], ふ = [ɸɯ]
ま行ma mi mu me moLike English m
や行ya — yu — yoGlides
ら行ra ri ru re roAlveolar tap [ɾ]
わ行wa — — — (w)oを is pronounced [o] in modern standard Japanese

1.4 Devoicing

This section describes a phenomenon that most beginner textbooks ignore and that causes real comprehension problems.

The vowels い [i] and う [ɯ] are regularly devoiced — produced without vocal cord vibration — when they occur between two voiceless consonants (k, s, sh, t, ch, ts, h, f, p), or between a voiceless consonant and a pause (such as the end of a sentence). The vowel is still technically present in terms of tongue and lip position, but the voicing is stripped away, making it nearly inaudible.

This means that common words sound very different from what a naive reading of the romaji would predict:

WrittenNaive readingActual pronunciationMeaning
ですdesu[des]is / am / are (copula)
ますmasu[mas]polite verb ending
すきsuki[sɯ̥ki] or [ski]like / fond of
くさkusa[kɯ̥sa] or [ksa]grass
ひとhito[çi̥to] or [çto]person
きくkiku[kikɯ̥] or [kik]to listen

The devoicing of です is especially important. Learners who expect to hear "de-su" at the end of every polite sentence will be confused when they hear [des] instead — which is virtually always in natural speech. The same applies to ます, which routinely sounds like [mas].

This process is automatic and natural — it is not "lazy" pronunciation or casual speech. It occurs in formal registers, news broadcasts, and careful speech just as readily as in casual conversation. It is a regular phonological rule of standard Japanese.

The rule is predictable: voiceless consonants on both sides (or a voiceless consonant followed by a pause) trigger devoicing of high vowels い and う. The vowels あ, え, and お are not affected.

Why this matters for listening: if you expect to hear [desɯ] and the speaker produces [des], you may not recognize the word. If you expect [sɯki] and hear [ski], you will be confused. Once you know that devoicing is systematic, words that previously sounded garbled become transparent. This single piece of knowledge will immediately improve your listening comprehension.


1.5 Long Vowels and Geminate Consonants

Length is contrastive in Japanese. Producing a vowel or consonant for one mora versus two morae changes the word's meaning. This is not a subtle stylistic difference — it is as fundamental as the difference between "bit" and "beat" in English, except that in Japanese, the mechanism is pure duration, applied systematically to both vowels and consonants.

Long vowels

A long vowel is an extra mora of the same vowel. In hiragana, long vowels in native Japanese words are typically written by adding the corresponding vowel kana:

  • おかあさん (okaasan, "mother") — か + あ = two morae of [a]
  • おにいさん (oniisan, "older brother") — に + い
  • くうき (kuuki, "air") — く + う
  • おねえさん (oneesan, "older sister") — ね + え
  • おおきい (ookii, "big") — お + お

For the お-column, an orthographic complication: in Sino-Japanese words (those borrowed from Chinese through kanji readings), the long [oː] sound is usually written おう rather than おお. Thus とうきょう (toukyou, "Tokyo") and がっこう (gakkou, "school") are pronounced with a sustained [oː] even though the kana show おう. This is a spelling convention — the pronunciation is the same sustained vowel.

Minimal pairs showing the contrast:

ShortMeaningLongMeaning
おじさん ojisanuncleおじいさん ojiisangrandfather
おばさん obasanauntおばあさん obaasangrandmother
え epictureええ eeyes (informal)
ここ kokohereこうこう koukouhigh school

Geminate consonants

A geminate consonant is a doubled consonant preceded by a mora of closure or silence, written with っ (small tsu). During that mora, your mouth moves into position for the consonant and holds — no air is released, no sound is produced — and then the consonant is released on the next mora.

Both the long vowel and the っ occupy exactly one mora of time. They are heard as a distinct beat. Dropping or shortening them produces a different word.

Without geminateMeaningWith geminateMeaning
きて kitecomeきって kittestamp / cut
かた katashoulderかった kattawon / bought
さか sakaslopeさっか sakkaauthor
もと motooriginもっと mottomore
いた itawas (location)いった ittawent / said

In listening, training yourself to hear the brief moment of silence before the consonant release is critical. That silence is the っ — one full mora of nothing.


1.6 Pitch Accent

Japanese is a pitch-accent language. It is not stress-accented like English (where stressed syllables are louder, longer, and higher) and it is not fully tonal like Mandarin Chinese (where each syllable carries an independent tone). Instead, Japanese uses relative pitch — high (H) versus low (L) — across the morae of a word, with one key feature: the location of the pitch drop.

Two pitch levels

Standard Japanese operates with two relative pitch levels: high and low. There is no "medium." Each mora in a word is either H or L relative to its neighbors. The first two morae of a word are always different in pitch — if the first is H, the second is L, and vice versa. This is a reliable rule of Tokyo-dialect Japanese.

The four Tokyo noun patterns

Every noun in standard Japanese (based on Tokyo dialect) has a pitch accent pattern defined by where — if anywhere — the pitch drops. There are four types:

頭高型 atamadaka-gata (head-high) — The pitch starts high on the first mora and drops immediately. Pattern: HL...

  • あめ (ame, 雨, rain) = HL: あ is high, め is low
  • いのち (inochi, 命, life) = HLL: い is high, の and ち are low

中高型 nakadaka-gata (middle-high) — The pitch rises after the first mora, stays high for one or more morae, then drops before the final mora. The drop occurs somewhere in the middle of the word. Pattern: LH...L

  • たまご (tamago, 卵, egg) = LHL: た is low, ま is high, ご is low

尾高型 odaka-gata (tail-high) — The pitch rises after the first mora and stays high through the end of the word — but drops on the particle that follows. In isolation, odaka words sound identical to heiban words. The difference only surfaces when a particle like が or を is attached. Pattern: LH...H(L on particle)

  • おとこ (otoko, 男, man) = LHH in isolation, but おとこが = LHHL (the が drops)

平板型 heiban-gata (flat) — The pitch starts low on the first mora, rises to high on the second, and stays high through the word and any following particles. There is no drop at all. Pattern: LH...H(H on particle)

  • さくら (sakura, 桜, cherry blossom) = LHH, and さくらが = LHHH (the が stays high)

Notation used in this book

This book marks pitch accent using the downstep number system, which is standard in Japanese dictionaries like the NHK日本語発音アクセント新辞典:

NotationNameMeaningExample
平板 heibanNo drop anywhereさくら⓪ (桜, cherry blossom)
頭高 atamadakaDrop after mora 1あめ① (雨, rain)
Drop after mora 2Odaka for 2-mora words, nakadaka for longerはし② (橋, bridge)
Drop after mora 3Depends on word lengthおとうと③ (弟, younger brother)

The number indicates the mora after which the pitch drops. ⓪ means no drop occurs (heiban). ① means the pitch drops after the first mora (atamadaka). For a two-mora word, ② means the drop comes after the final mora — which means it is odaka (the drop falls on the following particle). For a three-mora word, ② is nakadaka (the drop is in the middle).

Minimal pairs

These word pairs are identical in their individual sounds. Only pitch distinguishes them:

WordPitchNotationMeaning
あめHL雨 rain
あめLH飴 candy
はしHL箸 chopsticks
はしLH橋 bridge
かきHL柿 persimmon
かきLH牡蠣 oyster

Pitch accent is taught in this book primarily for listening comprehension — hearing the difference between words that are otherwise identical. At this stage, your goal is awareness. When you encounter a new word, notice its pitch pattern. Over time, this awareness will sharpen your ear and make natural speech more intelligible. This book marks pitch accent for all new vocabulary throughout.


1.7 Intonation

Pitch accent operates at the word level — it is a property of individual words. Intonation operates at the sentence level. It is the melody overlaid on top of word-level pitch patterns. These are independent systems: intonation does not replace pitch accent, it is layered on top of it. A word keeps its inherent pitch pattern even as sentence-level intonation modulates the overall contour.

Japanese intonation is subtler than English intonation, but it performs several important functions.

Questions without か

In formal speech, yes/no questions are marked with the particle か (ka) at the end of the sentence. In casual speech, か is often dropped, and the question is signaled by rising intonation alone.

  • たべる。 (Taberu.) — "I eat." / "I'll eat." — falling or level intonation at the end.
  • たべる? (Taberu?) — "You eat?" / "Are you eating?" — the final mora rises in pitch.

This rising intonation is the only difference between a statement and a question in casual speech. If you flatten it, your question will be heard as a statement. Listening for this rise is essential from the very beginning.

Confirmation with ね

The particle ね (ne) at the end of a sentence seeks confirmation or agreement — roughly, "right?" or "isn't it?" It is produced with a slight rise in pitch, inviting the listener to affirm.

  • いいてんきですね。 (Ii tenki desu ne.) — "Nice weather, isn't it." — ね rises softly.

The rise on ね is gentler than a question. It signals that the speaker assumes agreement and is inviting the listener to confirm, not genuinely asking for new information.

Surprise

A sharp, sudden rise in pitch expresses surprise. This is similar to English but tends to be more contained — a quick spike rather than a prolonged exclamation.

  • えっ? (E?) — A short, sharp rise expressing surprise.
  • ほんとうに? (Hontou ni?) — "Really?" — rising sharply on the final に.

Trailing off (けど...)

Japanese speakers frequently leave sentences unfinished, trailing off with particles like けど (kedo, "but...") or が (ga, "but..."), or simply letting the sentence hang without completion. This is not sloppy speech. It is a deliberate pragmatic device that softens assertions, implies shared understanding, or avoids being overly direct.

  • ちょっとむずかしいんですけど… (Chotto muzukashii n desu kedo...) — "It's a bit difficult, but..." — the sentence trails off rather than stating a conclusion. The trailing pitch typically falls gently or remains level.

This indicates hesitation, softening, or implied continuation. You will encounter it constantly in natural Japanese. Recognizing it as intentional rather than accidental is important — the speaker has communicated something by not finishing the sentence.


Looking Ahead

This chapter has introduced the sound system of Japanese: its mora-timed rhythm, its five stable vowels, its consonants (including the tapped r, the bilabial ふ, and the variable ん), the systematic devoicing of high vowels, the meaningful contrasts created by vowel and consonant length, the pitch accent system that distinguishes otherwise identical words, and the sentence-level intonation patterns that carry pragmatic meaning. None of this is decoration. These are the acoustic facts you will rely on every time you listen to Japanese.

In the next chapter, you will learn to read hiragana — the first of the two phonetic scripts.


Vocabulary

All Japanese words used as examples in this chapter, listed with pitch accent notation.

WordRomajiPitchMeaning
あめame雨 rain
あめame飴 candy
あさasa朝 morning
いけike池 pond
いのちinochi命 life
いもうとimouto妹 younger sister
うたuta歌 song
えきeki駅 station
おとこotoko男 man
おとうとotouto弟 younger brother
おばさんobasan叔母さん aunt
おばあさんobaasanお祖母さん grandmother
おじさんojisan叔父さん uncle
おじいさんojiisanお祖父さん grandfather
おかあさんokaasanお母さん mother
おにいさんoniisanお兄さん older brother
おねえさんoneesanお姉さん older sister
おおきいookii大きい big
かおkao顔 face
かきkaki牡蠣 oyster
かきkaki柿 persimmon
かぜkaze風 wind
かたkata肩 shoulder
かどkado角 corner
きくkiku聞く to listen / chrysanthemum
きてkite来て come (て-form)
きってkitte切手 stamp
くさkusa草 grass
くうきkuuki空気 air
くつkutsu靴 shoes
ここkokohere
こうこうkoukou高校 high school
さかsaka坂 slope
さっかsakka作家 author
さくらsakura桜 cherry blossom
さんぽsanpo散歩 walk
したshita下 below
すきsuki好き like / fond of
すしsushi寿司 sushi
そらsora空 sky
たまごtamago卵 egg
てらtera寺 temple
とうきょうtoukyou東京 Tokyo
なつnatsu夏 summer
はしhashi橋 bridge
はしhashi箸 chopsticks
ひとhito人 person
ふねfune船 ship
ふゆfuyu冬 winter
ほんhon本 book
もとmoto元 origin
もっとmottomore
ものmono物 thing
りんごringo林檎 apple