Chapter 1 — The Sounds of Japanese
Japanese has a sound system that differs from English in fundamental ways — not just in which sounds exist, but in how rhythm, length, and pitch operate. This chapter lays out those differences honestly. Understanding them now will save you from persistent listening problems later. Most textbooks skip or minimize this material. That is a mistake.
1.1 Mora, Not Syllables
English is a stress-timed language. Some syllables are long and loud, others are crushed and swallowed. The word "comfortable" has four syllables in the dictionary but most speakers produce something closer to two. Japanese does not work this way.
Japanese is a mora-timed language. Each kana character represents one mora — a unit of timing that receives roughly equal duration. The word おばあさん (obaasan, "grandmother") is not three syllables. It is five morae: お・ば・あ・さ・ん. A Japanese speaker gives each of those five units approximately the same length. English speakers tend to rush through them and produce something that sounds like "obasan" — which is a different word entirely (おばさん, "aunt," four morae).
This distinction between mora-timing and stress-timing is not academic trivia. It is the rhythmic foundation of the language. In English, you can stretch or compress syllables freely without changing meaning. In Japanese, the number of morae in a word is fixed and meaningful. If you produce three morae where the word requires four, you have said a different word.
Three elements in Japanese each occupy exactly one mora despite not being full syllables in the English sense:
The moraic nasal ん counts as its own mora. The word さんぽ (sanpo, "walk") has three morae: さ・ん・ぽ — not two syllables. The ん receives the same duration as さ or ぽ.
Small っ (the geminate) counts as its own mora. It represents a beat of held silence before the following consonant is released. The word きって (kitte, "stamp") has three morae: き・っ・て. Compare きて (kite, "come"), which has only two morae: き・て.
Long vowels add an extra mora. The あ after ば in おばあさん is a separate unit of time, not merely a longer version of the same syllable.
Consider these minimal pairs, where mora count alone changes meaning:
| Word | Romaji | Morae | Meaning |
|---|---|---|---|
| きて | kite | 2 (き・て) | come |
| きって | kitte | 3 (き・っ・て) | stamp / cut (command) |
| おばさん | obasan | 4 (お・ば・さ・ん) | aunt |
| おばあさん | obaasan | 5 (お・ば・あ・さ・ん) | grandmother |
If you do not internalize mora timing, you will consistently mishear words. This is not a minor detail.
1.2 The Five Vowels
Japanese has exactly five vowels. They are simple, pure, and stable. The following table shows where each vowel sits in terms of tongue position — how far forward or back in your mouth, and how open or closed your jaw is.
How to use this chapter: Focus on the descriptions and English comparisons. The IPA symbols in square brackets are reference notation — you don't need to memorize them. They are included so you can look up precise pronunciations later if you wish.
| Vowel | Romaji | IPA | Approximate quality |
|---|---|---|---|
| あ | a | [a] | Open, like the "a" in "father" — never like "cat" or "cake" |
| い | i | [i] | Close front, like "ee" in "see" — never reduced to "ih" |
| う | u | [ɯ] | Close back, unrounded — not like English "oo" |
| え | e | [e] | Mid front, like "e" in "bet" — never like "ay" |
| お | o | [o] | Mid back, rounded, like "o" in "coat" but without the glide |
Three properties distinguish these vowels from their English counterparts.
First, Japanese vowels are pure — they do not diphthongize. English vowels frequently glide: the "o" in "go" actually moves from [o] toward [ʊ], producing a diphthong [oʊ]. Japanese お does not glide. It starts as [o] and stays [o]. Similarly, English "ay" in "day" is really [eɪ] — a vowel that shifts. Japanese え is a steady [e] throughout.
Second, Japanese vowels show no reduction. In English, unstressed vowels collapse into a neutral "schwa" sound [ə] — the second syllable of "sofa," the first syllable of "about." Japanese has no schwa. Every vowel maintains its full quality regardless of position or speed. The あ in the middle of a long word is the same あ you would produce in isolation. Vowels do not weaken or change quality in unstressed positions the way English vowels constantly do.
Third, the Japanese う is unrounded. English speakers producing "oo" push their lips forward into a tight circle. For Japanese う [ɯ], the lips remain relatively flat and relaxed. The tongue position is similar to English "oo," but without the lip rounding. This is one of the most common errors English speakers make, and it is immediately noticeable to Japanese listeners. To produce it, aim for the vowel quality of "oo" while keeping your lips in a neutral, slightly spread position.
1.3 Consonants
Most Japanese consonants will feel reasonably familiar to English speakers. A few, however, differ in ways that matter for both listening and production.
The Japanese r — Between "d," "l," and "r"
The Japanese r-sound (ら・り・る・れ・ろ — ra ri ru re ro) is neither the English "r" nor the English "l." It is an alveolar tap [ɾ]: the tip of the tongue strikes the alveolar ridge (the bump behind your upper front teeth) once, briefly, and drops away. It lives somewhere between English "d," "l," and "r."
If you have ever produced a fast, casual American English "d" in a word like "butter" or "ladder" — that quick tap in the middle — you are close to the Japanese r. Do not curl your tongue back as you would for English "r." Do not press your tongue against the ridge and hold it there as you would for English "l." One quick tap. That is it. All five sounds らりるれろ use this same mechanism.
Practice words: さくら (sakura, cherry blossom), そら (sora, sky), りんご (ringo, apple).
ふ — Not Quite an F
The kana ふ is typically romanized as "fu," but it is not produced like English "f." English "f" is a labiodental fricative [f] — the lower lip presses against the upper teeth, and air is forced through the gap. Japanese ふ is a bilabial fricative [ɸ]: air passes between the two lips, which are brought close together but do not touch the teeth. The sound is softer and less sharp than English "f."
To produce it, bring both lips close together — as if you were about to blow out a candle very gently — and let air pass between them. No teeth are involved. Practice: ふね (fune, ship), ふゆ (fuyu, winter).
ん — One Letter, Many Sounds
The kana ん represents a moraic nasal that takes one full mora of time. Its actual pronunciation shifts depending on what follows it. All of these realizations are a single phoneme — one sound unit — but its surface form changes automatically based on environment:
| Environment | Realization | IPA | Example |
|---|---|---|---|
| Before b, p, m | Bilabial nasal | [m] | さんぽ (sanpo) → [sampo] |
| Before t, d, n | Alveolar nasal | [n] | さんど (sando) → [sando] |
| Before k, g | Velar nasal | [ŋ] | さんかい (sankai) → [saŋkai] |
| End of utterance | Uvular nasal | [ɴ] | ほん (hon) → [hoɴ] |
You do not need to consciously control this variation. If you simply hold the ん for its full mora of time, your mouth will naturally move into the correct position for the following sound. What matters is that you recognize ん as a full beat of time, not a quick throwaway nasal tacked onto the previous syllable.
Sounds That Don't Match Their Row
Several kana do not match what their row membership would predict:
- し is [ɕi], not [si]. It sounds close to English "she" but with a slightly different tongue position — the blade of the tongue approaches the alveolar-palatal region rather than the postalveolar area.
- ち is [tɕi], not [ti]. It sounds like English "chi" in "cheese."
- つ is [tsɯ], not [tu]. An affricate: the tongue briefly stops airflow behind the teeth, then releases into a hissing [s].
- ひ is [çi], a voiceless palatal fricative — softer and more forward than English "h."
- ふ is [ɸɯ], the bilabial fricative described above.
These are not exceptions to apologize for. They are simply the sounds of modern Japanese, shaped by historical sound changes. The rest of the consonant inventory (k, s, t, n, m, y, w, g, z, d, b, p) is close enough to English equivalents that no special instruction is needed at this stage.
Brief consonant inventory overview
| Row | Romaji | Notes |
|---|---|---|
| か行 | ka ki ku ke ko | Like English k |
| さ行 | sa shi su se so | し = [ɕi] |
| た行 | ta chi tsu te to | ち = [tɕi], つ = [tsɯ] |
| な行 | na ni nu ne no | Like English n |
| は行 | ha hi fu he ho | ひ = [çi], ふ = [ɸɯ] |
| ま行 | ma mi mu me mo | Like English m |
| や行 | ya — yu — yo | Glides |
| ら行 | ra ri ru re ro | Alveolar tap [ɾ] |
| わ行 | wa — — — (w)o | を is pronounced [o] in modern standard Japanese |
1.4 Devoicing
This section describes a phenomenon that most beginner textbooks ignore and that causes real comprehension problems.
The vowels い [i] and う [ɯ] are regularly devoiced — produced without vocal cord vibration — when they occur between two voiceless consonants (k, s, sh, t, ch, ts, h, f, p), or between a voiceless consonant and a pause (such as the end of a sentence). The vowel is still technically present in terms of tongue and lip position, but the voicing is stripped away, making it nearly inaudible.
This means that common words sound very different from what a naive reading of the romaji would predict:
| Written | Naive reading | Actual pronunciation | Meaning |
|---|---|---|---|
| です | desu | [des] | is / am / are (copula) |
| ます | masu | [mas] | polite verb ending |
| すき | suki | [sɯ̥ki] or [ski] | like / fond of |
| くさ | kusa | [kɯ̥sa] or [ksa] | grass |
| ひと | hito | [çi̥to] or [çto] | person |
| きく | kiku | [kikɯ̥] or [kik] | to listen |
The devoicing of です is especially important. Learners who expect to hear "de-su" at the end of every polite sentence will be confused when they hear [des] instead — which is virtually always in natural speech. The same applies to ます, which routinely sounds like [mas].
This process is automatic and natural — it is not "lazy" pronunciation or casual speech. It occurs in formal registers, news broadcasts, and careful speech just as readily as in casual conversation. It is a regular phonological rule of standard Japanese.
The rule is predictable: voiceless consonants on both sides (or a voiceless consonant followed by a pause) trigger devoicing of high vowels い and う. The vowels あ, え, and お are not affected.
Why this matters for listening: if you expect to hear [desɯ] and the speaker produces [des], you may not recognize the word. If you expect [sɯki] and hear [ski], you will be confused. Once you know that devoicing is systematic, words that previously sounded garbled become transparent. This single piece of knowledge will immediately improve your listening comprehension.
1.5 Long Vowels and Geminate Consonants
Length is contrastive in Japanese. Producing a vowel or consonant for one mora versus two morae changes the word's meaning. This is not a subtle stylistic difference — it is as fundamental as the difference between "bit" and "beat" in English, except that in Japanese, the mechanism is pure duration, applied systematically to both vowels and consonants.
Long vowels
A long vowel is an extra mora of the same vowel. In hiragana, long vowels in native Japanese words are typically written by adding the corresponding vowel kana:
- おかあさん (okaasan, "mother") — か + あ = two morae of [a]
- おにいさん (oniisan, "older brother") — に + い
- くうき (kuuki, "air") — く + う
- おねえさん (oneesan, "older sister") — ね + え
- おおきい (ookii, "big") — お + お
For the お-column, an orthographic complication: in Sino-Japanese words (those borrowed from Chinese through kanji readings), the long [oː] sound is usually written おう rather than おお. Thus とうきょう (toukyou, "Tokyo") and がっこう (gakkou, "school") are pronounced with a sustained [oː] even though the kana show おう. This is a spelling convention — the pronunciation is the same sustained vowel.
Minimal pairs showing the contrast:
| Short | Meaning | Long | Meaning |
|---|---|---|---|
| おじさん ojisan | uncle | おじいさん ojiisan | grandfather |
| おばさん obasan | aunt | おばあさん obaasan | grandmother |
| え e | picture | ええ ee | yes (informal) |
| ここ koko | here | こうこう koukou | high school |
Geminate consonants
A geminate consonant is a doubled consonant preceded by a mora of closure or silence, written with っ (small tsu). During that mora, your mouth moves into position for the consonant and holds — no air is released, no sound is produced — and then the consonant is released on the next mora.
Both the long vowel and the っ occupy exactly one mora of time. They are heard as a distinct beat. Dropping or shortening them produces a different word.
| Without geminate | Meaning | With geminate | Meaning |
|---|---|---|---|
| きて kite | come | きって kitte | stamp / cut |
| かた kata | shoulder | かった katta | won / bought |
| さか saka | slope | さっか sakka | author |
| もと moto | origin | もっと motto | more |
| いた ita | was (location) | いった itta | went / said |
In listening, training yourself to hear the brief moment of silence before the consonant release is critical. That silence is the っ — one full mora of nothing.
1.6 Pitch Accent
Japanese is a pitch-accent language. It is not stress-accented like English (where stressed syllables are louder, longer, and higher) and it is not fully tonal like Mandarin Chinese (where each syllable carries an independent tone). Instead, Japanese uses relative pitch — high (H) versus low (L) — across the morae of a word, with one key feature: the location of the pitch drop.
Two pitch levels
Standard Japanese operates with two relative pitch levels: high and low. There is no "medium." Each mora in a word is either H or L relative to its neighbors. The first two morae of a word are always different in pitch — if the first is H, the second is L, and vice versa. This is a reliable rule of Tokyo-dialect Japanese.
The four Tokyo noun patterns
Every noun in standard Japanese (based on Tokyo dialect) has a pitch accent pattern defined by where — if anywhere — the pitch drops. There are four types:
頭高型 atamadaka-gata (head-high) — The pitch starts high on the first mora and drops immediately. Pattern: HL...
- あめ (ame, 雨, rain) = HL: あ is high, め is low
- いのち (inochi, 命, life) = HLL: い is high, の and ち are low
中高型 nakadaka-gata (middle-high) — The pitch rises after the first mora, stays high for one or more morae, then drops before the final mora. The drop occurs somewhere in the middle of the word. Pattern: LH...L
- たまご (tamago, 卵, egg) = LHL: た is low, ま is high, ご is low
尾高型 odaka-gata (tail-high) — The pitch rises after the first mora and stays high through the end of the word — but drops on the particle that follows. In isolation, odaka words sound identical to heiban words. The difference only surfaces when a particle like が or を is attached. Pattern: LH...H(L on particle)
- おとこ (otoko, 男, man) = LHH in isolation, but おとこが = LHHL (the が drops)
平板型 heiban-gata (flat) — The pitch starts low on the first mora, rises to high on the second, and stays high through the word and any following particles. There is no drop at all. Pattern: LH...H(H on particle)
- さくら (sakura, 桜, cherry blossom) = LHH, and さくらが = LHHH (the が stays high)
Notation used in this book
This book marks pitch accent using the downstep number system, which is standard in Japanese dictionaries like the NHK日本語発音アクセント新辞典:
| Notation | Name | Meaning | Example |
|---|---|---|---|
| ⓪ | 平板 heiban | No drop anywhere | さくら⓪ (桜, cherry blossom) |
| ① | 頭高 atamadaka | Drop after mora 1 | あめ① (雨, rain) |
| ② | Drop after mora 2 | Odaka for 2-mora words, nakadaka for longer | はし② (橋, bridge) |
| ③ | Drop after mora 3 | Depends on word length | おとうと③ (弟, younger brother) |
The number indicates the mora after which the pitch drops. ⓪ means no drop occurs (heiban). ① means the pitch drops after the first mora (atamadaka). For a two-mora word, ② means the drop comes after the final mora — which means it is odaka (the drop falls on the following particle). For a three-mora word, ② is nakadaka (the drop is in the middle).
Minimal pairs
These word pairs are identical in their individual sounds. Only pitch distinguishes them:
| Word | Pitch | Notation | Meaning |
|---|---|---|---|
| あめ | HL | ① | 雨 rain |
| あめ | LH | ⓪ | 飴 candy |
| はし | HL | ① | 箸 chopsticks |
| はし | LH | ② | 橋 bridge |
| かき | HL | ① | 柿 persimmon |
| かき | LH | ⓪ | 牡蠣 oyster |
Pitch accent is taught in this book primarily for listening comprehension — hearing the difference between words that are otherwise identical. At this stage, your goal is awareness. When you encounter a new word, notice its pitch pattern. Over time, this awareness will sharpen your ear and make natural speech more intelligible. This book marks pitch accent for all new vocabulary throughout.
1.7 Intonation
Pitch accent operates at the word level — it is a property of individual words. Intonation operates at the sentence level. It is the melody overlaid on top of word-level pitch patterns. These are independent systems: intonation does not replace pitch accent, it is layered on top of it. A word keeps its inherent pitch pattern even as sentence-level intonation modulates the overall contour.
Japanese intonation is subtler than English intonation, but it performs several important functions.
Questions without か
In formal speech, yes/no questions are marked with the particle か (ka) at the end of the sentence. In casual speech, か is often dropped, and the question is signaled by rising intonation alone.
- たべる。 (Taberu.) — "I eat." / "I'll eat." — falling or level intonation at the end.
- たべる? (Taberu?) — "You eat?" / "Are you eating?" — the final mora rises in pitch.
This rising intonation is the only difference between a statement and a question in casual speech. If you flatten it, your question will be heard as a statement. Listening for this rise is essential from the very beginning.
Confirmation with ね
The particle ね (ne) at the end of a sentence seeks confirmation or agreement — roughly, "right?" or "isn't it?" It is produced with a slight rise in pitch, inviting the listener to affirm.
- いいてんきですね。 (Ii tenki desu ne.) — "Nice weather, isn't it." — ね rises softly.
The rise on ね is gentler than a question. It signals that the speaker assumes agreement and is inviting the listener to confirm, not genuinely asking for new information.
Surprise
A sharp, sudden rise in pitch expresses surprise. This is similar to English but tends to be more contained — a quick spike rather than a prolonged exclamation.
- えっ? (E?) — A short, sharp rise expressing surprise.
- ほんとうに? (Hontou ni?) — "Really?" — rising sharply on the final に.
Trailing off (けど...)
Japanese speakers frequently leave sentences unfinished, trailing off with particles like けど (kedo, "but...") or が (ga, "but..."), or simply letting the sentence hang without completion. This is not sloppy speech. It is a deliberate pragmatic device that softens assertions, implies shared understanding, or avoids being overly direct.
- ちょっとむずかしいんですけど… (Chotto muzukashii n desu kedo...) — "It's a bit difficult, but..." — the sentence trails off rather than stating a conclusion. The trailing pitch typically falls gently or remains level.
This indicates hesitation, softening, or implied continuation. You will encounter it constantly in natural Japanese. Recognizing it as intentional rather than accidental is important — the speaker has communicated something by not finishing the sentence.
Looking Ahead
This chapter has introduced the sound system of Japanese: its mora-timed rhythm, its five stable vowels, its consonants (including the tapped r, the bilabial ふ, and the variable ん), the systematic devoicing of high vowels, the meaningful contrasts created by vowel and consonant length, the pitch accent system that distinguishes otherwise identical words, and the sentence-level intonation patterns that carry pragmatic meaning. None of this is decoration. These are the acoustic facts you will rely on every time you listen to Japanese.
In the next chapter, you will learn to read hiragana — the first of the two phonetic scripts.
Vocabulary
All Japanese words used as examples in this chapter, listed with pitch accent notation.
| Word | Romaji | Pitch | Meaning |
|---|---|---|---|
| あめ | ame | ① | 雨 rain |
| あめ | ame | ⓪ | 飴 candy |
| あさ | asa | ① | 朝 morning |
| いけ | ike | ② | 池 pond |
| いのち | inochi | ① | 命 life |
| いもうと | imouto | ④ | 妹 younger sister |
| うた | uta | ② | 歌 song |
| えき | eki | ① | 駅 station |
| おとこ | otoko | ③ | 男 man |
| おとうと | otouto | ④ | 弟 younger brother |
| おばさん | obasan | ⓪ | 叔母さん aunt |
| おばあさん | obaasan | ② | お祖母さん grandmother |
| おじさん | ojisan | ⓪ | 叔父さん uncle |
| おじいさん | ojiisan | ② | お祖父さん grandfather |
| おかあさん | okaasan | ② | お母さん mother |
| おにいさん | oniisan | ② | お兄さん older brother |
| おねえさん | oneesan | ② | お姉さん older sister |
| おおきい | ookii | ③ | 大きい big |
| かお | kao | ⓪ | 顔 face |
| かき | kaki | ⓪ | 牡蠣 oyster |
| かき | kaki | ① | 柿 persimmon |
| かぜ | kaze | ⓪ | 風 wind |
| かた | kata | ② | 肩 shoulder |
| かど | kado | ① | 角 corner |
| きく | kiku | ⓪ | 聞く to listen / chrysanthemum |
| きて | kite | ① | 来て come (て-form) |
| きって | kitte | ⓪ | 切手 stamp |
| くさ | kusa | ② | 草 grass |
| くうき | kuuki | ① | 空気 air |
| くつ | kutsu | ② | 靴 shoes |
| ここ | koko | ⓪ | here |
| こうこう | koukou | ⓪ | 高校 high school |
| さか | saka | ⓪ | 坂 slope |
| さっか | sakka | ⓪ | 作家 author |
| さくら | sakura | ⓪ | 桜 cherry blossom |
| さんぽ | sanpo | ⓪ | 散歩 walk |
| した | shita | ⓪ | 下 below |
| すき | suki | ② | 好き like / fond of |
| すし | sushi | ① | 寿司 sushi |
| そら | sora | ① | 空 sky |
| たまご | tamago | ② | 卵 egg |
| てら | tera | ② | 寺 temple |
| とうきょう | toukyou | ⓪ | 東京 Tokyo |
| なつ | natsu | ② | 夏 summer |
| はし | hashi | ② | 橋 bridge |
| はし | hashi | ① | 箸 chopsticks |
| ひと | hito | ⓪ | 人 person |
| ふね | fune | ① | 船 ship |
| ふゆ | fuyu | ② | 冬 winter |
| ほん | hon | ① | 本 book |
| もと | moto | ② | 元 origin |
| もっと | motto | ① | more |
| もの | mono | ② | 物 thing |
| りんご | ringo | ⓪ | 林檎 apple |