Taalportaal - the digital language portal

Dutch
Frisian
Saterfrisian
Afrikaans

Show all

Segment frequencies of consonants and vowels

quickinfo

The following list of segmental frequencies in Dutch was extracted from the phonetically transcribed part of the Dutch Celex database (Baayen et al. 1995). The syllable boundaries provided in Celex were used. All syllables were classified as either being monosyllables (originating from monosyllabic words), stressed polysyllables or unstressed polysyllables (i.e. the stressed or unstressed syllable of a polysyllabic word). Subsequently, each syllable was parsed into a positional syllable template differentiating onset, nucleus and coda positions. The type and token frequencies given in table (1) are based on the number of entities per syllable position.

Please note that ambisyllabic consonants are not tagged as such in the Celex database. They are consistently classified as onset consonants, which means that B-class vowels in polysyllabic words appear in open syllables in the Celex transcriptions. As a result, the numbers presented for coda consonants in polysyllabic words and in all words combined may be skewed.

Furthermore, the Celex (word) frequency count of 486 cases (out of 5380) is specified as zero - although these words are present in the Celex database. The frequency count of zero was taken over for the syllable counts.

A searchable xls-file with the raw Celex count data can be found here. Examples are provided for each syllable type. Moreover, the data set can be filtered with respect to word type (monosyllabic or polysyllabic word), stress type (stressed or unstressed syllable), each syllable position and all combinations of these elements. Celex token and type frequencies of the filtered data are given in the top left corner of the xls-file.

[+]Distribution of all segments across all syllable types and all syllable positions

Table 1

Segment	Type frequency	Segment	Token frequency
[s]	8.7%	[n]	11.1%
[r]	8.6%	[t]	9.1%
[t]	8.5%	[ə]	8.7%
[l]	6.7%	[d]	5.9%
[k]	6.1%	[r]	5.8%
[n]	4.9%	[ɑ]	4.8%
[p]	4.4%	[ɛ]	4.3%
[ɑ]	4.3%	[z]	3.5%
[x]	3.5%	[ɛi]	3.4%
[m]	3.3%	[l]	3.4%
[ɛ]	3.3%	[k]	3.1%
[ɔ]	3.2%	[m]	3.1%
[ɪ]	2.7%	[a]	2.9%
[f]	2.4%	[v]	2.8%
[a]	2.3%	[ɔ]	2.7%
[b]	2.2%	[ɪ]	2.7%
[ʋ]	2.1%	[s]	2.6%
[e]	2.0%	[x]	2.6%
[i]	1.8%	[h]	2.3%
[o]	1.8%	[ʋ]	2.1%
[ʏ]	1.7%	[p]	2.0%
[u]	1.7%	[o]	2.0%
[d]	1.6%	[e]	1.9%
[ɛi]	1.3%	[i]	1.9%
[v]	1.3%	[u]	0.9%
[j]	1.3%	[b]	0.9%
[h]	1.1%	[f]	0.8%
[z]	1.1%	[j]	0.6%
[ŋ]	1.1%	[ŋ]	0.5%
[œy]	0.9%	[ʏ]	0.5%
[ə]	0.8%	[œy]	0.4%
[ʃ]	0.7%	[y]	0.3%
[ø]	0.7%	[ɑu]	0.2%
[y]	0.6%	[χ]	0.2%
[ɑu]	0.5%	[ø]	0.1%
[χ]	0.3%	[ʃ]	<0.1%
[g]	0.2%	[ʒ]	<0.1%
[ɛː]	0.2%	[ɛː]	<0.1%
[ʒ]	0.1%	[ɔː]	<0.1%
[dʒ]	0.1%	[g]	<0.1%
[ɔː]	<0.1%	[dʒ]	<0.1%
[œː]	<0.1%	[œː]	<0.1%
[c]	<0.1%	[c]	<0.1%
[ɲ]	<0.1%	[ɲ]	<0.1%

extra

Segmental frequency data are also available separately for vowels and consonants. Furthermore, frequency data for even more fine-grained positions within onsets and codas are given.

References

Baayen, R. Harald, Piepenbrock, Richard & Gulikers, L1995The CELEX Lexical Database (CD-ROM), Release 2, Dutch Version 3.1

⇑ Back to top

About this article

Author(s): Kathrin Linke, Marc van Oostendorp

Category: Dutch Phonology

Keywords: segmentfrequencyCelex

Version: 1.0, December 2015

Version history:

version	editor(s)	date	remarks
1.0	Kathrin Linke, Marc van Oostendorp	December 2015

printreport errorcite

Jump to:

introduction
references
related articles