Explaining a pattern of sound change in language

Dinoj Surendran

1  Explaining a pattern of sound change in language

Dinoj Surendran

For several decades, there have been people interested in quantitatively describing phenomena generally studied in the humanities and social sciences. However, their work is not known as well as it should be, and most people continue to think that computational models have no place outside the physical and biological sciences. This article describes a simple phenomenon of language change and give a few examples of how theories for it can be tested computationally.

There have always been purists who complain that English, or whatever language is in question, isn't spoken as it used to be. They are quite right in their observation, but wrong if they think it is normal for languages to remain static. With speech, shift happens.

The precise shift we wish to look at is the pronounciation of a single word. We want to know how people pronounced a word w at different times in the last, say hundred, years. Unfortunately, it is hard to find this out directly. Written sources aren't any use, as spellings tend to stay constant when pronunciations change. Audio recordings from that far back, where existent, are rarely representative of the general public.

However, the linguist William Labov suggested in 1966 an ingenious method, called ``apparent time'', of indirectly finding out how people many years ago pronounced a word. It is based on the reasonable assumption that individuals learn the pronunciation of common words when they are children and keep this pronunciation throughout their life. This means that to estimate the distribution of pronunciations N years ago, the researcher goes out into the population, finds a group of people N (plus a bit, since children need some time to learn pronunciations) years old and observes how they pronounce the word.

Examples of the data produced by Labov's method is given in Figure . This is data collected by Zhongwei Shen in 1990 when he was a graduate student at Berkeley. He went to part of China where the Wenzhou dialect of Chinese is spoken and observed how people of varying ages pronounced several words whose pronunciation had changed this century. In Wenzhou, the sounds `oy' and `o' have recently merged. These are the sounds in the English words `coil' and `coal'. If the same merger occurred in English, both such words would be pronounced `coal', i.e. only the pronunciation of `coil' would change.

dinojy.png

Figure 1: The changing distribution of pronunciations of three Wenzhou words, based on data collected by Zhongwei Shen, in apparent time.

The shape of the curves for the three words is a distinctive S-shape. This shape is also observed in the other words looked at by Shen, and in other situations as well. Linguists would like to know why this particular shape has to occur. Was it due to contact with people who spoke other languages? In the Wenzhou case, there wasn't much contact, especially in the pre-radio days. Another possible source of language change is language learning; children often do not speak quite the same language their parents do. But can a theory of language learning explain the data? This is what we wish to test.

To test a theory, the computational linguist must convert it to a model that predicts what the data should be like. If this does not agree with the actual data, then computationalist blames the theorist for not having a good enough theory and the theorist blames the computationalist for not implementing the theory properly. Both can be right, since the conversion process often requires that simplifications be made to the theory to make it efficient enough to run on a computer and/or analyze mathematically. (Another common occurrence is that the theory does not give enough details to implement as is.)

This article will focus on variants of just one theory of pronunciation learning, for space reasons:

A child has a maturational period (e.g. the first five years of life), during which it hears several possible pronunciations. It ends up using one it heard most often.

This theory, in several forms, was tested by Partha Niyogi and Robert Berwick. The general framework they used was dynamical systems. Suppose that the people under analysis speak only languages amongst a finite set {L1,¼,Ln}. If the proportion of people at time t speaking language Li is pi(t), then P(t)=[p1(t) ¼pn(t)]T is a vector function from \mathbbR to \mathbbRn. An Update Rule, provided by the theory being tested, shows how to obtain P(t) from P(s), s < t.

Note that the 'languages' may not correspond to languages in the everyday sense of the word. For example, to model the Wenzhou example, we use n=2 with L1 and L2 being Wenzhou with before and after the o-oy merger respectively. Some readers will shudder at this oversimplification. Why should n be only 2? How can one ignore all the other languages in the world, let alone dialectal variations? The answer is that we shouldn't, but if we want to analyze something, we have to. If simple models explain the data, excellent. Otherwise, we make the models more sophisticated.

So in this particular case, we assume that people in this geographical region were constrained, by other factors, to speak Wenzhou, though with small changes here and there. One such change would be in the pronunciations of words, whether the person still used the sound `oy' or not.

What is t? Something representing time. To do any computation, it must be a discrete unit of time. This is a reasonable assumption.

How do we obtain P(t) from previous P(s), s < t? In practice we need, for some finite r, P(t) to only be a function of P(t-1), P(t-2), ¼, P(t-r). [In the jargon, P would then be said to be a r-order Markov process.] Higher r means a better model, but also means more data is needed to test the model's predictions. For example, the cases analyzed here use r=1. If t is taken to index by generations rather than, say, years, then the assumption being made is that children learn language from people in the previous generation only - a major assumption!

How should the maturational period be represented? We convert years to instances, so that the theory we are now testing says:

A child hears N pronunciations of a word, and ends up using the one it hears most often. All pronunciations it hears are by members of the previous generation.

In the Wenzhou case, there are only n=2 possible pronunciations in the data, so p1(t) is the proportion of people using the pre-merger pronunciation, and p2(t) ditto for post-merger, at time t. Since p1(t)+p2(t)=1, we take p(t)=p2(t) for convenience.

The probability that the child in generation t+1 hears the post-merger pronunciation k times out of N is CNk (1-p(t))(N-k)p(t)k. The probability p(t+1) that the child adopts this pronunciation is the probability that k ³ [(N)/2], i.e. åk ³ 0.5NCNk (1-p(t))(N-k) p(t)k. Thus


p(t+1) =
å
k ³ N/2 
CNk (1-p(t))(N-k) p(t)k

A fixed point analysis of this system is done by solving for p in p(t+1)=p(t)=p. As the technical details are not that interesting, we leave them to the imagination of the reader. Suffice to say that p=0 and 1 are stable fixed points and there is an unstable fixed point around (exactly if N is odd) p=[ 1/2]. In other words p(t) heads for 0 or 1 when p(1) is below or above 0.5 respectively. Unfortunately, this does not explain the data, where p(t) heads for 1 even when it starts from way below 0.5.

In Wenzhou, the sounds oy and o merged to become o. Why didn't they merge to become oy? It has been suggested that this is because o is easier to pronounce and/or to hear. This can be modelled by generalizing our update rule. Before we said that the child settles on the post-merger pronunciation if over [ 1/2] of the first N example pronunciations it hears are post-merger. Now we replace [ 1/2] by a fraction q, which can be anything between 0 and 1. If q < [ 1/2] then the post-merger pronunciation has an edge over pre-merger one since the child can still end up using it even after hearing more of pre-merger examples.

A child hears N examples of a word pronounced in one of two ways, by members of the previous generation. It uses the (say) second way if the fraction of examples pronounced that way is at least q.

The dynamical system with q added still has 0 and 1 as stable fixed points, and an unstable fixed point q* between them. The population ends up all using w2 iff the initial fraction using it is at least q*, otherwise w2 dies out. Such a requirement, while less than before, is still asking quite a lot, especially since psychologically plausible values of q don't permit q* to be very small. If we conservatively place q*=0.25, this still mean that a quarter of the population would have to be using w2 to begin with. Why would they be? They would have to come from a completely different language group, but Wenzhou had little contact.

So giving one pronunciation an advantage is a good (as it models linguistic knowledge) modification but not good enough.

In our present model, the N pronunciations heard by the child were drawn using a probability distribution on the entire previous generation. In other words, it was as if a child was getting samples from all members of the previous generation. A more realistic model would have a child only get samples from a subset of the previous generation, such as its parents. This model was proposed by Luigi Cavalli-Sforza and Marcus Feldman, in the context of general cultural transmission, and applied by Niyogi and Berwick to language transmission.

A child hears N examples of a word w pronounced in one of two ways, w1 and w2, by its parents. It uses w2 if at least qN of the w examples are w2. We will also assume that the probability that a parent uses a particular pronunciation is the same as that in the general population. That is, the probability that two people become parents is independent of the pronunciations they use.

Clearly, if both parents use the same pronunciation, so do their children. The probability that a child in generation t+1 has 2 parents using pronunciation w2 is p(t)2; all such children will use w2. The probability that a child has one parent using w1 and one using w2 is 2p(t)(1-p(t)); a fraction of such children will use w2. The update rule is now


p(t+1) = p(t)2 +  2p(t)(1-p(t))

2N

å
k ³ qN 
CNk

dinojz.png

Figure 2: The progress of p(t) for different choices of maturational time N. The child only hears examples of word w from its parents, and ends up using pronunciation w2 if at least q=40% of the first N examples it hears are pronounced w2.

This time, we will not leave all of the fixed-point analysis of this dynamical system to the reader. Suppose p(t+1)=p(t)=p and that b = [ 1/(2N)]åk ³ qN CNk. Note that 0 £ b ³ 1, and b is [ 1/2] iff N is odd and q=[ 1/2]; otherwise b is more/less than [ 1/2] if q is the reverse. We have to solve p = p2 + 2p(1-p)b = : f(p). Some minor-league algebraic skullduggery makes it 0=p(1-p)(2b-1). This means that the fixed points are p=0 and 1, and all p iff b = [ 1/2].

Now to see if the fixed points are stable. Here f¢(p)=2p(1-2b)+2b. A fixed point p is stable if |f¢(p)| < 1 and unstable if |f¢(p)| > 1. If b = [ 1/2] = q, f¢(p)=1 and we can't say anything until we notice that now f(p)=p so that every p is fixed and stable. Now |f¢(0)|=2b and |f¢(1)|=2(1-b). If b < [ 1/2] < q, p=0 is stable and p=1 unstable; if b > [ 1/2] > q the reverse holds. This explains the Wenzhou data a lot better, since `o' being more salient than `oy' corresponds to q < [ 1/2]. Once even a few people began using using 'o' for 'oy', which we can, this time, get away with blaming on language contact, it was only a matter of time before the whole population followed suit.

There are still problems of course. A closer look at Shen's data shows that the stable point finally reached may not be 1, but a fraction close to 1. There is sometimes a drop-off after the highest point of the S-curve. But that is another story.

References

L. Cavalli-Sforza and M. W. Feldman, Cultural Transmission and Change: A Quantitative Approach, Princeton University Press, 1981.
William Labov, Principles of Linguistic Change: Internal Factors, Blackwell Publishers, 1994.
Partha Niyogi and Robert C. Berwick, Evolutionary Consequences of Language learning, Linguistics and Philosophy 20:697-719, 1997.
Partha Niyogi, Language Learning and Language Change, MIT Press, 2004.
Zhongwei Shen, Stochastic Diffusion and Regularity of Sound Change, paper given at NWAVE-XXII, Ottawa, 1993.

The author's day job is being a doctoral student in the Computer Science Department at the University of Chicago, studying topics in computational linguistics and speech recognition. He was formerly at Kutama College and University of Zimbabwe. Much of the material in this article is based on a chapter in Partha Niyogi's book to appear in 2004.

How can you get to the island?

Just a reminder that A PRIZE was offered in Issue 7.2 (page 4) for the best solution to this nice little puzzle, and nobody has yet attempted it. Go for it! If you were really in that position and there was a Big Prize on the island, you would find the way in less than 10 minutes....

· The island is in the middle of a small square pond. You cannot swim, but you have two identical planks, each just too short to reach from the side to the island. How can you use them to get to the island dry? (And there are more questions about the island on page 4 to be answered.)




File translated from TEX by TTH, version 3.01.
On 11 Oct 2004, 10:51.


Previous
Home
Next