Wednesday, 17 February 2010

The Language Evolution Tree Returns

Recently, I found that people have an innate bias to associate Evolutionary Linguistics with Acacia trees. I've just noticed that Babel's Dawn, a blog about the evolution of language by Edmund Blair Bolles also has an acacia tree in the sunrise on its banner. Alas, it's not the language evolution tree, but it's pretty close. With this in mind, I propose that there is a deep association between acacia trees and language evolution. Just look at the following two graphs ...

Distribution of Acacia Trees

Distribution of Languages using complex tone systems

(yes, but this isn't)

Tuesday, 16 February 2010

How many words for Red? Part 4

Xan Gregg has done a data quality analysis of the Wikipedia colour data I used in recent posts (here). Unsurprisingly, the data is not great quality. The outliers discussed are not included in my analysis, but the colour conversions are not totally consistent either. As Gregg points out, Wikipedia is hardly a good source for Visual Psychophysics research, but it's still an interesting proof-of-concept.

Thursday, 11 February 2010

How many words for Red? Part 3

This week I've been writing about a bin-packing approach to the bilingual lexicon (here and here). Here's why:

The mutual exclusivity bias is a default approach infants have to learning words which assumes that every object has a single, unique name. However, some studies show that bilingual infants do not follow this bias. The question I'm researching is why assume different things about the world if you're hearing two languages instead of one? Certainly, bilingual children hear more synonyms, but monolinguals also hear many words for the same objects.

The literature has not produced clear-cut results (see my posts here and here), so I've been using a simple model to try and organise my ideas. The model is based on the Categorisation game, where a population of agents try to agree on words for colours. That is, a speaker is presented with a scene of several colours and refers to one of them. A listener must decide which colour the speaker is talking about. Agents begin with no categories and no words, but then divide the colour spectrum into categories and associate words with them, based on verbal interactions. The algorithm is reproduced here.

The algorithm makes some assumptions about mutual exclusivity. The first (what I call Heuristic A) is that, when you see two objects within the same category (e.g. two shades of red), you should divide that category so that there is only one object in each, then assign to each a new unique name. That is, assume that different objects have different names. The second (Heuristic B) is that, when you communicate successfully, delete all other names associated with the category.

These two heuristics limit bilingualism and introduce a mutual exclusivity bias. I ran the categorisation game model without these heuristics to see what would happen. Below are the results of two runs - one with Heuristic A (Black), and one without (Red) (10 runs each, 25 agents, a maximum of 100 perceptual categories, 20,000 rounds). Measures include the number of perceptual categories (both rise and plateau at the same rate), communicative success rate (both similar), the average number of names an agent has, the bin packing depth (bpDepth), bin packing wastage (bpE) and the amount of lexical overlap (overlap function from Baronchelli, Gong, Puglisi and Loreto, 2010).

Removing this heuristic has some interesting consequences for the model. Firstly, the average communicative success is unaffected by removing heuristic A. The number of perceptual categories also increases to the maximum in the same timescale. However, removing the heuristic leads to agents which are more memory-efficient in terms of the number of labels they know, and the efficiency of those names to describe the meaning space. That is, the bin packing metric suggests fewer synonyms and a more efficient coverage of the meaning space. In fact, agents without heuristic A were near optimal in their bin packing.

I suggest that removing heuristic A (each different object has a different name) changes the demands on memory in such a way to favour agents that have several complete descriptions of the meaning space. Dropping heuristic A also reduces the number of lexical items that are stored.

Children exposed to two languages have extra demands on lexical memory. The above analysis suggests that it's a good idea for these children to drop heuristic A in order to save storage space. That is, if you're bilingual, you shouldn't assume that every object has a different name. Indeed, this predicts some of the findings in the experimental literature (i.e. that bilinguals do not apply mutual exclusivity).

However, part of the problem is that this model is a model of emergent structure in labelling perception, it's not a model of acquisition. The method of splitting a perceptual space into categories is also possibly not realistic.

I've just come back from a talk by Kenny Smith, who's been running experiments into Mutual Exclusivity. He trained participants to associate novel words with novel objects, with some participants getting more synonymy than others (objects may have two associated words). After this, participants did a mutual exclusivity task - they were shown two objects, one from the training set and one new object and were asked which one was associated with a novel word. The degree that the participants adhered to mutual exclusivity was proportional to the amount of synonymy they had experienced.

Therefore, it seems that deciding to drop the mutual exclusivity bias may occur on-line. It remains to be seen whether the same results are obtained for children.

Wednesday, 10 February 2010

The Minimal Naming Game

This is the Minimal Naming Game Algorithm from Puglisi et al. (2008). I point out two Heuristics that affect the agent's ability to acquire two langauge systems.

There is a population of agents, each with a partitioning of the perceptual space called categories. Each category has a list of associated words. Each agent has a minimum perceptual difference threshold dmin , below which stimuli appear the same. At each time step:

1. Two individuals are chosen at random to be the speaker and the listener.

2. They both have access to a scene containing M stimuli. The stimuli must
be perceptually distinguishable by the agents (perceptual distance ≤ dmin ).

3. The speaker selects a topic and discriminates it in the following way:
• Each stimulus is assigned to a perceptual category
• If one or more other stimuli are assigned to the same category as the topic, the agent splits its perceptual categories so that each stimulus belongs to only one perceptual category.
• The new partitions inherit the associated words of the old partition.
Heuristic A: Each new partition is given a new, unique name.

4. The speaker transmits a word that it associates with the topic to the listener.
If it has no words associated with the category, it creates a new one. If it has more than one word associated, it transmits the one that was last used in a successful communication.

5. The hearer receives the word and finds all categories which have the associated word and which identify one of the stimuli in the scene. Then:
• If there are no such categories, the agent does nothing.
• If there is one such category, the agent points to the associated stimulus.
• If there is more than one such category, the agent points randomly at an associated stimulus.

6. The hearer discriminates the scene, as above.

7. The speaker reveals the topic to the listener.

8. If the hearer did not point to the topic, the communication is a failure. The hearer adds the transmitted word to the category discriminating the topic.

9. If the hearer pointed to the topic, the communication is a success.
Heuristic B: Both agents delete all other words but the transmitted one from the inventory of the category discriminating the topic.

Tuesday, 9 February 2010

How many words for Red? Part 2

Just an update on the last post (here). I looked at the distribution of colour terms from Wikipedia according to their hue value and suggested that people have more words for some colour ranges than others.

First, I ran a linear mixed effects model on the data, and came up with slightly different results. The spectrum was split into 10 equally sized bins, and the number of words that fell into each bin was counted. This number was predicted by the bin number and the slope and intercept were allowed to vary by language. The bin number significantly improved the fit of the model (Log likelihood difference = 5.12, Chi squared = 10.2, df = 1, p=0.001). This suggests that the distribution is not flat (i.e. there are more words for some colour ranges than others). However, allowing different languages to have their own fit did not significantly improve the model (Log likelihood difference = 0.65, Chi squared = 1.3, df = 2, p = 0.52). This suggests that languages do not differ significantly in the distribution of colour words.

However, I also noticed that the distribution looks very similar to the Just Noticeable Difference (JND) curve for colour. Human eyes are not uniformly sensitive to colour. We can distinguish colours better at some ranges than others. Below is the distribution for colour terms from the English Wikipedia site, with an overlay of the human JND curve (from Long et al., 2006).

You'll notice that the curve is a very good fit. Indeed, the two distributions are correlated (r=0.6, df=18, p=0.005). That is, the distribution of colour words may not be uniform over the physical spectrum, but it is pretty even across the perceptual spectrum. Put another way, humans have lots of words for ranges of the spectrum that they are good at discerning.

For 8 languages, the number of colour categories and the JND are correlated (r = 0.28, df = 94, p = 0.005), and more so for all non-monochromatic colours (r = 0.3, df = 93, p = 0.002814).

A mixed effects model shows that the perceptually normalised number of colours (num
colours/JND) are still significantly skewed (Log Likelihood difference = 10.22 Chi square = 20.4234 p< 0.001). But this skew is not much different between languages (Log Likelihood difference = 1.95 chi square = 3.8, p= 0.14). (The p values drop to 0.0002 and 0.55 when considering non-monochromatic colours)

This suggests that there is still a non-perceptually motivated colour term distribution

Long F, Yang Z, & Purves D (2006). Spectral statistics in natural scenes predict hue, saturation, and brightness. Proceedings of the National Academy of Sciences of the United States of America, 103 (15), 6013-8 PMID: 16595630

Monday, 8 February 2010

How many words for Red?

Just how different are languages in the way they label colours? Since Berlin & Kay's 1969 study of colour terms in many languages, the debate over cross-linguistic similarities has raged. Interestingly, most of the subjects in Berlin & Kay's experiments were bilingual, but they didn't think that other languages could influence the results of individuals.

In the last few years, Dr Panos Athanasopoulos at the ESRC Centre for Research on Bilingualism, Bangor, Wales, has been investigating colour perception in Bilinguals. In 2009, Dr. Athanasopoulos studied Greek-English bilinguals. Greek makes a distinction between dark blue ble and light blue ghalazio. Results suggested that bilinguals' perceptions shift towards those of native speakers of their second language. The study is set to be extended into Japanese this year by two forthcoming publications (in Bilingualism: Language and Cognition and Language and Bilingual Cognition).

Last week I was thinking about models of colour terms and bilinguals. What do we mean when we say someone is 'bilingual'? On a syntactic level, this may be a bit easier to answer, but on the lexical level (where I am at the moment) is a bit more difficult. For example, what's the difference between an having two 'languages', and having one 'language', but many words for the same category?

My current suggestion is that bilinguals differ from monolinguals because they have more sets of categories that span their entire perceptual field. What does that mean? Imagine the colour spectrum (red -> yellow -> green -> blue). A monolingual will be able to label any point in the spectrum with one word, and some points with more than one. For instance, a certain red may be 'red' and 'crimson'. However, a bilingual will be able to label any point with more than one word. That is, the spread of their synonyms will be more even.

Here's an illustration: Below is how a monolingual might break up the colour spectrum. It has several categories in its memory (the colour 'E' spans from red to yellow, 'C' covers greens and 'A' covers a specific kind of green, etc.):

The bottom of the image is labelled 'bin packing'. This represents the results of the bin packing algorithm, which tries to fit the categories into the smallest space possible. The monolingual above has lots of wasted space (striped areas) because it has lots of synonyms for a few categories.

The next illustration is of a bilingual, but note I haven't specified which 'language' a category belongs to. They have synonyms too, but they pack much more efficiently. In fact the speaker below has two options with which to describe any colour. In contrast, the monolingual has only one complete system.

Ok, these examples are set up. But I still predict that bin packing of bilingual categories will be more efficient than for monolingual categories. I'll use this metric to examine the results of a model in an upcoming post.

However, this metric assumes that colour categories within a language are unevenly dispersed. That is, speakers know many words for some colours, but not others. Let's put this to the test. The problem with doing so is that most colour category experiments involve getting people to assign colours to labels, meaning that they can't declare two labels for the same colour. So, I went to the Wikipedia List of Colours page which lists details of all colours mentioned in technical articles on colour. These are mainly standardised names for use in HTML, which is a problem for our current analysis, but let's see what happens.

The names for colours in English and other langauges were gathered along with the Hue angle (i.e. position in the spectrum). Below is the histogram of the number of colour terms in different portions of the spectrum (coloured by average colour of the colour terms, taking saturation and brightness into account) for English, followed by the histograms for some other languages:

First, the histogram for English is certainly not even. There are more names for reds and blues than for greens. This works intuitively - how many types of red can you think of in comparison to types of green?

Let's look cross-linguistically: First of all, the histograms are not identical - the Wikipedia pages are not just translations. Second, they all seem to have lots of names for reds. This may be an artefact of the circular meaning space (from 0 degrees to 360 degrees), but would not entierly explain the imbalances. Let's do an ANOVA of the number of colour terms per section of the spectrum (splitting into 10 even sections) by section position and language.

Colour names were not evenly distributed within languages (F(1,64)=11.35, p<0.01), but were significantly differently distributed across languages (F(7,64)=6.31, p<0.001). Having said this, I'm not completely sure of my stats here.

However, if the analysis is correct, then bilinguals should have a better packing efficiency than monolinguals because acquiring a whole extra 'language' is more likely to normalise the distribution rather than increase the skewness. This may be a useful metric in the analysis of feature-level models of bilingualism.

Friday, 5 February 2010

Linguistic Leniancy

I was once on holiday with a motley crew of tourists when a guy called Dave decided to show off his linguistic skills. He'd lived in China for a few months, and picked up some Mandarin, which he proceeded to flaunt in front two girls from Hong-Kong. However, no matter how hard he tried, the girls had no idea what Dave was saying. After he explained in English what he was trying to convey, the girls did recognise the traces of meaning. He was partly faulty on tone (notoriously difficult for foreigners) and partly on pronunciation.

Dave was very disappointed and a bit embarrassed, but also confused. How had he managed to order all those meals, to greet all those people, to have all those conversations, if he couldn't actually be understood? One possible answer is, instead of Dave learning Chinese, the Chinese had learned Dave. That is, after a few interactions and a lot of contextual help, they had understood that a particular incoherent sound meant that he wanted some noodles/was asking the time/was saying hello.

Recently I've been thinking about what happens when two populations with different languages mix. Several studies have considered this using computational models. Several suggest that one language usually takes over, and that bilingualism is not stable. What the Dave's example suggests is that speakers are extremely forgiving when dealing with people who do not speak their language, especially when gesture and context can make most meanings clear. It's also a good example of the fact that people can communicate despite large discrepancies in their mental representations.

Tuesday, 2 February 2010

Pun of the Day #7

L: I need more washing powder... did I say that already?
Me: Yes, you're experiencing déshàmpoo.