The Adventures of Auck: November 2009

Monday 30 November 2009

Where the Welsh Things Are

Maurice Sendak's classic children's story 'Where the Wild Things Are' has been made into a movie and is released in the UK this week. I, like millions of other children, read this as a child. However, I will feel an extra level of betrayal to those who don't agree with the movie changes to the voices/plot/characters, because I read it initially in Welsh.

The welsh speaking characters in 'Gwlad y pethau Gwyllt' always felt closer, more special to me, in a way the characters in English books did not. Indeed, they may have been more dear to me literally - I discovered that my Welsh version of 'Where the Wild things Are' is a collectable now, selling for $450 dollars on Amazon! I should really try to find it...

It's not the first time this has happened. I remember seeing Superted in English for the first time (Superted was originally in Welsh), and feeling like he had abandoned Wales and the fight to 'achyb yr iaith' (rescue the language). I also remember being surprised that my English speaking friends also knew about Fireman Sam.

I wonder if other children around the world have been similarly confused. In the end, I guess I shouldn't have been that surprised that my childhood heroes were bilingual, but why did they never code-switch?

Talking of code-switching cartoons, I can't help including possibly the most complicated example of code-switching ever, from Disney's The Prince Of Egypt:

Friday 27 November 2009

Conceptions of bilingualism in Canada

I just finished watching the documentary 'Incident at Restigouche' about tensions between the Canadian native Micmac Indians and the Quebec government over fishing rights in 1981. The Quebec authorities raided the local reserve because, in their eyes, they were over-fishing the river. The Micmac deny this, pointing out that sport fishing alone took a greater number of fish each year. I was linked by The AQ's blog on the Micmac language and identity, here.

Several things made me upset. First was the labelling of the Micmac as 'not bilingual' because of speaking English and Micmac, not English and French.

Secondly, I was particularly struck by what Quebec Minister of Fisheries, Lucien Lessard, was heard to say:

"You cannot ask for sovereignty, because to have sovereignty, one must have one's own culture, language and land."

First of all, the Micmac have all of these, so it's not clear what on earth Lessard was thinking. It's clear, though, that people's conceptions of other languages and how they relate to culture and sovereignty can be radically different.

Tuesday 24 November 2009

Bayesian Bilingualism

I've been wondering about Bayesian models of language learning and bilingualism. Models such as Griffiths & Kalish (2005) assume learners have probabilities for hypotheses of the structure of a language in a large hypothesis space, based on utterances heard. The posterior probability represents the learner’s model of a speaker’s language (compatible with a view of trying to learn the parents’ Medium). Two methods drive convergence to a best hypothesis in the learner: The MAP (maximum a posteriori) process assumes the maximally probable ‘language’ and only produces strings created by that ‘language’. The sampling approach (SAM) does not rule out any nonzero probability hypothesis and may produce mixed strings occasionally.

In a monolingual environment, MAP should be most efficient, but SAM is better for acquiring more than one language. A sampling approach also models observations of better task switching but worse inhibition in bilinguals than monolinguals. This may be another factor in the differences between monolingual and bilingual development.

However, I'm not completely sure about the maths, and suspect that MAP can define a best hypothesis over any number of 'languages', so they may be equivalent.

Wednesday 18 November 2009

Bilingualism in Singapore

Singapore is certainly multilingual. It has four official languages, and Ethnologue catalogues 21 different languages and dialects, all within about 5 million people.

The latest post is on Language Log is on Bilingualism in Singapore, charting the dubious theorising of Minister Lee Kuan Yew (for more on Lee, see LeeWatch.info). Lee essentially forged an education plan based on the idea that people only have so much 'space' to store languages, and so bilingualism can only be bad. Surprisingly, this was not so far from well received Linguistic theory until Martin-Jones and Romanie (1986) argued it was 'half-baked'.

However, the minister seems to have recently changed his mind about bilingual education for the better. Maybe it'll trickle down to the guy I met the other day who congratulated me on my very good English despite my Welsh medium education.

Tuesday 17 November 2009

Laughter

Last week I heard someone laugh, and I thought it was such a good laugh that I would use it from then on. It was sort of halfway between a cackle and a guffaw - definitely a mocking, cruel, delighted burst of air.

This reminded me of another concious adoption of a cultural trait that I read about recently from Papua New Guinea: McElhanon describes a community meeting in which one group decided to change their word for ‘no’ in order to distinguish themselves from another group (Kulick, 1992). Although there are many examples of people changing to conform, it's not often you find such an organised move away from the norm. I couldn't find out how successful the change was, though.

Surprisingly, my own adoption seems to have worked, and I now involuntarily use my new laugh quite a lot. Laughter, it seems, is infectious.

Sunday 15 November 2009

Check, Fold, Race

Saturday 14 November 2009

Edinburgh Skyline

Me and Keelin just made a stencil of the Edinburgh skyline from Princes Street. I was really pleased with it, and will definitely use sponging again. The whole thing only took about 4 hours.

Here's the original panorama. I cut out the boring trees in the middle.

The contrast on the castle wasn't great, so I ended up splicing in a separate image:

Desaturate, boost contrast, find edges, print:

Cut out!

Stencil Ready:

Sponge acrylic paint onto wall:

Stencil done!

Friday 13 November 2009

Ha Long Time Coming

David Graddol predicted in 1996 (theory quantified by Lupyan and Dale, 2009) that, as more people learned English as a second language, native English speakers would lose their grip on the language. Indeed, there are probably far more second language speakers of English now than native speakers, so the non-natives have the power to change the language to suit themselves.

This was highlighted on a friend's travel blog recently (I've been enjoying living vicariously, especially when there are puns involved in the titles). The latest post in Eric's South East Asia blog finds the protagonist on board a ship, floating between the myriad of islands in Ha Long Bay, Vietnam. Incidentally, I have to agree that it is one of the most beautiful areas I have ever seen. Having signed up for an English tour, Eric is surprised to find that he is the only native speaker of English. However, he found himself translating between 'versions' of English. There seemed to be non-reciprocal intelligibility between them: Eric could understand them all, but they had difficulty translating between themselves. Indeed, one traveller resorts to using his iPhone for translation (see my post on Lingua Tecnologia).

So it seems that, instead of a sprawling continent, English may be eroded by the seas of time into thousands of tiny little islands.

Thursday 12 November 2009

Ghost in the Shell

This was my favourite stencil. Unfortunately, the original got lost in a move and I haven't had the heart to do another. It's inspired by Shirow Masamune's original Ghost in the Shell comic. In this scene, the hero glimpses the birth of a new, intelligent life form. The thing in the middle is some kind of neural network, and descending onto it is an angel (the shape pointing down is the shade on a foot, some people don't notice it at first, but after it's pointed out, you can't help seeing it).

The idea is similar to the Sistine Chapel ceiling fresco, where God and Adam reach out towards each other. Michaelangelo captures the question of man's relationship with God. Masamune questions the relationship between man's body and man's spirit.

The medium here also hints at a direction for answers: Stencils are made up of use small, abstract, isolated shapes. However, in a particular configuration, they create an impression of a unified image to human perceivers. However, not all shapes are allowable - you can't have a stencil with free standing opaque parts (e.g. the outline of a full circle). In fact, even convex shapes decrease the integrity of the stencil. So, even as the whole influences how one interprets the parts, the parts influence what that whole can be.

Wednesday 11 November 2009

Codeswitching as a Move to Markedness

One advantage of having two languages is having an extra tool with which to avoid ambiguity. For example, in English, ‘Thirteen’ and ‘Thirty’ are often confused, while in German ‘dreizehn’ and ‘dreissig’ are more different, while in Chinese ‘三十’ and ‘十三’ are very different. Montanari (2008, pp. 622) gives an example of this tactic in a trilingual child (KAT) interacting with their grandmother (GRA) in Spanish and Tagalog:

%sit : KAT and GRA are engaged in book reading
*KAT : [‘ota].
%gls : pelota
%eng : ball in Spanish
*GRA : ¿botas ? zapatos ? zapatos.
%eng : boots ? shoes ? shoes.
*KAT : bola bola !
%eng : ball in Tagalog
*GRA : ah la pelota ahí detrá s, ahí está la pelota.
%eng : ah the ball right behind, there is the ball.

Because the child cannot pronounce the ‘pel’ of ‘peloa’ (ball), their attempt is confused with ‘botas’ (shoes). Instead of attempting the word again, or using pragmatics, the child uses the word in a different language. This makes it easier to pronounce and thus easier to understand. Perhaps, then, some codeswitching can be accounted for by this tactic.

One might assume that the optimal strategy, given two different languages, is to switch at every word. However, individual languages tend to display a move to markedness (Shillcock, Hick, Cairns, Chater & Levy, 1995). This principle is ‘that when consonant interactions introduce phonological ambiguity, the ambiguity introduced is always in the direction of a less frequent phoneme’ (Tamariz & Shillcock, 2001). That is, frequently occurring words should be optimised for pronunciation within a language, while words from another language will be free from this pressure. This suggests that frequent constructions (e.g. Noun Phrases) should be most salient in the same language. However, at larger phrase/constituent boundaries, where the probability of words co-occurring is less, words from other languages may be more salient. Code-switching phenomena such as Myer-Scotton’s embedded language frames may fall out of this interaction.

A modelling approach could be used to investigate this. A list of cognates and sentence templates in two languages will be required. Sentence templates will be filled with words from either language, based on maximising the phonetic distinctness of the sentence. This will be calculated using Markov Chain assumptions, with words as nodes and transition costs as the phonetic difference between the last phone of the current word and the first phone of the next word. To model this for children, extra costs could be imposed on transitions to words with complex consonant clusters.

This will produce sentences which are maximally phonetically distinct. Inferences about the choice of language could be drawn over many sentences and many sentence types, with particular attention being paid to constituent boundaries.

Tuesday 10 November 2009

Mixing into weaker language

Studies above show that bilingual children can differentiate between their languages and show sensitivity to their interlocutor’s linguistic abilities from a very early age. Yet it is still implied that mixing occurs for qualitatively different reasons to adults. For instance, Cantone & Müller (2007) suggest that children mix more often into their weaker language. However, adults also have lexical gaps in weaker languages. The assumption of separate lexicons is weakened in a draft of Cantone & Müller’s paper:

“… one word und two word utterances …” (Cantone & Müller, 2007b, p.8, my emphasis).

Cantone & Müller’s theory could explain this mixing – that is, they are mixing into the language they are less ‘ready’ to speak (both authors are stronger speakers of German). The example is harsh, but demonstrates that it is unclear why their theory applies to developing children alone. The observation that both children and adults mix more into a weaker language is not surprising, and may only be a quantitative difference.

Monday 9 November 2009

Modeling Bilingualism

When children are brought up speaking two languages, they often go through a stage of 'mixing' where they appear to be unable to separate their languages. For instance, a Welsh word might be inserted into an English sentence: As an example, when I first realised the implications of death, my parents told me that I cried and said "I don't want to go into the pridd" (earth, dirt).

Several theories have been put forward to explain this. Firstly, I may simply not have known the word for 'dirt', and had to rely on a word in another language. Back then, Welsh was probably my stronger language, so this would be an example of mixing into my weaker language. Alternatively, I had not yet learned to tell the difference properly between Welsh and English.

However, both my parents speak Welsh and both languages are used, probably with quite a lot of mixing. Therefore, I may have known the English word, and been aware that I was mixing, but I knew that using a bilingual code was permissible, given my interlocutors.

Indeed, Montanari (2008) finds that the child she studies mixes some words even when they know the word in the language of context. Does this suggest, then, that the child simply didn't know which words belonged to which language? I argue that this isn't necessarily the case.

Adults mix their languages for many reasons. In fact, it's often difficult to decide which language a word belongs to without a lot of context (e.g. 'zeitgeist'). Let's forget about languages for a minute and ask 'to what extent has the child acquired the communicative code of its parents'? By this, I mean how closely does the child's output mirror the parent's input?

To do this, let's look at Quay's (2008) study of a trilingual child. Japanese is the language of the environment, the father is strongest in English and also speaks Japanese and the mother strongest in Chinese and also speaks English and Japanese. Weekly recordings were made from 1;10 to 2;4 years. The utterances of both the child and the parents were coded along with the addressee. The summary of the data is very detailed - containing the proportions of mixing between any two people in Japanese/English, Japanese/Chinese, Chinese/English and Japanese/Chinese/English.

Let's model the child's mixing proportions as a function of the parent's mixing proportions. Each cell in the table below contains the correlation between the model’s predictions and the child’s actual mixing proportions. The first two models use the mother and father’s data separately. The third model is an additive model which combines the parents’ utterances and the fourth uses the difference between the parents’ mixed utterance types. The difference model was provided as a conceivable, but unlikely model. The correlations in the first column correspond to a model using the total input, whereas the last two columns correspond to a model using only utterances directed to the child (direct) and utterances directed to the other parent (indirect).

Although the mother spends more time with the child than the father, the total mixing behaviour of the child is equally predicted by the mother and the father. However, the best model is an additive model of the direct utterances to the child. That is, the child's output is closest to a model which tries to imitate the mixing behaviour of both parents.

Interestingly, the highest correlation between the mixing proportions is between the parents (0.999), which is nearly perfect. Perhaps, then, the child is simply trying to acquire the adult’s mixing strategies or 'Code'.

We can look at the data in more detail by calculating the correlations between mixing proportions for each interlocutor separately:

When addressing the mother, the child's mixing proportions reflect the mother’s total mixing proportions better than the father’s and vice versa, indicating pragmatic differentiation to each parents’ mixing. When addressing the father, the child’s mixing proportions reflect the mother’s indirect input. This could indicate that the child is mimicking the mother’s interaction with the father. The opposite isn't true, but any mimicry may be masked since the child spends so much time alone with the mother.

These two analyses conclude that the child’s mixing reflects the mixing of the parents from a very young age. Modelling allows us to gain extra insights on the potential learning mechanism for the child, but it relies on detailed data, as in Quay (2008). The model could be taken further to include considerations of location, the societal status of each language and the parent's tactics (Negative evidence, implicit allowance of mixing, teaching of translation equivalents etc.).

Now for the ambitious, unfounded part: Considering a communicative code, there may be no qualatative difference between mono- and bi-lingual language acquisition. How, then, do bilinguals select words? One possible solution is to use a sort of Bayesian probability distribution over the linguistic, social and pragmatic contexts for each word that represents the best estimation of when to use a word. If a mapping between words and pragmatic and social contexts is acquired, a discrete mapping between words and ‘languages’ becomes irrelevant. This approach works equally well for acquiring one ‘language’, or several levels of tone or dialect.

In this sense, the ‘remarkable’ ability to keep languages separate (Costa & Santesteban, 2004) seems less remarkable and less specific to bilinguals: We don’t find it remarkable that an adult refrains from using terms of endearment during a boardroom speech.

This approach would be extended to syntactic acquisition by assuming that, as the mapping between words and meanings developed, strings of words themselves became a context which was encodable in the probability distributions of words. This is essentially a constructivist approach to bilingual acquisition: Before linguistic acquisition, infants first learn an embodied perceptual ‘language’ – an iconic mapping between form and meaning – which allows them to relate structure in the world to an interaction between sensory and motor activity. The mapping between structure in the world and symbolic, linguistic representations would build itself on top of this system in the same way as syntactic (Bernardini & Schlyter) and lexical (Nicoladis & Secco) acquisition can build on pre-existing structures.

Following from this, the ‘difficult’ bit of language acquisition is not the segmentation of strings into words or words into lexicons, but the initial segmentation of the world into functional concepts. The development of this more fundamental understanding of the world may be an additional factor in the qualitative differences between mixing in children and adults.

Sunday 8 November 2009

Languages and Poetry

Languages

by Carl Sandburg

There are no handles upon a language

Whereby men take hold of it

And mark it with signs for its remembrance.

It is a river, this language,

Once in a thousand years

Breaking a new course

Changing its way to the ocean.

It is mountain effluvia

Moving to valleys

And from nation to nation

Crossing borders and mixing.

Languages die like rivers.

Words wrapped round your tongue today

And broken to shape of thought

Between your teeth and lips speaking

Now and today

Shall be faded hieroglyphics

Ten thousand years from now.

Sing—and singing—remember

Your song dies and changes

And is not here to-morrow

Any more than the wind

Blowing ten thousand years ago.

Good point, Carl. However, poetry may be a particularly bad way to make points about language change, as Paul Valery says in The Art of Poetry, "poetry can be recognised by its ability to get us to reproduce it in its own form: it stimulates us to reconstruct it identically."

On the other hand, although poetry has a small transmission error in terms of phonetic reproduction, the fidelity of conceptual interpretation may be a different story. Show me a class of high school English Literature students, and I'll show you eleven different, badly written interpretations.

Friday 6 November 2009

Lazy Linking Friday

Time for some lazy linking!

I've been listening to Archers of Loaf recently - especially the incredible ending to their penultimate album, White Trash Heroes. This is likely to be in my collection forever.

I have been utterly captivated by guitarist hironou2525: Videos of beautiful guitar playing together with links to mp3s and very high quality tabs. Obviously someone who knows a thing or two about cultural transmission. They have a blog here, although I couldn't read it. I especially liked I do:

The Speculative Grammarian is a collection of linguistic satire. There is a load of stuff there, but this set of puns caught my eye.

This recent paper by Novembre et al. is interesting - Genetic distance between people is correlated with the geographic distance between them. In fact, a PCA graph draws a pretty good map of Europe. I was especially interested to see the analysis of Swiss genes - they divide on primary language!

While Linguistic Exogamy (marrying someone outside your linguistic group) is common in liguistically diverse areas of the world (Papua New Guinea, Amazon basin) this analysis may suggest it wasn't practised so much in Europe.

Thursday 5 November 2009

Levels of Bilingualism

How many people in the world speak more than one language? Probably the vast majority. In the USA it's estimated at 18% (US Census Bureau), in Canada it's about 34% (Statistics Canada) and in the EU it's about 66% (European Commission). But getting data is hard - even in countries with the infrastructure to support a large scale census, the issue of bilingualism is often not prioritised. The metric of number of languages spoken in each country (linguistic density) has been used (e.g. Nettle, 1999), as well as the number of neighbour groups (Lupyan & Dale, 2009) and is probably correlated, but is not the same as bilingualism.

So, maybe we can estimate a different way. The Ethnologue has data on the estimated number of speakers for each language within a country, along with the number of people in a country. Subtracting the number of speakers from the number of people gives, in theory, the maximum number of bilinguals in a country.

Maximum Number of Bilinguals =
total number of speakers for all languages – total number of people

For example, if a country has 1 million people, and 500,000 speakers of language A and 750,000 speakers of language B, then 250,000 must be bilingual (if there are no other languages spoken). The figure below shows the ratio of speakers to people with darker areas indicating higher levels of bilingualism (data from Ethnologue, created with R):

click for larger image

As expected, the data is not good enough to warrant a proper analysis. The number of speakers is underestimated (total population of world = 6 billion, total number of speakers = 5.7 billion). 12% of entries in the ethnologue have no population data and for more than half of the countries the number of speakers is less than the number of people. One exception was Saudi Arabia, with a ratio of 9.4, possibly because 23% of the population are foreign nationals or, more intriguingly, because the majority of the population were nomadic until the 1960s.

At any rate, there appears to be no correlation with latitude (r= -0.1, t = -1.4, df = 197, p-value = 0.15) or longitude (r = -0.01, t = -0.28, df = 198, p-value = 0.8).

Ah well, back to counting people instead of numbers.
Gary Lupyan, Rick Dale (0). Linguistic Structure is Partly Determined by Social
Structure in Press

Wednesday 4 November 2009

Genes vs Language

A point about the difference between genetic and linguistic inheritance:

Tuesday 3 November 2009

OneKind

I can count the number of ad banners I have clicked on one mouse. But I had to click on this one:

[picture of cat and human] - "We're not that different, if you believe we're all OneKind".

As a linguist trying to understand the genetic basis of language in humans, I immediately thought "Uh-Oh".

The link took me to a sign up sheet with a picture of a dog and Paul O'Grady. This is what it said:

"Feelings. We all have them and so do animals. 9 out of 10 people agree. We’re all OneKind."

Amazing! A whole new way to approach the study of cognition in animals - tap into the 'Wisdom of Crowds' collective subconscious of the masses. It continued:

"Can an animal feel lonely, can an animal feel scared, can an animal feel pain? Common sense and experience have long implied that animals are capable of feeling. New research reveals that 91% of people believe this."

Who could argue with those statistics? I've always said that science gets things wrong sometimes - look at the classification of the tomato as a fruit! Surely this is having an adverse affect? Yes, OneKind tells me - Seals are having a bad time in Scotland, there are snares in the world and scientists are running experiments with monkeys.

What are OneKind doing about this? The only answer on the whole site? A petition! Which doesn't seem to be being sent anywhere!

"I believe that animals can feel.
I believe we’re all OneKind.
HumanKind.
AnimalKind.
OneKind."

I emphatically hope that my intrigued ad-click did not boost the ratings of a brainwashing cult. Not even brainwashing - just a group of people saying a meaningless statement, a sentence with a made up word, then three categories. And probably getting money and sympathy for it.

As an evolutionary linguist, I am very happy with the idea that we share many cognitive abilities with animals, and I'm not sure whether we'll ever find out whether animals have what people would describe as feelings. And, of course, I believe that all organisms are probably related at some level. What really annoys me, as a linguist, is that this site is exploiting a strange hypocrisy that, as Nettle & Romanie (2000) point out, people care about some obscure endangered species without caring that whole human languages are going extinct at a comparative rate. I'll be keeping my ad clicking in check from now on.

Cross-Dimensional Linguistics

It's time for some Socio Linguistics! How many names do you have? What do you call other people? Would you use first names with some but never others?

So far, I have been Sean, Seanny, D, Lep and Monyn, but not yet Mr. Roberts.

Can we tell anything about people by the names they use? Let's look at a corpus. I have selected that pillar of linguistic research and popular 90s sci-fi tv program Sliders. Let's see what the four main characters call themselves while dimension hopping around alternative versions of San Fransisco and Los Angeles (from EarthPrime, with Regular Expressions):

In the graph above, Quin, Wade, Rembrandt and Professor Arturo are represented by dashed circles with the different names people use in squares. Arrows between a person and a name indicate that person using that name, with the relative frequency indicated by the thickness of the line. All arrows from the same origin sum to 100%.

So what can we learn? The professor and Wade have the most reciprocal relationship - they use each other's first name and title in similar proportions. This indicates a curtious respect.

The Professor and Quinn, however, have opposite proportions of first names and titles - this indicates that Quinn recognises his intellectual superior (although Quinn is much better, obviously).

Next, Quinn only calls Wade "Wade", while Wade also uses Quinn's first name the most, but also plays with "Q-Ball" (Rembrandt's favourite nickname for Quinn) and "Mr. Mallory". This pattern is typical of people who fancy each other, but, in the end, will never do anything about it.

The group's most complex relationship is with Rembrandt "Crying Man" Brown. Everyones uses his first name and affectionate diminutive "Remmy", while only the youngsters use his stage name and only the Professor uses "Mr. Brown". All this confusion is, lamentably, because even dimension hoppers can struggle socially around black people.

Monday 2 November 2009

K**g Fu

Sunday 1 November 2009

Etymologies

Where do our words come from? English is Germanic, right? Ok, but what about words like Cappuccino, Revolution and Smorgasbord. Well, those were just 'borrowed' - they don't really count, since we're intending to give them back. But how valid is this view? Over the centuries, speakers have adopted words from all over the place, yet the diversity of the sources of words is under-appreciated.

Sounds like a job for ... etymology!

The general view of languages is that they are related like a family tree. English is seen as a Germanic language, along with Dutch and Flemish, while Welsh is seen as a Celtic language along with Irish and Cornish. The tree diagram below shows this idea, and gives the impression that the last 'common ancestor' of English and Welsh was way-back Proto-European:

However, this masks the complexity of languages and language change. A strict family tree marginalises the borrowing of words from other languages. For example, there are a huge number of 'English' words with roots in French, Italian and Spanish.

Hurford & Dediu (2009) encourage us to see languages as made up of sets of linguistic units (e.g. a word), each of which can have a separate ancestry. I wondered what this would look like, so I used the Online Etymology Dictionary to create one.

The Etymology Dictionary lists the heritage of English words, for example:

Cabin: 1549, from M.Fr. cabinet "small room," dim. of O.Fr. cabane "cabin" (see cabin); perhaps infl. by It. gabbinetto, dim. of gabbia, from L. cavea "stall, stoop, cage." Sense of "private room where advisors meet" (1607) led to modern political meaning (1644).

That is, the ancestry of 'Cabin' can be traced back through Middle French, Old French, Italian and Latin. Similarly, the word 'Tower' also comes from Latin, but via Old English. Crawling the website, the relationships for about 5000 words were processed. I used hypergraph to display them in an interactive hyperbolic graph. You can play about with it below, or visit here. Click and drag portions of the graph on the edges closer to the middle to explore. For some reason, it starts off zoomed in on Latin, but there's a lot of detail to the right (see here for abbreviations).

For ease of presentation, the graph is simplifed, with lineages of words between 'languages' first going through a language node. Also, Modern English words are not represented, but all contained within the 'Mod.Eng.' node.

Some bits of the graph are tree-like: Words with roots in Middle High German are only borrowed through (New High) German. However, in general, the graph is not tree-like at all. The lineages of English words have all sorts of routes through earlier languages. For example, words can come from Greek via German or French. And this is only for English words. Imagine etymological data from German and French was added.

Ok, so the graph is pretty useless for research - it's just way too complicated (part of the problem is that hypergraph is designed for trees). What I'm aiming at is questioning the idea of a 'language' as a stable set cut off from other 'languages'. We don't inherit a 'dictionary' from just two individuals, like our genes; we pick up individual words from a wide range of sources, and keep adding, borrowing and changing them throughout our lifetime.