Bartender
Last week, Shashi Borade was asking a few people whether they found the following joke to be funny: C.E. Shannon goes into a bar and orders a beer. The bartender says, "I don't see where this joke is going." Shannon replies, "It's an open problem."
I thought it was mildly amusing because of the breaking of the fourth wall. Others did not agree.
At this point, you must be thinking, "I don't see where this post is going."
Some people find the joke funny and some do not, but it is not a noisy measurement, so averaging across subjects does not make sense. A paper that I was flipping through recently develops a probability model for exactly this type of individual variation. I reproduce the first paragraph of "Modeling Individual Differences Using Dirichlet Processes," by Navarro, Griffiths, Steyvers, and Lee (2005) here: Suppose we asked one hundred people which number was the most unlucky. Of those people, fifty said ‘13’, forty said ‘4’, and ten said ‘87’. This variation is unlikely to be due to noise in the cognitive process by which people make unluckiness judgments: If we replicated the experiment with the same people, the same fifty people would probably say 13 again. It seems much more likely that most of the observed variation arises from genuine differences in what those people believe. A complete explanation of people’s answers would have to account for this variation.
As I have become aware through the course I am taking this semester, linguists (syntacticians in particular) are sometimes criticized for getting data from just one speaker of a language and then basing theories on that data. The common thinking is that data must be gathered from many subjects to be reliable.
This contention was somewhat implicit in a workshop last week entitled "Where Does Syntax Come From? Have We All Been Wrong?" co-organized by Bob Berwick. One of the speakers, Christopher Manning, talked about learning language from very large corpora, such as a corpus of Wall Street Journal articles, but his probability models did not seem to be mixture models, which would be necessary to capture individual variation. Noam Chomsky, in his talk, basically brushed off statistics within his first two sentences.
The 'LIDS-iest' of the talks during the workshop was given by Partha Niyogi, who applied learning theory to language acquisition and evolution. The starting point for the talk was that the structure of all languages is the same except for certain parameters that may be set differently. The running example of a parameter was whether a language is head-initial or head-final. In English, the tree-structure of a sentence is something like this:
.
In Hindi-Urdu, the tree-structure is something like this:
.
As anyone familiar with graph theory knows, the two trees are the same, but are just drawn differently. How to draw the tree is the parameter.
As another example, consider the two sentences: (1) Norbert thinks that he is a genius; (2) Norbert thinks that himself is a genius. Sentence (1) is grammatical, but he cannot refer to Norbert. In sentence (2) himself refers to Norbert, but is an ill-formed sentence. What we should really have is the sentence: (3) Norbert thinks that heself is a genius. A parameter called the nominative island condition prevents English from having a sentence like (3), but Chinese has the parameter set differently. In Chinese, the word taziji is used for heself.
Niyogi provided analysis which explains that if an ideal learning algorithm is used to learn from a heterogeneous population of speakers -- most with parameter setting A and a few with parameter setting A' -- a language can evolve very quickly from having parameter setting A to having A'. One such historical change was the conversion from Old English, a head-final language, to modern English, a head-initial language. Slides from a similar talk are available here.
If I were clever, I would now tie together the penultimate paragraph about ideal learning algorithms and the opening gambit about the bartender. However, how to do so is an open problem.