Ultrafilters and Social Choice II

November 5, 2009 by Preno

Another example of the use of ultrafilters in social choice theory is provided by the Gibbard-Satterthwaite theorem, which states that any non-dictatorial election in which it is possible for any candidate to win is susceptible to tactical voting. That is, in some situations, a voter will achieve a better result, according to his genuine preferences, by not voting according to his genuine preferences. The Duggan-Schwartz theorem is a related result concerning elections with multiple winners.

More explicitly: we again have a set of voters V, each voting by imposing a linear order \prec_i on a set of alternatives A, |A|>3 (we will call any such V-tuple of linear orders a “profile”). This time, however, our goal is more modest: we only wish to pick a single winning alternative from A (i.e. provide a function c from the set of voter preferences \Pi to A, called the collective choice procedure). Our demands are:

  • every candidate can win, given a suitable set of voter preferences (non-imposition)
  • knowing the votes of other voters does not give anyone an incentive to change their own vote (stability / non-manipulability)

That is, given the preferences of all voters except for i, the winning alternative when i votes according to his sincere preferences is preferred by i to the alternatives which would win if i voted otherwise. We will demonstrate that again, any function which satisfies these two demands is dictatorial in the sense of simply returning the top preference of some particular voter. (The proof is based on the paper Stability of Aggregation Procedures, Ultrafilters and Simple games by P. Bateau, J.-M. Blin and B. Monjardet. I tried to make the argument as informal as possible without losing the rigour. For the definition of an ultrafilter, see the previous post.)

First, a quick ultrafilter lemma:

Lemma (a characterization of ultrafilters):

Every family \mathcal{F} of subsets of a non-empty set X satisfying the following conditions is an ultrafilter:

  • A \in \mathcal{F} ~ \& ~ A \subseteq B \subseteq X \Rightarrow B \in \mathcal{F}
  • A \in \mathcal{F} \Leftrightarrow X \backslash A \notin \mathcal{F}
  • A,B,C \in \mathcal{F} \Rightarrow A \cap B \cap C \neq \emptyset

Proof: if \emptyset \in \mathcal{F}, then we would have both X \in \mathcal{F} and X \notin \mathcal{F}, therefore \emptyset \notin \mathcal{F} (non-triviality). Now let us assume that A,B \in \mathcal{F}, but A \cap B \notin \mathcal{F}. Then X \setminus (A \cap B) \in \mathcal{F}, but A \cap (X \setminus (A \cap B)) \cap B = \emptyset. Therefore \mathcal{F} has the finite intersection property and is an ultrafilter.

Now, back to social choice theory. Let’s say that for a profile \pi, the collective choice is c(\pi)=x. We would like know what happens when some of the voters change their preferences. For notational convenience, let V_{xy} be the set of voters which prefer x to y.

Lemma (about stable collective choice):

For any stable collective choice procedure c, if c(\pi)=x and V_{xy}(\pi^\prime) \supseteq V_{xy}(\pi), then c(\pi^\prime) \neq y.

Proof: first, we will define a function \varphi_i^{xy}(\pi) which takes a profile as an argument and returns the set of all profiles which result from applying the follow operation to \pi:

  • if voter i prefers x to y in \pi, the pair x, y gets moved to the beginning of his ranking in this order in \pi^\prime
  • if voter i prefers y to x in \pi, the pair x, y gets moved to the beginning of his ranking in either order in \pi^\prime
  • every other ranking in the profile remains the same

In other words, x and y get moved to the front of \pi_i and x doesn’t drop in ranking relative to y. Likewise, \varphi^{xy} is the set of all profiles which result from successively applying this operation to every voter.

It is not difficult use the stability of c to prove that since c(\pi)=x, we have c[\varphi_i^{xy}(\pi)]=\{x\} (meaning: every profile which results from applying the operation to \pi picks out x). By applying this to each voter, we obtain the same result for \varphi^{xy}.

Now, by the hypothesis of this theorem, \varphi^{xy}(\pi^\prime) \subseteq \varphi^{xy}(\pi) (\pi^\prime places at least as many constraints on the operation as \pi). Thus, c[\varphi^{xy}(\pi^\prime)]=\{x\}. But if it were the case that c(\pi^\prime)=y for some y \neq x, we would have c[\varphi^{yx}(\pi^\prime)]=\{y\}, which is a contradiction, since \varphi^{xy}(\pi^\prime) and \varphi^{yx}(\pi^\prime) have a non-empty intersection.

Combining this lemma with the assumption of non-imposition yields the weak Pareto property: if everyone prefers x to y, y cannot be the collective choice.

Armed with these results, we can now explore the structure of “preventing families”. A set of voters S is said to be a member of the preventing family \mathcal{F}_{xy} for a collective choice procedure iff y can never get chosen by the procedure when all members of S prefer x to y (the members of S can prevent y from being chosen by unanimously preferring x).

Lemma (about preventing families):

  1. \mathcal{F}_{xy} is non-empty: trivial
  2. S\in\mathcal{F}_{xy} \Leftrightarrow \exists \pi: V_{xy}(\pi) = S \& c(\pi)=x: follows easily from the weak Pareto property
  3. S\in\mathcal{F}_{xy} \& S \subseteq T \Rightarrow T\in\mathcal{F}_{xy}: by the previous property and the theorem about collective choice
  4. S\in\mathcal{F}_{yx} \Leftrightarrow V\setminus S \notin \mathcal{F}_{xy}: easy application of the weak Pareto property

It would seem reasonable to demand that \mathcal{F}_{xy} be the same for every x, y. Fortunately, we do not have to, as the following theorem shows:

Lemma (neutrality of stable collective choice procedures):

\mathcal{F}_{xy} = \mathcal{F}_{zt} = \mathcal{F} for any x, y, z, t.

Proof: let us first show that \mathcal{F}_{xy} \subseteq \mathcal{F}_{xz} (hence \mathcal{F}_{xy} = \mathcal{F}_{xz}) for any x, y, z. Pick S\in\mathcal{F}_{xy} and T\in\mathcal{F}_{yz} and consider a profile \pi such that x, y, z are the top three choices of every voter and:

  • x >_i z >_i y for i \in S \setminus T
  • x >_i y >_i z for i \in S \cap T
  • y >_i z >_i x for i \in T \setminus S
  • z >_i y >_i x for i \in V \setminus (S \cup T)

Then by the weak Pareto property, c(\pi) \in \{x, y, z\}, but S \in \mathcal{F}_{xy} \Rightarrow c(\pi) \neq y and T \in \mathcal{F}_{yz} \Rightarrow c(\pi) \neq z. Thus, c(\pi)=x and by property (2) of preventing families, S \in \mathcal{F}_{xz}.

The same reasoning applied to S\in\mathcal{F}_{xy}, T\in\mathcal{F}_{zx} and the profile:

  • x \succ_i z \succ_i y for i \in S \setminus T
  • z \succ_i x \succ_i y for i \in S \cap T
  • y \succ_i z \succ_i x for i \in T \setminus S
  • y \succ_i x \succ_i z for i \in V \setminus (S \cup T)

shows that S\in\mathcal{F}_{zy}.

Applying this result to properties (3) and (4) of preventing families, we see that the unique preventing family \mathcal{F} in fact satisfies the first two conditions of the ultrafilter lemma. The last step of the proof of the Gibbard-Satterthwaite theorem will be to show that it also satisfies the last one.

Lemma:

The preventing family \mathcal{F} is an ultrafilter.

Proof: the first two conditions of the ultrafilter lemma are satisfied. As for the last one, let us assume that S, T, U \in \mathcal{F}, but S \cap T \cap U = \emptyset. We could then construct a profile \pi such that:

  • x \succ_i y for i \in S
  • y \succ_i z for i \in T
  • z \succ_i x for i \in U
  • x \succ_i y \succ_i z otherwise

By the weak Pareto property, c(\pi) \in \{x, y, z\}. But since S, T, U \in \mathcal{F}, c(\pi) \neq y, c(\pi) \neq z, c(\pi) \neq x.

In view of the fact that every ultrafilter on a finite set is principal (consists precisely of its subsets containing a given element), this concludes the proof of the Gibbard-Satterthwaite theorem: every stable, non-imposed collective choice procedure for at least three alternatives is dictatorial. Conversely, a free ultrafilter on an infinite set corresponds to a stable non-dictatorial collective choice procedure (or, more accurately, a collective choice function).

Ultrafilters and Social Choice

August 11, 2009 by Preno

I was somewhat suprised recently to learn that Arrow’s well-known impossibility theorem (which, as many of you will know, states that there is in general no way of aggregating individual preferences into a single ordering satisfying certain natural conditions) fails for infinitely many voters. If anything, shouldn’t it be more difficult to come up with a “fair” voting procedure for an infinite electorate? Well, as it turns out, the answer to this question has to do with an abstract notion often used in topology and model theory, that of an ultrafilter. For those already familiar with this notion, Arrow’s theorem essentially corresponds to the statement that every ultrafilter on a finite set is principal.

Let us first define a more general notion. A filter \mathcal{F} on a set X is a family of subsets of X such that:

  • \emptyset\notin \mathcal{F} (non-triviality)
  • A\in\mathcal{F} ~ \& ~ A\subseteq B \subseteq X \Rightarrow B \in\mathcal{F} (superset property)
  • A,B\in\mathcal{F} \Rightarrow A\cap B \in\mathcal{F} (finite intersection property)

An ultrafilter \mathcal{F} is a filter such that for all A\subseteq X:

  • A\in\mathcal{F} \vee X \backslash A\in\mathcal{F} (maximality)

By non-triviality and the finite intersection property, only one of these options can hold, hence every ultrafilter on X is maximal in the sense that it cannot be extended to a larger filter on X. A trivial example of an ultrafilter is provided by the family \mathcal{F}_a(X) of subsets of X which contain a given element a. Such ultrafilters are called “principal”. It is easy to show that every ultrafilter on a finite set is principal. On the other hand, the existence of non-principal ultrafilters (also called “free”) cannot be proven in ZF. Proving their existence requires adding a non-constructive choice principle to the axioms of ZF. In particular, assuming that every filter can be extended to an ultrafilter is equivalent to the Boolean prime ideal theorem, a principle strictly weaker than the Axiom of Choice but still independent of ZF. Applying this to, for example, the filter of cofinite subsets of an infinite set yields a free ultrafilter.

The intuitive interpretation of an (ultra)filter is that it specifies which subsets are to be considered “large”. A property holds for “almost all” elements of a set if the set of elements for which it holds is contained in the relevant (ultra)filter – for example, if its complement is finite. (This is a generalization of the measure-theoretic notion of “almost everywhere”, i.e. on a set of full measure. Indeed, an ultrafilter can be thought of as a finitely additive zero-one measure.) This interpretation suggests a natural connection to social choice theory – if “almost all” voters prefer A to B, we would ideally like to rank A higher than B.

And indeed, this is what Arrow’s conditions imply (the following proof is based on these notes by A. McLennan [pdf]). Arrow considers a situation in which we have a set of voters V, each linearly ordering a set of three or more alternatives A according to his preferences (we will symbolize the i-th voter’s ordering by \prec_i). The task is to find a reasonable procedure (“social welfare function”) for aggregating their preferences into a single linear ordering of social preference (\prec). The natural conditions which Arrow imposes on this procedure are:

  • If every voters prefers x to y, the society prefers x to y. (unanimity)
  • The social preference between x and y depends only on the voters’ preferences between x and y, i.e. not on their ranking of other alternatives. (independence of irrelevant alternatives)

We will say that a set of voters C_{xy} is decisive for x over y if, whenever all the voters in C_{xy} prefer x to y and the rest of the voters V \backslash C_{xy} prefer y to x, the society prefers x to y. First let us prove that if C is a decisive coalition for some x over some y, it is decisive for every alternative over any other. If C is decisive for x over y, then let us consider a set of voter preferences such that x \succ_i y \succ _i z for i \in C and y \succ_i z \succ_i x for i \in V \backslash C. Because C is decisive for x over y, we have x \succ y, unanimity implies y \succ z and transitivity means that x \succ z. Therefore (by IIA), C is also decisive for x over z. A symmetric argument shows that C is also decisive for t over z for any t,z.

We now wish to prove that the family of decisive sets does in fact form an ultrafilter. First, non-triviality immediately follows from unanimity. Second, the intersection property. Let us assume that C and D are decisive and consider the set of voter preferences defined thus:

  • x \succ_i z \succ_i y for i \in C \cap D
  • z \succ_i y \succ_i x for i \in C \backslash D
  • y \succ_i x \succ_i z for i \in D \backslash C
  • y \succ_i z \succ_i x for i \in V \backslash (C \cap D)

Then x \succ z by the decisiveness of D and z \succ y by the decisiveness of C, hence x \succ y by transitivity and C \cap D is decisive by IIA.

Third, maximality. Consider the set of preferences x \succ_i z \succ_i y for C and y \succ_i x \succ_i z for V \backslash C. If neither C nor V \backslash C were decisive, then x cannot be socially preferred to y, nor y to z, but by unanimity x \succ z.

And finally, the superset property follows from the fact that if C \subseteq D and C is decisive, V \backslash D cannot be decisive (by the finite intersection property and non-triviality).

If V is finite, we can conclude the proof of Arrow’s theorem by noting that every ultrafilter on V is principal, i.e. for some a \in V, \{a\} is a decisive coalition and the social preference is always simply a copy of a’s preferences. Therefore, every social welfare function for a finite number number of voters at least three alternatives which satisfies unanimity and IIA is dictatorial in this sense.

For an infinite number of voters, the existence of such a function cannot be disproven and in fact follows from the Axiom of Choice or the weaker Boolean Prime Ideal Theorem (although calling it a “procedure” is somewhat hyperbolic due to the non-constructive nature of free ultrafilters). If there are only two alternatives to choose from, Arrow’s result is complemented by May’s theorem, which states that simple majority voting is the canonical choice.

And an interesting, if only tangentially related factoid to end with: for exactly three choices and a large number of voters, the relative prevalence of the Condorcet effect (cycles in social preference which prevent us from applying a simple majority rule) tends to 1 - \frac{3}{\pi}\arccos\left(\frac{1}{\sqrt{3}}\right) \approx 9 \% according to this paper by G.-T. Guilbaud [pdf].

Thermodynamics and economics

February 22, 2009 by Preno

If you’ve ever studied thermodynamics, you too were probably struck by the sheer abstractness and generality of its laws/postulates. It manages to build a rich physical theory (or perhaps a “framework” is a better description) based merely on assuming the existence of a handful of functions and quantities with some very general properties. The natural question seems to be, is it possible to give those axioms an interpretation other than the intended one (where, for example, T roughly corresponds to our pre-theoretical notion of temperature)? If it is, one would (or at least I would) expect the alternative interpretation to deal with phenomena which are still statistical in nature (we assume that the fluctuations in different parts of the system are averaged out, spatiotemporally in thermodynamics), unidirectional and governed by a general class of quantitative relationships rather than one specific equation.

Well, I don’t know the first thing about economics, but it was the first thing to come to my mind. And it turns out my hunch was correct, there is indeed a portion of economics which can be axiomatized in essentially the same way as thermodynamics, as shown by E. Smith and D.K. Foley in their paper Classical thermodynamics and economic general equilibrium theory. (It may be worthwhile to think about how you would draw an analogy between the two before continuing.)

The role of thermodynamic systems in the formalism is taken over by the so-called quasi-linear neoclassical economies, which means we assume that each agent j is assigned a vector of n+1 commodities x (a commodity bundle) and tries to maximize his (quasiconcave, differentiable) ordinal utility function u^j by exchanging goods with others. Quasi-linearity requires that each agent’s utility depend on one of the commodities x^j_0 (called the “linear commodity” or “money”) as: u^j(x^j) = x^j_0 + \overline{u}^j(\overline{x}^j) (where x^j = (x^j_0,\overline{x}^j)). We will only be concerned with economic exchange, which corresponds to assuming the conservation of commodities (energy1). If any two such (groups of) agents are brought into “thermal contact” (are allowed to exchange their commodities), they will ideally keep trading their commodities until they reach a state of equilibrium where no two agents can both increase their utility by trading their commodities – a Pareto optimal state. The final Pareto optimal state given some initial endowment of goods is in general non-unique, but as it turns out, in a quasi-linear economy, the allocation of all non-linear commodities is the same for all such Pareto optimal allocations. In such an economic equilibrium, there is an economy-wide price for every commodity (otherwise, one could get richer by buying cheap and selling dear). And this price is uniquely defined by the distribution of commodities (it’s the common value of the agents’ marginal utilities at that distribution).

In fact, in a quasi-linear economy, we can aggregate the agents’ individual endowments and postulate that the equilibrium state maximizes their total utility from the non-linear goods, and the price p_i of commodity i is the intensive variable conjugate to the (obviously extensive) total amount of the commodity x_i in the economy. The total utility is equivalent to entropy, the total amount of commodity i is equivalent to the extensive variable X (say, internal energy), and the price of commodity i is equivalent to the (entropic) conjugate intensive variable to X (say, the inverse of the temperature). An economic reinterpretation can also be found for the Legendre transforms of internal energy (see the paper).


1 The authors draw an analogy between x_i and energy/volume, but given that there are n different commodities, an analogy between x_i and the number of moles n_i seems more natural to my (lay) eye, given the absence of any a priori difference between the non-linear commodities and the obvious analogy between the nature of the two quantities. It would also mean we don’t have to assume that the utility function is increasing in the commodity or worry about the third law of thermodynamics (I don’t see any meaningful way to interpret it economically and the authors don’t even mention it).

The truth about the Tunguska event

November 25, 2008 by Preno

Well, not really, I just like how the title makes me sound like a bit of a crackpot. But if you’ve ever wondered how the natives themselves interpreted the event

(The native Evenki are a Tungusic people related to the Manchu.)

The any-thesis: Hintikka vs. Chomsky

November 5, 2008 by Preno

A question which still remains from last time is the order in which we apply the rules. Well, the general principles proposed in [1] appear quite plausible: rules are to be applied to phrases in ‘higher’ (dominating) clauses before they are applied to phrases in ‘lower’ (subordinate) clauses, and if we have two different phrases in the same clause, they are to be applied to the first one first. These principles can be overridden by the special rules of precedence associated with specific words. In particular, (G.any) has priority over (G.not), (G.if), “modal rules” and, as claimed in [2], also over (G.or). (G.some) has priority over (G.not), (G.each) has priority over other quantifier rules and over (G.and) and (G.or). (G.every) is not assumed to be subject to a special ordering rule.

I mentioned all of those mostly so that you can, if you wish, think them through and come up with examples / counter-examples. The only ordering principles relevant here are those pertaining to any.

Let’s look at some examples from [1] (the asterisk marks ungrammaticality):

If any members contributes, I’ll be happy.

If every member contributes, I’ll be happy.

* If Jane wins, anybody is happy.

If Jane wins, everybody is happy.

Jane can win any match.

Jane can win every match.

* Jane has won any match.

Jane has won every match.

Jane hasn’t won any match.

Jane hasn’t won every match.

* Mary believes that Jane will win any match.

John has not dated any girl.

John has not dated every girl.

* Any girl has not been dated by John.

Every girl has not been dated by John.

I included a plenty of examples for a reason – try analyzing some of them for yourself to see how (G.any/every) along with the rules for preference determine the meaning of these sentences. Some of them are straightforward, some of them probably need some commentary. The difference in meaning between sentences like “Jane can win any/every match” is seen as a matter of talking about all possible matches vs. talking about actual matches only. Perhaps a clearer example is seen in [3]:

Any candidate will be considered on his merits.

Every candidate will be considered on his merits.

There is clearly a difference in meaning between, again involving a difference in the range of quantification. Either we first choose a particular future (the actual one) and quantify over individuals in it, or we first quantify over individuals in various possible futures and claim that all of them will be considered on their own merits (or would be if such a future were to occur). We can say that every candidate happens to be left-handed, but we can’t really say that any candidate happens to be left-handed. On the other hand, “any” is not assumed to have any special precedence over “intensional operators” like “hope”, hence in sentences like “Mary hopes that Jane will win any/every match” Hintikka claims that “the choice of the ‘world’ is absorbed to the intensional operator and hence precedes the choice of an individual involved either in (G.every) or (G.any)”.

You may have noticed a regularity in the un/grammaticality of these any constructions. In fact, studying such constructions led Hintikka to formulate the following principle, which he calls the any-thesis:

The word ‘any’ is acceptable (grammatical) in a given context X – and Y – Z if and only if an exchange of ‘any’ for ‘every’ results in a grammatical expression which is not identical in meaning with X – any Y – Z.

This is a well-formed condition, as the rules of game semantics for English unambiguously determine whether the sentences are equivalent or not (even when one of them is ungrammatical according to this rule). I’ll give you some time to ponder on the linguistic implications of this thesis. Meanwhile, a technical note: in [3], Hintikka relaxes the condition that the resulting expression be grammatical, as it can fail to be for completely unrelated reasons (for example not being a suitable antecedent for a pronoun, as in “if Bill has every donkey, he beats it”).

The obvious thing to notice is that in defining grammaticality in terms of meaning, it goes against the classical generativist idea of syntax (grammar) being independent of semantics. But even if semantics is taken into account, it is usually in the form of some simple, algorithmic rules. That is, we still have an algorithm which can tell us whether any given sentence belongs to the set of grammatical English sentences or not (such a set is then called “recursive”). In contrast, the any-thesis would imply that the set of grammatical English sentences is not recursive. Indeed, it is not even recursively enumerable! (That would mean having an algorithm which is guaranteed to confirm that a grammatical sentence indeed is grammatical but which is not guaranteed to terminate if it is not.)

Why? Well, the set of theorems of first-order predicate logic is not recursive (assuming we have at least one two-place predicate), and having a procedure to determine the logical equivalence of sentence of the form “if any X, then Y” and “if every X, then Y” would enable us to create a decision procedure for first-order predicate logic (details in [1]). So the set of grammatical sentences of the fragment of English we are considering is not recursive. What’s more, if we have a recursive test for sentences which would be grammatical if it were not for the any-thesis, then it cannot even be recursively enumerable – since there is a recursive enumeration of the ungrammatical sentences of our fragment (why?), it would contradict our weaker result, as a set is recursive iff it and its complement (in a recursive set) are both recursively enumerable (intuitively speaking, one could run both enumerations in parallel and every sentence is bound to pop up in one or the other). By a similar argument, it follows that the whole set of grammatical English sentences is not recursively enumerable either.

But this flatly contradicts the basic principles of generativism and what Hintikka calls the “recursive paradigm”. If the thesis is true, it follows that grammar cannot be conceived as based on generative rules as Chomsky would have us believe. And even if you disagree with this particular thesis (i.e. you disagree with some ordering principle proposed by Hintikka), it demonstrates that you cannot a priori exclude such rules from consideration, so there doesn’t appear to be any a priori reason to assume that the generativist enterprise can be successful and restrict ourselves to generativist accounts only, as we cannot rule out the existence of some similar semantic-based criterion of grammaticality. At the very least, the Chomskian must not only present valid counter-examples against the any-thesis, but also convincing arguments that such rules simply cannot operate in natural languages.

There is much more to discuss about game semantics and Hintikka, but I think I’ll end my attempt at an introduction here, for now at any rate. I hope it has been at least somewhat informative. If anyone’s interested in Hintikka’s polemic with Chomsky, check out article [2] (available in its entirety on Google Books). For a bit more about the “recursive” vs. “strategic” paradigm, try [4] (partly available online).

As for my opinion about this whole thing, I dunno. On the one hand, it seems unrealistic to assume that people would judge that grammaticality of complicated sentences including any by testing, on some level, their equivalence with the corresponding every construction. On the other hand, the concept of grammaticality partly breaks down when we’re dealing with sufficiently complicated sentences, so the set of “all” grammatical English sentences is to a large extent an artifact of the theory anyway, a matter of how we choose to extrapolate from the actual data which consists of grammaticality judgements of relatively simple sentences. In line with this, one way of interpreting the result (assuming the thesis holds for sentences of “normal” length) which is not completely unrealistic psychologically would be to assume that we do have a procedure which raises a red flag if it notices that the any construction is equivalent to the corresponding every construction and does so quite reliably for constructions of reasonable complexity and to extrapolate our notion of grammaticality from this. And Hintikka’s way of extrapolating seems quite elegant.


References (all papers by Hintikka):

[1] Quantifiers in Natural Languages: Some Logical Problems (1977), in Game-Theoretical Semantics

[2] On the Any-Thesis and the Methodology of Linguistics (1980), in Paradigms for Language Theory and Other Essays

[3] Rejoinder to Peacocke (1979), in Game-Theoretical Semantics

[4] Paradigms for Language Theory (1990), in Paradigms for Language Theory and Other Essays

Quantification in English

October 31, 2008 by Preno

The rules

Now that we have an idea of what game semantics looks like, let’s see how one could apply it to English. The natural rules that correspond to our rules for propositional connectives (\vee, \wedge, \neg, \rightarrow) can be adapted by simply replacing the formal symbol with the corresponding English conjunctive construction. This presupposes that we know how to get rid of pronominal reference from one conjunct to the other (or that we limit ourselves to some fragment of English where this can be done easily enough, for example we require that pronominal antecedents be proper names) and how to transform a negative sentence into a positive one (we’re going from the outside in, not generating something from the inside out). As for universal and existential quantification, the corresponding English words which Hintikka sets up analogical rules for are some / a(n) and every / any / each.

For existential quantification it is:

(G.some) When the game has reached a sentence of the form

X – some Y who Z – W,

a person may be chosen by myself. Let the proper name of that person be ‘b‘. Then the game is continued with respect to the sentence

X – b – W, b is a Y, and b Z.

An analogical rule (G.a(n)) can be devised for a(n).For universal quantification:

(G.every) When the game has reached a sentence of the form

X – every Y who Z – W,

a person may be chosen by Nature. Let the proper name of that person be ‘d‘. Then the game is continued with respect to the sentence

X – d – W if d is a Y and if d Z.

The rule (G.any) is analogical. Here we assume that Y and Z are singular and that the ‘who’ is the subject of ‘who Z’. The rules could be modified to deal with more general situations, but Hintikka does not go into that here.

The dependencies

So, we have defined some transformations which allow us to change an English sentence into a simpler English sentence. The natural question is, which order to we apply them in, and, since we are dealing with a game, what is their “informational status” (i.e. dependence, in the sense I talked about last time). I will discuss the first question in the next post (where we’ll finally get to the relevance of all this to linguistics itself), let us now concentrate on the latter. Hintikka argues that there are English sentences which do not exhibit a linear ordering of quantification. He provides an example:

Some novel by every novelist is mentioned in some survey by every critic.

This he interprets as being true only if the novel depends only on the novelist and the survey only on the critic. I.e. it is true in a state of affairs where

The bestselling book by every author is referred to in the longest essay by every critic

but not in a state of affairs where

The bestselling book by every author is referred to in the obituary essay on him by every critic.

I don’t know how about you, but I’d probably be quite willing to assent to the original statement even in such a situation. I would probably interpret it is “for every author and every critic, there is a book by the former which is mentioned in an essay by the latter” (but don’t mind my opinions). Against this, Hintikka gives an argument which I’m not sure I like, namely he appeals to the principle of charity and claims that even though the “literal meaning” of the sentence is as he says, people tend to interpret it the other way, because that’s the way “which is most likely to make the intended meaning of [the] utterance true” – just like the natural interpretation of “someone is hit by a car every week on this street” is simply an instance of the principle of charity, rather than an exception to the left-to-right ordering principle of English quantifiers.

Consider the sentence

John has not shown any of his paintings to some of his friends.

The natural interpretation is clearly that there are friends of John’s to whom he has shown none of paintings. But this seems to violate the left-to-right ordering principle (we would expect the scope of “any” to be greater than the scope of “some”). The principle of charity line of argument doesn’t seem to be applicable here, as it would not be anything out of ordinary if there were there were no paintings of John’s which he has shown to all of his friends. Partially-ordered quantification comes to the rescue, as we can simply interpret the two quantifiers to be independent of each other, which however turns out to be equivalent to the natural interpretation. Hintikka makes it clear that this is not due to some special behaviour of the “not”, as a similar analysis can be given for “John has shown all his paintings to some of his friends” (think it through yourself).

The other argument given is that “some novel by John is mentioned in some survey by every critic” clearly follows from the original sentence, but the “linear” interpretation (“there is one novel by John which is mentioned in some survey by every author”) is, according to Hintikka, “patently counter-intuitive”. Similarly with “some novel by every author is mentioned by every critic”. It is also argued that this nesting be arbitrarily complicated, that is that every sentence with branching quantifiers corresponds to some grammatical English sentence. Myself, I wouldn’t go as far as calling them patently counter-intuitive but I admit that in this case Hintikka’s interpretation seems acceptable. What I think Hintikka neglects, however, is that his sentences are a bit too abstract – I would say that their interpretation can also be expected to depend on stress (and possibly other prosodic features, too). I don’t think it’s possible to interpret the sentence “some novel by John is mentioned in some survey by every critic” (italics denoting stress) the way Hintikka does.

The implications

But however you feel about his particular examples, it seems undeniable that there are English sentences which can be naturally interpreted his way (for me, it would probably be sentences like “a particular book of every author is mentioned in a particular essay of every critic”). At the very least, even if you are unimpressed by any of his examples, they show that there is no a priori reason to assume that classical first-order predicate calculus can in general adequately capture quantification in natural language.

The importance of these findings (if true) is summarized at the end of Hintikka’s 1973 paper Quantifiers vs. Quantification Theory (according to the editor the first paper to discuss game semantics for natural language):

“Since our results suggest that the whole of f.p.o. quantification theory is built into the semantics of English quantifiers, it follows that the semantics of a relatively small fragment of English, viz. of English quantifiers plus a few supporting constructions, is a much subtler and much more complicated subject than anyone seems to have suspected. In the eyes of a logician, it seems to be powerful beyond the wildest dreams of linguists and of philosophers of language.”

Next time, I’ll try to wrap up this little introduction to Hintikka with his thesis concerning the interplay of syntax and semantics.

Game semantics

October 31, 2008 by Preno

Okay, three weeks of sheer suspense should be enough, it’s about time for some actual content.

A philosopher/logician I have long wanted to familiarize myself with is Jaakko Hintikka, one of the originators of game semantics – a semantics for logic based on game theory. So I picked up Game-Theoretical Semantics (edited by Esa Saarinen) at my library and plunged in. It turns out Hintikka has some interesting things to say about both formal logic and natural language. I originally intended to make a single post, but it seems best to split it into two parts, one dealing with the semantics itself, the other with applying it to natural language.

As you see, I am terrible at writing introductory fluff, so let’s get down to business. This post should be understandable to anyone with at least a cursory knowledge of formal logic. Just to refresh your memory: \vee, \wedge, \exists x,\forall x,\neg correspond to, respectively, “or”, “and”, “there is an x such that”, “for all x” and “not”. The basic idea behind game semantics is quite straight-forward: the truth-value of a sentence is determined by the outcome of a tug-of-war between ‘Myself’ (trying to prove the sentence true) and ‘Nature’ (trying to prove the sentence false). We fix a domain of individuals D and to every sentence S, we associate a recursively defined game G(S). If the sentence S is of the form:

  • F \vee G, I choose F or G and the game is continued with respect to it
  • F \wedge G, Nature chooses F or G and the game is continued with respect to it
  • \exists x ~ F(x), I choose a member of D, give it a proper name if does not already have one (a, for example), and the game is continued with respect to F(a)
  • \forall x ~ F(x), Nature chooses a member of D, give it a proper name if does not already have one (a, for example), and the game is continued with respect to F(a)
  • \neg F, the roles of the two players are switched

The rules are called, naturally enough, (G.\vee), (G.\wedge), (G.E), (G.U), (G.\neg) respectively. Implication can be translated into disjunction and negation or a new rule can be added (in G(F \rightarrow G), I choose G or \neg F). Likewise, we can abandon the negation rule in favour of rewriting sentences using De Morgan’s laws.

Using this process, we ultimately arrive at an atomic sentence A (a sentence like F(a,b,c) which we do not further analyze logically). If A is true, I have won and Nature lost; if A is false, vice versa (rule (G.A)). Finally, the original sentence S is true iff I have a winning strategy in G(S). (In game theory, a strategy is a function describing which option one is going to choose given the previous ones, a winning strategy is naturally a strategy which guarantees you victory.)

Thus we have defined a semantics for first-order predicate calculus which determines which sentences are true and which are false. That this definition is reasonable should be obvious. Assuming the Axiom of Choice, one can easily use induction to prove that this definition of truth is equivalent to Tarski’s (truth = satisfaction by every assignment of objects to variables). The problematic step is where we have to jump from “for every a, I have a winning strategy for F(a)” to “I have a winning strategy for \forall x F(x)” – without the Axiom of Choice, there needn’t be any, and sentences true according to Tarski would no longer have to be true according to Hintikka, resulting in a non-classical logic.

In fact, the easy modifiability to deal with non-classical logics is one of the main advantages of game semantics. For one thing, we might require the strategies to be computable functions. For another, the semantics works for a more general theory than classical logic if we relax a condition we have been tacitly assuming all along. Namely, the requirement of perfect information – that both players are informed about the previous choices made by their adversary (and remember their own) and their subsequent choices depend on them. The theory we get when we no longer assume such a dependence is called finite partially-ordered quantification or branching quantification. A typical formula with a branching quantifier might look like this:

\begin{pmatrix} \forall x \exists y \\ \forall z \exists t \end{pmatrix} F(x,y,z,t)

This implies that the choice of y depends only on x and the choice of t only on z (the branching may of course be further nested). We can make this dependence explicit by using the so-called Skolem functions f,g:

\exists f \exists g ~ \forall x \forall z ~ F(x,f(x),z,g(z))

This sentence is not equivalent to any sentence of regular first-order predicate logic. On the other hand, the sentence:

\begin{pmatrix} \exists x \\ \forall y \end{pmatrix} F(x,y)

is equivalent simply to:

\exists x \forall y ~ F(x,y).

This, obviously, is an exception rather than the norm – it can be proven that (validity in) this logic of finite partially-ordered quantification is recursively equivalent to (validity in) second-order logic, i.e. there is an algorithm to turn a sentence in second-order logic (where we can quantify over predicates in addition to individuals) into an equivalent sentence in this “branching” logic. And second-order logic is, of course, a much stronger theory than first-order logic.

What implications all this has for natural language (English in particular) will be discussed next time.

First post

October 8, 2008 by Preno

No, as cool as that would be, this blog is not going to be about algorithmic information theory.

Instead, I’d like to write about some not too advanced maths and logic, language(s), analytic philosophy and political theory. Possibly more, possibly less.

It goes without saying that you’re encouraged to leave comments of any kind. As much as writing one’s thoughts down in a coherent manner helps to expose their hidden inconsistencies, feedback or not, the point of having a blog isn’t really to just keep throwing browserfuls of text into an insatiable electronic abyss, is it?

So enjoy this blog, write comments and let’s hope that I won’t look back at this post in a couple of weeks and laugh at how naive I was to think that I would keep updating it semi-regularly.

Cheers.