Joseph Tassarotti home

Tarski's High School Algebra Problem


In high school, we all learn algebraic rules for simplifying expressions. The following eleven identities over the positive integers cover most of the cases we encounter in school:
  • \(x + y = y + x\)
  • \((x + y) + z = x + (y + z)\)
  • \(x \cdot 1 = x \)
  • \(x \cdot y = y \cdot x \)
  • \((x \cdot y) \cdot z = x \cdot (y \cdot z)\)
  • \(1^x = 1\)
  • \(x^1 = x\)
  • \(x^{y+z} = x^y \cdot x^z\)
  • \((x \cdot y)^z = x^z \cdot y^z\)
  • \((x^y)^z = x^{y \cdot z}\)
One question that arises is whether these rules are complete. That is, can we prove every valid identity over the positive integers holds just using these eleven rules?

This problem is known as Tarski's high school algebra problem. The Wikipedia page gives a good summary of the problem, its solution, and related work. In short, the answer is no. Andrew Wilkie proved this by giving an equation:

\[\begin{equation} \begin{split} {\left( (1 + x)^y + (1 + x + x^2)^y\right)^x \cdot \left((1 + x^3)^x + (1 + x^2 + x^4)^x\right)^y} \\ = {\left((1 + x)^x + (1 + x + x^2)^x\right)^y \cdot \left((1 + x^3)^y + (1 + x^2 + x^4)^y\right)^x} \end{split} \end{equation}\]

that is true for all positive integers \(x\) and \(y\). He then showed that it is not possible to give a proof of this equation just using the eleven rewrite rules above.

In this post, I will give a mechanized proof of Wilkie's theorem in the Coq proof assistant. However, I will formalize a proof due to Burris and Yeats instead of Wilkie's original proof.

We begin by defining the algebraic expressions we are talking about as a Coq data type:

Instead of representing variables by letters like "x" and "y", we will index them using natural numbers. Var 0 will be the first variable, Var 1 will be the second, and so on. We also define some notation to make it easier to write and read these expressions.

Infix "+" := Eplus : expression_scope.
Infix "*" := Etimes : expression_scope.
Infix "^" := Epow : expression_scope.

Next, we would like to give a denotation to these algebraic expressions. What we mean here is, when we have an expression like:

\[ x^2 + x + 4 \]

We can think of this as the description of a function from positive integers (which will be bound to the variable \(x\)) to positive integers. We would like to capture this idea formally.

Unfortunately, we first need to define exponentiation on the positive integers. Although the Coq standard library has definitions for exponentiation over general integers and natural numbers, I did not find one for the positive integer type. This is easy enough to do ourselves. This definition is just like Zpower from the standard library, but uses iter_pos instead of iter_nat

Definition Ppower (x y: positive) :=
  iter_pos y positive (fun q: positive => x * q) 1.
Infix "^" := Ppower : positive_scope.

We also prove some lemmas about this operation. The proofs are omitted here, but they can be found in the full source file

Lemma unit_power: forall x, 1^x = 1.
Lemma exp_one: forall x, x^1 = x.
Lemma exp_plus_aux: forall y z x,
  Ppower_nat x (y + z) = Ppower_nat x y * Ppower_nat x z.
Lemma exp_plus: forall x y z, x ^ (y + z) = x^y * x^z.
Lemma exp_mult: forall z x y, (x * y) ^ z = x^z * y^z.
Lemma exp_exp: forall z x y, (x ^ y) ^ z = x ^ (y * z).

Since all of the other operators we need are in the standard library, we are ready to define the denotation of algebraic expressions:

Fixpoint denote (e: expression) (f: nat -> positive) : positive :=
  match e with
    | Var n => f n
    | Const n => n
    | Eplus e1 e2 => denote e1 f + denote e2 f
    | Etimes e1 e2 => denote e1 f * denote e2 f
    | Epow e1 e2 => denote e1 f ^ denote e2 f
  end.

Given an expression e and a function f from nat -> positive, denote uses the function f as map to look up the values of the variables in the expression. That is, Var n gets mapped to f n. Constant terms are mapped to the corresponding positive, and Eplus, Etimes, and Epow are mapped to the corresponding operator for the positive type.

Now we give an inductive data type corresponding to proofs using the eleven high school identities:

Local Open Scope expression_scope.

Inductive td : expression -> expression -> Prop :=
  | plus: forall x x' y y', td x x' -> td y y' -> td (x + y) (x' + y')
  | times: forall x x' y y', td x x' -> td y y' -> td (x * y) (x' * y')
  | pow: forall x x' y y', td x x' -> td y y' -> td (x ^ y) (x' ^ y')
  | cplus: forall p1 p2, td (Const p1 + Const p2) (Const (p1 + p2))
  | ctimes: forall p1 p2, td (Const p1 * Const p2) (Const (p1 * p2))
  | cpow: forall p1 p2, td (Const p1 ^ Const p2) (Const (p1 ^ p2))
  | refl: forall x, td x x
  | trans: forall x y z, td x y -> td y z -> td x z
  | sym: forall x y, td x y -> td y x
  | t1: forall x y, td (x + y) (y + x)
  | t2: forall x y z, td ((x + y) + z) (x + (y + z))
  | t3: forall x, td (x * (Const 1)) x
  | t4: forall x y, td (x * y) (y * x)
  | t5: forall x y z, td ((x * y) * z) (x * (y * z))
  | t6: forall x y z, td (x * (y + z)) (x * y + x * z)
  | t7: forall x, td ((Const 1) ^ x) (Const 1)
  | t8: forall x, td (x^(Const 1)) x
  | t9: forall x y z, td (x^(y + z)) (x^y * x^z)
  | t10: forall x y z, td ((x * y)^z) (x^z * y^z)
  | t11: forall x y z, td ((x^y) ^ z) (x^(y * z)).

Given two expression e1 and e2, if we can build a term of type td e1 e2 then there is a proof that e1 = e2 using one of the eleven identities. The constructors t1 through t11 correspond to these identities, but we also have some extras. plus, times, and pow state that if we have a proof that e1 and e1' are equal, and e2 and e2' are equal then e1 + e2 is equal to e1' + e2', and e1 * e2 is equal to e1' * e2', etc. cplus, ctimes, and cpow let us fold constants so that we can conclude td (Const 1 + Const 1) (Const 2)

Then we have refl, sym and trans which correspond to the idea that e1 is always equal to itself, if e1 equals e2 then e2 equals e1, and that if we have a proof that e1 is equal to e2, and e2 is equal to e3, then we also have a proof that e1 is equal to e3.

We also establish notation. We use [=] instead of "=" because this equality is distinct from the built-in equality in Coq

Infix "[=]" := td (at level 70) : expression_scope.

Now we can prove that these rules are sound. That is, if we can construct a proof that e1 [=] e2, then for each possible denotation of e1 and e2, the resulting values are equal.

Lemma td_sound: forall x y, x [=] y -> forall f, denote x f = denote y f.
Proof.
  intros.
  induction H; simpl; auto.
  rewrite IHtd1; rewrite IHtd2; auto with *.
  rewrite IHtd1; rewrite IHtd2; auto with *.
  rewrite IHtd1; rewrite IHtd2; auto with *.
  rewrite IHtd1; rewrite IHtd2; auto with *.
  apply Pplus_comm.
  rewrite Pplus_assoc; auto.
  apply Pmult_1_r.
  apply Pmult_comm.
  rewrite Pmult_assoc; auto.
  apply Pmult_plus_distr_l.
  apply unit_power.
  apply exp_one.
  apply exp_plus.
  apply exp_mult.
  apply exp_exp.
Qed.

Now we can prove identities in Coq about positive integers by constructing corresponding expressions, say e1 and e2, building a term of the type td e1 e2, and then using the previous lemma.

In practice, we don't really want to be doing this. Proving the following fact "directly" using the usual Coq tactics is a lot more pleasant and efficient, this is just merely to demonstrate that it can be done.

Remark example:
  (forall x y, (x + y) * (x + y) = x ^ 2 + 2 * (x * y) + y ^2)%positive.
Proof.
  intros.
  pose (e1 := (Var 0 + Var 1) * (Var 0 + Var 1)).
  pose (e2 := (Var 0) ^ (Const 2) + (Const 2) * (Var 0 * Var 1) +
    (Var 1) ^ (Const 2)).
  pose (f := fun (n:nat) => match n with O => x | _ => y end).
  specialize (td_sound e1 e2). intros.
  pose (e1' := (Var 0 + Var 1) * (Var 0) + (Var 0 + Var 1) * Var 1).
  pose (e1'' := (Var 0 * Var 0) + (Var 1 * Var 0) + (Var 0 + Var 1)
    * Var 1).
  assert (e1 [=] e2).
  eapply trans.
    eapply t6.
  eapply trans.
  eapply plus.

I will omit the rest of this proof since it's quite tedious

  specialize (H H0 f).
  simpl in H. auto.
Qed.

Now, what we'd like to prove is that there is some pair of expressions e1 and e2 such that forall f, denote e1 f = denote e2 f (which implies that they correspond to a valid identity over the positive integers) but for which there does not exist a term of type td e1 e2 (ie we cannot prove the identity using the high school identities

We begin by constructing expression terms w1 and w2 corresponding to Wilkie's counterexample.
We'll now want to prove that forall f, denote w1 f = denote w2 f. Let's first talk about how this proof would proceed on paper. In Wilkie's original paper he lets \(A = 1 + x\), \(B = 1 + x + x^2 \), \(C = 1 + x^3\) and \(D = 1 + x^2 + x^4\), so that the left hand side of his counterexample becomes \( (A^x + B^x)^y(C^y + d^y)^x\) and the right hand side becomes \( (A^y + B^y)^x (C^x + D^x)^y \).

Then by observing that \(C = A \cdot (1 + x + x^2) \) and \(D = B \cdot (1 + x + x^2)\), we can factor out \((1 + x + x^2)^{xy}\) from both sides and then the equality of the two expressions is clear.

To do this in Coq we first construct a function denote_z that instead of mapping our expressions onto positives, will instead map them onto integers

Fixpoint denote_z (e: expression) (f: nat -> positive) : Z :=
  (match e with
    | Var n => Zpos (f n)
    | Const n => Zpos n
    | Eplus e1 e2 => denote_z e1 f + denote_z e2 f
    | Etimes e1 e2 => denote_z e1 f * denote_z e2 f
    | Epow e1 e2 => denote_z e1 f ^ denote_z e2 f
  end)%Z.

The reason for doing this is that our proof will involve factoring out terms of the form \([1 - x + x^2\). This term involves subtraction. However, the positives are not closed under subtraction. The definition for subtraction in Coq's standard library deals with this by defining x - y to be 1 if y >= x.

Obviously, such a definition of subtraction does not have the properties we usually expect. By mapping things onto Z instead, we will work in an expanded domain where intermediary values can be negative, even though the final expression will still evaluate to a positive number.

We first quickly prove an intermediate result we need

Require Import Zpow_facts.
Lemma zpow_pos: forall p1 p2, (Zpos p1 ^ Zpos p2 = Zpos (p1 ^ p2))%Z.

The next two lemmas capture the idea that if we prove an identity is valid using our expanded domain of Z, it will still hold under our normal definition of denote

Lemma denote_to_denotez: forall e f, denote_z e f = Zpos (denote e f).
Proof.
 induction e; simpl; auto;
 intros;
 try (specialize (IHe1 f); specialize (IHe2 f);
   rewrite IHe1; rewrite IHe2; auto).
 apply zpow_pos.
Qed.

Lemma denote_z_implies_denote:
  forall e1 e2, (forall f, denote_z e1 f = denote_z e2 f)
    -> (forall f, denote e1 f = denote e2 f).
Proof.
  intros.
  specialize (denote_to_denotez e1 f).
  specialize (denote_to_denotez e2 f).
  intros.
  specialize (H f).
  rewrite H0 in H.
  rewrite H1 in H.
  inversion H.
  auto.
Qed.

Now we are ready to prove that w1 = w2 is a valid identity using our usual denote function. This proof is a little long, and it's probably not good Coq style, but I have tried to structure it according to the factorization suggested by Wilkie in his initial paper.

Local Open Scope Z_scope.

Lemma Zexp_exp:
  forall p2 x p1, (x ^ (Zpos p1)) ^ (Zpos p2) = x ^ (Zpos p1 * Zpos p2).
Lemma Zpower_pos_mult:
  forall x y p, (x * y) ^ (Zpos p) = x^(Zpos p) * y^(Zpos p).

Lemma w1_eq_w2: forall f, denote w1 f = denote w2 f.
Proof.
  apply denote_z_implies_denote.
  intros. unfold w1. unfold w2.
  unfold denote_z.
  pose (x := Zpos (f 0%nat)).
  pose (y := Zpos (f 1%nat)).
  fold x. fold y.
  pose (A := 1 + x).
  pose (B := 1 + x + x^2).
  pose (C := 1 + x^3).
  pose (D := 1 + x^2 + x^4).
  fold B. fold A.
  fold C. fold D.
  assert (C = A * (1 - x + x^2)).
    unfold C. unfold A.
    ring.
  assert (D = B * (1 - x + x^2)).
    unfold D. unfold B.
    ring.
  assert ((C ^ x + D ^ x) ^ y = (1 - x + x^2)^(x * y) * (A ^ x + B ^x)^y).
    rewrite H. rewrite H0.
    unfold x, y.
    rewrite Zpower_pos_mult.
    rewrite Zpower_pos_mult.
    fold x. fold y.
    rewrite <-Zmult_plus_distr_l.
    unfold y.
    rewrite Zpower_pos_mult.
    fold y.
    unfold x, y.
    rewrite Zexp_exp.
    fold x. fold y.
    ring.
  rewrite H1.
  assert ((C ^ y + D ^ y) ^ x = (1 - x + x^2)^(x * y) * (A ^ y + B ^y)^x).
    rewrite H. rewrite H0.
    unfold x, y.
    rewrite Zpower_pos_mult.
    rewrite Zpower_pos_mult.
    fold x. fold y.
    rewrite <-Zmult_plus_distr_l.
    unfold x.
    rewrite Zpower_pos_mult.
    fold x.
    unfold x, y.
    rewrite Zexp_exp.
    fold x. fold y.
    replace (y * x) with (x * y) by (auto with * ).
    ring.
  rewrite H2.
  ring.
Qed.

We must now prove that ~ td w1 w2. This is the critical step of the proof. How can we possibly prove that there does not exist a proof using these rules? Following the proof from Burris and Yeats, we define an algebra of 12 elements along with definitions of +, *, and ^ on this algebra.

Then we'll show that this algebra satisfies all of the eleven high school identities. We'll also show that w1 = w2 is not a valid identity over this 12 element algebra, by explicitly picking values for the variables at which the two expressions differ.

This will show that there cannot be a proof of w1 = w2 using the high school identities, because if there were that would mean that the equation w1 = w2 would also hold over the 12 element algebra.

We start by defining the algebra. We omit the definitions of addition, multiplication, and exponentiation on this algebra since they are just long match statements that elucidate little. This algebra was discovered by Burris and Yeats through the use of a computer program, so it is not very "natural".

Inductive byalg :=
  | n1 | n2 | n3 | n4
  | a | b | c | d
  | e | f | g | h.

We now prove that these operators have the desired properties. These "proofs" are just computation. We do proof by cases over all possible values for the variables and have Coq check each case holds.

Lemma byplus_comm: forall x y, byplus x y = byplus y x.
Proof.
  destruct x; destruct y; auto.
Qed.

Lemma byplus_assoc:
  forall x y z, byplus x (byplus y z) = byplus (byplus x y) z.

Lemma bymult_comm: forall x y, bymult x y = bymult y x.

Lemma bymult_assoc:
  forall x y z, bymult x (bymult y z) = bymult (bymult x y) z.

Lemma bymult_ident:
  forall x, bymult x n1 = x.

Lemma bymult_distr:
  forall x y z, bymult x (byplus y z) = byplus (bymult x y) (bymult x z).

Lemma byexp_one:
  forall x, byexp x n1 = x.

Lemma byexp_base_one:
  forall x, byexp n1 x = n1.

Lemma byexp_byplus:
  forall x y z, byexp x (byplus y z) = bymult (byexp x y) (byexp x z).

Lemma byexp_bymult:
  forall x y z, byexp (bymult x y) z = bymult (byexp x z) (byexp y z).

Lemma byexp_exp:
  forall x y z, byexp (byexp x y) z = byexp x (bymult y z).

Now we can define a denotation for expressions to byalg terms, just as we did with denote and denotez

Fixpoint denote_by (e: expression) (f: nat -> byalg) : byalg :=
  (match e with
    | Var n => f n
    | Const n =>
        match n with
         | 1 => n1
         | 2 => n2
         | 3 => n3
         | _ => n4
        end
    | Eplus e1 e2 => byplus (denote_by e1 f) (denote_by e2 f)
    | Etimes e1 e2 => bymult (denote_by e1 f) (denote_by e2 f)
    | Epow e1 e2 => byexp (denote_by e1 f) (denote_by e2 f)
  end)%positive.

And we can also prove that the td proofs are sound under this denotation. The properties of byplus, bymult, and byexp that we've shown above cover the eleven cases corresponding to the high school identities.

However, recall that we also had rules showing that constants could be "folded", e.g. that Const p1 + Const p2 [=] Const (p1 + p2) and so on. These turn out to be the most annoying part of the proof. Luckily, the proof works because of the way the operators treat the elements n1, n2, n3, and n4. For example:
  • byplus n1 n2 = n3
  • bymult n2 n2 = n4
  • bymult n3 n2 = n4
What's happening is that in the case of byplus n1 n2, we treat n1 and n2 as though they are 1 and 2 respectively, add them to get 3 and then we return n3. The same thing happens for the other operators. Now what happens if the resulting number is greater than 4, as in the case with bymult n3 n2, (since 3*2 = 6)? In that case we just return n4.

Once we realize this pattern, we can prove that the constant folding rules are sound. Addition and multiplication are handled just by brute force case work. However, this didn't work for exponentiation, so we need a few helper lemmas first.

Lemma denote_by_const_gt_3:
  (forall p1, p1 > 3 -> (forall f, denote_by (Const p1) f = n4))%positive.

Lemma Ppow_increase: (forall p2 p1, p1 ^ p2 > p1 \/ p1 ^ p2 = p1)%positive.

Lemma Ppow_gt_3: (forall p2 p1, p1 > 1 -> p2 > 1 -> p1 ^ p2 > 3)%positive.

Local Open Scope positive_scope.
Local Open Scope expression_scope.

Lemma positive_range: forall p: positive, p > 1 \/ p = 1.

Lemma constants_pow_gt_1: forall p1 p2 f, p1 > 1 -> p2 > 1 ->
  denote_by (Const p1 ^ Const p2) f =
      denote_by (Const (p1 ^ p2)) f.

Lemma constant_pow:
  forall p2 p1 f, denote_by (Const p1 ^ Const p2) f =
      denote_by (Const (p1 ^ p2)) f.

With those out of the way, we proceed with the soundness proof.

Lemma td_sound_by:
 forall x y, x [=] y -> forall f, denote_by x f = denote_by y f.
Proof.
  intros.
  induction H; try (apply constant_pow); simpl.
  simpl; rewrite IHtd1; rewrite IHtd2; auto with *.
  simpl; rewrite IHtd1; rewrite IHtd2; auto with *.
  simpl; rewrite IHtd1; rewrite IHtd2; auto with *.

  intros; simpl; repeat (induction p1; induction p2; auto with *);
  repeat (induction p1; auto with *);
  repeat (induction p2; auto with *).
  intros; simpl; repeat (induction p1; induction p2; auto with *);
  repeat (induction p1; auto with *);
  repeat (induction p2; auto with *).
  auto.
  rewrite IHtd1; rewrite IHtd2; auto.
  auto.

  apply byplus_comm.
  rewrite byplus_assoc; auto.
  apply bymult_ident.
  apply bymult_comm.
  rewrite bymult_assoc; auto.
  apply bymult_distr.
  apply byexp_base_one.
  apply byexp_one.
  apply byexp_byplus.
  apply byexp_bymult.
  apply byexp_exp.
Qed.

Next we show that under the Burris and Yeats algebra denotation, w1 is not equal to w2. Specifically, when x = a and y = e, the two expressions are not equal.

Lemma w1_neq_w2_by: ~ (forall f, denote_by w1 f = denote_by w2 f).
Proof.
  intros. unfold not.
  intros.
  specialize (H (fun x => match x with | O => a | _ => e end)).
  compute in H.
  inversion H.
Qed.

Which implies that we cannot have w1 [=] w2, as discussed earlier.

Theorem not_td_w1_w2: ~ (w1 [=] w2).
Proof.
 unfold not; intro.
 eapply w1_neq_w2_by.
 apply td_sound_by; auto.
Qed.

One might ask what needs to be added to the set of eleven identities to make them complete. Unfortunately, R. Gurevic showed that a finite list of identities cannot be complete. However, that proof is considerably harder, so this seems like a good place to wrap up!

Thanks to Edward Gan for his feedback.

References:

R. Gurevic. Equational theory of positive numbers with exponentiation is not finitely axiomatizable. Ann. Pure Appl. Logic, 49(1):1-30, 1990.

Stanley N. Burris and Karen A. Yeats. The saga of the high school identities. Algebra Universalis, 52(2-3):325-342, 2004.

A. J. Wilkie. On exponentiation - a solution to Tarski's high school algebra problem.