Computing with field extensions

After we released the Z3 source code, a friend of mine made the following comment: "you have so many different representations for numbers in Z3". He is correct, we have Naturals, Integers, Rationals, binary Rationals, multiple precision floating and fixed point, and Algebraic numbers. Binary rationals are just rational numbers of the form \(\frac{a}{2 ^ k}\). The basic operations (addition, multiplication, comparison) with binary Rationals are much more efficient than the respective ones with Rational numbers. Algebraic numbers are Real numbers that are roots of polynomials with integer coefficients. For example, the square root of 2 is an Algebraic number because it is a root of the polynomial \(x ^ 2 - 2\). Some Algebraic numbers cannot be represented using radicals. For example, the single Real root of \(x ^ 5 - x - 1\). Nonetheless, we can compute with and compare them, and they are used in the nonlinear solver nlsat that is available in Z3.

In this post, I describe yet another representation that Grant Passmore and I developed. It supports infinitesimal, transcendental and algebraic extensions of the Rational numbers. That is, we can perform precise computations using multiple precision Rational numbers and elements such as \( \sqrt{2},\, \pi,\, e,\, \sqrt[5]{3},\, \epsilon,\, \ldots \) The new package subsumes the existing Algebraic number package in Z3, and is orders of magnitude more efficient. It is already available in the Z3 C and Python APIs. We can also play with it online at rise4fun.

Decision methods for nonlinear real arithmetic are essential to the formal verication of cyber-physical systems and formalized mathematics. Classically, these decision methods operate over the theory of real closed fields (RCF). What is a RCF? Every RCF is a field, that is, addition and multiplication are associative and commutative, multiplication distributes over addition, additive and multiplicative inverses for non-zero elements exist, etc. The Real and Rational numbers are fields, but the Integers are not because only 1 and -1 have multiplicative inverses. For example, there is no integer \(a\) s.t. \(3 a = 1\). A RCF is also an ordered field, i.e., it is a field equipped with a total order upon its elements. The Reals and Rational numbers are ordered fields. A RCF also has two additional properties, every positive number is a square

\[ \forall x y \ \left(0 \leq x \Rightarrow \exists y (x = y^2)\right)\!, \]

and second, all polynomials of odd degree have a root. This latter property is expressed using an axiom scheme, with one axiom for each n.

\[ \forall a_0 a_1 \ldots a_{2n} \exists z \left( z^{2n+1} + a_{2n} z^{2n} + \ldots + a_1 z + a_0 = 0 \right)\!. \]

So, the Rational numbers are not a RCF, but the Real and Algebraic numbers are. The set of Algebraic numbers is countable with computable operations, and thus provides a logically sufficient computational substructure for making RCF decision procedures. Note that the Algebraic numbers do not contain transcendental elements such as \( \pi \mbox{ and } e \). Indeed, a Real number is transcendental precisely when it is not Algebraic.

When implementing an RCF decision procedure, one is free to compute over any RCF while still being sure that the resulting computations are valid over the Reals. This is important from a computational point of view, as Reals are uncountable with uncomputable basic operations. On the one hand, this lack of transcendental elements seems logically inconsequential and even computationally desirable, as transcendentals are undefinable over RCF and almost all of them are uncomputable. However, various new applications have given rise to a need for computing in real closed fields containing transcendentals. This need is especially apparent when one considers cyber-physical systems, or mainstream efforts in formalized mathematics, such as Thomas Hales's Flyspeck project.

Infinitesimals

In addition to RCFs containing common transcendental constants, there is also a need for computing in RCFs containing infinitesimals. Our package allows users to create an arbitrary number of infinitesimal extensions. Each extension adds a new positive element \(\epsilon\) that is smaller than all existing positive elements in the current ordered field \(\mathbb{K}\).

\[ \epsilon > 0 \ \wedge \ \forall k \in \mathbb{K} \ \left(k > 0 \ \Rightarrow \ \epsilon < k\right). \]

Note that, \(\frac{1}{\epsilon}\) is an infinite element, that is, it is bigger than any element in \(\mathbb{K}\). Infinitesimals are a very useful abstraction, and simplify the implementation of decision procedures. For example, Bruno Dutertre and I used infinitesimals to implement a decision procedure for linear real arithmetic. The basic idea was to treat a strict inequality \(p < 0\), where \(p\) is a linear polynomial as a non-strict inequality \(p \leq \epsilon\), where \(\epsilon\) is an infinitesimal. Note that, in this application, infinitesimals could be easily encoded because we considered only the linear fragment.

The PSPACE and singly exponential procedures of Canny, Grigor'ev-Vorobjov, and Basu-Pollack-Roy for the existential fragment of nonlinear real arithmetic also rely on infinitesimal elements. Most of these procedures were never fully implemented. The lack of a viable library for computing with real closed fields containing infinitesimals has been an impediment to this line of research

Grant and I are also using infinitesimals to implement a new procedure for nonlinear programming (aka optimization). Suppose we are trying to minimize a nonlinear polynomial \(p\) subject to a set of nonlinear polynomial constraints. We say \(p\) is an objective function. In this project, we use infinitesimals in two different places. To show that \(p\) is unbounded by constructing a solution where \(p\) evaluates to \(-\frac{1}{\epsilon}\), and to show that \(p\) has no minimum, but an infimum value \(a\), by constructing a solution where \(p\) evaluates to \(a + \epsilon\), and proving that there is no solution \(\leq a\).

Examples

The Python API for our package is very simple. It provides the object RCFNum that is a wrapper for the internal representation used in our package. This object overloads the usual operations +, -, *, <, etc. We also provide the operations MkInfinitesimal() for creating a new infinitesimal element, Pi() and E() for obtaining the transcendental constants \(\pi\) and \(e\), and MkRoots([a_0, ..., a_n]) that returns the roots of the polynomial \(a_0 + a_1 x + \ldots + a_n x ^ n\). In the following example, I use MkRoots to "build" \(-\sqrt{2}\) and \(\sqrt{2}\).

msqrt2, sqrt2 = MkRoots([-2, 0, 1])
print(msqrt2)
>> root(x^2 + -2, (-oo, 0), {})
print(sqrt2)
>> root(x^2 + -2, (0, +oo), {})

The actual encoding used to represent the numbers is described in the following draft. In the example above we distinguish \(-\sqrt{2}\) and \(\sqrt{2}\) using the intervals (-oo, 0) and (0, +oo). For example root(x^2 + -2, (0, +oo), {}) is the single root of \(x ^ 2 - 2\) in the interval \((0, \infty)\). The method RCFNum.decimal(k) displays a number in decimal notation using k decimal places. The contract in our package only guarantees that the decimal representation can be produced when the number does not depend on infinitesimal extensions.

print(sqrt2.decimal(10))
>> 1.4142135623?

The ? in the output above is used to denote that the result is truncated. Now, we show that \(\frac{1}{\sqrt{2}}\) is equal to \(\frac{\sqrt{2}}{2}\), and compute \(\sqrt{2} ^ 5 + 1\)

print(1/sqrt2 == sqrt2/2)
>> True
print(sqrt2**5 + 1)
>> 4*root(x^2 + -2, (0, +oo), {}) + 1
print((sqrt2**5 + 1).decimal(10))
>> 6.6568542494?

Now, we compute the roots of the polynomial \( \pi - \sqrt{2} x + x ^ 5 \).

pi = Pi()
rs = MkRoots([pi, - sqrt2, 0, 0, 0, 1])
print(len(rs))
>> 1
print(rs[0])
>> root(x^5 + -1 root(x^2 + -2, (0, +oo), {}) x + pi, (-oo, 0), {})
print(rs[0].decimal())
>> -1.3852383884?

Actually, the polynomial has only one root: -1.3852383884?. The output produced also suggests that our representation is recursive. The previous Algebraic number package in Z3 used a "flat" data-structure where every number is encoded using a polynomial with integer coefficients. The main problem in this flat representation is that the polynomials grow really fast. For example, \(\sqrt[11]{2} + \sqrt[7]{2}\) can be compactly encoded in our new package

[r1] = MkRoots([-2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1])
[r2] = MkRoots([-2, 0, 0, 0, 0, 0, 0, 1])
print(r1 + r2)
>> root(x^7 + -2, (0, +oo), {}) + root(x^11 + -2, (0, +oo), {})
print((r1 + r2).decimal(10))
>> 2.1691306031?

On the other hand, if we try to represent this value as the root of a polynomial with integer coefficients, we have to use the following polynomial of degree 77. Actually, it is the minimal polynomial (with integer coefficients) for defining this value.

\[ \begin{array}{l} -2176 - 19712 x - 670208 x^2 - 2050048 x^3 - 295680 x^4 + 96233984 x^5 - 495599104 x^6 + 11264 x^7 - \\ 198253440 x^8 - 20646191104 x^9 - 650890240 x^{10} + 448 x^{11} - 38011467648 x^{12} + 171371574528 x^{13} - \\ 28160 x^{14} - 37166976 x^{15} - 1268310460032 x^{16} - 11504100608 x^{17} + 34723106880 x^{19} - \\ 2544211567744 x^{20} + 42240 x^{21} - 672 x^{22} - 1992710577088 x^{23} - 43405281920 x^{24} - 186825408 x^{26} + \\ 4024746461120 x^{27} - 42240 x^{28} - 120170824928 x^{30} - 50308241984 x^{31} + 560 x^{33} - 850951467520 x^{34} + 29568 x^{35} - \\ 155813504 x^{37} - 20052576544 x^{38} + 20445649840 x^{41} - 14784 x^{42} - 280 x^{44} - 2670956288 x^{45} - 25531352 x^{48} + \\ 5280 x^{49} - 97853448 x^{52} + 84 x^{55} - 1320 x^{56} - 544236 x^{59} + 220 x^{63} - 14 x^{66} - 22 x^{70} + x^{77} \end{array} \]

We can try these examples online here.

In nonlinear solvers such as nlsat, a procedure such as MkRoots is extensively used. To demonstrate the basic idea, let us consider a very simple system of polynomial equations in triangular form.

\begin{eqnarray*} -1 - x + x^5 & = & 0 \\ -197 + 3131 x - 31 x^2 y^2 + x y^7 & = & 0 \\ -735 x y + 7 y^2 z - 1231 x^3 z^2 + y z^5 & = & 0 \end{eqnarray*}

We say the system is in triangular form because the first polynomial equation contains only one variable, the second contains two, and so on. We can solve this kind of system using MkRoots and "plug-in" the solutions up to step i when solving step i+1. This system has only one solution, and we can find it using

[x] = MkRoots([-1, -1, 0, 0, 0, 1])
[y] = MkRoots([-197, 3131, -31*x**2, 0, 0, 0, 0, x])
[z] = MkRoots([-735*x*y, 7*y**2, -1231*x**3, 0, 0, y])
print(x.decimal(10))
>> 1.1673039782?
print(y.decimal(10))
>> 0.0629726948?
print(z.decimal(10))
>> 31.4453571397?

This example in instantaneously solved using our new package. We can try it online here.

We can also try to solve it using Mathematica Root primitive that is roughly equivalent to MkRoots. However, Mathematica builds a "flat" representation like the old Algebraic package in Z3, and takes approximately 10min to find the solution on my machine. The solution for z is encoded using a polynomial with integer coefficients of degree 175 that is too big to display here. Here is the Mathematica script I used

x = Root[#^5 - # - 1 &, 1]
y = Root[x #^7 - 31 x^2 #^2 + 3131 # - 197 &, 1]
z = Root[y #^5 - 1231 x^3 #^2 + 7 y^2 # - 735 x y &, 1]

In the next example, we create infinitesimal elements, and perform some basic operations with them.

eps = MkInfinitesimal()
print(eps < 0.000000000000001)
>> True
print(1/eps > 10000000000000000000000)
>> True
print(1/eps + 1 > 1/eps)
>> True
print(eps**2 + 1/eps)
>> (eps!1^3 + 1)/(eps!1)
print(eps**2 < eps)
>> True
# r is the cubic root of eps
[r] = MkRoots([-eps, 0, 0, 1])
print(r)
>> root(x^3 + -1 eps!1, (0, +oo), {})
print(r > eps)
>> True
# We need to use sign conditions to distinguish the roots of 
for r in MkRoots([1, 0, -eps, -eps, 0, eps**2]):
    print(r)
>> root(eps!1^2*x^5 + -1*eps!1*x^3 + -1*eps!1*x^2 + 1, (-oo, 0), {})
>> root(eps!1^2*x^5 + -1*eps!1*x^3 + -1*eps!1*x^2 + 1, (0, +oo), {60*eps!1^2*x^2 + -6*eps!1 > 0})
>> root(eps!1^2*x^5 + -1*eps!1*x^3 + -1*eps!1*x^2 + 1, (0, +oo), {60*eps!1^2*x^2 + -6*eps!1 < 0})
# The irreducible polynomial:  eps x^3 - 6x^2 + 9x - 1
# has three infinite positive roots
for r in MkRoots([-1, 9, -6, eps]):
    print(r)
>> root(eps!1*x^3 + -6*x^2 + 9*x + -1, (0, +oo), {3*eps!1*x^2 + -12*x + 9 > 0, 6*eps!1*x + -12 > 0})
>> root(eps!1*x^3 + -6*x^2 + 9*x + -1, (0, +oo), {3*eps!1*x^2 + -12*x + 9 > 0, 6*eps!1*x + -12 < 0})
>> root(eps!1*x^3 + -6*x^2 + 9*x + -1, (0, +oo), {3*eps!1*x^2 + -12*x + 9 < 0, 6*eps!1*x + -12 < 0})

Note that 1/eps is an "infinite" value, and we use Rational functions to represent elements of infinitesimal (and transcendental) extensions. For example, eps**2 + 1/eps is internally encoded as \(\frac{\epsilon ^ 3 + 1}{\epsilon}\). Finally, we use sign conditions to identify the roots of \(1 - \epsilon x ^ 2 - \epsilon x ^ 3 + \epsilon ^ 2 x ^ 5\) and \(-1 + 9 x - 6 x ^ 2 + \epsilon x ^ 3 \). We cannot identify them using just intervals with binary Rational end-points because these roots are infinite values. A similar problem occur when we have roots that are infinitely close to each other. Thom's lemma guarantees that we can always identify the roots of a polynomial using the signs of its derivatives. In our implementation, we minimize the number of sign conditions needed using the intervals whenever possible. For example, our package did not use sign conditions to encode the negative root of \(1 - \epsilon x ^ 2 - \epsilon x ^ 3 + \epsilon ^ 2 x ^ 5\). Moreover, it only needed the sign of one derivative to identify the two positive roots. In principle, we can use the signs of arbitrary polynomials to identify/discriminate the different roots of a polynomial \(p\), Thom's lemma just says we can find these polynomials by checking the derivatives of \(p\). We can try this example online here. Many other examples are available at rise4fun. Well, we hope you will find new applications for this kind of package. The source code is available online at codeplex. I think Grant would agree that we had \(\frac{1}{\epsilon}\) fun working on this package :)