Chapter 1 Numerical mathematics

For reasons of computational efficiency, essentially all numerical and statistical computations are performed on computers with finite precision arithmetic. The finite precision needs to be taken into account when designing algorithms because the numbers represented by a computer do not have all the same properties that real numbers have.

1.1 Floating point numbers

Most numerical and statistical algorithms and computations use floating point numbers. These are numbers of the form \[\begin{equation} x_{FP} = \pm c \cdot 2^q, \tag{1.1} \end{equation}\] where both the coefficient \(c\) and exponent \(q\) are integers chosen from a finite range of values. Floating point numbers are useful because they can easily represent values that are very small in absolute value (close to zero) and ones that are very large by using negative and positive exponents \(q\), respectively. The range of values is still finite and as a consequence there are real numbers \(x \neq 0\) for which the closest floating point number \(x_{FP}=0\), leading to what is known as underflow in computations. Conversely, computations with a result that is larger than the largest floating point number are said to overflow.

The floating point numbers used by all modern computers are defined by the IEEE Standard for Floating-Point Arithmetic (IEEE 754). Most commonly used formats are 64 bit double precision numbers and 32 bit single precision numbers, although others are also occasionally used.

Table 1.1: Key properties of common floating point number formats.
Precision Bits \(c\) bits \(q\) bits Max value
Single 32 24 8 \(\approx 3.4 \cdot 10^{38}\)
Double 64 53 11 \(\approx 1.8 \cdot 10^{308}\)
Half 16 11 5 \(\approx 6.6 \cdot 10^4\)

Python/NumPy allows exploring floating point type properties using

## Machine parameters for float32
## ---------------------------------------------------------------
## precision =   6   resolution = 1.0000000e-06
## machep =    -23   eps =        1.1920929e-07
## negep =     -24   epsneg =     5.9604645e-08
## minexp =   -126   tiny =       1.1754944e-38
## maxexp =    128   max =        3.4028235e+38
## nexp =        8   min =        -max
## ---------------------------------------------------------------
## Machine parameters for float64
## ---------------------------------------------------------------
## precision =  15   resolution = 1.0000000000000001e-15
## machep =    -52   eps =        2.2204460492503131e-16
## negep =     -53   epsneg =     1.1102230246251565e-16
## minexp =  -1022   tiny =       2.2250738585072014e-308
## maxexp =   1024   max =        1.7976931348623157e+308
## nexp =       11   min =        -max
## ---------------------------------------------------------------

The value eps returned above is especially interesting because it is the smallest number of the given type for which \(1 + eps \neq 1\).

1.1.1 Special values

In addition to numbers in the prescribed range, floating point numbers can represent the following special values:

  • \(\pm\)Inf (\(\pm \infty\))
  • NaN (not-a-number)

Infinity values appear when a computation overflows, e.g.:

## exp(1000) = inf
## 
## <string>:1: RuntimeWarning: overflow encountered in exp

Infinities obey the usual rules of arithmetic and can be compared as usual.

NaN is used to represent undefined values such as \(0/0\), \(0 \cdot \infty\), \(\infty - \infty\).

All arithmetic operations including a NaN result in a NaN. All comparisons with NaN (except \(\neq\)) are always false. To test if a number is NaN, you need to use np.isnan().

1.2 Some important properties of floating point numbers

Some mathematical properties of floating point numbers that differ from real numbers:

  • Uneven distribution of numbers, more dense near zero
  • Finite range of representable values
    • Example: cumulative density of standard normal \(\Phi(10) \approx 1 - 7.619853 \cdot 10^{-24} =_F 1\)
    • Using \(\log p\) instead of \(p\) can help with probabilities
  • Rounding errors break all rules of arithmetic
  • Equality comparison non-trivial because of rounding: never use \(x==y\)
  • Avoid computing \(x - y\) for \(x,y \gg 0\) or \(x,y \ll 0\)

1.3 Computing with probabilities

As probabilities are non-negative but can be very close to 0, it is often numerically advantageous to store their logarithms \(\log p\) instead of the raw values \(p\).

1.3.1 Logsumexp

(Re-)normalisation of probabilities defined in log scale requires converting back to raw probabilities with exp() which can cause an overflow or an underflow. This can be avoided using so-called logsumexp trick.

Let us consider a vector \(\boldsymbol{\pi} = (\pi_i) = (\log p_i)\), where \(p_i > 0\), \(i= 1,\dots,n\). In order to compute \[\log S = \log \sum_{i = 1}^n p_i,\] we need to compute \[\log S = \log \sum_{i=1}^n \exp(\pi_i).\] However, the computation of \(\exp(\pi_i)\) can overflow even if final \(\log S\) is not that large. The overflow can be avoided by finding \[M = \max_{i \in \{1, \dots, n\}} \pi_i\] and computing \[\begin{equation} \log S = M + \log \sum_{i=1}^n \exp(\pi_i - M), \tag{1.2} \end{equation}\] which is guaranteed to be well-behaved because \(\max_{i \in \{1, \dots, n\}} \pi_i - M = 0\). The same trick avoids potential underflows when \(\pi_i\) are small, because at least one of \(\pi_i - M\) is guaranteed not to be small.

1.3.2 Special functions

Mathematical libraries in various programming environments contain a number of special functions that are specifically designed for computing challenging expressions in a numerically stable manner by avoiding overflows and underflows. Some examples of such functions are listed in Table 1.2

Table 1.2: Useful special functions for numerical computation in Python. Equivalent functions can be found in most other numerical languages. (import numpy as np; import scipy.special as scs)
Name Purpose
np.log1p Computes \(\log(1 + x)\), useful for \(x \approx 0\)
np.expm1 Computes \(\exp(x) - 1\), useful for \(x \approx 0\)
scs.gammaln Computes \(\log \Gamma(x)\)
scs.erf Computes \(\operatorname{erf}(x) = 2/\sqrt{\pi} \int_{0}^x e^{-t^2} \,\mathrm{d} t\)
(The cumulative distribution of \(\mathcal{N}(0, 1)\) is \(\Phi(x) = 1/2[1 + \operatorname{erf}(x/\sqrt{2})]\))
scs.erfc Computes \(1 - \operatorname{erf}(x)\), useful for \(x \gg 0\)
scs.erfcx Computes \(\exp(x^2) (1-\operatorname{erf}(x))\), useful for \(x \gg 0\)

Examples:

## 1e-99
## 0.0
## 1e-20
## 0.0

1.4 Reporting extreme values computed using logarithms

Most people are used to working in base 10 numbers. When using logarithms to represent extreme values, it is sometimes non-trivial to print them as \(x \cdot 10^y\).

Assuming \(z = \log_{10}(p)\), we have for all \(y\), \[ p = 10^z = 10^(z - y) \cdot 10^y. \] By choosing \(y = \lfloor z \rfloor\), we have \[ 1 \le 10^(z - y) < 10. \] Numbers computed in different logarithmic bases can be converted as usual using \[ \log_{10}(p) = \frac{\log_b(p)}{\log_b(10)}. \]

## Computed using floating point values and logarithms
## 1.2676506002282257e30
## Computed using Python's arbitrary precision integer arithmetic
## 1267650600228229401496703205376

1.5 Best practices and hints

  • Be aware of your data types and their ranges, especially with GPUs and other special hardware
  • Compute and store values in a way that maintains precision (\(p\) vs.  \(1 - p\), \(\log p\), renormalising things as necessary)
  • Order of evaluation does matter!
  • You should not do equality comparisons for floating point numbers because of possible rounding errors
    • Python has np.isclose() to test approximate equality
  • Equality tests for special values should be done using special functions np.isfinite(), np.isinf(), np.isnan()
  • NaN values can be annoying, because any operation with NaN yields a NaN and thus a single NaN value often propagates to all values in a large computation.
    • It can be useful to use np.isnan() to check the output of a suspicious function

1.6 Literature

Gentle (2009) contains an introduction to basic challenges raised by floating point representation.

1.7 Exercises

  1. Write a program to increment x = 0.0 by 0.1 100 times. Compute x - 10. How do you interpret the result?
  2. Verify (prove) that the logsumexp trick of Eq. (1.2) gives the correct result for \(\log S\).
  3. Find examples of floating point computations that break standard rules of arithmetic.
  4. For \(x \sim \mathcal{N}(0, 1)\), compute \(\log p(28 < x < 30)\).
  5. Check if the following properties of real numbers hold for floating point numbers and provide examples of how the invalid properties fail.
  1. \(\forall x < y, \exists z: x < z < y\)
  2. \(x, y \in \mathbb{R} \Rightarrow x + y \in \mathbb{R}\)
  3. \(x, y \in \mathbb{R} \Rightarrow x y \in \mathbb{R}\)
  4. There exists unique \(a = 0\) such that \(x+a = a+x = x\)
  5. \(x, y, z \in \mathbb{R} \Rightarrow (x + y) + z = x + (y + z)\)
  6. \(x, y, z \in \mathbb{R} \Rightarrow (x y) z = x (y z)\)
  7. \(x, y, z \in \mathbb{R} \Rightarrow x (y + z) = xy + xz\)

References

Gentle, James E. 2009. Computational Statistics. 1st ed. New York, NY: Springer.