Intro
In the last post, we added persistence to our search by adding a database. This ensured that we didn’t lose any of the work that we did, but still the progress was pretty slow and we didn’t move much past millions in the search. We were considering every number as a potential candidate for a Riemann Counterexample
. However, on inspection, we can see that the high witness values
are all created by numbers that have a lot of divisors. Can we focus our search only on these numbers?
As always, credits go to the original author here
Superabundant Numbers
In mathematics, numbers that have the sum of their factors that add up to more than the number itself are called abundant numbers
$$\phi(n) > n$$
where $\phi(n)$ refers to the sum of factors of n. It is easy to see how these numbers are prime candidates for a Riemann counterexample since they have a high chance of a high sum of divisors.
Among abundant numbers, some numbers are called superabundant
which have the property of “maximal relative divisor sums”, i.e, these are the numbers that have the the ratio of the sum of divisors to itself is larger than every number smaller than itself. This can be expressed as:
$$ \frac{\sigma_m}{m} \lt \frac{\sigma_n}{n} \hspace{1em} \forall \hspace{1em} m <n $$
While there is no easy strategy to find superabundant numbers, it has been proved that superabundant numbers have to have a prime decomposition where all the prime factors occur in non-increasing exponents:
$$n = \prod_{i=1}^n {p_i}^{a_i}$$
where $p_i$ is the i
th prime factor and $a_i \leq a_{i-1}$.
The non-increasing list of numbers that add to a number is called a partition
. We just need to find the partitions of a number, and combine them with the non-increasing prime factor and we’re well on our way to a better approach to solve this problem.
Finding Partitions
There are pretty standard methods to find partitions, one of which was included in the original post. However, I found the algorithm a bit difficult to understand and hence ended up implementing a simple recursion algorithm instead.
func PartitionsOfN(n int) [][]int {
output := [][]int{{n}}
arrayToAdd := make([]int, n)
splitPoint := 0
arrayToAdd[splitPoint] = n
for i := n - 1; i > 0; i-- {
newParts := PartitionsOfN(n - i)
for _, part := range newParts {
if part[0] <= i {
output = append(output, append([]int{i}, part...))
}
}
}
return output
}
Quite simply, partitions of a number are the previous partitions ($n-1$), plus the partitions created by the additional number.While this approach worked and found the right set of partitions, the algorithm grinded to a halt as we approached $n>25$! A similar slowdown was seen for the Kun/Python
algorithm only around $n>70$ so we definitely needed to do better!
On inspection, I realized that for each number, the algorithm was calculating the partitions for all the smaller numbers again and again, duplicating the work. In this commit, I implemented memoization which calculates the partitions only if the number was new, otherwise reading from the existing cache. This approach of memoization
+ recursion
is called Dynamic Programming
func cachedPartitionsOfN(n int, cache map[int][][]int) [][]int {
possibleOutput, ok := cache[n]
if ok {
return possibleOutput
}
output := [][]int{{n}}
arrayToAdd := make([]int, n)
splitPoint := 0
arrayToAdd[splitPoint] = n
for i := n - 1; i > 0; i-- {
newParts := cachedPartitionsOfN(n-i, cache)
cache[n-i] = newParts
for _, part := range newParts {
if part[0] <= i {
output = append(output, append([]int{i}, part...))
}
}
}
return output
}
func MemoizedPartitionsOfN(n int) [][]int {
cache := make(map[int][][]int)
return cachedPartitionsOfN(n, cache)
}
This significantly sped up the process and we were able to approach partition computation speeds on par with the Python
approach.
At this point, I noticed some minor discrepancy between the results of my approach and the original authors approach. To verify, I implemented the same myself and saw the the issue is persisting and raised a bug report which the author fixed promptly!
Now that we had the partitions, finding the prime factor divisor sum is straightforward.
Property Testing
In general, we add tests by manually computing the expected value and then comparing the results of a function with the expected value to see if it matches correctly. This kind of testing is useful primarily in preventing regressions, where a further change can accidentally break existing functionality. However, this rarely captures edge cases that the programmer herself hasn’t thought of. We can mitigate some of this by using property tests
.
In the case of Property Tests, the test cases are generated automatically at test time. After that, the expected output is checked based on a specific property of the expected output.
For example, let us assume we are implementing the square
function. We usually test the function by checking the value of squares for a few numbers.
- square(1) = 1
- square(2) = 4
- square(11) = 121
While this might look sufficient, this doesn’t necessarily capture edge cases. For example our algorithm might fail for
- negative numbers
- fractional numbers
- irrational numbers
- large numbers
- very small numbers etc.
So instead of trying to account for every possible number, we instead give properties of the squaring function that we know:
- Square of any number is positive
- Square of any number > 1 should be greater than itself
- Square of any number <1 should be less than itself
- Square of any integer is another integer
- Square of any fraction is a fraction.
This kind of an approach can find out many hidden bugs in the code and is essential for all mission critical code!
In our case, we add the property tests by calculating the Prime Factor Divisor Sum
using the naive approach to validate our tests. In Golang
, the gopter
package allows you to quickly generate hypothesis tests and customize it to your hearts content.
parameters := gopter.DefaultTestParameters()
parameters.MinSuccessfulTests = 100
parameters.MaxSize = 10
properties := gopter.NewProperties(parameters)
properties.Property("Check Prime Factor Divisor Sum", prop.ForAll(
func(a, b []int) bool {
smallerSliceLength := len(b)
if len(a) <= len(b) {
smallerSliceLength = len(a)
}
input := [][]int{}
n := int64(1)
for i := 0; i < smallerSliceLength; i++ {
input = append(input, []int{a[i], b[i]})
n *= int64(math.Pow(float64(a[i]), float64(b[i])))
}
resultA, err := riemann.PrimeFactorDivisorSum(input)
if err != nil {
return false
}
resultB, err := riemann.DivisorSum(n)
if err != nil {
return false
}
return resultA == resultB
},
gen.SliceOf(gen.IntRange(2, 20).SuchThat(func(v interface{}) bool { // Have to constrain because it doesn't work for large numbers yet
return riemann.CheckIfPrime(v.(int))
}),
reflect.TypeOf(int(0))).SuchThat(func(v interface{}) bool {
return riemann.CheckUniqueness(v.([]int))
}).WithLabel("a"),
gen.SliceOf(gen.IntRange(2, 5),
reflect.TypeOf(int(0))).WithLabel("b"),
))
Expect(properties.Run(gopter.ConsoleReporter(true))).To(BeTrue())
In the above code, I randomly generate bases between 2 and 20 (while constraining them to be prime and unique), and exponents between 0 to 5. This testing actually managed to find specific issues with my first implementation that I had missed out!
- The algorithm doesn’t work with non-prime bases, for which I had added guard rails.
- The algorithm doesn’t work if the bases are not unique, which I hadn’t accounted for either in the first implementation.
- The algorithm doesn’t work for larger numbers because of integer overflow (and exponentiation creates large numbers really fast!), which couldn’t solve yet.
I added code in addition to the property tests and regular tests in this commit.
Now that we are sold on property tests, we can also circle around and add property tests for the partitions
code. We check for the following properties:
- Sum of all partitions should add up to the original number
- The partition values themselves are non-increasing
- Partitions are sorted
- Partitions are unique.
Thankfully, our partitions algorithm was already beyond reproach ;) and we didn’t have to change any code!
Dealing with (REALLY) Large Numbers
The one issue that we identified which we had avoided till now was dealing with large numbers. Even if we are using 64 bit integers (which are the largest primitives in Golang
) while doing the math, it turns out that we max-out at a paltry 9223372036854775807
which is not going to be nearly enough if we’re going to find a counterexample.
NativePython
doesn’t face this issue since large number math is handled internally byPython
, but the moment we try to optimize the code by using packages likenumpy
andnumba
, we have the same problem!
Thankfully, Golang
has a standard way of dealing with this by using the math/big
package. While using the package is a bit more clunky than native numbers, it allows you to do all the math required to solve our problem. In this commit I upgrade the numbers to Bigint
and add a few more tests. This allows us to remove the constraint on large numbers (now the constraints are only to make sure that the tests run within a reasonable time!)
Next steps
All the commits above are condensed into this pull request.
We are all set to solve the actual problem, since we have the subset of numbers that we need to search as candidates for a Reimann counterexample
.
In the next post, we will use the candidates we found using this approach to solve the Riemann Hypothesis problem.