In the previous post, we started searching for Riemann counterexamples by trying to find a witness, that would disprove the hypothesis, across a range of integers. While we were successful in getting the code working to find the witness value, we had no way to save the results. In effect, this meant that for each program run, we were starting from scratch.
To make sure that we preserve the results of our runs, we need to introduce some way of persisting the results, and the state at which we stopped each run. This way, we could pick up where we started when we run the program next.
There are many ways to introduce persistence in the code, with the simplest being to write the values to a file and read the file each time to understand what has happened before and where to start the next time. However, it is better to use a database instead. In this post, we add the simplest of databases, SQLite, to our system.
As always, we will be closely following the original post from here.
The Interface pattern
As mentioned before, we are planning to use a SQLite database in our implementation. The simplest approach would be write code that depends on this knowledge and uses the database functions directly. However, this means that if we decide to use a different implementation for persisting the data tomorrow, we have to rewrite all the parts that interact with the database!
So instead of adding just a database implementation, we need to add an abstraction layer instead decoupling the intent and the implementation. This layer is called an Interface. We first identify the operations that we want the interface to support and then hide the actual code enabling the different operation as a detail to each implementation. We also write tests for the interface and expect results irrespective of the implementation details.
Since we are getting started on a major feature addition now, it is a good idea to add all additional changes in a separate branch. In our case, I’m using the branchdatabase
to handle all the code in this post
Interface Methods
Let us think about the basic operations our persistence implementation needs to support:
- A way to insert/overwrite data in the persistence layer (Upsert)
- A way to read the data that has already been written (Load)
- In addition, we will also add a convenience action to get summary statistics of the data that is already present in the persistence layer (Summarize)
That is our interface! If we build the interface right, we should be able to plug any code into our system easily, as long as it has way to support these three methods.
To see this in action, we will be first adding a non-persistent in-memory database (a simple Hashmap of values) as our first implementation, and then adding the SQLite database as an additional implementation.
The First Implementation
Adding the Types
We first create a type RiemannDivisorSum
to hold a single record, with the number(N), the Sum of Divisors(DivisorSum) and the calculated Witness values (WitnessValues).
type RiemannDivisorSum struct {
N int
DivisorSum int
WitnessValue float64
}
We can also add SummaryStats
, which holds the Largest Computed N and for the Largest Witness value, both RiemannDivisorSum
values.
type SummaryStats struct {
LargestComputedN RiemannDivisorSum
LargestWitnessValue RiemannDivisorSum
}
Adding the Interface
Now that we have the types in place, we can add the interface with the methods as discussed before.
type DivisorDb interface {
Load() []RiemannDivisorSum
Upsert([]RiemannDivisorSum)
Summarize() SummaryStats
}
Now that we have interface fully defined, we can add the implementation. The In-Memory Database implementation is defined in this file and the tests are in this one.
All of the above steps are done as a single feature in this commit.
Precomputing Divisor Sum
In some cases, we can provide the DivisorSum directly to the function without it having to compute it. For that, we need to extend the function to optionally take the precomputed DivisorSum value as a parameter, and calculate the DivisorSum only if this value is not provided. This is implemented in this commit.
Calculating Riemann sums for an array
Now that we know how to find the Riemann sums for a single integer, it is trivial to extend it for an array.
Adding the main
We have the individual components in place along with the unit tests. Now, it is time to add the main
function. The main
function in Golang
is the entrypoint to your code. In our case, it calculates the Witness values for a range of numbers and prints out the summary statistics across the range. Once we have the main
implemented, you can invoke the main function using go run main.go
to see the output in the terminal.
In this commit, we also add capability to search across arbitrary start and end ranges.
Additional fixes
At this point, I identified a few issues with the code which I fixed in the following small commits:
- We were using
int
till now which is soon going to be too small to represent the numbers we are going to handle! In this commit, we change all numbers toint64
. - While adding that fix, we introduced another bug. That was fixed and additional tests were added in this commit.
- We also add tests to capture the
panic
situations correctly in this commit. - In additional, there are many small (mostly cosmetic) cleanups here, here, and here.
With these changes, we are well and truly done with our first implementation and can more on to the second (and more practical) implementation.
Adding the Second(SQLite) Implementation
While ideating on adding the SQLite implementation, I realized that I had made a mistake by missing one method in the interface - which would initialize the interface. This design error actually came up while we were adding the first implementation as well, when I had to make the data field in the in-memory-database public, just to initialize the function. This is a code smell since the way the data is stored is really an implementation concern and not something the user should be aware of. In this commit, I rectify that mistake by making the data field private and adding and Initialize method to the interface, in addition to the method for the implementation (+ tests!)
Now we can add in the SQLite implementation, using the same methods as before. The implementation details are completely different from the in-memory database, but the interfaces remain the same. This is best showcased in this commit, where the test files are practically the same of the two implementations! This also allows us to parametrize the tests in this commit by moving the tests to be database specific rather than implementation specific. We can now go back and forth between the two implementations as we please!
The last step to do is to upgrade the main function to use the SQLite implementation instead of the In-Memory DB. This gives us an output:
❯ go run main.go
Computed Sums from 1,010,082 to 2,010,081 in 2.250402s
Computed Sums from 2,010,082 to 3,010,081 in 2.570138167s
Computed Sums from 3,010,082 to 4,010,081 in 2.820090959s
Computed Sums from 4,010,082 to 5,010,081 in 3.26657375s
^CGot Interrupt. Shutting down...
Highest Number Analyzed
======
{N:4010081 DivisorSum:4010082 WitnessValue:0.367434}
Largest Witness Value
======
{N:55440 DivisorSum:232128 WitnessValue:1.751247}
DB Closed!
On rerunning the function after stopping it:
❯ go run main.go
Computed Sums from 5,010,083 to 6,010,082 in 3.399881167s
Computed Sums from 6,010,083 to 7,010,082 in 3.468380792s
Computed Sums from 7,010,083 to 8,010,082 in 3.460063167s
Computed Sums from 8,010,083 to 9,010,082 in 3.5575695s
^CGot Interrupt. Shutting down...
Highest Number Analyzed
======
{N:8010082 DivisorSum:12027996 WitnessValue:0.542865}
Largest Witness Value
======
{N:55440 DivisorSum:232128 WitnessValue:1.751247}
DB Closed!
You can see that function is able to save the state of the last run and start the calculations from that time instead of from scratch each time.
Wrapping up
Now that we have the entire feature added, we can create a new pull request which after review is merged on to the main
branch.
Running the function upto a 100M now takes just a few minutes, and we can stop and start any time we want. However, we’re still nowhere close to finding a better candidate that the original 55440
. If you look at the top witness values in the SQLite DB, we can see a curious pattern:
sqlite> select * from RiemannDivisorSums ORDER BY witness_value DESC limit 10;
55440|232128|1.751247
27720|112320|1.742537
15120|59520|1.738559
110880|471744|1.734849
720720|3249792|1.733065
1441440|6604416|1.72774
All of the top numbers are multipliers of 2520
. This makes intuitive sense since 2520
is a superabundant number and it has been proven that a witness value, if it exists, has to be a superabundant number, and a multiplier of 2520. Maybe we can use this idea to improve the way we search?
In the next post, we will see if we can do better than indiscriminately searching every number, thereby improving the speed of our calculations by multiple orders of magnitude!