🏷️ tech 🏷️ rust 🏷️ programming

📆 2024-04-11

Interleaving Computations With Iterators

Prime Number Factorization in Rust

Finding prime numbers and factorizing numbers into their prime components is one of my stock programs: Whenever I learn a new programming language, I try to solve the following problems:

Given a number n, find all prime numbers in the interval [2;n].
- Example: The prime numbers up to 20 are [2, 3, 5, 7, 11, 13, 17, 19].
Given a number x, factorize x into its prime components.
- Example: The number 234 can be factorized into [2, 3, 3, 13].

The two problems are related: In order to find the prime factors of a number, one needs to find the prime numbers that could possibly make up its factors. In order to find the prime factors of x, a naive approach would be to first find all prime numbers up to x—or up to the square root of x as an optimization. (See my explanation of the Trial Division Method for a rationale; I’ve written this paper when I implemented the same stock program concurrently in Elixir.) So let’s do what Rustaceans do: re-implement it in Rust!

Re-implementing `factor` in Rust

The factor(1) program, which probably is written in C—and therefore compulsively needs to be rewritten in Rust—factorizes the given number:

$ factor 234
234: 2 3 3 13

Let’s start with a new Cargo project:

$ cargo new factor
     Created binary (application) `factor` package

The Hello, World! program shall be replaced with this boiler-plate code:

fn main() {
    let mut args = std::env::args();
    args.next().expect("skipping the executable name");
    let argument = args.next().expect("number argument missing");
    let number: u64 = argument.parse().expect("argument is not a number");
    println!("{}", number);
}

This allows us to parse a single command line argument as a number. The program panics (i.e. exits) if no positive number was provided.

With this out of the way, let’s find the prime numbers up to number!

Prime Sieve

The Prime Sieve of Eratosthenes finds prime numbers up to n by only trying to divide i within [2;n] for values of i that are prime numbers themselves. (Rationale: if a number is divisible by 4, it is also divisible by 2, and therefore checking 2 is enough.)

Let’s implement this algorithm:

fn prime_sieve(n: u64) -> Vec<u64> {
    let mut primes = Vec::new();
    for i in 2..=n {
        if !primes.iter().any(|p| i % p == 0) {
            primes.push(i);
        }
    }
    primes
}

The numbers from 2 (which is the smallest prime number) up to and including n are processed in a for loop. For every number i in the interval [2;n] it is checked whether or not it is divisible by any prime number already found. If it is not divisible, i is a prime number, and thus added to the vector of prime numbers.

A little testing won’t hurt:

mod tests {
    use super::*;

    #[test]
    fn prime_sieve_up_to_20() {
        assert_eq!(prime_sieve(20), vec![2, 3, 5, 7, 11, 13, 17, 19]);
    }
}

Prime Factorization: Trial Division Method

Given a number n to factorize, and having found the prime numbers up to n (or up to the square root of n, for that matter), we can implement the Trial Division Method for prime factorization:

fn factorize(n: u64) -> Vec<u64> {
    let n_sqrt = (n as f64).sqrt().ceil() as u64;
    let primes = prime_sieve(n_sqrt);
    let mut factors = Vec::new();
    let mut n = n;
    let mut i = 0;
    while n > 1 {
        let prime = primes[i];
        if n % prime == 0 {
            factors.push(prime);
            n /= prime;
        } else {
            i += 1;
        }
    }
    factors
}

First, the prime numbers up to the square root of n are computed using the prime_sieve function just written.

Second, the trial division is repeated as long as the number n (shadowed with a mutable version) is bigger than 1, at which point the factorization is finished. The prime numbers are picked one by one. As long as a prime number divides n without a remanider, the prime number is a prime factors and pushed onto factors as such, and n becomes the remainder of dividing it by the current prime number. Otherwise, the next prime number is tried, until a prime number is reached that divides the remainder of n to 1, at which point the loop terminates.

This test case demonstrates that factors works for n=234:

#[test]
fn factorize_234() {
    assert_eq!(factorize(234), vec![2, 3, 3, 13]);
}

However, the following test fails:

#[test]
fn factorize_39() {
    assert_eq!(factorize(39), vec![3, 13]);
}

The algorithm runs out of prime numbers and fails to use the remainder 13 as a prime factor:

thread 'tests::factorize_39' panicked at src/main.rs:58:27:
index out of bounds: the len is 4 but the index is 4

The logic could be slightly modified by using an Iterator instead of the index variable i:

fn factorize(n: u64) -> Vec<u64> {
    let n_sqrt = (n as f64).sqrt().ceil() as u64;
    let primes = prime_sieve(n_sqrt);
    let mut primes = primes.iter();
    let mut factors = Vec::new();
    let mut n = n;
    let mut prime = match primes.next() {
        Some(p) => p,
        None => {
            return vec![n];
        }
    };
    while n > 1 {
        if n % prime == 0 {
            factors.push(*prime);
            n /= prime;
        } else {
            prime = match primes.next() {
                Some(p) => p,
                None => {
                    factors.push(n);
                    break;
                }
            }
        }
    }
    factors
}

The initial primes vector is shadowed by a (mutable) iterator over those prime numbers. Since our primes Iterator returns an Option<u64> instead of a plain u64, we make sure to handle the case of running out of prime numbers properly. (In this case, the remainder n is the last factor to be added.)

The code hasn’t gotten any shorter, quite the opposite, but the factorize function no longer violates index bounds—and is now ready to make use of an Iterator of prime numbers, which will come in very handy soon…

Minible Viable `factor`

Let’s extend the main function to replicate the core functionality of factor by making use of factorize:

fn main() {
    let mut args = std::env::args();
    args.next().expect("skipping the executable name");
    let argument = args.next().expect("number argument missing");
    let number: u64 = argument.parse().expect("argument is not a number");
    let factors = factorize(number);
    let factors = factors
        .iter()
        .map(|x| x.to_string())
        .collect::<Vec<String>>()
        .join(" ");
    println!("{}: {}", number, factors);
}

Our factor implementation now can be used from the command line (discarding compiler output):

$ cargo run -- 234 2>/dev/null
234: 2 3 3 13

Rewriting it in Rust should not lead to a performance penalty. However, our re-implementation works visibly slower than factor with big numbers (e.g. one billion), even with the release build profile:

$ time factor 1000000000
real	0m0.002s
user	0m0.000s
sys	    0m0.002s

$ cargo build --release && time target/release/factor 1000000000
1000000000: 2 2 2 2 2 2 2 2 2 5 5 5 5 5 5 5 5 5

real	0m0.024s
user	0m0.020s
sys	    0m0.004s

Stripping down the main function into a prime number finder might give a hint:

fn main() {
    let mut args = std::env::args();
    args.next().expect("skipping the executable name");
    let argument = args.next().expect("number argument missing");
    let number: u64 = argument.parse().expect("argument is not a number");

    let n_sqrt = (number as f64).sqrt().ceil() as u64;
    println!("{} prime numbers computed", prime_sieve(n_sqrt).len());
}

Computing the prime numbers takes up most of the time:

$ cargo build --release && time target/release/factor 1000000000
3401 prime numbers computed

real	0m0.027s
user	0m0.023s
sys	    0m0.003s

However, only the two prime numbers 2 and 5 are actually used for this particular input. This is a waste of computing power!

Let’s make it faster by being lazy.

Iterators are Rust’s (Lazy) Streams

Instead of first finding all the prime numbers that possibly will be used and only then starting the factorization process, the prime numbers could be just computed on demand, i.e. when the next one is actually needed. In functional programming languages such as Elixir, this is achieved using a Stream, which is a virtually endless collection that computes the next value as needed. (For details, see my Elixir implementation of the prime sieve.)

Rust has a similar concept: Iterators. An Iterator has an internal state, which is advanced as the next() method is called on it. This returns an Option, i.e. either the next value, or None if all the elements have been consumed and the Iterator is exhausted. For prime numbers, the latter case won’t happen, at least not within the u64 value range.

So let’s implement an Iterator for finding prime numbers!

An Iterator for Prime Numbers

First, we need a data structure that holds the internal state of the iteration. For the prime sieve, this is the last number that has been checked for primality (i), and the prime numbers found so far (primes):

struct PrimeIterator {
    i: u64,
    primes: Vec<u64>,
}

For convenience, a new() associated method is provided, which initializes the iteration at the number 2, which is the first prime number:

impl PrimeIterator {
    fn new() -> Self {
        PrimeIterator {
            i: 2,
            primes: Vec::new(),
        }
    }
}

The vector of prime numbers is initially empty. (The current value of i, which is 2, will be moved into primes upon the first iteration.)

To implement the Iterator trait, the data type of the Item and the next() method need to be defined:

impl Iterator for PrimeIterator {
    type Item = u64;

    fn next(&mut self) -> Option<u64> {
        let mut i = self.i;
        loop {
            if !self.primes.iter().any(|p| i % p == 0) {
                self.primes.push(i);
                self.i = i + 1;
                return Some(i);
            }
            i += 1;
        }
    }
}

We use u64 values, and next() returns an Option<u64>. First, the current value of i is grabbed from self. Then, i is checked for divisibility against the prime numbers found so far. If no prime number divides it, a new prime number is found, which is added to the primes vector. The variable i is advanced by one, so that the next iteration starts with the its successor. The found prime number is returned.

This test case demonstrates how the PrimeIterator can be used:

#[test]
fn prime_iterator() {
    let mut iter = PrimeIterator::new();
    assert_eq!(iter.next(), Some(2));
    assert_eq!(iter.next(), Some(3));
    assert_eq!(iter.next(), Some(5));
    assert_eq!(iter.next(), Some(7));
    assert_eq!(iter.next(), Some(11));
    assert_eq!(iter.next(), Some(13));
    assert_eq!(iter.next(), Some(17));
    assert_eq!(iter.next(), Some(19));
}

It would be possible to define an upper limit n for prime numbers, but it only complicates matters and is therefore left as an exercise for the reader.

Factorizing Faster with `PrimeIterator`

In factorize, we simply replace the Iterator acquired from the returned vector of prime_sieve with a PrimeIterator:

fn factorize(n: u64) -> Vec<u64> {
    let mut primes = PrimeIterator::new();
    let mut factors = Vec::new();
    let mut n = n;
    let mut prime = match primes.next() {
        Some(p) => p,
        None => {
            return vec![n];
        }
    };
    while n > 1 {
        if n % prime == 0 {
            factors.push(prime);
            n /= prime;
        } else {
            prime = match primes.next() {
                Some(p) => p,
                None => {
                    factors.push(n);
                    return factors;
                }
            }
        }
    }
    factors
}

Let’s retry our factor re-implementation with the input from before, i.e. with one billion:

$ cargo build --release && time target/release/factor 1000000000
1000000000: 2 2 2 2 2 2 2 2 2 5 5 5 5 5 5 5 5 5

real	0m0.001s
user	0m0.001s
sys	    0m0.000s

Notice how the speedup vanishes as other inputs require more prime numbers to be computed (using one billion and one):

$ cargo build --release && time target/release/factor 1000000001
1000000001: 7 11 13 19 52579

real	0m0.073s
user	0m0.069s
sys     0m0.004s

However, this is still fast enough, expecially compared to the timings yielded by the concurrent Elixir implementation.

Conclusion

Iterators allow us to interleave two stateful computations without mixing up their code. The concerns—finding prime numbers, factorizing a number into its prime components—remain separated and can be tested independently. The state of the iterator is only advanced as much as needed, which avoids unnecessary computations and, compared to the original computation in advance, speeds up the program considerably (depending on the input).

The final code can be found in my factor repository.