# Interleaving Computations With Iterators

## Prime Number Factorization in Rust

Finding prime numbers and factorizing numbers into their prime components is one
of my *stock programs*: Whenever I learn a new programming language, I try to
solve the following problems:

- Given a number
`n`

, find all prime numbers in the interval`[2;n]`

.- Example: The prime numbers up to
`20`

are`[2, 3, 5, 7, 11, 13, 17, 19]`

.

- Example: The prime numbers up to
- Given a number
`x`

, factorize`x`

into its prime components.- Example: The number
`234`

can be factorized into`[2, 3, 3, 13]`

.

- Example: The number

The two problems are related: In order to find the prime factors of a number,
one needs to find the prime numbers that could possibly make up its factors. In
order to find the prime factors of `x`

, a naive approach would be to first find
all prime numbers up to `x`

â€”or up to the square root of `x`

as an optimization.
(See my explanation of the Trial Division
Method
for a rationale; I’ve written this paper when I implemented the same stock
program concurrently in Elixir.) So let’s do what *Rustaceans* do: re-implement
it in Rust!

# Re-implementing `factor`

in Rust

The `factor(1)`

program, which probably is written in Câ€”and therefore
compulsively needs to be rewritten in Rustâ€”factorizes the given number:

```
$ factor 234
234: 2 3 3 13
```

Let’s start with a new Cargo project:

```
$ cargo new factor
Created binary (application) `factor` package
```

The `Hello, World!`

program shall be replaced with this boiler-plate code:

```
fn main() {
let mut args = std::env::args();
args.next().expect("skipping the executable name");
let argument = args.next().expect("number argument missing");
let number: u64 = argument.parse().expect("argument is not a number");
println!("{}", number);
}
```

This allows us to parse a single command line argument as a number. The program panics (i.e. exits) if no positive number was provided.

With this out of the way, let’s find the prime numbers up to `number`

!

## Prime Sieve

The Prime Sieve of Eratosthenes finds prime numbers up to `n`

by only trying to
divide `i`

within `[2;n]`

for values of `i`

that are prime numbers themselves.
(Rationale: if a number is divisible by `4`

, it is also divisible by `2`

, and
therefore checking `2`

is enough.)

Let’s implement this algorithm:

```
fn prime_sieve(n: u64) -> Vec<u64> {
let mut primes = Vec::new();
for i in 2..=n {
if !primes.iter().any(|p| i % p == 0) {
primes.push(i);
}
}
primes
}
```

The numbers from `2`

(which is the smallest prime number) up to *and including*
`n`

are processed in a `for`

loop. For every number `i`

in the interval `[2;n]`

it is checked whether or not it is divisible by any prime number already found.
If it is *not* divisible, `i`

is a prime number, and thus added to the vector of
prime numbers.

A little testing won’t hurt:

```
mod tests {
use super::*;
#[test]
fn prime_sieve_up_to_20() {
assert_eq!(prime_sieve(20), vec![2, 3, 5, 7, 11, 13, 17, 19]);
}
}
```

## Prime Factorization: Trial Division Method

Given a number `n`

to factorize, and having found the prime numbers up to `n`

(or up to the square root of `n`

, for that matter), we can implement the Trial
Division Method for prime factorization:

```
fn factorize(n: u64) -> Vec<u64> {
let n_sqrt = (n as f64).sqrt().ceil() as u64;
let primes = prime_sieve(n_sqrt);
let mut factors = Vec::new();
let mut n = n;
let mut i = 0;
while n > 1 {
let prime = primes[i];
if n % prime == 0 {
factors.push(prime);
n /= prime;
} else {
i += 1;
}
}
factors
}
```

First, the prime numbers up to the square root of `n`

are computed using
the `prime_sieve`

function just written.

Second, the trial division is repeated as long as the number `n`

(shadowed with
a mutable version) is bigger than 1, at which point the factorization is
finished. The prime numbers are picked one by one. As long as a prime number
divides `n`

without a remanider, the prime number is a prime factors and pushed
onto `factors`

as such, and `n`

becomes the remainder of dividing it by the
current prime number. Otherwise, the next prime number is tried, until a prime
number is reached that divides the remainder of `n`

to 1, at which point the
loop terminates.

This test case demonstrates that `factors`

works for `n=234`

:

```
#[test]
fn factorize_234() {
assert_eq!(factorize(234), vec![2, 3, 3, 13]);
}
```

However, the following test fails:

```
#[test]
fn factorize_39() {
assert_eq!(factorize(39), vec![3, 13]);
}
```

The algorithm runs out of prime numbers and fails to use the remainder `13`

as a
prime factor:

```
thread 'tests::factorize_39' panicked at src/main.rs:58:27:
index out of bounds: the len is 4 but the index is 4
```

The logic could be slightly modified by using an `Iterator`

instead of the index
variable `i`

:

```
fn factorize(n: u64) -> Vec<u64> {
let n_sqrt = (n as f64).sqrt().ceil() as u64;
let primes = prime_sieve(n_sqrt);
let mut primes = primes.iter();
let mut factors = Vec::new();
let mut n = n;
let mut prime = match primes.next() {
Some(p) => p,
None => {
return vec![n];
}
};
while n > 1 {
if n % prime == 0 {
factors.push(*prime);
n /= prime;
} else {
prime = match primes.next() {
Some(p) => p,
None => {
factors.push(n);
break;
}
}
}
}
factors
}
```

The initial `primes`

vector is shadowed by a (mutable) iterator over those prime
numbers. Since our primes `Iterator`

returns an `Option<u64>`

instead of a plain
`u64`

, we make sure to handle the case of running out of prime numbers properly.
(In this case, the remainder `n`

is the last factor to be added.)

The code hasn’t gotten any shorter, quite the opposite, but the `factorize`

function no longer violates index boundsâ€”and is now ready to make use of an
`Iterator`

of prime numbers, which will come in very handy soonâ€¦

## Minible Viable `factor`

Let’s extend the main function to replicate the core functionality of `factor`

by making use of `factorize`

:

```
fn main() {
let mut args = std::env::args();
args.next().expect("skipping the executable name");
let argument = args.next().expect("number argument missing");
let number: u64 = argument.parse().expect("argument is not a number");
let factors = factorize(number);
let factors = factors
.iter()
.map(|x| x.to_string())
.collect::<Vec<String>>()
.join(" ");
println!("{}: {}", number, factors);
}
```

Our `factor`

implementation now can be used from the command line (discarding
compiler output):

```
$ cargo run -- 234 2>/dev/null
234: 2 3 3 13
```

*Rewriting it in Rust* should not lead to a performance penalty. However, our
re-implementation works visibly slower than `factor`

with big numbers (e.g. one
billion), even with the `release`

build profile:

```
$ time factor 1000000000
real 0m0.002s
user 0m0.000s
sys 0m0.002s
$ cargo build --release && time target/release/factor 1000000000
1000000000: 2 2 2 2 2 2 2 2 2 5 5 5 5 5 5 5 5 5
real 0m0.024s
user 0m0.020s
sys 0m0.004s
```

Stripping down the `main`

function into a prime number finder might give a hint:

```
fn main() {
let mut args = std::env::args();
args.next().expect("skipping the executable name");
let argument = args.next().expect("number argument missing");
let number: u64 = argument.parse().expect("argument is not a number");
let n_sqrt = (number as f64).sqrt().ceil() as u64;
println!("{} prime numbers computed", prime_sieve(n_sqrt).len());
}
```

Computing the prime numbers takes up most of the time:

```
$ cargo build --release && time target/release/factor 1000000000
3401 prime numbers computed
real 0m0.027s
user 0m0.023s
sys 0m0.003s
```

However, only the *two* prime numbers `2`

and `5`

are actually used for this
particular input. This is a waste of computing power!

Let’s make it faster by being *lazy*.

# Iterators are Rust’s (Lazy) Streams

Instead of first finding *all* the prime numbers that possibly will be used and
only then starting the factorization process, the prime numbers could be just
computed on demand, i.e. when the next one is actually needed. In functional
programming languages such as Elixir, this is achieved using a *Stream*, which
is a virtually endless collection that computes the next value as needed. (For
details, see my Elixir
implementation
of the prime sieve.)

Rust has a similar concept: Iterators. An `Iterator`

has an internal state,
which is advanced as the `next()`

method is called on it. This returns an
`Option`

, i.e. either the next value, or `None`

if all the elements have been
consumed and the `Iterator`

is exhausted. For prime numbers, the latter case
won’t happen, at least not within the `u64`

value range.

So let’s implement an `Iterator`

for finding prime numbers!

## An Iterator for Prime Numbers

First, we need a data structure that holds the internal state of the iteration.
For the prime sieve, this is the last number that has been checked for
primality (`i`

), and the prime numbers found so far (`primes`

):

```
struct PrimeIterator {
i: u64,
primes: Vec<u64>,
}
```

For convenience, a `new()`

associated method is provided, which initializes the
iteration at the number `2`

, which is the first prime number:

```
impl PrimeIterator {
fn new() -> Self {
PrimeIterator {
i: 2,
primes: Vec::new(),
}
}
}
```

The vector of prime numbers is initially empty. (The current value of `i`

, which
is `2`

, will be moved into `primes`

upon the first iteration.)

To implement the `Iterator`

trait, the data type of the `Item`

and the `next()`

method need to be defined:

```
impl Iterator for PrimeIterator {
type Item = u64;
fn next(&mut self) -> Option<u64> {
let mut i = self.i;
loop {
if !self.primes.iter().any(|p| i % p == 0) {
self.primes.push(i);
self.i = i + 1;
return Some(i);
}
i += 1;
}
}
}
```

We use `u64`

values, and `next()`

returns an `Option<u64>`

. First, the current
value of `i`

is grabbed from `self`

. Then, `i`

is checked for divisibility
against the prime numbers found so far. If no prime number divides it, a new
prime number is found, which is added to the `primes`

vector. The variable `i`

is advanced by one, so that the next iteration starts with the its successor.
The found prime number is returned.

This test case demonstrates how the `PrimeIterator`

can be used:

```
#[test]
fn prime_iterator() {
let mut iter = PrimeIterator::new();
assert_eq!(iter.next(), Some(2));
assert_eq!(iter.next(), Some(3));
assert_eq!(iter.next(), Some(5));
assert_eq!(iter.next(), Some(7));
assert_eq!(iter.next(), Some(11));
assert_eq!(iter.next(), Some(13));
assert_eq!(iter.next(), Some(17));
assert_eq!(iter.next(), Some(19));
}
```

It would be possible to define an upper limit `n`

for prime numbers, but it only
complicates matters and is therefore left as an exercise for the reader.

## Factorizing Faster with `PrimeIterator`

In `factorize`

, we simply replace the `Iterator`

acquired from the returned
vector of `prime_sieve`

with a `PrimeIterator`

:

```
fn factorize(n: u64) -> Vec<u64> {
let mut primes = PrimeIterator::new();
let mut factors = Vec::new();
let mut n = n;
let mut prime = match primes.next() {
Some(p) => p,
None => {
return vec![n];
}
};
while n > 1 {
if n % prime == 0 {
factors.push(prime);
n /= prime;
} else {
prime = match primes.next() {
Some(p) => p,
None => {
factors.push(n);
return factors;
}
}
}
}
factors
}
```

Let’s retry our `factor`

re-implementation with the input from before, i.e. with
one billion:

```
$ cargo build --release && time target/release/factor 1000000000
1000000000: 2 2 2 2 2 2 2 2 2 5 5 5 5 5 5 5 5 5
real 0m0.001s
user 0m0.001s
sys 0m0.000s
```

Notice how the speedup vanishes as other inputs require more prime numbers to be computed (using one billion and one):

```
$ cargo build --release && time target/release/factor 1000000001
1000000001: 7 11 13 19 52579
real 0m0.073s
user 0m0.069s
sys 0m0.004s
```

However, this is still fast enough, expecially compared to the timings yielded by the concurrent Elixir implementation.

# Conclusion

Iterators allow us to interleave two stateful computations without mixing up their code. The concernsâ€”finding prime numbers, factorizing a number into its prime componentsâ€”remain separated and can be tested independently. The state of the iterator is only advanced as much as needed, which avoids unnecessary computations and, compared to the original computation in advance, speeds up the program considerably (depending on the input).

The final code can be found in my factor repository.