Hi! I'm Arve and this is my adventure into building my own modern CouchDB with Rust.

CouchDB is an key-value database written in Erlang, initially released in 2005. It has a simple replication protocol over HTTP, using revisions to ensure eventual consistency.

CouchDB leaves me longing for better interoperability with modern browsers. Specifically I want "real time" replication to IndexedDB, which is unpleasant with regular CouchDB. The unpleasantness is mainly due to the revision mechanism, which is fairly Erlang specific. Revisions hashes are calculated using Erlang data structures and md5, both which are not native in browsers. Of course it is possible to achieve the revision calculation with some extra libraries. Still, I think it will be a fun challenge to implement a modern CouchDB variant after the build your own x-pattern.

With no further ado, let’s start the journey and look at the objectives.

Decisions made should reflect the main goals, they are:

Easy interoperability with other programming environments.
Efficient and simple syncing.
Use browser native protocols / APIs.
Availability and Partition tolerance of CAP.

To help finishing the project, some non-goals will restrict scope and complexity:

Compatibility with CouchDB.
Writing low level code.
Exstensive server side logic, like index lookup and design documents.
Consistency of CAP.

Lets start with designing the disk storage format. The non-goal Writing low level code stears off designing a file format and using direct file access. A good alternative is SQLite. Lets set it up first.

I'll call the project sakkosekk, which is Norwegian for bean bag chair.

Bootstrapping with Cargo:

~/g/build-your-own-couchdb $ cargo init sakkosekk
     Created binary (application) package
~/g/build-your-own-couchdb $ cd sakkosekk/
~/g/b/sakkosekk $ cargo run
   Compiling sakkosekk v0.1.0 (/Users/arve/git/build-your-own-couchdb/sakkosekk)
    Finished dev [unoptimized + debuginfo] target(s) in 10.24s
     Running `target/debug/sakkosekk`
Hello, world!

Bindings for SQLite is available through the rusqlite crate:

~/g/b/sakkosekk $ echo '[dependencies]' >> Cargo.toml
~/g/b/sakkosekk $ echo 'rusqlite = { version = "0.20", features = ["bundled"] }' >> Cargo.toml

The bundled feature is enabled for hassle free sqlite3 linking.

Documents in the database will have the columns:

indentifier,
revision,
hash and
document data.

Open database:

use rusqlite::{named_params, Connection};

fn main() {
    let db = Connection::open("database.sqlite").expect("Unable to open 'database.sqlite'.");

Creating table:


# #![allow(unused_variables)]
#fn main() {
    db.execute_batch(
        "create table documents (
            id text primary key not null,
            revision integer not null,
            hash blob not null,
            data text not null
        )",
    )
    .expect("Unable to create documents table.");
#}

Inserting document:


# #![allow(unused_variables)]
#fn main() {
    db.execute_named(
        "insert into documents (id, revision, hash, data)
        values (:id, :revision, :hash, :data)",
        named_params!(
            ":id": "asdf",
            ":revision": 0,
            ":hash": vec![0u8],
            ":data": r#"{ "a": 1, "b": 123 }"#
        ),
    )
    .expect("Unable to insert document.");
#}

Reading document by the identifier:


# #![allow(unused_variables)]
#fn main() {
    let data: String = db
        .query_row_named(
            "select data from documents where id=:id",
            named_params!(":id": "asdf"),
            |row| row.get(0),
        )
        .expect("Unable to get document with id 'asdf'");

    println!("data: {}", data);
}
#}

Result:

~/g/b/sakkosekk $ cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.31s
     Running `target/debug/sakkosekk`
data: { "a": 1, "b": 123 }

Next up: Abstractions around creating the database schema, inserting documents and reading documents.

Abstract database access

In the previous chapter, we inserted and read documents directly with rusqlite. To gain cleaner code, lets abstract it away with a type and some methods.

Rust provides structs to gather data fields:


# #![allow(unused_variables)]
#fn main() {
struct Document {
    id: String,
    revision: i64,
    hash: Vec<u8>,
    data: String,
}
#}

Methods are implementet on the struct, which is more or less a copy of the previous main function:


# #![allow(unused_variables)]
#fn main() {
impl Document {
    fn create_table(db: &Connection) -> Result<(), SqliteError> {
        db.execute_batch(
            "create table documents (
            id text primary key not null,
            revision integer not null,
            hash blob not null,
            data text not null
        )",
        )
    }

    fn insert(&self, db: &Connection) -> Result<usize, SqliteError> {
        db.execute_named(
            "insert into documents (id, revision, hash, data)
        values (:id, :revision, :hash, :data)",
            named_params!(
                ":id": &self.id,
                ":revision": self.revision,
                ":hash": &self.hash,
                ":data": &self.data,
            ),
        )
    }

    fn get_by_id(id: &str, db: &Connection) -> Result<Self, SqliteError> {
        db.query_row_named(
            "select id, revision, hash, data from documents where id=:id",
            named_params!(":id": id),
            Document::row_mapper,
        )
    }

    fn row_mapper(row: &Row) -> Result<Self, SqliteError> {
        Ok(Self {
            id: row.get(0)?,
            revision: row.get(1)?,
            hash: row.get(2)?,
            data: row.get(3)?,
        })
    }
}
#}

Row and SqliteError are imported from rusqlite:


# #![allow(unused_variables)]
#fn main() {
use rusqlite::{named_params, Connection, Error as SqliteError, Row};
#}

The main function now reduces to:

fn main() {
    let db = Connection::open("database.sqlite").expect("Unable to open 'database.sqlite'.");

    Document::create_table(&db).expect("Unable to create documents table.");

    let document = Document {
        id: String::from("asdf"),
        revision: 0,
        hash: vec![0u8],
        data: String::from(r#"{ "a": 1, "b": 123 }"#),
    };

    document.insert(&db).expect("Unable to insert document.");

    let document_from_db = Document::get_by_id("asdf", &db)
        .expect("Unable to get document with id 'asdf'");

    println!("data: {}", &document_from_db.data);
}

thread 'main' panicked

Running the code gives:

~/g/b/sakkosekk $ cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.02s
     Running `target/debug/sakkosekk`
thread 'main' panicked at 'Unable to create documents table.: SqliteFailure(Error { code: Unknown, extended_code: 1 }, Some("table documents already exists"))', src/libcore/result.rs:999:5
note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
~/g/b/sakkosekk [101] $

The code assumes an empty database and fails with exit code 101.

Removing the database before running works:

~/g/b/sakkosekk (master|✚2) [101] $ rm database.sqlite && cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.02s
     Running `target/debug/sakkosekk`
data: { "a": 1, "b": 123 }

Next up: Fixing this error.

Currently, our application crashes when the database exists. Helper function that checks for existence and only create tables upon an empty database:


# #![allow(unused_variables)]
#fn main() {
fn get_db_create_if_missing(filename: &str) -> Connection {
    // Connection::open will create file if missing, check before.
    let exists = Path::new(filename).exists();

    let db = Connection::open(filename)
        .unwrap_or_else(|_| panic!(format!("Unable to open database file {}", filename)));

    if !exists {
        // create schema
        Document::create_table(&db).expect("Unable to create documents table.");
    }

    db
}
#}

Path import:


# #![allow(unused_variables)]
#fn main() {
use std::path::Path;
#}

main function simplifies to:

fn main() {
    let db = get_db_create_if_missing("database.sqlite");

    let document = Document {
        id: String::from("asdf"),
        revision: 0,
        hash: vec![0u8],
        data: String::from(r#"{ "a": 1, "b": 123 }"#),
    };

    document.insert(&db).expect("Unable to insert document.");

    let document_from_db = Document::get_by_id("asdf", &db)
        .expect("Unable to get document with id 'asdf'");

    println!("data: {}", &document_from_db.data);
}

The application still crashes, but with a different error:

~/g/b/sakkosekk $ cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.01s
     Running `target/debug/sakkosekk`
thread 'main' panicked at 'Unable to insert document.: SqliteFailure(Error { code: ConstraintViolation, extended_code: 1555 }, Some("UNIQUE constraint failed: documents.id"))', src/libcore/result.rs:999:5
note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

Next up: Refactor the main function into tests.

Cargo provides a test runner, cargo test which runs functions annotated with #[test]. Lets create a test which checks that get_db_create_if_missing does not crash if called twice, src/tests.rs:


# #![allow(unused_variables)]
#fn main() {
#[cfg(test)]
mod database {
    use crate::*;
    use std::fs::remove_file;

    #[test]
    fn creating_database_twice_should_not_fail() {
        get_db_create_if_missing("test.sqlite");
        get_db_create_if_missing("test.sqlite");
        remove_file("test.sqlite").unwrap();
    }
}
#}

Here, #[cfg(test)] tells Rust that the module should only compile when compiling tests. mod database is a grouping for the database tests.

Running the test:

~/g/b/sakkosekk $ cargo test
   Compiling sakkosekk v0.1.0 (/Users/arve/git/build-your-own-couchdb/sakkosekk)
    Finished dev [unoptimized + debuginfo] target(s) in 0.73s
     Running target/debug/deps/sakkosekk-5d572632b2e9bfcc

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

Running 0 tests? We need to add mod tests to src/main.rs, such that the tests module is found:

use rusqlite::{named_params, Connection, Error as SqliteError, Row};
use std::path::Path;

mod tests;

fn main() {
    let db = get_db_create_if_missing("database.sqlite");
...

Really run the test:

~/g/b/sakkosekk $ cargo test
   Compiling sakkosekk v0.1.0 (/Users/arve/git/build-your-own-couchdb/sakkosekk)
    Finished dev [unoptimized + debuginfo] target(s) in 1.85s
     Running target/debug/deps/sakkosekk-5d572632b2e9bfcc

running 1 test
test tests::database::creating_database_twice_should_not_fail ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

Naivly adding an insertion test:


# #![allow(unused_variables)]
#fn main() {
#[cfg(test)]
mod database {
    use crate::*;
    use std::fs::remove_file;

    const TEST_DB_FILENAME: &str = "test.sqlite";

    #[test]
    fn creating_database_twice_should_not_fail() {
        get_db_create_if_missing(TEST_DB_FILENAME);
        get_db_create_if_missing(TEST_DB_FILENAME);
        clean_up();
    }

    #[test]
    fn insertion() {
        let db = get_db_create_if_missing(TEST_DB_FILENAME);

        let document = Document {
            id: String::from("asdf"),
            revision: 0,
            hash: vec![0u8],
            data: String::from(r#"{ "a": 1, "b": 123 }"#),
        };

        document.insert(&db).expect("Unable to insert document.");

        clean_up();
    }

    fn clean_up() {
        remove_file(TEST_DB_FILENAME).unwrap();
    }
}
#}

This will fail:

~/g/b/sakkosekk (master|✚2…) $ cargo test
   Compiling sakkosekk v0.1.0 (/Users/arve/git/build-your-own-couchdb/sakkosekk)
    Finished dev [unoptimized + debuginfo] target(s) in 1.14s
     Running target/debug/deps/sakkosekk-5d572632b2e9bfcc

running 2 tests
test tests::database::insertion ... ok
test tests::database::creating_database_twice_should_not_fail ... FAILED

failures:

---- tests::database::creating_database_twice_should_not_fail stdout ----
thread 'tests::database::creating_database_twice_should_not_fail' panicked at 'Unable to create documents table.: SqliteFailure(Error { code: Unknown, extended_code: 1 }, Some("table documents already exists"))', src/libcore/result.rs:999:5
note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.


failures:
    tests::database::creating_database_twice_should_not_fail

test result: FAILED. 1 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out

error: test failed, to rerun pass '--bin sakkosekk'

The test creating_database_twice_should_not_fail fails, as tests runs in parallel. We'll use a helper function with that:

takes a filename and a test function,
creates a database connection,
runs the given test function with the created database connection,
and removes the database file.

Using the function should look like:


# #![allow(unused_variables)]
#fn main() {
with("filename.sqlite", |db| {
    // use db connection
});
// with will clean up / remove the database
#}

with implementation:


# #![allow(unused_variables)]
#fn main() {
    fn with<F>(filename: &str, test: F)
    where
        F: Fn(Connection) -> (),
    {
        let db = get_db_create_if_missing(filename);
        test(db);
        remove_file(filename).unwrap();
    }
#}

Note: An alternative with is to implement Drop for our own Connection-type.

The tests rewritten to use with:


# #![allow(unused_variables)]
#fn main() {
    #[test]
    fn creating_database_twice_should_not_fail() {
        with("creating_twice.sqlite", |_| {
            get_db_create_if_missing("creating_twice.sqlite");
        });
    }

    #[test]
    fn insertion() {
        with("insertion.sqlite", |db| {
            let document = Document {
                id: String::from("asdf"),
                revision: 0,
                hash: vec![0u8],
                data: String::from(r#"{ "a": 1, "b": 123 }"#),
            };

            document.insert(&db).expect("Unable to insert document.");
        });
    }
#}

Note that the tests use different filenames for the database.

Running the tests does not fail:

~/g/b/sakkosekk $ cargo test
    Finished dev [unoptimized + debuginfo] target(s) in 0.02s
     Running target/debug/deps/sakkosekk-5d572632b2e9bfcc

running 2 tests
test tests::database::creating_database_twice_should_not_fail ... ok
test tests::database::insertion ... ok

test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

Next up: Tests that are expected to fail, like inserting a document with the same identifier twice.

We could make our tests panic and annotate tests with #[should_panic], but as our methods return Result, the is_err method tells us that the correct method call failed.

A double insertion of the same document should fail:


# #![allow(unused_variables)]
#fn main() {
    #[test]
    fn double_insertion_should_fail() {
        with("double_insertion.sqlite", |db| {
            let document = Document {
                id: String::from("asdf"),
                revision: 0,
                hash: vec![0u8],
                data: String::from(r#"{ "a": 1, "b": 123 }"#),
            };

            document.insert(&db).expect("Unable to insert document.");
            let second_insert_result = document.insert(&db);
            assert!(second_insert_result.is_err());
        });
    }
#}

Actually, this test is wrong. In CouchDB a document can have many revisions, making id, revision and hash unique.

This fails as expected:


# #![allow(unused_variables)]
#fn main() {
    #[test]
    fn insert_multiple_revisions() {
        with("insert_multiple_revisions.sqlite", |db| {
            let insert = |revision: i64| {
                let document = Document {
                    id: String::from("asdf"),
                    revision: revision,
                    hash: vec![0u8],
                    data: String::from(r#"{ "a": 1, "b": 123 }"#),
                };

                document.insert(&db).expect("Unable to insert document.");
            };

            insert(0);
            insert(1);
        });
    }
#}

We'll fix it by removing primary key from id:


# #![allow(unused_variables)]
#fn main() {
    fn create_table(db: &Connection) -> Result<(), SqliteError> {
        db.execute_batch(
            "create table documents (
            id text not null,
            revision integer not null,
            hash blob not null,
            data text not null
        )",
        )
    }
#}

Running tests again does not seem to have fixed insert_multiple_revisions, also double_insertion_should_fail fails now:

running 4 tests
test tests::database::creating_database_twice_should_not_fail ... ok
test tests::database::insert_multiple_revisions ... FAILED
test tests::database::double_insertion_should_fail ... FAILED
test tests::database::insertion ... ok

As the tests paniced, remove_file in with never runs. Fix it by removing file before opening:


# #![allow(unused_variables)]
#fn main() {
    fn with<F>(filename: &str, test: F)
    where
        F: Fn(Connection) -> (),
    {
        remove_file(filename).unwrap_or(());
        let db = get_db_create_if_missing(filename);
        test(db);
        remove_file(filename).unwrap();
    }
#}

unwrap_or(()) ignores any errors in deleting the file.

Running tests again:

running 4 tests
test tests::database::creating_database_twice_should_not_fail ... ok
test tests::database::double_insertion_should_fail ... FAILED
test tests::database::insert_multiple_revisions ... ok
test tests::database::insertion ... ok

insert_multiple_revisions is OK, but double_insertion_should_fail is still failing. It fails as primary key constraint was removed.

Adding a unique constraint should fix id:


# #![allow(unused_variables)]
#fn main() {
    fn create_table(db: &Connection) -> Result<(), SqliteError> {
        db.execute_batch(
            "create table documents (
                id text not null,
                revision integer not null,
                hash blob not null,
                data text not null,
                unique(id, revision, hash)
            );
            ",
        )
    }
#}

Getting a missing document should fail:


# #![allow(unused_variables)]
#fn main() {
    #[test]
    fn get_by_missing_id_should_fail() {
        with("get_by_id_missing.sqlite", |db| {
            let result = Document::get_by_id("asdf", &db);
            assert!(result.is_err());
        });
    }
#}

Make sure all tests pass:

running 5 tests
test tests::database::get_by_missing_id_should_fail ... ok
test tests::database::creating_database_twice_should_not_fail ... ok
test tests::database::double_insertion_should_fail ... ok
test tests::database::insert_multiple_revisions ... ok
test tests::database::insertion ... ok

Next up: Get by id should return all revisions of document.

In Persistent storage we naively used id to look up a single document, but a document can have multiple revisions. In other words, Document::get_by_id should return a list of documents.

First, the tests are repeating themself. Currently the tests insertion, double_insertion_should_fail and insert_multiple_revisions are all repeating declaration of a document.

Move document declaration into a function:


# #![allow(unused_variables)]
#fn main() {
    fn get_document(revision: i64) -> Document {
        Document {
            id: String::from("asdf"),
            revision: revision,
            hash: vec![0u8],
            data: String::from(r#"{ "a": 1, "b": 123 }"#),
        }
    }
#}

Tests refactored to use get_document:


# #![allow(unused_variables)]
#fn main() {
    #[test]
    fn insertion() {
        with("insertion.sqlite", |db| {
            get_document(0).insert(&db).expect("Unable to insert document.");
        });
    }

    #[test]
    fn double_insertion_should_fail() {
        with("double_insertion.sqlite", |db| {
            get_document(0).insert(&db).expect("Unable to insert document.");
            let second_insert_result = get_document(0).insert(&db);
            assert!(second_insert_result.is_err());
        });
    }

    #[test]
    fn insert_multiple_revisions() {
        with("insert_multiple_revisions.sqlite", |db| {
            get_document(0).insert(&db).expect("Unable to insert document.");
            get_document(1).insert(&db).expect("Unable to insert document.");
        });
    }

#}

Now the test for Document::get_by_id, using get_document. The test should check that Document::get_by_id returns all documents:


# #![allow(unused_variables)]
#fn main() {
    #[test]
    fn get_by_id() {
        with("get_by_id.sqlite", |db| {
            get_document(0)
                .insert(&db)
                .expect("Unable to insert document.");
            get_document(1)
                .insert(&db)
                .expect("Unable to insert document.");

            let documents_from_db = Document::get_by_id("asdf", &db);

            assert!(documents_from_db == Ok(vec![get_document(0), get_document(1)]));
        });
    }
#}

This fails compiling for two reasons. Lets takle number one first; Document does not implement the method eq from the [PartialEq trait].

Error message:

error[E0369]: binary operation `==` cannot be applied to type `std::result::Result<Document, rusqlite::error::Error>`
  --> src/tests.rs:53:39
   |
53 |             assert!(documents_from_db == Ok([get_document(0), get_document(1)]));
   |                     ----------------- ^^ -------------------------------------- std::result::Result<[Document; 2], _>
   |                     |
   |                     std::result::Result<Document, rusqlite::error::Error>
   |
   = note: an implementation of `std::cmp::PartialEq` might be missing for `std::result::Result<Document, rusqlite::error::Error>`

As all fields in Document implements PartialEq, we can derive PartialEq:


# #![allow(unused_variables)]
#fn main() {
#[derive(PartialEq)]
struct Document {
    id: String,
    revision: i64,
    hash: Vec<u8>,
    data: String,
}
#}

Now, cargo tests yields the other error:

error[E0308]: mismatched types
  --> src/tests.rs:53:45
   |
53 |             assert!(documents_from_db == Ok([get_document(0), get_document(1)]));
   |                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected struct `Document`, found array of 2 elements
   |
   = note: expected type `Document`
              found type `[Document; 2]`

query_row_named only returns a single row. Statement provides query_map_named that gives more then one row. [Connection::prepare] yields a Statement.

Document::get_by_id rewritten to use a Statement:


# #![allow(unused_variables)]
#fn main() {
    fn get_by_id(id: &str, db: &Connection) -> Result<Vec<Self>, SqliteError> {
        db.prepare("select id, revision, hash, data from documents where id=:id")?
            .query_map_named(named_params!(":id": id), Document::row_mapper)?
            .collect()
    }
#}

cargo tests gives a compilation error:

error[E0609]: no field `data` on type `std::vec::Vec<Document>`
  --> src/main.rs:21:44
   |
21 |     println!("data: {}", &document_from_db.data);
   |                                            ^^^^ unknown field

As get_by_id have a new return signature and Vec does not have a data-field. Fix it by removing the line.

Running tests shows that get_by_missing_id is failing:

running 5 tests
test tests::database::creating_database_twice_should_not_fail ... ok
test tests::database::get_by_missing_id_should_fail ... FAILED
test tests::database::double_insertion_should_fail ... ok
test tests::database::insert_multiple_revisions ... ok
test tests::database::insertion ... ok

Fix the test by having it expect an empty vector:


# #![allow(unused_variables)]
#fn main() {
    #[test]
    fn get_by_missing_id_should_give_no_results() {
        with("get_by_id_missing.sqlite", |db| {
            let documents = Document::get_by_id("asdf", &db).expect("Unable to get documents.");
            assert!(documents.is_empty());
        });
    }
#}

A lot happens in get_by_id. Looking at the types can help in understanding the code.

Result<T> is here rusqlite::Result<T>, which is equivalent to std::result::Result<T, rusqlite::Error>.
prepare returns Result<Statement>.
The question mark ? translates to unwrap result or exit early with error.
? unwraps Statement.
query_map_named returns Result<MappedRows>.
? unwraps MappedRows.
MappedRows implements IntoIterator, giving us Iterator<Result<Document>>.
Iterator::collect uses the return signature, Result<Vec<Self>, SqliteError>, and unwraps Result<Document> one by one, exiting early if one of them fails.

You might have noticed compiler warnings like:

warning: unused variable: `db`
 --> src/main.rs:7:9
  |
7 |     let db = get_db_create_if_missing("database.sqlite");
  |         ^^ help: consider prefixing with an underscore: `_db`
  |
  = note: #[warn(unused_variables)] on by default

We'll fix these warnings later when using the database interface in our actual application.

Next up: Benchmarking

As you might have noticed, we did not add an index when removing the primary key. This may affect performance when looking up entries on the id column.

The criterion crate gives some nice tools for statistics-driven benchmarking.

Criterion has some known limitations. One limitation is benchmarking a binary crate, which is not possible. To overcome the limitation, we split our project into a library and a binary.

Move main.rs to lib.rs:

mv src/main.rs src/lib.rs

Remove the main-function and make get_db_create_if_missing, Document, Document-fields and some of the Document-methods public:


# #![allow(unused_variables)]
#fn main() {
use rusqlite::{named_params, Connection, Error as SqliteError, Row};
use std::path::Path;

mod tests;

pub fn get_db_create_if_missing(filename: &str) -> Connection {
    ...
}

#[derive(PartialEq)]
pub struct Document {
    pub id: String,
    pub revision: i64,
    pub hash: Vec<u8>,
    pub data: String,
}

impl Document {
    ...

    pub fn insert(&self, db: &Connection) -> Result<usize, SqliteError> {
        ...
    }

    pub fn get_by_id(id: &str, db: &Connection) -> Result<Vec<Self>, SqliteError> {
        ...
    }

    ...
}
#}

... is omitted code that have not changed.

Create a new minimal main function in src/bin/sakkosekk.rs:

use sakkosekk::get_db_create_if_missing;

fn main() {
    let db = get_db_create_if_missing("database.sqlite");
    dbg!(db);
}

Check that both cargo run and cargo test works:

~/g/b/sakkosekk $ cargo run
   Compiling sakkosekk v0.1.0 (/Users/arve/git/build-your-own-couchdb/sakkosekk)
    Finished dev [unoptimized + debuginfo] target(s) in 11.00s
     Running `target/debug/sakkosekk`
[src/bin/sakkosekk.rs:5] db = Connection {
    path: Some(
        "database.sqlite",
    ),
}
~/g/b/sakkosekk $ cargo test
   Compiling sakkosekk v0.1.0 (/Users/arve/git/build-your-own-couchdb/sakkosekk)
    Finished dev [unoptimized + debuginfo] target(s) in 2.38s
     Running target/debug/deps/sakkosekk-c97044fdfdcd0074

running 6 tests
test tests::database::creating_database_twice_should_not_fail ... ok
test tests::database::get_by_missing_id_should_give_no_results ... ok
test tests::database::double_insertion_should_fail ... ok
test tests::database::get_by_id ... ok
test tests::database::insertion ... ok
test tests::database::insert_multiple_revisions ... ok

test result: ok. 6 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/debug/deps/sakkosekk-7673159c8931cebe

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

   Doc-tests sakkosekk

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

Add criterion as a development dependency and define a benchmark in Cargo.toml:

[dev-dependencies]
criterion = "0.2"

[[bench]]
name = "database"
harness = false

name = "database" must match the filename, make the benchmark as the file benches/database.rs:


# #![allow(unused_variables)]
#fn main() {
use criterion::{criterion_group, criterion_main, Criterion};
use sakkosekk::{get_db_create_if_missing, Document};
use rusqlite::{Connection};
use std::fs::remove_file;

criterion_group!(benches, benchmark);
criterion_main!(benches);

fn benchmark(c: &mut Criterion) {
    c.bench_function("get documents by id", |b| {
        let db = BenchDatabase::new("bench_get_documents_by_id.sqlite");
        let doc = Document::get_by_id("bench100", &db.connection).unwrap();
        assert!(doc.len() == 1);

        b.iter(|| Document::get_by_id("bench100", &db.connection))
    });
}

struct BenchDatabase {
    filename: &'static str,
    connection: Connection,
}

impl BenchDatabase {
    fn new(filename: &'static str) -> Self {
        remove_file(filename).unwrap_or(());
        let mut connection = get_db_create_if_missing(filename);
        let transaction = connection.transaction().unwrap();

        for i in 0..1000 {
            let document = Document {
                id: format!("bench{}", i),
                revision: 0,
                hash: vec![],
                data: format!(r#"{{ "number": {} }}"#, i)
            };
            document.insert(&transaction).unwrap();
        }

        transaction.commit().unwrap();

        Self { filename, connection }
    }
}
#}

c.bench_function takes a setup function. Latest statement in bench_function is b.iter. Criterion will call the function given to b.iter in a tight loop and collect timing statistics.

Run it with cargo bench:

get documents by id     time:   [25.567 us 25.989 us 26.485 us]

Lets try adding an index on id and see results of benchmark again.

Adding index index on id in lib.rs:


# #![allow(unused_variables)]
#fn main() {
    fn create_table(db: &Connection) -> Result<(), SqliteError> {
        db.execute_batch(
            "create table documents (
                id text not null,
                revision integer not null,
                hash blob not null,
                data text not null,
                unique(id, revision, hash)
            );
            create index documents_id_idx on documents(id);
            ",
        )
    }
#}

Run cargo bench to review benchmark:

get documents by id     time:   [26.169 us 26.702 us 27.417 us]
                        change: [-0.4154% +2.4533% +5.1353%] (p = 0.10 > 0.05)
                        No change in performance detected.

No change in performance detected. Lets investigate with the SQLite REPL:

~/g/b/sakkosekk $ sqlite3 bench_get_documents_by_id.sqlite
SQLite version 3.24.0 2018-06-04 14:10:15
Enter ".help" for usage hints.
sqlite> EXPLAIN QUERY PLAN select * from documents where id='bench100';
QUERY PLAN
`--SEARCH TABLE documents USING INDEX documents_id_idx (id=?)
sqlite> drop index documents_id_idx;
sqlite> EXPLAIN QUERY PLAN select * from documents where id='bench100';
QUERY PLAN
`--SEARCH TABLE documents USING INDEX sqlite_autoindex_documents_1 (id=?)
sqlite>

SQLite actually creates an autoindex, so we see no performance gain when creating the index. A search in the SQLite query optimizer documentation reveals some details:

In SQLite version 3.8.0 (2013-08-26) and later, an SQLITE_WARNING_AUTOINDEX message is sent to the error log every time a statement is prepared that uses an automatic index. Application developers can and should use these warnings to identify the need for new persistent indices in the schema.

Following the recomendation, I'll leave the definition of the index.

A common performance recommandation for SQLite is write-ahead logging.

Add Enabling WAL-mode:


# #![allow(unused_variables)]
#fn main() {
    ...

    if !exists {
        // create schema
        Document::create_table(&db).expect("Unable to create documents table.");

        enable_write_ahead_logging(&db);
    }

    db
}

fn enable_write_ahead_logging(db: &Connection) {
    // PRAGMA journal_mode=wal;
    let result: String = db
        .pragma_update_and_check(None, "journal_mode", &"wal", |row| row.get(0))
        .unwrap();
    assert!("wal" == &result);
}
#}

As before, we use cargo bench to assess performance:

get documents by id     time:   [13.426 us 13.583 us 13.759 us]
                        change: [-51.227% -49.928% -48.677%] (p = 0.00 < 0.05)
                        Performance has improved.

2x speedup! 14 microseconds gives a decent upper limit for fetches, 1s / 14us ≈ 70k fetches per second on this hardware (2013 macbook air with 250 GB SSD). I'm happy with that, so let's continue with adding a HTTP-layer in front of the disk persistence.

Next up: Adding HTTP.

Build your own CouchDB

Build your own CouchDB

What is CouchDB?

Why I'm building my own?

Objectives

Goals

Non goals

Persistent storage

Bootstrapping the project

Adding SQLite

Database schema

Abstract database access

Data fields

Document methods

Using the `Document` data type

thread 'main' panicked

Create database only when missing

Testing introduction

Insertion

Tests that are expected to fail

Insertion failure

Insertion failure only when same revision

Get non-existent document

Getting all revisions of document

Refactoring tests

Get document by identifier test

Understanding `get_by_id`

Note on compiler warnings

Benchmarking

Splitting into library and binary

Creating our first benchmark

Enabling write-ahead logging (WAL)

Build your own CouchDB

Build your own CouchDB

What is CouchDB?

Why I'm building my own?

Objectives

Goals

Non goals

Persistent storage

Bootstrapping the project

Adding SQLite

Database schema

Abstract database access

Data fields

Document methods

Using the Document data type

thread 'main' panicked

Create database only when missing

Testing introduction

Insertion

Tests that are expected to fail

Insertion failure

Insertion failure only when same revision

Get non-existent document

Getting all revisions of document

Refactoring tests

Get document by identifier test

Understanding get_by_id

Note on compiler warnings

Benchmarking

Splitting into library and binary

Creating our first benchmark

Enabling write-ahead logging (WAL)

Using the `Document` data type

Understanding `get_by_id`