Build your own CouchDB
Hi! I'm Arve and this is my adventure into building my own modern CouchDB with Rust.
What is CouchDB?
CouchDB is an key-value database written in Erlang, initially released in 2005. It has a simple replication protocol over HTTP, using revisions to ensure eventual consistency.
Why I'm building my own?
CouchDB leaves me longing for better interoperability with modern browsers. Specifically I want "real time" replication to IndexedDB, which is unpleasant with regular CouchDB. The unpleasantness is mainly due to the revision mechanism, which is fairly Erlang specific. Revisions hashes are calculated using Erlang data structures and md5, both which are not native in browsers. Of course it is possible to achieve the revision calculation with some extra libraries. Still, I think it will be a fun challenge to implement a modern CouchDB variant after the build your own x-pattern.
With no further ado, let’s start the journey and look at the objectives.
Objectives
Decisions made should reflect the main goals, they are:
Goals
- Easy interoperability with other programming environments.
- Efficient and simple syncing.
- Use browser native protocols / APIs.
- Availability and Partition tolerance of CAP.
To help finishing the project, some non-goals will restrict scope and complexity:
Non goals
- Compatibility with CouchDB.
- Writing low level code.
- Exstensive server side logic, like index lookup and design documents.
- Consistency of CAP.
Persistent storage
Lets start with designing the disk storage format. The non-goal Writing low level code stears off designing a file format and using direct file access. A good alternative is SQLite. Lets set it up first.
Bootstrapping the project
I'll call the project sakkosekk, which is Norwegian for bean bag chair.
Bootstrapping with Cargo:
~/g/build-your-own-couchdb $ cargo init sakkosekk
Created binary (application) package
~/g/build-your-own-couchdb $ cd sakkosekk/
~/g/b/sakkosekk $ cargo run
Compiling sakkosekk v0.1.0 (/Users/arve/git/build-your-own-couchdb/sakkosekk)
Finished dev [unoptimized + debuginfo] target(s) in 10.24s
Running `target/debug/sakkosekk`
Hello, world!
Adding SQLite
Bindings for SQLite is available through the rusqlite crate:
~/g/b/sakkosekk $ echo '[dependencies]' >> Cargo.toml
~/g/b/sakkosekk $ echo 'rusqlite = { version = "0.20", features = ["bundled"] }' >> Cargo.toml
The bundled feature is enabled for hassle free sqlite3 linking.
Database schema
Documents in the database will have the columns:
- indentifier,
- revision,
- hash and
- document data.
Open database:
use rusqlite::{named_params, Connection}; fn main() { let db = Connection::open("database.sqlite").expect("Unable to open 'database.sqlite'.");
Creating table:
# #![allow(unused_variables)] #fn main() { db.execute_batch( "create table documents ( id text primary key not null, revision integer not null, hash blob not null, data text not null )", ) .expect("Unable to create documents table."); #}
Inserting document:
# #![allow(unused_variables)] #fn main() { db.execute_named( "insert into documents (id, revision, hash, data) values (:id, :revision, :hash, :data)", named_params!( ":id": "asdf", ":revision": 0, ":hash": vec![0u8], ":data": r#"{ "a": 1, "b": 123 }"# ), ) .expect("Unable to insert document."); #}
Reading document by the identifier:
# #![allow(unused_variables)] #fn main() { let data: String = db .query_row_named( "select data from documents where id=:id", named_params!(":id": "asdf"), |row| row.get(0), ) .expect("Unable to get document with id 'asdf'"); println!("data: {}", data); } #}
Result:
~/g/b/sakkosekk $ cargo run
Finished dev [unoptimized + debuginfo] target(s) in 0.31s
Running `target/debug/sakkosekk`
data: { "a": 1, "b": 123 }
Next up: Abstractions around creating the database schema, inserting documents and reading documents.
Abstract database access
In the previous chapter, we inserted and read documents directly with rusqlite. To gain cleaner code, lets abstract it away with a type and some methods.
Data fields
Rust provides structs to gather data fields:
# #![allow(unused_variables)] #fn main() { struct Document { id: String, revision: i64, hash: Vec<u8>, data: String, } #}
Document methods
Methods are implementet on the struct, which is more or less a copy of the previous main
function:
# #![allow(unused_variables)] #fn main() { impl Document { fn create_table(db: &Connection) -> Result<(), SqliteError> { db.execute_batch( "create table documents ( id text primary key not null, revision integer not null, hash blob not null, data text not null )", ) } fn insert(&self, db: &Connection) -> Result<usize, SqliteError> { db.execute_named( "insert into documents (id, revision, hash, data) values (:id, :revision, :hash, :data)", named_params!( ":id": &self.id, ":revision": self.revision, ":hash": &self.hash, ":data": &self.data, ), ) } fn get_by_id(id: &str, db: &Connection) -> Result<Self, SqliteError> { db.query_row_named( "select id, revision, hash, data from documents where id=:id", named_params!(":id": id), Document::row_mapper, ) } fn row_mapper(row: &Row) -> Result<Self, SqliteError> { Ok(Self { id: row.get(0)?, revision: row.get(1)?, hash: row.get(2)?, data: row.get(3)?, }) } } #}
Row
and SqliteError
are imported from rusqlite:
# #![allow(unused_variables)] #fn main() { use rusqlite::{named_params, Connection, Error as SqliteError, Row}; #}
Using the Document
data type
The main function now reduces to:
fn main() { let db = Connection::open("database.sqlite").expect("Unable to open 'database.sqlite'."); Document::create_table(&db).expect("Unable to create documents table."); let document = Document { id: String::from("asdf"), revision: 0, hash: vec![0u8], data: String::from(r#"{ "a": 1, "b": 123 }"#), }; document.insert(&db).expect("Unable to insert document."); let document_from_db = Document::get_by_id("asdf", &db) .expect("Unable to get document with id 'asdf'"); println!("data: {}", &document_from_db.data); }
thread 'main' panicked
Running the code gives:
~/g/b/sakkosekk $ cargo run
Finished dev [unoptimized + debuginfo] target(s) in 0.02s
Running `target/debug/sakkosekk`
thread 'main' panicked at 'Unable to create documents table.: SqliteFailure(Error { code: Unknown, extended_code: 1 }, Some("table documents already exists"))', src/libcore/result.rs:999:5
note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
~/g/b/sakkosekk [101] $
The code assumes an empty database and fails with exit code 101.
Removing the database before running works:
~/g/b/sakkosekk (master|✚2) [101] $ rm database.sqlite && cargo run
Finished dev [unoptimized + debuginfo] target(s) in 0.02s
Running `target/debug/sakkosekk`
data: { "a": 1, "b": 123 }
Next up: Fixing this error.
Create database only when missing
Currently, our application crashes when the database exists. Helper function that checks for existence and only create tables upon an empty database:
# #![allow(unused_variables)] #fn main() { fn get_db_create_if_missing(filename: &str) -> Connection { // Connection::open will create file if missing, check before. let exists = Path::new(filename).exists(); let db = Connection::open(filename) .unwrap_or_else(|_| panic!(format!("Unable to open database file {}", filename))); if !exists { // create schema Document::create_table(&db).expect("Unable to create documents table."); } db } #}
Path
import:
# #![allow(unused_variables)] #fn main() { use std::path::Path; #}
main
function simplifies to:
fn main() { let db = get_db_create_if_missing("database.sqlite"); let document = Document { id: String::from("asdf"), revision: 0, hash: vec![0u8], data: String::from(r#"{ "a": 1, "b": 123 }"#), }; document.insert(&db).expect("Unable to insert document."); let document_from_db = Document::get_by_id("asdf", &db) .expect("Unable to get document with id 'asdf'"); println!("data: {}", &document_from_db.data); }
The application still crashes, but with a different error:
~/g/b/sakkosekk $ cargo run
Finished dev [unoptimized + debuginfo] target(s) in 0.01s
Running `target/debug/sakkosekk`
thread 'main' panicked at 'Unable to insert document.: SqliteFailure(Error { code: ConstraintViolation, extended_code: 1555 }, Some("UNIQUE constraint failed: documents.id"))', src/libcore/result.rs:999:5
note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
Next up: Refactor the main function into tests.
Testing introduction
Cargo provides a test runner, cargo test
which runs functions annotated with #[test]
. Lets create a test which checks that get_db_create_if_missing
does not crash if called twice, src/tests.rs:
# #![allow(unused_variables)] #fn main() { #[cfg(test)] mod database { use crate::*; use std::fs::remove_file; #[test] fn creating_database_twice_should_not_fail() { get_db_create_if_missing("test.sqlite"); get_db_create_if_missing("test.sqlite"); remove_file("test.sqlite").unwrap(); } } #}
Here, #[cfg(test)]
tells Rust that the module should only compile when compiling tests. mod database
is a grouping for the database tests.
Running the test:
~/g/b/sakkosekk $ cargo test
Compiling sakkosekk v0.1.0 (/Users/arve/git/build-your-own-couchdb/sakkosekk)
Finished dev [unoptimized + debuginfo] target(s) in 0.73s
Running target/debug/deps/sakkosekk-5d572632b2e9bfcc
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
Running 0 tests? We need to add mod tests
to src/main.rs, such that the tests
module is found:
use rusqlite::{named_params, Connection, Error as SqliteError, Row}; use std::path::Path; mod tests; fn main() { let db = get_db_create_if_missing("database.sqlite"); ...
Really run the test:
~/g/b/sakkosekk $ cargo test
Compiling sakkosekk v0.1.0 (/Users/arve/git/build-your-own-couchdb/sakkosekk)
Finished dev [unoptimized + debuginfo] target(s) in 1.85s
Running target/debug/deps/sakkosekk-5d572632b2e9bfcc
running 1 test
test tests::database::creating_database_twice_should_not_fail ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
Insertion
Naivly adding an insertion test:
# #![allow(unused_variables)] #fn main() { #[cfg(test)] mod database { use crate::*; use std::fs::remove_file; const TEST_DB_FILENAME: &str = "test.sqlite"; #[test] fn creating_database_twice_should_not_fail() { get_db_create_if_missing(TEST_DB_FILENAME); get_db_create_if_missing(TEST_DB_FILENAME); clean_up(); } #[test] fn insertion() { let db = get_db_create_if_missing(TEST_DB_FILENAME); let document = Document { id: String::from("asdf"), revision: 0, hash: vec![0u8], data: String::from(r#"{ "a": 1, "b": 123 }"#), }; document.insert(&db).expect("Unable to insert document."); clean_up(); } fn clean_up() { remove_file(TEST_DB_FILENAME).unwrap(); } } #}
This will fail:
~/g/b/sakkosekk (master|✚2…) $ cargo test
Compiling sakkosekk v0.1.0 (/Users/arve/git/build-your-own-couchdb/sakkosekk)
Finished dev [unoptimized + debuginfo] target(s) in 1.14s
Running target/debug/deps/sakkosekk-5d572632b2e9bfcc
running 2 tests
test tests::database::insertion ... ok
test tests::database::creating_database_twice_should_not_fail ... FAILED
failures:
---- tests::database::creating_database_twice_should_not_fail stdout ----
thread 'tests::database::creating_database_twice_should_not_fail' panicked at 'Unable to create documents table.: SqliteFailure(Error { code: Unknown, extended_code: 1 }, Some("table documents already exists"))', src/libcore/result.rs:999:5
note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
failures:
tests::database::creating_database_twice_should_not_fail
test result: FAILED. 1 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out
error: test failed, to rerun pass '--bin sakkosekk'
The test creating_database_twice_should_not_fail
fails, as tests runs in parallel. We'll use a helper function with
that:
- takes a filename and a test function,
- creates a database connection,
- runs the given test function with the created database connection,
- and removes the database file.
Using the function should look like:
# #![allow(unused_variables)] #fn main() { with("filename.sqlite", |db| { // use db connection }); // with will clean up / remove the database #}
with
implementation:
# #![allow(unused_variables)] #fn main() { fn with<F>(filename: &str, test: F) where F: Fn(Connection) -> (), { let db = get_db_create_if_missing(filename); test(db); remove_file(filename).unwrap(); } #}
Note: An alternative with
is to implement Drop
for our own Connection
-type.
The tests rewritten to use with
:
# #![allow(unused_variables)] #fn main() { #[test] fn creating_database_twice_should_not_fail() { with("creating_twice.sqlite", |_| { get_db_create_if_missing("creating_twice.sqlite"); }); } #[test] fn insertion() { with("insertion.sqlite", |db| { let document = Document { id: String::from("asdf"), revision: 0, hash: vec![0u8], data: String::from(r#"{ "a": 1, "b": 123 }"#), }; document.insert(&db).expect("Unable to insert document."); }); } #}
Note that the tests use different filenames for the database.
Running the tests does not fail:
~/g/b/sakkosekk $ cargo test
Finished dev [unoptimized + debuginfo] target(s) in 0.02s
Running target/debug/deps/sakkosekk-5d572632b2e9bfcc
running 2 tests
test tests::database::creating_database_twice_should_not_fail ... ok
test tests::database::insertion ... ok
test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
Next up: Tests that are expected to fail, like inserting a document with the same identifier twice.
Tests that are expected to fail
We could make our tests panic and annotate tests with #[should_panic]
, but as our methods return Result
, the is_err
method tells us that the correct method call failed.
Insertion failure
A double insertion of the same document should fail:
# #![allow(unused_variables)] #fn main() { #[test] fn double_insertion_should_fail() { with("double_insertion.sqlite", |db| { let document = Document { id: String::from("asdf"), revision: 0, hash: vec![0u8], data: String::from(r#"{ "a": 1, "b": 123 }"#), }; document.insert(&db).expect("Unable to insert document."); let second_insert_result = document.insert(&db); assert!(second_insert_result.is_err()); }); } #}
Actually, this test is wrong. In CouchDB a document can have many revisions, making id
, revision
and hash
unique.
Insertion failure only when same revision
This fails as expected:
# #![allow(unused_variables)] #fn main() { #[test] fn insert_multiple_revisions() { with("insert_multiple_revisions.sqlite", |db| { let insert = |revision: i64| { let document = Document { id: String::from("asdf"), revision: revision, hash: vec![0u8], data: String::from(r#"{ "a": 1, "b": 123 }"#), }; document.insert(&db).expect("Unable to insert document."); }; insert(0); insert(1); }); } #}
We'll fix it by removing primary key
from id
:
# #![allow(unused_variables)] #fn main() { fn create_table(db: &Connection) -> Result<(), SqliteError> { db.execute_batch( "create table documents ( id text not null, revision integer not null, hash blob not null, data text not null )", ) } #}
Running tests again does not seem to have fixed insert_multiple_revisions
, also double_insertion_should_fail
fails now:
running 4 tests
test tests::database::creating_database_twice_should_not_fail ... ok
test tests::database::insert_multiple_revisions ... FAILED
test tests::database::double_insertion_should_fail ... FAILED
test tests::database::insertion ... ok
As the tests paniced, remove_file
in with
never runs. Fix it by removing file before opening:
# #![allow(unused_variables)] #fn main() { fn with<F>(filename: &str, test: F) where F: Fn(Connection) -> (), { remove_file(filename).unwrap_or(()); let db = get_db_create_if_missing(filename); test(db); remove_file(filename).unwrap(); } #}
unwrap_or(())
ignores any errors in deleting the file.
Running tests again:
running 4 tests
test tests::database::creating_database_twice_should_not_fail ... ok
test tests::database::double_insertion_should_fail ... FAILED
test tests::database::insert_multiple_revisions ... ok
test tests::database::insertion ... ok
insert_multiple_revisions
is OK, but double_insertion_should_fail
is still failing. It fails as primary key
constraint was removed.
Adding a unique constraint should fix id:
# #![allow(unused_variables)] #fn main() { fn create_table(db: &Connection) -> Result<(), SqliteError> { db.execute_batch( "create table documents ( id text not null, revision integer not null, hash blob not null, data text not null, unique(id, revision, hash) ); ", ) } #}
Get non-existent document
Getting a missing document should fail:
# #![allow(unused_variables)] #fn main() { #[test] fn get_by_missing_id_should_fail() { with("get_by_id_missing.sqlite", |db| { let result = Document::get_by_id("asdf", &db); assert!(result.is_err()); }); } #}
Make sure all tests pass:
running 5 tests
test tests::database::get_by_missing_id_should_fail ... ok
test tests::database::creating_database_twice_should_not_fail ... ok
test tests::database::double_insertion_should_fail ... ok
test tests::database::insert_multiple_revisions ... ok
test tests::database::insertion ... ok
Next up: Get by id should return all revisions of document.
Getting all revisions of document
In Persistent storage we naively used id
to look up a single document, but a document can have multiple revisions. In other words, Document::get_by_id
should return a list of documents.
Refactoring tests
First, the tests are repeating themself. Currently the tests insertion
, double_insertion_should_fail
and insert_multiple_revisions
are all repeating declaration of a document
.
Move document declaration into a function:
# #![allow(unused_variables)] #fn main() { fn get_document(revision: i64) -> Document { Document { id: String::from("asdf"), revision: revision, hash: vec![0u8], data: String::from(r#"{ "a": 1, "b": 123 }"#), } } #}
Tests refactored to use get_document
:
# #![allow(unused_variables)] #fn main() { #[test] fn insertion() { with("insertion.sqlite", |db| { get_document(0).insert(&db).expect("Unable to insert document."); }); } #[test] fn double_insertion_should_fail() { with("double_insertion.sqlite", |db| { get_document(0).insert(&db).expect("Unable to insert document."); let second_insert_result = get_document(0).insert(&db); assert!(second_insert_result.is_err()); }); } #[test] fn insert_multiple_revisions() { with("insert_multiple_revisions.sqlite", |db| { get_document(0).insert(&db).expect("Unable to insert document."); get_document(1).insert(&db).expect("Unable to insert document."); }); } #}
Get document by identifier test
Now the test for Document::get_by_id
, using get_document
. The test should check that Document::get_by_id
returns all documents:
# #![allow(unused_variables)] #fn main() { #[test] fn get_by_id() { with("get_by_id.sqlite", |db| { get_document(0) .insert(&db) .expect("Unable to insert document."); get_document(1) .insert(&db) .expect("Unable to insert document."); let documents_from_db = Document::get_by_id("asdf", &db); assert!(documents_from_db == Ok(vec![get_document(0), get_document(1)])); }); } #}
This fails compiling for two reasons. Lets takle number one first; Document
does not implement the method eq
from the [PartialEq
trait].
Error message:
error[E0369]: binary operation `==` cannot be applied to type `std::result::Result<Document, rusqlite::error::Error>`
--> src/tests.rs:53:39
|
53 | assert!(documents_from_db == Ok([get_document(0), get_document(1)]));
| ----------------- ^^ -------------------------------------- std::result::Result<[Document; 2], _>
| |
| std::result::Result<Document, rusqlite::error::Error>
|
= note: an implementation of `std::cmp::PartialEq` might be missing for `std::result::Result<Document, rusqlite::error::Error>`
As all fields in Document
implements PartialEq
, we can derive PartialEq
:
# #![allow(unused_variables)] #fn main() { #[derive(PartialEq)] struct Document { id: String, revision: i64, hash: Vec<u8>, data: String, } #}
Now, cargo tests
yields the other error:
error[E0308]: mismatched types
--> src/tests.rs:53:45
|
53 | assert!(documents_from_db == Ok([get_document(0), get_document(1)]));
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected struct `Document`, found array of 2 elements
|
= note: expected type `Document`
found type `[Document; 2]`
query_row_named
only returns a single row. Statement
provides query_map_named
that gives more then one row. [Connection::prepare
] yields a Statement
.
Document::get_by_id
rewritten to use a Statement
:
# #![allow(unused_variables)] #fn main() { fn get_by_id(id: &str, db: &Connection) -> Result<Vec<Self>, SqliteError> { db.prepare("select id, revision, hash, data from documents where id=:id")? .query_map_named(named_params!(":id": id), Document::row_mapper)? .collect() } #}
cargo tests
gives a compilation error:
error[E0609]: no field `data` on type `std::vec::Vec<Document>`
--> src/main.rs:21:44
|
21 | println!("data: {}", &document_from_db.data);
| ^^^^ unknown field
As get_by_id
have a new return signature and Vec
does not have a data
-field. Fix it by removing the line.
Running tests shows that get_by_missing_id
is failing:
running 5 tests
test tests::database::creating_database_twice_should_not_fail ... ok
test tests::database::get_by_missing_id_should_fail ... FAILED
test tests::database::double_insertion_should_fail ... ok
test tests::database::insert_multiple_revisions ... ok
test tests::database::insertion ... ok
Fix the test by having it expect an empty vector:
# #![allow(unused_variables)] #fn main() { #[test] fn get_by_missing_id_should_give_no_results() { with("get_by_id_missing.sqlite", |db| { let documents = Document::get_by_id("asdf", &db).expect("Unable to get documents."); assert!(documents.is_empty()); }); } #}
Understanding get_by_id
A lot happens in get_by_id
. Looking at the types can help in understanding the code.
Result<T>
is hererusqlite::Result<T>
, which is equivalent tostd::result::Result<T, rusqlite::Error>
.prepare
returnsResult<Statement>
.- The question mark
?
translates to unwrap result or exit early with error. ?
unwrapsStatement
.query_map_named
returnsResult<MappedRows>
.?
unwrapsMappedRows
.MappedRows
implementsIntoIterator
, giving usIterator<Result<Document>>
.Iterator::collect
uses the return signature,Result<Vec<Self>, SqliteError>
, and unwrapsResult<Document>
one by one, exiting early if one of them fails.
Note on compiler warnings
You might have noticed compiler warnings like:
warning: unused variable: `db`
--> src/main.rs:7:9
|
7 | let db = get_db_create_if_missing("database.sqlite");
| ^^ help: consider prefixing with an underscore: `_db`
|
= note: #[warn(unused_variables)] on by default
We'll fix these warnings later when using the database interface in our actual application.
Next up: Benchmarking
Benchmarking
As you might have noticed, we did not add an index when removing the primary key. This may affect performance when looking up entries on the id
column.
The criterion crate gives some nice tools for statistics-driven benchmarking.
Splitting into library and binary
Criterion has some known limitations. One limitation is benchmarking a binary crate, which is not possible. To overcome the limitation, we split our project into a library and a binary.
Move main.rs to lib.rs:
mv src/main.rs src/lib.rs
Remove the main
-function and make get_db_create_if_missing
, Document
, Document
-fields and some of the Document
-methods public:
# #![allow(unused_variables)] #fn main() { use rusqlite::{named_params, Connection, Error as SqliteError, Row}; use std::path::Path; mod tests; pub fn get_db_create_if_missing(filename: &str) -> Connection { ... } #[derive(PartialEq)] pub struct Document { pub id: String, pub revision: i64, pub hash: Vec<u8>, pub data: String, } impl Document { ... pub fn insert(&self, db: &Connection) -> Result<usize, SqliteError> { ... } pub fn get_by_id(id: &str, db: &Connection) -> Result<Vec<Self>, SqliteError> { ... } ... } #}
...
is omitted code that have not changed.
Create a new minimal main
function in src/bin/sakkosekk.rs:
use sakkosekk::get_db_create_if_missing; fn main() { let db = get_db_create_if_missing("database.sqlite"); dbg!(db); }
Check that both cargo run
and cargo test
works:
~/g/b/sakkosekk $ cargo run
Compiling sakkosekk v0.1.0 (/Users/arve/git/build-your-own-couchdb/sakkosekk)
Finished dev [unoptimized + debuginfo] target(s) in 11.00s
Running `target/debug/sakkosekk`
[src/bin/sakkosekk.rs:5] db = Connection {
path: Some(
"database.sqlite",
),
}
~/g/b/sakkosekk $ cargo test
Compiling sakkosekk v0.1.0 (/Users/arve/git/build-your-own-couchdb/sakkosekk)
Finished dev [unoptimized + debuginfo] target(s) in 2.38s
Running target/debug/deps/sakkosekk-c97044fdfdcd0074
running 6 tests
test tests::database::creating_database_twice_should_not_fail ... ok
test tests::database::get_by_missing_id_should_give_no_results ... ok
test tests::database::double_insertion_should_fail ... ok
test tests::database::get_by_id ... ok
test tests::database::insertion ... ok
test tests::database::insert_multiple_revisions ... ok
test result: ok. 6 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
Running target/debug/deps/sakkosekk-7673159c8931cebe
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
Doc-tests sakkosekk
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
Creating our first benchmark
Add criterion as a development dependency and define a benchmark in Cargo.toml:
[dev-dependencies]
criterion = "0.2"
[[bench]]
name = "database"
harness = false
name = "database"
must match the filename, make the benchmark as the file benches/database.rs:
# #![allow(unused_variables)] #fn main() { use criterion::{criterion_group, criterion_main, Criterion}; use sakkosekk::{get_db_create_if_missing, Document}; use rusqlite::{Connection}; use std::fs::remove_file; criterion_group!(benches, benchmark); criterion_main!(benches); fn benchmark(c: &mut Criterion) { c.bench_function("get documents by id", |b| { let db = BenchDatabase::new("bench_get_documents_by_id.sqlite"); let doc = Document::get_by_id("bench100", &db.connection).unwrap(); assert!(doc.len() == 1); b.iter(|| Document::get_by_id("bench100", &db.connection)) }); } struct BenchDatabase { filename: &'static str, connection: Connection, } impl BenchDatabase { fn new(filename: &'static str) -> Self { remove_file(filename).unwrap_or(()); let mut connection = get_db_create_if_missing(filename); let transaction = connection.transaction().unwrap(); for i in 0..1000 { let document = Document { id: format!("bench{}", i), revision: 0, hash: vec![], data: format!(r#"{{ "number": {} }}"#, i) }; document.insert(&transaction).unwrap(); } transaction.commit().unwrap(); Self { filename, connection } } } #}
c.bench_function
takes a setup function. Latest statement in bench_function
is b.iter
. Criterion will call the function given to b.iter
in a tight loop and collect timing statistics.
Run it with cargo bench
:
get documents by id time: [25.567 us 25.989 us 26.485 us]
Lets try adding an index on id
and see results of benchmark again.
Adding index index on id
in lib.rs:
# #![allow(unused_variables)] #fn main() { fn create_table(db: &Connection) -> Result<(), SqliteError> { db.execute_batch( "create table documents ( id text not null, revision integer not null, hash blob not null, data text not null, unique(id, revision, hash) ); create index documents_id_idx on documents(id); ", ) } #}
Run cargo bench
to review benchmark:
get documents by id time: [26.169 us 26.702 us 27.417 us]
change: [-0.4154% +2.4533% +5.1353%] (p = 0.10 > 0.05)
No change in performance detected.
No change in performance detected. Lets investigate with the SQLite REPL:
~/g/b/sakkosekk $ sqlite3 bench_get_documents_by_id.sqlite
SQLite version 3.24.0 2018-06-04 14:10:15
Enter ".help" for usage hints.
sqlite> EXPLAIN QUERY PLAN select * from documents where id='bench100';
QUERY PLAN
`--SEARCH TABLE documents USING INDEX documents_id_idx (id=?)
sqlite> drop index documents_id_idx;
sqlite> EXPLAIN QUERY PLAN select * from documents where id='bench100';
QUERY PLAN
`--SEARCH TABLE documents USING INDEX sqlite_autoindex_documents_1 (id=?)
sqlite>
SQLite actually creates an autoindex, so we see no performance gain when creating the index. A search in the SQLite query optimizer documentation reveals some details:
In SQLite version 3.8.0 (2013-08-26) and later, an SQLITE_WARNING_AUTOINDEX message is sent to the error log every time a statement is prepared that uses an automatic index. Application developers can and should use these warnings to identify the need for new persistent indices in the schema.
Following the recomendation, I'll leave the definition of the index.
Enabling write-ahead logging (WAL)
A common performance recommandation for SQLite is write-ahead logging.
Add Enabling WAL-mode:
# #![allow(unused_variables)] #fn main() { ... if !exists { // create schema Document::create_table(&db).expect("Unable to create documents table."); enable_write_ahead_logging(&db); } db } fn enable_write_ahead_logging(db: &Connection) { // PRAGMA journal_mode=wal; let result: String = db .pragma_update_and_check(None, "journal_mode", &"wal", |row| row.get(0)) .unwrap(); assert!("wal" == &result); } #}
As before, we use cargo bench
to assess performance:
get documents by id time: [13.426 us 13.583 us 13.759 us]
change: [-51.227% -49.928% -48.677%] (p = 0.00 < 0.05)
Performance has improved.
2x speedup! 14 microseconds gives a decent upper limit for fetches, 1s / 14us ≈ 70k fetches per second on this hardware (2013 macbook air with 250 GB SSD). I'm happy with that, so let's continue with adding a HTTP-layer in front of the disk persistence.
Next up: Adding HTTP.