Add a `data_version` to the `db_write` hook an implement optimistic locking #3358

cdecker · 2019-12-18T20:10:26Z

Currently the db_write hook is a simple sink to which we write, but a plugin
has no way of telling whether there were any gaps or whether the lightningd
database lost some updates. This PR introduces a numeric data_version that
is incremented on any dirty transaction. A ditry transaction is a DB
transaction that performed some changes on the underlying database.

The data_version is guaranteed to change between successive changes, however
it does not guarantee monotonicity (i.e., it can wrap around at 2**32). Its
goal is to assert to a plugin that no changes to the DB have been performed by
c-lightning between successive calls to db_write. The data_version
persists across restarts so a plugin can check whether it matches a backup
after startup.

In addition this PR implements an optimistic locking strategy. The
data_version is incremented only if it matches what we expect from
startup. If it doesn't match we abort() since there might be another
instance writing to the database. This is not really important for sqlite3
which is already protected through the use of the pid file, but in postgres
we may not have such alternative protection mechanism against concurrent
instances.

It is not meant as a graceful shutdown, but rather a last resort protection. I
expect that some other locking mechanism is implemented in a wrapper for
c-lightning that'd start the instance only after acquiring ownership. As a
standalone though it should be sufficient to protect against concurrent
instances by serializing all write operations, and abort()ing the loser of
the write race.

As a minor detail the order in which the increment and the reporting is
currently done the plugin gets the version number after the transaction would
be applied, but that doesn't really matter, as long as the values differ
between calls.

The lock provides linearizability of DB modifications: if a database is
changed under the feet of a running process that process will abort(), which
from a global point of view is as if it had crashed right after the last
successful commit. Any process that also changed the DB must've started
between the last successful commit and the unsuccessful one since otherwise
its counters would not have matched (which would also have aborted that
transaction). So this reduces all the possible timelines to an equivalent
where the first process died, and the second process recovered from the DB.

cdecker · 2019-12-18T20:10:51Z

Still checking rebaseability, will un-draft asap :-)

ZmnSCPxj · 2019-12-19T08:01:59Z

When considering as well the possibillity of chaining db_write hooks, perhaps we should consider a two-stage commit as well:

db_write hook, where plugins save the queries in some log, and return with "result" : "continue" if they are able to save the queries or "result" : "abort" if they fail.
db_commit notification, where if all the plugins hooked on db_write returned "result" : "continue", this notification is broadcast so that plugins can advance the state.

We may not have hook chaining yet, but it seems good to prepare the interface at this point as well, so that plugin authors for remote backup in particular can start adapting to the new interface, then later we can support hook chaining and thereby allow multiple backup plugins to hook the db_write hook and stop the daemon if any backup is having trouble.

cdecker · 2019-12-19T12:23:47Z

When considering as well the possibillity of chaining db_write hooks, perhaps we should consider a two-stage commit as well:

db_write hook, where plugins save the queries in some log, and return with "result" : "continue" if they are able to save the queries or "result" : "abort" if they fail.

db_commit notification, where if all the plugins hooked on db_write returned "result" : "continue", this notification is broadcast so that plugins can advance the state.

We may not have hook chaining yet, but it seems good to prepare the interface at this point as well, so that plugin authors for remote backup in particular can start adapting to the new interface, then later we can support hook chaining and thereby allow multiple backup plugins to hook the db_write hook and stop the daemon if any backup is having trouble.

Totally with you on the chainability of hooks, standardizing on {"result": "xyz"} allowing the plugin to abort or continue is important. It is possible today by having the hook return an error, but that looks a lot scarier to the user when he gets a backtrace with an inscrutable error message. So cleaning that up would be great.

I'm not convinced that a db_commit notification is something we should do. Plugins have to manage cases with and without that notification anyway (we might exit inbetween calling the hook and committing) so adding the notification might lead to more complex logic given now two different entrypoints into the logic are possible. Worse is that plugin developers may start relying on the notification, causing issues for committed but not notified transactions. These two arguments lead me to be cautious with this kind of apparent optimization.

darosior · 2019-12-19T12:28:10Z

Worse is that plugin developers may start relying on the notification, causing issues for committed but not notified transactions.

FWIW we could send a notification at begin_transaction and one at commit_transaction, so that the plugin knows what's committed or not ?

cdecker · 2019-12-19T13:55:35Z

Worse is that plugin developers may start relying on the notification, causing issues for committed but not notified transactions.

FWIW we could send a notification at begin_transaction and one at commit_transaction, so that the plugin knows what's committed or not ?

But the plugin doesn't know as a matter of fact, since we might terminate inbetween calling the hook and issuing the notification, so the plugin must handle the case where it doesn't get a notification but the DB transaction is still committed. Once you already have to handle that, why add yet another possible entrypoint for the happy case, and not just always assume the bad case? No matter how many notifications we add the plugin will have to have the ability to handle the case that a notification never comes :-)

I guess I'm worried about the overhead of notifications (it's small but not free) and plugin devs relying to heavily on the helpful notification and not bothering with the bad case, which would then result in loads of subtly broken plugins.

niftynei

nice, seems like a good thing to have

niftynei · 2019-12-19T18:22:28Z

wallet/db_common.h

+
+	/* The current DB version we expect to update if changes are
+	 * committed. */
+	u32 data_version;


u64 not u32?

Yes, intval in the vars table is defined as an INTEGER which gets translated into a u32 in postgres and the size has to match in non-sqlite3 databases. So I thought I shouldn't be too picky about this value.

Alternatives are 1) creation of a new table, or 2) adding a new field bigintval to the vars table. If this is a problem I'd prefer the second. But really all we're guaranteeing is that there is enough variety to identify if we skipped writes or have lost some. In particular we can't do much if we're off by more than 2: if the plugin missed an update we must take a snapshot, if the DB moved back to an earlier point in time we must abort. We just need to have the probability of accidentally using the same version again small enough (1/2**32 seems good enough for me 😉).

sweet, ok. that's a good thing to know about int -> u32 in postgres.

We were passing them in separately, while we could just retrieve them from the db instance instead.

We are about to do some more operations before committing, so moving this up allows us to reuse the same transaction.

This counter is incremented on each dirty transaction.

This increments the `data_version` upon committing dirty transactions, reads the last data_version upon startup, and tracks the number in memory in parallel to the DB (see next commit for rationale). Changelog-Changed: JSON-RPC: Added a `data_version` field to the `db_write` hook which returns a numeric transaction counter.

The optimistic lock prevents multiple instances of c-lightning making concurrent modifications to the database. That would be unsafe as it messes up the state in the DB. The optimistic lock is implemented by checking whether a gated update on the previous value of the `data_version` actually results in an update. If that's not the case the DB has been changed under our feet. The lock provides linearizability of DB modifications: if a database is changed under the feet of a running process that process will `abort()`, which from a global point of view is as if it had crashed right after the last successful commit. Any process that also changed the DB must've started between the last successful commit and the unsuccessful one since otherwise its counters would not have matched (which would also have aborted that transaction). So this reduces all the possible timelines to an equivalent where the first process died, and the second process recovered from the DB. This is not that interesting for `sqlite3` where we are also protected via the PID file, but when running on multiple hosts against the same DB, e.g., with `postgres`, this protection becomes important. Changelog-Added: DB: Optimistic logging prevents instances from running concurrently against the same database, providing linear consistency to changes.

niftynei

ACK 8076248

cdecker added wallet json-rpc labels Dec 18, 2019

cdecker added this to the 0.8.1 milestone Dec 18, 2019

cdecker requested a review from rustyrussell December 18, 2019 20:10

cdecker self-assigned this Dec 18, 2019

cdecker force-pushed the backup-versioning branch from 704f99f to f4a84e2 Compare December 19, 2019 12:25

niftynei reviewed Dec 19, 2019

View reviewed changes

cdecker added 6 commits December 27, 2019 15:42

db: Consolidate access to the changes in a db

c065503

We were passing them in separately, while we could just retrieve them from the db instance instead.

db: Move db_migrate transaction up one level

d84f5a9

We are about to do some more operations before committing, so moving this up allows us to reuse the same transaction.

db: Add tracking of whether the current transaction is dirty

c6c543c

db: Add numeric data_version counter to count modifying transactions

73a5d5a

This counter is incremented on each dirty transaction.

cdecker force-pushed the backup-versioning branch from 73c1928 to 8076248 Compare December 27, 2019 14:43

cdecker marked this pull request as ready for review December 27, 2019 14:43

niftynei approved these changes Jan 2, 2020

View reviewed changes

niftynei merged commit 3f3a48d into ElementsProject:master Jan 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a `data_version` to the `db_write` hook an implement optimistic locking #3358

Add a `data_version` to the `db_write` hook an implement optimistic locking #3358

cdecker commented Dec 18, 2019

cdecker commented Dec 18, 2019

ZmnSCPxj commented Dec 19, 2019

cdecker commented Dec 19, 2019

darosior commented Dec 19, 2019

cdecker commented Dec 19, 2019 •

edited

Loading

niftynei left a comment

niftynei Dec 19, 2019

cdecker Dec 20, 2019

niftynei Dec 21, 2019

niftynei left a comment

Add a data_version to the db_write hook an implement optimistic locking #3358

Add a data_version to the db_write hook an implement optimistic locking #3358

Conversation

cdecker commented Dec 18, 2019

cdecker commented Dec 18, 2019

ZmnSCPxj commented Dec 19, 2019

cdecker commented Dec 19, 2019

darosior commented Dec 19, 2019

cdecker commented Dec 19, 2019 • edited Loading

niftynei left a comment

Choose a reason for hiding this comment

niftynei Dec 19, 2019

Choose a reason for hiding this comment

cdecker Dec 20, 2019

Choose a reason for hiding this comment

niftynei Dec 21, 2019

Choose a reason for hiding this comment

niftynei left a comment

Choose a reason for hiding this comment

Add a `data_version` to the `db_write` hook an implement optimistic locking #3358

Add a `data_version` to the `db_write` hook an implement optimistic locking #3358

cdecker commented Dec 19, 2019 •

edited

Loading