Skip to content

Migrating Sharded Clusters

Leif Walsh edited this page Feb 12, 2014 · 1 revision

Migrating from MongoDB: Sharded Cluster

When migrating a sharded cluster to TokuMX, there are three main strategies to consider.

Just like for replica sets, sharded clusters can be migrated "offline" or "online". See Migrating from MongoDB: Replica set for the differences between offline and online migrations. They apply to sharded clusters as well.

When migrating a sharded cluster offline, it is possible to convert each shard individually, or to convert the entire data set at once.

Individual shards

When converting each shard individually, you will end up with the same number of shards and chunk distribution that you had in MongoDB. The benefits here are:

  • The migration process is faster because each shard can be migrated in parallel, and all shards can use the bulk loader, which imports data faster.
  • Each shard is backed up to disk separately, so the space used by the backup is spread over many disks.

All data at once

When converting the entire data set at once, you have the opportunity to change the number of shards and the chunk distribution, but this method does not use TokuMX’s fast bulk loader, so will typically be a slower process. Converting the entire data set at once also can require a lot of extra disk space because the entire data set needs to be dumped to disk at once.

When the entire data set is dumped to disk in one backup, it is also possible to import this backup into a single TokuMX replica set. Since TokuMX already has document-level concurrency, high write throughput, and compression, it is reasonable to expect a single TokuMX replica set to support the workload of a sharded cluster of MongoDB instances, and migrating down to a single replica set reduces the operations work for maintaining the TokuMX database.

Clone this wiki locally