Replica Set Elections
Replica sets use elections to determine whichset member will become primary. Replica sets can trigger anelection in response to a variety of events, such as:
- Adding a new node to the replica set,
initiating a replica set
,- performing replica set maintenance using methods such as
rs.stepDown()
orrs.reconfig()
, and - the secondary members losing connectivity to the primary for more than the configured
timeout
(10 seconds by default).
In the following diagram, the primary node was unavailable for longerthan the configured timeout
and triggers the automatic failoverprocess. One of the remaining secondaries calls for an election toselect a new primary and automatically resume normal operations.
The replica set cannot process write operations until theelection completes successfully. The replica set can continue to serveread queries if such queries are configured torun on secondaries.
The median time before a cluster elects a new primary should nottypically exceed 12 seconds, assuming default replicaconfiguration settings
. This includes time required tomark the primary as unavailable andcall and complete an election.You can tune this time period by modifying thesettings.electionTimeoutMillis
replication configurationoption. Factors such as network latency may extend the time requiredfor replica set elections to complete, which in turn affects the amountof time your cluster may operate without a primary. These factors aredependent on your particular cluster architecture.
Your application connection logic should include tolerance for automaticfailovers and the subsequent elections. Starting in MongoDB 3.6, MongoDB driverscan detect the loss of the primary and automaticallyretry certain write operations a single time,providing additional built-in handling of automatic failovers and elections:
- MongoDB 4.2-compatible drivers enable retryable writes by default
- MongoDB 4.0 and 3.6-compatible drivers must explicitly enableretryable writes by including
retryWrites=true
in the connection string.
Factors and Conditions that Affect Elections
Replication Election Protocol
Changed in version 4.0: MongoDB 4.0 removes the deprecated replication protocol version 0.
Replication protocolVersion: 1
reducesreplica set failover time and accelerate the detection of multiplesimultaneous primaries.
With protocolVersion 1, you can usecatchUpTimeoutMillis
to prioritize between fasterfailovers and preservation of w:1
writes.
For more information on pv1
, seeReplica Set Protocol Version.
Heartbeats
Replica set members send heartbeats (pings) to each other every twoseconds. If a heartbeat does not return within 10 seconds, the othermembers mark the delinquent member as inaccessible.
Member Priority
After a replica set has a stable primary, the election algorithm willmake a “best-effort” attempt to have the secondary with the highestpriority
available call an election.Member priority affects both the timing and theoutcome of elections; secondaries with higher priority call electionsrelatively sooner than secondaries with lowerpriority, and are also more likely to win. However, a lower priorityinstance can be elected as primary for brief periods, even if a higherpriority secondary is available. Replica set members continueto call elections until the highest priority member available becomesprimary.
Members with a priority value of 0
cannot become primary and donot seek election. For details, seePriority 0 Replica Set Members.
Loss of a Data Center
With a distributed replica set, the loss of a data center may affectthe ability of the remaining members in other data center or datacenters to elect a primary.
If possible, distribute the replica set members across data centers tomaximize the likelihood that even with a loss of a data center, one ofthe remaining replica set members can become the new primary.
See also
Replica Sets Distributed Across Two or More Data Centers
Network Partition
A network partition may segregate a primary into a partitionwith a minority of nodes. When the primary detects that it can only seea minority of nodes in the replica set, the primary steps down asprimary and becomes a secondary. Independently, a member in thepartition that can communicate with a majority
of the nodes (including itself)holds an election to become the new primary.
Voting Members
The replica set member configuration setting members[n].votes
and member state
determine whether amember votes in an election.
- All replica set members that have their
members[n].votes
setting equal to 1 vote in elections. To exclude a member from votingin an election, change the value of the member’smembers[n].votes
configuration to0
.
Changed in version 3.2:
Non-voting members must have
priority
of 0.
Only voting members in the following states are eligible to vote:
Non-Voting Members
Although non-voting members do not vote in elections, these membershold copies of the replica set’s data and can accept read operationsfrom client applications.
Because a replica set can have up to 50 members
, but only 7 votingmembers
, non-votingmembers allow a replica set to have more than seven members.
Non-voting members must have priority
of 0.
For instance, the following nine-member replica set has seven votingmembers and two non-voting members.
A non-voting member has both votes
andpriority
equal to 0
:
- {
- "_id" : <num>,
- "host" : <hostname:port>,
- "arbiterOnly" : false,
- "buildIndexes" : true,
- "hidden" : false,
- "priority" : 0,
- "tags" : {
- },
- "slaveDelay" : NumberLong(0),
- "votes" : 0
- }
Important
Do not alter the number of votes to control whichmembers will become primary. Instead, modify themembers[n].priority
option. _Only_alter the number of votes in exceptional cases. For example, topermit more than seven members.
To configure a non-voting member, seeConfigure Non-Voting Replica Set Member.