From Traditional Fault Tolerance to Blockchain. Wenbing Zhao. Читать онлайн. MREADZ.NET

From Traditional Fault Tolerance to Blockchain. Wenbing Zhao

Читать онлайн книгу.

В начало <16 17 18 19 20 21 22 23 24 25 >В конец

From Traditional Fault Tolerance to Blockchain - Wenbing Zhao

message from a channel when it is in the Checkpointing state, it adds the Marker message to the Marker Certificate and checks whether or not the Marker Certificate is complete. If the Marker Certificate is now complete, the process transits to the Normal state (and possibly reports the completion of the global checkpointing to some predefined server). Otherwise, the process will remain in the Checkpointing state.

◾ In either the Normal or Checkpointing state, the process may receive a regular message. The regular message is always executed immediately. This is drastically different from the Tamir and Sequin global checkpointing protocol. The regular message will be appended to the channel state from which it is received only when the process is in the Checkpointing state and it has not received the Marker message in this channel.

EXAMPLE 2.4

An example run of the distributed snapshot protocol in a three-process distributed system is shown in Figure 2.8. P₀ is the initiator of the round of the global checkpointing. P₀ takes a local checkpoint and sends a Marker message along each of its outing channels. Upon receiving the Marker message, P₁ immediately takes a local checkpoint and in turn sends a Marker message to each of its outgoing channels. Similarly, P₂ takes a local checkpoint when it receives the first Marker message (from P₁) and sends a Marker message to each of its outgoing channels connecting to P₀ and P₁, respectively.

Upon taking a local checkpoint, a process starts logging messages, if any, arrived at each incoming channel. The process stops logging messages for a channel as soon as it has received a Marker message from that channel. The messages logged will become the state for each channel. For P₀, the channel state consists of a message m₀. For P₁, the channel state consists of a message m₁. The channel state for P₂ is empty because it did not receive any message prior to the receipt of the Marker message from each of its incoming channels. Note that the regular message received (such as m₀ or m₁) is executed immediately, which is drastically different from the Tamir and Sequin global checkpointing protocol.

Schematic illustration of normal operation of the Chandy and Lamport global snapshot protocol in an example three-process distributed system.

Figure 2.8 Normal operation of the Chandy and Lamport global snapshot protocol in an example three-process distributed system.

2.2.4 Discussion

The two global checkpointing protocols introduced in this section share a number of similarities.

◾ Both rely on virtually the same system model, and use a special control message to propagate and coordinate the global checkpointing.

◾ They both recognize the need to capture the channel state to ensure the recoverability of the system.

◾ The mechanism to capture the channel state is virtually the same for both protocols, as shown in Figure 2.9.– In both protocols, a process starts logging messages (for the channel state) for each channel upon the initiation of the global checkpoint (at the initiator) or upon the receipt of the first control message (i.e., the Marker message in the Chandy and Lamport protocol and the CHECKPOINT message in the Tamir and Sequin protocol).– In both protocols, the process stops logging messages and conclude the channel state for each channel when it receives the control message in that channel.

◾ The communication overhead of the two protocols is identical (i.e., the same number of control messages is used to produce a global checkpoint).

Schematic illustration of a comparison of the channel state definition between (a) the Chandy and Lamport distributed snapshot protocol and (b) the Tamir and Sequin global checkpointing protocol.

Figure 2.9 A comparison of the channel state definition between (a) the Chandy and Lamport distributed snapshot protocol and (b) the Tamir and Sequin global checkpointing protocol.

The two protocols also differ in their strategies in producing a global checkpoint.

◾ The Tamir and Sequin protocol is more conservative in that a process suspends its normal execution as soon as it learns that a global checkpointing round has started. In light of the Chandy and Lamport protocol, the suspension of normal execution could have been avoided during a global checkpointing round.

◾ The reason for the blocking design in the Tamir and Sequin protocol is that a process captures the channel states prior to taking a local checkpoint. While capturing the channel state, a process cannot execute the regular messages received because doing so would alter the process state, thereby potentially rendering the global checkpoint inconsistent. On the other hand, in the Chandy and Lamport protocol, a process captures the channel state after it has taken a local checkpoint, thereby enabling the execution of regular messages without the risk of making the global checkpoint inconsistent.

◾ The Tamir and Sequin protocol is more complete and robust because it ensures the atomicity of the global checkpointing round. Should a failure occurs, the current round would be aborted. The Chandy and Lamport protocol does not define any mechanism to ensure such atomicity. Presumably, the mechanisms defined in the Tamir and Sequin protocol can be incorporated to improve the Chandy and Lamport protocol.

2.3 Log Based Protocols

Checkpoint-based protocols only ensure to recover the system up to the most recent consistent global state that has been recorded and all executions happened afterwards, if any, are lost. Logging can be used to recover the system to the state right before the failure, provided that the piecewise deterministic assumption is valid. In log based protocols, the execution of a process is modeled as consecutive state intervals [21]. Each state interval is initiated by a nondeterministic event (such as the receiving of a message) or the initialization of the process, and followed by a sequence of deterministic state changes. As long as the nondeterministic event is logged, the entire state interval can be replayed.

As an example, three state intervals are shown in Figure 2.10. The first state interval starts at the initialization of the process Pi and ends right before it executes the first message, m₁ received. Note that the sending of message m₀ is not considered a nondeterministic event. The second state interval is initiated by the receiving event of message m₁ and ends prior to the receipt of m₃. Similarly, the third state interval starts with the receiving event of m₃ and ends prior to the receipt of m₅.

In the remaining of this section, we assume that the only type of nondeterministic events is the receiving of application messages. Therefore, logging is synonymous with message logging.

Schematic illustration of an example state intervals.

Figure 2.10 Example state intervals.

For all practical purposes, logging is always used in conjunction with checkpointing to enjoy two benefits:

1 It limits the recovery time because to recover from a failure the process can be restarted from its last checkpoint (instead from its initial state) and its

Скачать книгу

В начало <16 17 18 19 20 21 22 23 24 25 >В конец