Byzantine Fault Tolerant protocols are complicated and hard to implement.Today’s software industry is reluctant to adopt these protocols because of thehigh overhead of message exchange in the agreement phase and the high resourceconsumption necessary to tolerate faults (as 3 f + 1 replicas are required totolerate f faults). Moreover, total ordering of messages is needed by mostclassical protocols to provide strong consistency in both agreement and executionphases. Research has improved throughput of the execution phase by introducingconcurrency using modern multicore infrastructures in recent years. However,improvements to the agreement phase remains an open area.
Byzantine Fault Tolerant systems use State Machine Replication to tolerate awide range of faults. The approach uses leader based consensus algorithms for thedeterministic execution of service on all replicas to make sure all correct replicasreach same state. For this purpose, several algorithms have been proposed toprovide total ordering of messages through an elected leader. Usually, a singleleader is considered to be a bottleneck as it cannot provide the desired throughputfor real-time software services. In order to achieve a higher throughput there is aneed for a solution which can execute multiple consensus rounds concurrently.
We present a solution that enables multiple consensus rounds in parallel bychoosing multiple leaders. By enabling concurrent consensus, our approach canexecute several requests in parallel. In our approach we incorporate applicationspecific knowledge to split the total order of events into multiple partial orderswhich are causally consistent in order to ensure safety. Furthermore, a dependencycheck is required for every client request before it is assigned to a particular leaderfor agreement. This methodology relies on optimistic prediction of dependenciesto provide higher throughput. We also propose a solution to correct the course ofexecution without rollbacking if dependencies were wrongly predicted.
Our evaluation shows that in normal cases this approach can achieve upto 100% higher throughput than conventional approaches for large numbers ofclients. We also show that this approach has the potential to perform better incomplex scenarios