The Spread Toolkit - Overview

The problem in building distributed systems comes from the need to communicate and synchronize the different components of the system using networks that are prone to faults. In every distributed system (e.g. replicated databases or application server clusters) there are inherent uncertainties about the current state of remote components. Due to the complexity of such systems, the construction of a reliable and efficient distributed system is very difficult.

Spread is a toolkit that provides a high performance messaging service that is resilient to faults across external or internal networks. Spread functions as a unified message bus for distributed applications, and provides highly tuned application-level multicast and group communication support. Spread services range from reliable message passing to fully ordered messages with delivery guarantees, even in case of computer failures and network partitions.

Spread is designed to encapsulate the challenging aspects of asynchronous networks and enable the construction of scalable distributed applications, allowing application builders to focus on the differentiating components of their application.

Some of the most useful services Spread provides:

  • Efficient and reliable message bus.
  • Reliable multicast from any number of senders to lots of receivers.
  • Scalable group services that allow thousands of active groups.
  • Membership services that inform each component of an application of which other components are running, and enables easy recovery when some fail.
  • Agreed ordering of messages to the group - all receivers receive messages sent to the group in exactly the same order. For many document or message based applications this avoids a lot of work that used to be required for the application to order messages for usability or reliability reasons.
  • FIFO ordering of data streams allows applications to easily add networked streaming multimedia such as video or audio without having to make changes to existing codes or libraries that expect an in-order stream with no holes because of dropped packets on the network.

What kinds of applications can use Spread

Spread can be used in different types of distributed applications. For example:

  • Highly available services. Services that require maintaining a consistent shared state among numerous computers in order to guarantee high performance and availability. Example - replicated servers.
  • Generic message bus for clusters and distributed clusters. Example - A cluster with 30 machines, each of which has 20 distinct processes that need to communicate with each of the other processes reliably. Opening 600 TCP connections at each process is not practical. Our technology enables one connection per process to send to all other 599 processes.
  • Replicated databases. A number of instances of a database exist in several different locations. They must all be kept synchronized in such a way that a client can query or update any of them and the results will be the same as if only one copy existed.
  • Collaborative tools. Many different groups of participants each want to share data, video and audio conferencing.
  • N-way fail-over tool for clusters of servers. Ensures that there will always be a server to handle requests that arrive on any of the IP addresses that are publicly known for the cluster.
  • Distributed Shared Memory (DSM). Sending pages of memory to machines where they are needed using reliable multicast.
  • Spread can be used as a substitute for messaging and group communication tools such as TIB/Rendezvous and Ensemble.

Why use Spread

  • Powerful, but simple API. Only six basic calls are required to utilize Spread.
  • Spread is optimized and can handle over 8,000 1Kbytes messages a second in local area network settings.
  • Spread ensures reliability and availability in the presence of network partitions and failures of any component of the system, whether the environment is one cluster with a few machines, multiple clusters, or a multi-site wide area network with thousands of machines.
  • Enables the system to grow seamlessly without architectural changes.
  • Spread allows unicast, multicast, multi-group multicast, scatter-gather calls, and multiple query functions.
  • Flexible message semantics: Spread supports all levels of messaging from unreliable to agreed (also known as total order) to safe (which guarantees all daemons receive the message before any deliver it to applications), and provides complete extended virtual synchrony.
  • Spread handles network and machine partitions and re-merges safely and notifies applications about the current state of the group.
  • Cross Platform: Spread supports cross-platform operation between Unix (BSD, Linux, Solaris, Irix, AIX, Mac OS X, etc.) and Windows (2000/NT/98/95).
  • Spread currently has programming API's for C/C++, C#, Java, Perl, Python and Ruby.