Reliable Web-based Transactions

FARGOS/SolidState™ has the capability to provide an unprecedented level of resiliency to web-based information systems that are comprised of four or more servers. One might not expect that such fault-tolerance is needed, but reports indicate that web site deployments are surprisingly fragile:

An Andersen Consulting study found that "One in Four Online Purchases [are] Thwarted".

Problems with "dropped shopping carts" are discussed in "Sites Strive to Hold On To Buyers". It reports assertions by Datamonitor that 78% of transactions are never completed due to web site problems.

Preserve the Customer's Investment

Many web-based applications obtain information from a user only after that individual has expended a significant amount of time. Consider a typical shopping experience at an online retailer's site: the customer browses the catalog, selecting items for purchase and then eventually checks out. The checkout process itself often involves several rounds of interaction in order to obtain the customer's identity, delivery address and payment mechanism (e.g., credit card number). A failure to correctly accept and record information provided by the customer will result in a failure to receive and process the order. Few customers will view such a failure as a positive experience. While it is possible that a customer will reenter all of his or her selections, the more probable result is an irritated individual that will give up and leave the site. Only a small class of web sites can afford to lose present and potential future orders as the result of the customer taking his or her business elsewhere. FARGOS/SolidState keeps a failure of in a vendor's web server farm from being visible to the customer: the order gets placed and the business is retained.

High-Availability is not Fault-Tolerance

Web sites with multiple servers typically use a special kind of application-specific load-balancing router as a front-end to their server farm. Vendors of such products also promote them as providing high-availability services for the web site: if a given web server fails, the front-end router will stop sending requests to it and the failure will not affect future requests. Consequently, the owners of web sites that deployed these useful products may think that they are already protected against failures. They are not.

Vendors of many load-distribution products are correct to make claims that they provide high-availability. What they are not providing, nor do they claim to provide, is fault-tolerance. If a transaction is in progress when the server fails, the transaction is lost. In the majority of cases, the transaction will be a request to obtain a page of HTML or a graphic image—these are read-only operations that can be successfully retried on a different server. These retry attempts will occur either because the user clicked on the Refresh button in his browser or the connection attempt is retried by the browser application or underlying operating system kernel. When the retried request is received by the web site's load balancing front-end, it will be transferred to a different, still operational server. Unfortunately, if the failure occurs during a transaction in which the user was providing data, this recovery will typically not work. One reason for this is the use of SSL-based sessions to securely transmit sensitive information.

Pinned Down by SSL

To protect the confidentiality of data, many web sites use the Secure Socket Layer protocol to encrypt the information provided by a customer. Because the establishment of an SSL-based session is computationally expensive, SSL-based sessions normally remain open for the period in which confidential data is being requested from the customer. This, coupled with the fact that the successful decryption of newly transmitted data requires knowledge of the data that has previously been sent, results in special-case treatment for SSL-based connections by load-distribution systems that act as front ends for web sites. Essentially, once an SSL-based connection is established, the session is pinned to a given server. As a consequence, most backend applications take advantage of the fact that the connection is stuck on a single server and thus make no provision for replicating data to other server nodes. Statistically, the probability of encountering a failure increases the longer the session is open. A failure that results in the loss of the server results in the loss of the SSL-session and, ultimately, the transaction in progress. If, as is commonplace, the customer-provided data has not been replicated, it too is lost. When a site uses FARGOS/SolidState, the customer's data is replicated across multiple servers and the transaction's processing is checked for correctness. Unlike simple replication schemes, FARGOS/SolidState provides Byzantine fault-tolerance.

Protection Against Data Corruption

Most systems that attempt to provide fault-tolerance assume that a server will fail in a clean fashion and thus it either works 100% correctly or not at all. FARGOS/SolidState takes on the ultimate challenge and protects against failures in which a server provides incorrect results, perhaps due to a disk or memory failure. Rather than assume benign failure modes, FARGOS/SolidState protects against the worst case scenarios, such as the complete takeover of a machine by a hostile third party.

Availability

The FARGOS/SolidState HTTP Server Adapter is available as a component for the FARGOS/VISTA™-based HTTP server on all hardware and operating system platforms supported by FARGOS/VISTA. A demonstration example of an HTTP-based e-commerce shopping cart is included in the FARGOS/VISTA Software Development Kit.