Simulating Multi-Tenant OLAP Database Clusters
Simulation of parallel database machines was used in many database research projects during the 1990ies. One of the main reasons why simulation approaches were popular in that time was the fact that clusters with hundreds of nodes were not as readily available for experimentation as it is the case today. At the same time, the simulation models underlying these systems were fairly complex since they needed to capture both queuing processes in hardware (e.g. CPU contention or disk I/O) and software (e.g. processing distributed joins). Todays trend towards more specialized database architectures removes large parts of this complexity from the modeling task. As the main contribution of this paper, we discuss how we developed a simple simulation model of such a specialized system: a multi-tenant OLAP cluster based on an in-memory column database. The original infrastructure and testbed was built using SAP TREX, an in-memory column database part of SAP's business warehouse accelerator, which we ported to run on the Amazon EC2 cloud. Although we employ a simple queuing model, we achieve good accuracy. Similar to some of the parallel systems of the 1990ies, we are interested in studying different replication and high-availability strategies with the help of simulation. In particular, we study the effects of mirrored vs. interleaved replication on throughput and load distribution in our cluster of multi-tenant databases. We show that the better load distribution inherent to the interleaved replication strategy is exhibited both on EC2 and in our simulation environment.
Full Text: PDF