Towards the end-to-end design for big data management in the cloud: why, how, and when?
With the wide-scale adoption of cloud computing and with the explosion in the number of distributed applications and end-user devices, we are witnessing insatiable desire to build bigger-and-bigger systems that can serve hundreds of millions of end-users, are highly automated, and can collect enormous amounts of data in short periods of time. Often newer systems are implemented by integrating existing sub-systems that are already in use. A consequence of such a massive-scale integration is that it is very difficult to have a complete understanding of the overall system design. In fact, recent examples indicate that the only way to debug and test newer modules is to put them in live deployments that sometimes can lead to disastrous outcomes. In this talk, I will use some of the recent events in the context of Big Data and Cloud Computing as a motivation to argue that we need better methodologies for end-to-end system design for big data management in the cloud. I will then explore some well-known abstractions from distributed computing and databases as a means towards such a design and conclude with a contemplative question whether we can achieve such a goal or shall we leave it all to an automated self-learning and self-corrective oracle.
Full Text: PDF