This page briefly introduces some of ongoing and past projects. Technical details of those projects can be found in research papers.
Database systems are becoming increasingly complex. Manually configuring a system for each different system environment is too time-consuming and, more than often, not even optimal. To build self-optimizing systems, I am investigating machine learning-based techniques that can continuously optimize the internals of data systems.
Despite 25 years of research in academia, approximate query processing (AQP) has had little industrial adoption, mostly due to the reluctance of traditional vendors to make radical changes and also due to the tight integration with specific platforms.
Our proposal, called VerdictDB, uses a middleware architecture that requires no changes to the backend database, and thus, can work with all off-the-shelf engines. Operating at the driver-level, VerdictDB intercepts analytical queries issued to the database and rewrites them into another query that, if executed by any standard relational engine, will yield sufficient information for computing an approximate answer. VerdictDB uses the returned result set to compute an approximate answer and error estimates, which are then passed on to the user or application. However, lack of access to the query execution layer introduces significant challenges in terms of generality, correctness, and efficiency.
You have billions of data points, but are you frustrated simply because they are just too big to visualize? This is a common problem we encounter when we work with large database: my dataset is simply too big to interactively explore and gain useful insights from them.
Our proposed approach, Visualizaion-Aware Sampling (VAS), makes the interactive visulization of big data possible by reducing the number of data points, but minimzing the quality loss stemming from the data reduction process.