I am a postdoctoral Research Fellow in Computer Science and Engineering at the University of
Michigan, Ann Arbor. I am the lead of the Verdict project. My research interests are (1) building real-time big data analytics systems and (2) developing statistical and machine learning algorithms for smarter computational engines in those systems.
I earned my Ph.D. in Computer Science and Engineering at the University of Michigan, Ann Arbor in 2017, with advisors Michael Cafarella and Barzan Mozafari. During my Ph.D. study, I developed algorithms for various data analytics applications including approximate query processing, data visualizations, searching in high-dimensional space. I am part of Database Group at the University of Michigan. My curriculum vitae is available here. My MS and Ph.D. studies were supported additionally by Jeongsong Cultural Foundation and Kwanjeong Educational Foundation, respectively. I am best reached at my email: email@example.com.
Database learning is our vision to building an intelligent database system that
makes use of its own knowledge or understanding on the underlying data stored in
the database for producing enhanced answers to new queries. This general vision
has been applied to building an approximate query processing (AQP) system that becomes
smarter as it process more queries. Our AQP system combines the (relatively less
accurate) approximate answer to a new query with its model on the underlying
data to produce more accurate approximate answers, where the model is built
based on the approximate answers to the past queries. As a result, our system
greatly reduces the expected errors (and actual errors as well) of the
approximate answers generated from a popular sampling-based AQP engines. This
error reduction capability also translates into query processing speedups when
targeting a certain level of AQP accuracy.
North East Database Day 2016, Oral, MIT
Midwest Big Data Opportunities and Challenges Workshop 2016, Chicago
The biennial Conference on Innovative Data Systems Research (CIDR) 2017 Gongshow
ACM SIGMOD International Conference on Management of Data (SIGMOD) 2017
Our hashing-based searching algorithm, Neighbor-Sensitive Hashing, provides
significant improvements in the accuracy of approximate searching and, as a
result, the speed of the search process for the identical target serach
accuracy, compared to a decade of research in hashing-based searching algorithms
started from the famous Locality-Sensitive Hashing. The performance improvements
of our algorithm stems from the fact that our hashing functions are designed
only for capturing the information on only a relatively small number of similar
items, compared to the existing work that captures the similarity (or
distance) information on all pairs of data items. As a result, our searching
algorithm excels when the number of similar items is relatively small compared
to the total number of items in a database, which is mostly the true for
International conference on Very Large Data Bases 2016 Delhi, India
3rd Workshop on Web-scale Vision and Social Media at ICCV 2015, Santiago, Chile
We investigates an optimal sampling (or equivalently, data reduction)
method for visualizing large-scale database using scatter plots (or similar).
This project starts by observing that traditional sampling methods such as
uniform random sampling and stratified sampling show limited performance in
generating scatter plots that look (visually) similar to the original plot
(which is obtainable by visualizing entire database after hours of processing).
In contrast, our newly developed method is specialized in approximating the
visual quality of scatter plots, and provides much higher quality
Our demo with 2 billion GPS recordings from the open street map project is
Presentations: International Conference in Data Engineering (ICDE) 2016, Helsinki,
This work identifies the necessity of the system dedicated to feature
engineering process essential for building high-quality machine learning
systems. This idea is further pursued by Mike
Anderson, resulting in a
for the feature engineering system.
Presentations: The biennial Conference on Innovative Data Systems Research (CIDR) 2013,