Research


Active Research Projects

Sufficient logging is crucial for software development as it helps isolate bugs, debug problems, and maintain security practices. Insufficient logging makes spotting issues difficult during failures or debugging. However, current logging mechanisms in serverless platforms burden developers by requiring separate logging statements from the code. Likewise, auto-generated logs by cloud platforms lack the necessary insight from the application code, making them too generic and non-descriptive.

We propose LogLess, a new logging paradigm to automate the logging process. Using this approach, developers can publish logs for serverless applications without writing manual log statements in the code. LogLess uses the programming language "decorators" to parse a function's arguments, blocks, and lines of code. Each block/line is categorized based on its function, such as variable assignment, API call, or external database connection. LogLess assigns each block/line to a corresponding “pattern group/sub-group,” fetches underlying variable values, and generates appropriate logs. Our results show that LogLess has a performance overhead ranging from 20% to 40% compared to manual logging. However, it saves significant development time, taking only seconds compared to hours.

Waiting policies for cloud-enabled schedulers determine whether submitted jobs should queue to wait for reserved “fixed” resources or run on dynamically acquired on-demand resources. Prior work has found that forcing jobs with long runtimes and short expected waiting times to queue for fixed resources is an optimal waiting policy. Unfortunately, predicting a job’s runtime and expected waiting time at submission is challenging. To mitigate this, prior work found that a naïve “speculative execution” approach in which all submitted jobs run on on-demand resources as needed for a short amount of time yielded near-optimal performance. However, the approach’s performance was only evaluated on datasets where the majority of jobs were short. We propose a custom approach which extends existing work and utilizes ML runtime models, ML waiting time models, and a modified form of speculative execution as an alternative waiting policy implementation for workloads in which the majority of jobs are not short. We evaluate our custom approach on two workload traces and find that it can simultaneously reduce both cost and waiting time by 4% compared to an existing technique.

Multimodal Travel Optimization, aiming to find and optimize multimodal travel options with given constraints including time, budget and destination features
Utilizing NLP tools, including large language models (LLM) to optimize text for live presentations and recording.

Selected Past Research Projects

There has been considerable growth and interest in industrial applications of machine learning (ML) in recent years. ML engineers, as a consequence, are in high demand across the industry, yet improving the efficiency of ML engineers remains a fundamental challenge. Automated machine learning (AutoML) has emerged as a way to save time and effort on repetitive tasks in ML pipelines, such as data pre-processing, feature engineering, model selection, hyperparameter optimization, and prediction result analysis. In this paper, we investigate the current state of AutoML tools aiming to automate these tasks. We conduct various evaluations of the tools on many datasets, in different data segments, to examine their performance, and compare their advantages and disadvantages on different test cases.

Streaming data processing has been gaining attention due to its application into a wide range of scenarios. To serve the booming demands of streaming data processing, many computation engines have been developed. However, there is still a lack of real-world benchmarks that would be helpful when choosing the most appropriate platform for serving real-time streaming needs. In order to address this problem, we developed a streaming benchmark for three representative computation engines: Flink, Storm and Spark Streaming. Instead of testing speed-of-light event processing, we construct a full data pipeline using Kafka and Redis in order to more closely mimic the real-world production scenarios. Based on our experiments, we provide a performance comparison of the three data engines in terms of 99th percentile latency and throughput for various configurations.

As the use of Apache Storm, a distributed real-time computation platform, becomes more widespread in community, scalability of the platform has become a key challenge. Currently, Apache Storm leverages Apache Zookeeper as a message broker for heartbeats and metrics. Using Apache Zookeeper in this manner may have been a good implementation choice at the inception of Apache Storm since it leverages the use of an existing proven open source technology instead of having to implement a proprietary message broker for Storm. However, using Apache Zookeeper in such a fashion has created one of the biggest bottlenecks in scaling Storm. In this paper, we first explore the key scalability issues that this architecture was creating as cluster and workload sizes grow at Yahoo. Next, we discuss in detail a new heartbeat and metrics service, called PaceMaker to address this scalability problem. Finally, we present experimental results to demonstrate that PaceMaker greatly relieves the workload and resource pressure on the ZooKeeper nodes which enables Storm to scale much further.

Advent of next generation gene sequencing machines has led to computationally intensive alignment problems that can take many hours on a modern computer. Considering the fast increasing rate of introduction of new short sequences that are sequenced, the large number of existing sequences and inaccuracies in the sequencing machines, short sequence alignment has become a major challenge in High Performance Computing.

In practice gaps as well as mismatches are found in genomic sequences, resulting in an edit distance problem. In this paper we describe the design of a distributed filter, based on shifted masks, to quickly reduce the number of potential matches in the presence of gaps and mismatches. Furthermore, we present a hybrid dynamic programming method, optimized for GPGPU targets, to process the filter outputs and find the accurate number of insertions, deletions and mismatches. Finally we present results from experiments performed on an NCSA cluster of 128 GPU units using the Hadoop framework.

Iterative-convergence algorithms are frequently used in a variety of domains to build models from large data sets. Cluster implementations of these algorithms are commonly realized using parallel programming models such as MapReduce. However, these implementations suffer from significant performance bottlenecks, especially due to large volumes of network traffic resulting from intermediate data and model updates during the iterations. To address these challenges, we propose partitioned iterative convergence (PIC), a new approach to programming and executing iterative convergence algorithms on frameworks like MapReduce. In PIC, we execute the iterative-convergence computation in two phases - the best-effort phase, which quickly produces a good initial model and the top-off phase, which further refines this model to produce the final solution. The best-effort phase iteratively performs the following steps: (a) partition the input data and the model to create several smaller, model-building sub-problems, (b) independently solve these sub-problems using iterative convergence computations, and (c) merge solutions of the sub-problems to create the next version of the model. This partitioned, loosely coupled execution of the computation produces a model of good quality, while drastically reducing network traffic due to intermediate data and model updates. The top-off phase further refines this model by employing the original iterative-convergence computation on the entire (un-partitioned) problem until convergence. However, the number of iterations executed in the top-off phase is quite small, resulting in a significant overall improvement in performance. We have implemented a library for PIC on top of the Hadoop MapReduce framework, and evaluated it using five popular iterative-convergence algorithms (Page Rank, K-Means clustering, neural network training, linear equation solver and image smoothing). Our evaluations on clusters ranging from 6 nodes to 256 nodes demonstrate a 2.5X-4X speedup compared to conventional implementations using Hadoop.


Copyright © 2023 · All Rights Reserved