PhD thesis defense to be held on May 2, 2023, at 16:00 (NTUA Administration Building)

Picture Credit: Konstantina Koliogeorgi

Thesis title: Hardware acceleration techniques for Computation and Data Intensive Machine Learning and Bioinformatic Applications

Abstract: In this thesis, we focus on the hardware acceleration of two representative applications of modern healthcare: a ML-based prediction analysis and Read Alignment of genomic data. Both fields experience an intense growth in the latest decades and generate an immense amount of raw data.
Creating value and making decisions based on these data have proved to be a challenging task as both the datasets as well as the computational intensity of the algorithms continue to escalate. To cope with this issue, High Performance techniques such as hardware acceleration have been examined.
There is a great surge of works that leverage different programming models and frameworks to develop efficient FPGA-based accelerators, thanks to the bit-level customization capabilities of the devices.
However, the frameworks available for programming such devices cannot always straightforwardly fully exploit the acceleration prospects of the applications.
Furthermore, in complex applications existing solutions are characterized by a narrow view on real integration aspects, such as system wide communication and accelerator call overheads. In the current research work, the core contribution is based on the delivery of efficient solutions through strategic exploration of the design space and the synergy of hardware and software code modifications.
The first application that this thesis examines is efficient hardware acceleration of Support Vector Machine (SVM) classifiers. SVMs have played a crucial role in providing data fusion and high accuracy classification solutions for various, complex, non-linear problems. In this thesis, we explore an application that SVM hardware co-processors perform classification for ECG signal arrhythmia detection. The proposed methodology for accelerating the SVM has been implemented as a framework on top of the state-of-art Vivado High-Level Synthesis (HLS) tool. We propose a systematic two-level approach for SVM acceleration, which first optimizes the global structure of the original SVM's behavioral description to assist the tool in infering the inherent data- and instruction-level parallelism of the algorithm. The second level of optimization further refines the design through a targeted design exploration that matches the accelerator's memory architecture to its computation and memory access patterns.
In the second part of the thesis, we study the effect of acceleration techniques on one of the major bottlenecks of a typical genomic pipeline, which is short read alignment.
In our study we perform extensive profiling on a popular aligner and identify the bottleneck within alignment as the string-matching algorithm Smith-Waterman. Our approach is to provide a dataflow implementation for this task that targets FPGA devices by taking into account the implications of integrating the accelerator in the original software tool. We therefore present GANDAFL, a novel genome alignment dataflow architecture for Smith-Waterman Matrix-fill and Traceback stages to perform high throughput short-read alignment on Next Generation Sequencing data. We then propose a radical software restructuring to widely-used Bowtie2 aligner that implements an aggregation-batching strategy and feeds the accelerator in high-throughput streaming fashion with minimized transfer and call overheads.
The standalone solution delivers up to $\times$116 and $\times$2 speedup over state-of-the-art software and hardware accelerators respectively and GANDAFL-enhanced Bowtie2 aligner delivers a $\times$1.9 speedup.
We also examine an alternative approach to accelerating short read alignment.
We introduce a high throughput alignment system that combines Banded SmithWaterman accelerators and pre-filtering for alignment optimization by introducing a profile-driven accelerator methdology.
Extensive profiling of genomic datasets reveals low edit thresholds that can be leveraged by a heuristic of SmithWaterman, i.e. Banded SmithWaterman, to create resource-efficient accelerators that are customized to the edit profile of the input.
We therefore design and deliver a highly optimized dataflow implementation for Banded Smith-Waterman seed-extension targeting FPGA devices, which is leveraged within a multi-dataflow accelerated system.
The multi-dataflow system covers the full range of edits and therefore achieves both high throughput as well as high accuracy alignments. The evaluation shows that the proposed Banded Smith-Waterman accelerator delivers a $\times$34 speedup over state-of-the-art software aligners and $\times$1.53 and $\times$3 over state-of-the-art dataflow and RTL SmithWaterman accelerators respectively.
The multi-dataflow system delivers average speedups of $\times$1.8 over state-of-art multi-accelerator FPGA solutions that employ generic and input-agnostic accelerators.

Supervisor: Professor D. Soudris

PhD Student: Konstantina Koliogeorgi