PhD thesis defense to be held on September 2, 2024, at 14:30 (Virtually)
Picture Credit: Dimitrios Danopoulos
Thesis title: Hardware-Software Co-Design of Deep Learning Accelerators: From Custom to Automated Design Methodologies
Abstract: In the past few years, Deep Learning (DL), a subset within the broader field of Artificial Intelligence (AI), has achieved remarkable success across a wide spectrum of applications, such as computer vision and autonomous systems. It has emerged as one of the most powerful and accurate techniques, often employing Deep Neural Networks (DNNs) that frequently surpass human performance. However, the ongoing AI research, heavily relies on high-performance systems to handle the vast amounts of computations and data involved. Advancements in technology and architecture have led to the integration of co-processors like Graphics Processing Units (GPUs) or Field Programmable Gate Arrays (FPGAs). These devices are types of hardware accelerators that play crucial roles in accelerating AI workloads and have facilitated the deployment but also the development of increasingly sophisticated AI models. To address the substantial computational demands of AI algorithms, such specialized hardware requires significant engineering effort to achieve optimal performance and efficiency. Moreover, the current state-of-the-art leverages approximate computing, an approach that permits computations to be less precise, trading off some precision for additional efficiency gains. However, evaluating the accuracy of approximate DNNs is cumbersome due to the lack of adequate support for approximate arithmetic in Deep Learning frameworks, such as PyTorch or Tensorflow. In this dissertation, we first employ FPGAs as accelerators for a wide range of AI applications and present various strategies and techniques to improve the efficiency and performance for such applications. We also examine automated frameworks to convert trained neural network models into optimized FPGA firmware, alleviating the hardware development challenges faced by engineers in the process. Additionally, we investigate the use of approximate computing to leverage the intrinsic error resilience of DNN models. We address the challenge of evaluating approximate DNNs by introducing two frameworks, AdaPT and TransAxx. These frameworks, built on PyTorch and optimized using CPU and GPU hardware acceleration respectively, facilitate approximate inference and approximation-aware retraining for various DNN models, Last, we propose a hardware-driven Monte Carlo Tree Search (MCTS) algorithm to efficiently search the space of possible approximate configurations on Vision Transformer (ViT) models and achieve significant trade-offs between accuracy and power.
Supervisor: Professor Dimitrios Soudris
PhD Student: Dimitrios Danopoulos