[Read Paper] FPGA/DNN Co-Design: An Efficient Design Methodology for IoT Intelligence on the Edge

Published: by Creative Commons Licence (Last updated: )

FPGA/DNN Co-Design: An Efficient Design Methodology for IoT Intelligence on the Edge

Develop DNNs and the corresponding FPGA accelerators simultaneously. DNN designs should be FPGA-architecture driven, and FPGA accelerators should be DNN-aware.

Contributions

  • simultaneous FPGA/DNN co-design methodology with
    • hardware-oriented DNN model design following bottom-up approach;
    • DNN-driven FPGA accelerator design following top-down approach.
  • For DNN model design, we introduce a DNN template to guide the DNN generation with predictable performance and resource utilization, which greatly reduces the co-design search space. Based on such template, an automatic DNN model search engine, Auto-DNN, is proposed to effectively explore the design space and generate DNN models for desired QoR.
  • For FPGA accelerator design, we introduce a fine-grained tile-based pipeline architecture, which supports arbitrary DNNs generated by Auto-DNN using a library of highly optimized HLS IPs. Based on such architecture, an automatic HLS generator, Auto-HLS, is proposed to directly generate synthesizable C code of the DNN models, to conduct latency/resource estimation and FPGA accelerator generation.
  • We demonstrate our co-design approach on an object detection task targeting a PYNQ-Z1 embedded FPGA. DNN models are searched and mapped to the board with the state-of-the-art performance regarding accuracy, speed, and power efficiency.

Some Knowledge About DNN

  • DNN design is conducted either manually by machine learning experts or automatically by Neural Architecture Search (NAS) such as recursive neural networks (RNN) and reinforcement learning.
  • quantization and model compression are used to reduce DNN model size
  • latency-directed resource allocation and fine-grained pipeline architecture are proposed to deliver low latency during DNN inference

FPGA/DNN Co-Design

Design Space

  • DNN design
    • the number and types of layers
    • the number of input/output channels
    • residual connections
    • concatenations
  • FPGA accelerator
    • IP instance categories
    • IP reuse strategies
    • quantization schemes
    • parallel factors
    • data transfer behaviors
    • buffer sizes

For FPGA accelerator, use IP-based design strategy. Each IP supports a basic DNN layer type (e.g. Conv, Pooling), which must be instantiated and configured if the DNN model contains such type of layer.

Key Variables for FPGA/DNN Co-Design

Co-Design Flow

  • Co-Design Step 1: Building block and DNN modeling. Capture the hardware latency and resource utilization of DNN building blocks and hardware IP pool.

  • Co-Design Step 2: Building block selection.
    • Auto-DNN performs both coarse and fine-grained evaluations of the building blocks regarding three most important features: latency, resource utilization and accuracy.
    • Based on the evaluation, building blocks on the Pareto curve will be selected for further DNN exploration.
  • Co-Design Step 3: Hardware-aware DNN search and update.
    • Given selected building blocks, Auto-DNN explores the DNNs under given resource and latency constraints by using stochastic coordinate descent (SCD).
    • DNNs output from SCD are passed to Auto-HLS to get more precise performance and resource results
    • Are fed back to SCD for update.
    • The generated DNNs that meet performance and resource requirements are output for training and fine-tuning.

Four Components

The overall FPGA/DNN co-design flow

  • Bundle-Arch as a hardware-aware DNN template (green)
  • Tile-Arch as a low-latency accelerator template (yellow)
  • Auto-DNN for DNN exploration (blue), works as the primary component and outputs DNN models
  • Auto-HLS for FPGA accelerator synthesizable C code generation (pink), outputs the corresponding FPGA implementations of the DNN models