Posts By Tag
3D IC
- [Glean] 2.5D and 3D Interposer
Mar 23, 2021
Interposers are wide, extremely fast electrical signal conduits used between die in a 2.5D configuration.
3DConv
- [Glean] 3D Convolution: kernel will traverse in 3-D
Jun 12, 2020
This post introduces the 3D convolution.
3DIC
- [Read Paper] The N3XT Approach to Energy-Efficient Abundant-Data Computing
Jul 14, 2020
This paper introduces the framework of N3XT as well as its evaluation methodology. It also mentions the RRAM endurance resiliency.
ACA
- [Weekly Review] 2021/02/01-2021/02/07
Feb 07, 2021
The weekly review 2021/02/01-2021/02/07
- [Glean] Terms in the Intel Xeon E5-2600 V3 “Haswell-EP” Workstation Die Configurations
Feb 06, 2021
Some terms in Intel Xeon E5-2600 V3 Haswell-EP Workstation Die Configurations, including ACA, LLC, Cbo, SAD, QPI, IIO.
AMBA
- [Glean] AXI Bus Introduction
Apr 28, 2024
AXI Bus Introduction, including five channels: read address, read data, write address, write data, and write response, and VALID/READY Handshake.
AMD
- [Weekly Review] 2020/01/27-02/02
Feb 02, 2020
This review contains some hotchip19's slides and materials of HPC
ANSI
- [Glean] ANSI Escape Codes
Aug 06, 2020
ANSI escape sequences are a standard for in-band signaling to control the cursor location, color, and other options on video text terminals and terminal emulators.
APR
- [Glean] Library Formats: CCS, ECSM, and NLDM
Apr 03, 2023
This article provides an overview of three common library formats used in the design and analysis of digital circuits: Composite Current Source (CCS), Effective Current Source Model (ECSM), and Non-Linear Delay Model (NLDM), which is generated by ChatGPT4.
ASIC
- [Glean] Five Steps to Make an ASIC for Algorithm X
Apr 19, 2021
Five Steps to Make an ASIC for Algorithm X: Math first, Optimization Target, Hardware-Software Boundary, Building Blocks, Physical Implementation
AXI
- [Glean] AXI Bus Introduction
Apr 28, 2024
AXI Bus Introduction, including five channels: read address, read data, write address, write data, and write response, and VALID/READY Handshake.
Adapters
- [Emulate] Refinement of Computation and Communication
Jun 19, 2020
This post introduces Refinement of Computation and Communication in SystemC. Including different kinds of communication refinement, such as channel refinement, module refinement, hw-hw refinement, sw-sw refinement, hw-sw refinement. It also introduces the steps in communication refinement.
Agile
- [Glean] Computer Architectures for Next Generation Applications
Jan 18, 2021
This post is mainly translated from one zhihu answer by Bao Yungang. It introduces three laws: Moore_s law, Makimoto_s wave, Bell_s law and design methods and optimizations for performance and power as well as fragmented requirements in AIoT aging. These methods including reducing data movements, reducing data precious, improve parallelism and agile hardware development.
- [Glean] Impact Map
Sep 17, 2020
The introduction of impact map and how to create an impact map.
- [Weekly Review] 2020/04/20-26
Apr 26, 2020
This weekly review includes the introduction of SystemC, modeling, JVM Memory, Rocket Chip's interruption PLIC and CLINT. Also, including CS61B's Graph.
Algorithm
- [Read Paper] An efficient kernel transformation architecture for binary- and ternary-weight neural network inference
Mar 08, 2020
The initial kernels are transformed into much fewer and sparser ones, and the output feature maps are rebuilt from the immediate results. Overall, the number of total operations in convolution is reduced.
- [Read Paper] BitFlow: Exploiting Vector Parallelism for Binary Neural Networks on CPU
Feb 09, 2020
A gemm operator network three-level optimization framework for fully exploiting the computing power of BNNs on CPU.
- [Read Paper] A Survey on Methods and Theories of Quantized Neural Networks
Jan 05, 2020
There is no excerpt to show~
Algorithms and parallel computing
- [Glean] Algorithm DG (Directed Graph)
Dec 14, 2020
Usually, an algorithm is graphically represented as a DG to illustrate the data dependencies among the algorithm tasks.
- [Glean] Classifying Algorithms Based On Task Dependences
Dec 14, 2020
Algorithms can be broadly classified based on task dependences: Serial algorithms/Parallel algorithms/Serial–parallel algorithms (SPAs)/Nonserial–parallel algorithms (NSPAs)/Regular iterative algorithms (RIAs).
AllReduce
- [Glean] All-Reduce Operations
Jun 28, 2020
The all reduce operations are one kind of collective operations in NCCL and MPI lib.
Atomic
- [Weekly Review] 2020/05/11-17
May 17, 2020
This weekly review includes cache atomic and false sharing.
Attention
- [Read Paper] Attention Is All You Need
Jan 07, 2021
This blog is the combination of two blogs which introduces the paper Attention is All You Need. Shortages and one improvement is shown, too.
AyarLabs
- [Weekly Review] 2020/01/27-02/02
Feb 02, 2020
This review contains some hotchip19's slides and materials of HPC
BF16
- [Glean] Precision Format
Feb 21, 2024
Precision formats of floating-point and integer.
BNN
- [Read Paper] An efficient kernel transformation architecture for binary- and ternary-weight neural network inference
Mar 08, 2020
The initial kernels are transformed into much fewer and sparser ones, and the output feature maps are rebuilt from the immediate results. Overall, the number of total operations in convolution is reduced.
- [Read Paper] BitFlow: Exploiting Vector Parallelism for Binary Neural Networks on CPU
Feb 09, 2020
A gemm operator network three-level optimization framework for fully exploiting the computing power of BNNs on CPU.
Bell
- [Glean] Computer Architectures for Next Generation Applications
Jan 18, 2021
This post is mainly translated from one zhihu answer by Bao Yungang. It introduces three laws: Moore_s law, Makimoto_s wave, Bell_s law and design methods and optimizations for performance and power as well as fragmented requirements in AIoT aging. These methods including reducing data movements, reducing data precious, improve parallelism and agile hardware development.
Benchmark
- [Survey] MLforHPC Benchmarks
Feb 16, 2020
I attached my recent survey on ML4HPC benchmarks, including three papers 1) A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning; 2) HPC AI500: A Benchmark Suite for HPC AI Systems; 3) A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning; and some other presentation slides.
BlackBox
- [Tutorial] Chisel simulation with the hierarchical BlackBox Module
Jul 23, 2023
This article introduces how to do Chisel simulation with the hierarchical BlackBox Module.
BrainFloat16
- [Glean] Precision Format
Feb 21, 2024
Precision formats of floating-point and integer.
C++
- [Tutorial] Build SystemC Environment
Jun 18, 2020
How to Build SystemC Environment in Windows and Linux Ubuntu.
CAM
- [Glean] Content-addressable memory
Aug 28, 2020
Content-addressable memories (CAMs) are hardware search engines that are much faster than algorithmic approaches for search-intensive applications. CAMs are composed of conventional semiconductor memory (usually SRAM) with added comparison circuitry that enable a search operation to complete in a single clock cycle. The two most common search-intensive tasks that use CAMs are packet forwarding and packet classification in Internet routers. I introduce CAM architecture and circuits by first describing the application of address lookup in Internet routers. Then we describe how to implement this lookup function with CAM.
CCS
- [Glean] Library Formats: CCS, ECSM, and NLDM
Apr 03, 2023
This article provides an overview of three common library formats used in the design and analysis of digital circuits: Composite Current Source (CCS), Effective Current Source Model (ECSM), and Non-Linear Delay Model (NLDM), which is generated by ChatGPT4.
CI
- [Tutorial] Quick Debug and Run Test on Chisel Repos based on CI Flow Files
Feb 28, 2023
This tutorial introduces the quick way to debug the code of Chisel environment, such as Chisel3, playground, Rocket Chip, et al. The method introduced in this tutorial can also be used for other repos.
- [Weekly Review] 2020/03/30-04/05
Apr 05, 2020
Contains BDD, TDD, CI and CS61B. Plus the ICR in Chisel.
CLINT
- [Weekly Review] 2020/04/20-26
Apr 26, 2020
This weekly review includes the introduction of SystemC, modeling, JVM Memory, Rocket Chip's interruption PLIC and CLINT. Also, including CS61B's Graph.
CLion
- [Tutorial] Build SystemC Environment
Jun 18, 2020
How to Build SystemC Environment in Windows and Linux Ubuntu.
CNN
- [Survey] GEMM, Strassen and Winograd Fast Convolution Algorithms
Aug 21, 2021
This blog surveys the papers that optimise the convolution by GEMM, Strassen and Winograd algorithms.
- [Read Paper] Fast Algorithms for Convolutional Neural Networks
Aug 19, 2021
Winograd’ s minimal filtering algorithms compute minimal complexity convolution over small tiles, which makes them fast with small filters and small batch sizes. However, this paper introduces only stride 1.
- [Read Paper] Minimizing Computation in Convolutional Neural Networks
Aug 18, 2021
Strassen algorithm can compute 2x2 Matrix Mult using only 7 multiplications.
CPP
- [CodeStudy] Using C++ Lib Chrono to record execution time
Aug 26, 2020
Chrono is a flexible collection of types that track time with varying degrees of precision.
- [Glean] ANSI Escape Codes
Aug 06, 2020
ANSI escape sequences are a standard for in-band signaling to control the cursor location, color, and other options on video text terminals and terminal emulators.
- [CodeStudy] Undefined reference to one function in CPP
Aug 04, 2020
This blog shows a error caused by the separation of the declaration and implementation in template class.
CRV
- [Survey] Current Verification Methods And Their Limited Situations
Jan 11, 2021
This post introduces the current verification methods, steps and their limitations, including formal verification, constrained random verification (CRV) and hardware-software co-verification using virtual platform with hardware emulation and acceleration.
CS61B
- [Weekly Review] 2020/04/27-05/03
May 03, 2020
This weekly review contains spanning tree, A*, Primi's algorithm, Kruskal's algorithm, MST, dynamic programming and LIS. Also introduce some basic terms of Cache, such as offset, cache line, way, cache thrashing, et.
- [Weekly Review] 2020/04/13-19
Apr 19, 2020
Contains CS61B binary search tree, red-black trees, and hashing, heap; Three methods to dump vcd files (waveform) in Chisel Testers2; The first two generations verification and the coming third generation verification, plus the defination of simulator, emulation and formal verification.
- [Weekly Review] 2020/04/06-12
Apr 12, 2020
Contains cuset retiming, zen of Python, and some knowledge of CS61B.
- [Weekly Review] 2020/03/30-04/05
Apr 05, 2020
Contains BDD, TDD, CI and CS61B. Plus the ICR in Chisel.
CSC
- [Read Paper] EIE: Efficient Inference Engine on Compressed Deep Neural Network
Jan 12, 2020
Han Song's paper: EIE using CSC data format
CUDA
- [Glean] CUDA Programming Model
Apr 02, 2024
This post introduces the CUDA programming model, including kernels, thread hierarchy, thread blocks, thread block clusters, and memory hierarchy.
- [Glean] CUDA C++ Maximize Memory Throughput
Apr 02, 2024
This post introduces how to maximize memory throughput in CUDA C++ programming.
- [Glean] CUDA C++ Function Execution Space
Apr 02, 2024
CUDA C++ Function Execution Space, including `__global__`, `__device__`, `__host__`, and `__host__ __device__`. A summary table is provided for comparison at the end.
CXL
- [Glean] Compute Express Link (CXL) Introduction
Jun 26, 2024
This post introduces Compute Express Link (CXL) and its benefits.
Cache
- [Glean] Content-addressable memory
Aug 28, 2020
Content-addressable memories (CAMs) are hardware search engines that are much faster than algorithmic approaches for search-intensive applications. CAMs are composed of conventional semiconductor memory (usually SRAM) with added comparison circuitry that enable a search operation to complete in a single clock cycle. The two most common search-intensive tasks that use CAMs are packet forwarding and packet classification in Internet routers. I introduce CAM architecture and circuits by first describing the application of address lookup in Internet routers. Then we describe how to implement this lookup function with CAM.
- [Weekly Review] 2020/05/04-10
May 10, 2020
This weekly review includes some knowledge related to cache indexed and tagged methods, TLB, coherence between Cache and DMA, coherence between iCache and dCache, coherence between multiple processors.
- [Weekly Review] 2020/04/27-05/03
May 03, 2020
This weekly review contains spanning tree, A*, Primi's algorithm, Kruskal's algorithm, MST, dynamic programming and LIS. Also introduce some basic terms of Cache, such as offset, cache line, way, cache thrashing, et.
- [Weekly Review] 2019/12/09-15
Dec 15, 2019
This review contains come basic knowledge related to git, RISC-V, Chipyard, RoCC interface, SHA3 and cache.
Cache Coherency
- [Glean] Compute Express Link (CXL) Introduction
Jun 26, 2024
This post introduces Compute Express Link (CXL) and its benefits.
Cadence
- [Glean] Backend Design Flow: SDC and Timing Constraints Apr 22, 2023
- [Glean] Library Formats: CCS, ECSM, and NLDM
Apr 03, 2023
This article provides an overview of three common library formats used in the design and analysis of digital circuits: Composite Current Source (CCS), Effective Current Source Model (ECSM), and Non-Linear Delay Model (NLDM), which is generated by ChatGPT4.
- [Tutorial] Obtain Objects in the collection in Genus Using Tcl
Mar 06, 2023
collection is an extension provided by EDA vendors like Synopsys to support a list of objects in their Tcl API. Usually, most database query operations in Cadence and Synopsys would return a collection object. Complex query operations with filters may be slow in large design. Pre-store the query results might reduce runtime when it will be used in multiple places.
- [Tutorial] Background Execution of Reporting Commands in Cadence Genus
Mar 06, 2023
Cadence Genus supports doing report in parallel and running them in the background. This tutorial introduces how to conditional enable this feature using Tcl syntax.
- [Glean] Cadence Genus Synthesis Check List
Mar 06, 2023
Here lists several messages that should be checked from the Genus synthesis log file to make sure there is no error and mismatch between the simulation and synthesis results.
Category
- [Weekly Review] 2021/01/11-2021/01/17
Jan 17, 2021
The weekly review 2021/01/11-2021/01/17
- [Weekly Review] 2020/12/21-2021/01/10
Jan 10, 2021
The weekly review 2020/12/21-2021/01/10
- [Weekly Review] 2020/12/14-2020/12/20
Dec 20, 2020
The weekly review 2020/12/14-2020/12/20
- [Weekly Review] 2020/12/07-2020/12/13
Dec 13, 2020
The weekly review 2020/12/07-2020/12/13
- [Weekly Review] 2020/11/30-12/06
Dec 06, 2020
The weekly review 2020/11/30-12/06
- [Weekly Review] 2020/11/02-11/15
Nov 15, 2020
The weekly review from 2020/11/02 to 11/15
- [Weekly Review] 2020/10/26-11/01
Nov 01, 2020
The weekly review from 2020/10/26 to 11/01
- [Weekly Review] 2020/10/19-10/25
Oct 25, 2020
The weekly review from 2020/10/19 to 10/25
- [Weekly Review] 2020/10/12-10/18
Oct 18, 2020
The weekly review from 2020/10/12 to 10/18
- [Weekly Review] 2020/10/05-10/11
Oct 11, 2020
The weekly review from 2020/10/05 to 10/11
- [Weekly Review] 2020/09/21-10/04
Oct 04, 2020
The weekly review from 2020/09/21 to 10/04
- [Weekly Review] 2020/09/14-20
Sep 20, 2020
Last week, I finished one implementation of one circuit in Chisel. I also learned Impact Map and User Story Map during the week.
- [Weekly Review] 2020/08/31-09/13
Sep 13, 2020
Last week, I used Python to quickly verify my algorithm and its quite easy, especially reading from Excels and txt files. This week, I picked up Chisel and finished one submodule of the project. As these are heavily engineering works, I do think I need to learn some agile methods to boost my efficient and make my progress in the time table.
- [Weekly Review] 2020/08/24-30
Aug 30, 2020
This week, the accelerator model successfully simulated the first layer of MobileNet V2. I also took almost two days to write my research proposal.
- [Weekly Review] 2020/08/17-23
Aug 23, 2020
This week, I still worked on the Loosely Timed TLM.
- [Weekly Review] 2020/08/10-16
Aug 16, 2020
This week, I still worked on the Loosely Timed TLM. I'm a little knowledge the concept of memory cell and memory structure. I spent a lot of time on optimizing the memory structure. I also learned a little about the SystemC TLM quantum keeper, but didn't use it in my modelling as I didn't think I need it to sync the time.
- [Weekly Review] 2020/08/03-09
Aug 09, 2020
This week, I still worked on the Loosely Timed TLM. This post contains some thinking about implementation and modelling, Chisel and SystemC.
- [Weekly Review] 2020/07/27-08/02
Aug 02, 2020
This week, I focusing on the TLM of Arch v2 in SystemC. I only studied the the Doulos tutorial and example. I post one paper note I read several weeks ago.
- [Weekly Review] 2020/07/20-26
Jul 26, 2020
This week, I read the paper related to VLIW and learned the TLM tutorial video by accellera.org. I watched one tinyML workshop video related to the optimization of computation and memory.
- [Weekly Review] 2020/07/13-19
Jul 19, 2020
This week, I read the paper related to N3XT technology and read the textbook SystemC From the Ground Up
- [Weekly Review] 2020/07/06-12
Jul 12, 2020
This week, I read several papers related to Systolic Array.
- [Weekly Review] 2020/06/29-07/05
Jul 04, 2020
This week, I read one paper and read the SystemC example project the Simple Bus.
- [Weekly Review] 2020/06/22-28
Jun 28, 2020
This week, I read two and a half papers, including two TPU papers. I also learned several terms like round-robin arbitration, unified power format, all-reduce operations.
- [Weekly Review] 2020/06/15-21
Jun 21, 2020
This week, I read two papers, one is Deep Learning Hardware: Past, Present, and Future.
- [Weekly Review] 2020/06/08-14
Jun 14, 2020
This week, I began to learn the methodology of SystemC and its syntaxes. Also, I read some articles related to 3D convolution.
- [Weekly Review] 2019/12/23-29
Dec 29, 2019
This review contains some troubles I met while setting up the Chisel develop environment.
Cbo
- [Weekly Review] 2021/02/01-2021/02/07
Feb 07, 2021
The weekly review 2021/02/01-2021/02/07
- [Glean] Terms in the Intel Xeon E5-2600 V3 “Haswell-EP” Workstation Die Configurations
Feb 06, 2021
Some terms in Intel Xeon E5-2600 V3 Haswell-EP Workstation Die Configurations, including ACA, LLC, Cbo, SAD, QPI, IIO.
CentOS
- [Tutorial] Updating CentOS 7 Mirror to Aliyun
Aug 20, 2024
Updating the CentOS 7 mirror to the Aliyun mirror, including updating the CentOS 7 mirror configuration file and updating the mirror cache.
- [Tutorial] Schedule Command for CentOS using `crontab`
Jul 31, 2024
Schedule command for CentOS using `crontab`, including crontab syntax, examples, output, and environment variables.
- [Tutorial] Config CentOS 8 Server in 2022
Nov 02, 2022
Config CentOS 8 Server in 2022
CentOS7
- [Tutorial] Updating CentOS 7 Mirror to Aliyun
Aug 20, 2024
Updating the CentOS 7 mirror to the Aliyun mirror, including updating the CentOS 7 mirror configuration file and updating the mirror cache.
CentOS8
- [Tutorial] Config CentOS 8 Server in 2022
Nov 02, 2022
Config CentOS 8 Server in 2022
Cerebras
- [Workshop] Hot Chips 2020 Cerebras WSE Programming
Sep 04, 2020
Hot Chips 2020, Cerebras WSE Programming
Channel
- [Emulate] Interface and Channel Design
Jun 16, 2020
This post introduces Interface and Channel Design in SystemC. Including primitive and hierarchical channels.
ChatGPT
- [Tutorial] Obtain GPT Prompts.
Nov 12, 2023
This article introduces how to obtain the system prompts of the GPT, such as ChatGPT.
Chipyard
- [Tutorial] Establish Linux Environment for Chisel and Chipyard Developments
Jan 02, 2020
This tutorial will help you to establish a Linux environment for Chisel and Chipyard development quickly with little error.
- [Weekly Review] 2019/12/09-15
Dec 15, 2019
This review contains come basic knowledge related to git, RISC-V, Chipyard, RoCC interface, SHA3 and cache.
Chisel
- [Tutorial] Develop Chisel with Dev Container in Idea.
Oct 19, 2023
This article introduces how to develop Chisel with Dev Container in IntelliJ Idea.
- [Tutorial] Chisel simulation with the hierarchical BlackBox Module
Jul 23, 2023
This article introduces how to do Chisel simulation with the hierarchical BlackBox Module.
- [Tutorial] Quick Debug and Run Test on Chisel Repos based on CI Flow Files
Feb 28, 2023
This tutorial introduces the quick way to debug the code of Chisel environment, such as Chisel3, playground, Rocket Chip, et al. The method introduced in this tutorial can also be used for other repos.
- [CodeStudy] Scala Excel Read: POI XSSF
Nov 11, 2020
In this article, I introduced how to read a workbook, a sheet, a row and a special cell. The methods to obtain the row number and column number are also given. One way to filter empty cells is introduced too.
- [CodeStudy] RocketChip Optional Bundle
Oct 08, 2020
Learned some tips of Chisel via RocketChip. This introduces how to make the bundles be optional.
- [CodeStudy] RocketChip MultiWidthFIFO
Sep 25, 2020
Learned some tips of Chisel via RocketChip. This includes the Imp of Multi-Width-FIFO.
- [CodeStudy] Some Chisel details in the project RocketChip
Sep 24, 2020
Learned some tips of Chisel via RocketChip. Here includes come implicit classes, and one implementation of a gray counter.
- [Weekly Review] 2020/04/13-19
Apr 19, 2020
Contains CS61B binary search tree, red-black trees, and hashing, heap; Three methods to dump vcd files (waveform) in Chisel Testers2; The first two generations verification and the coming third generation verification, plus the defination of simulator, emulation and formal verification.
- [Weekly Review] 2020/03/30-04/05
Apr 05, 2020
Contains BDD, TDD, CI and CS61B. Plus the ICR in Chisel.
- [Tutorial] Suggest Using private Before val
Apr 03, 2020
This tutorial suggests to use private as a prefix of val when create a wire or register and mentions one possible problem when using private.
- [Weekly Review] 2020/03/23-29
Mar 29, 2020
This weekly review contains Scala intersection, union and complement, as well as ScalaDoc tags. Also, introduce using console to print colorful log. An error occurred while I using `RegInit` without giving the width to UInt.
- [CodeStudy] RocketChip Fuzzer
Mar 29, 2020
Study the code: fuzzer in rocketchip. Including how to generate source id, how to send requirement via TileLink.
- [Weekly Review] 2020/03/16-22
Mar 22, 2020
include Scala high-order function, Scala Regex, Chisel forkwithRegion. Also, the definition of `base address` and `offset`
- [Tutorial] TileLink Spec
Mar 21, 2020
The study of SiFive TileLink. Including TileLink buses, nodes and its chisel codes in chipyard.
- [Tutorial] TileLink RegMap
Mar 20, 2020
The study of TileLink TLRegMap
- [Weekly Review] 2020/03/09-15
Mar 15, 2020
git commit types. chisel `withRegion`, Scala `collect`, et.
- [Weekly Review] 2020/03/02-08
Mar 08, 2020
This weekly review contains the usage of Linux `tree` and Chisel `<>` as well as `:=`. Also, DecoupledDriver.
- [Weekly Review] 2020/02/17-23
Feb 23, 2020
This week, I continued on the survey of ML4HPC and found several papers of Indiana University, which described the definitions of ML4HPC and its subcategories. Also, I finished the draft implementation of GLB cluster with some test.
- [Weekly Review] 2020/02/10-16
Feb 16, 2020
This review contains one way to think matrix multiply, one Chisel class named DataMirror which can monitor the details of ports, and a discussing of how can RoCC accelerator communicate with L2 cache. Also, I continued my survey at AI for HPC.
- [Weekly Review] 2020/02/03-09
Feb 09, 2020
This review contains the usage of general data type in Chisel, the basic architecture of NN and the introductions of BNN and the BitFlow algorithm. Also, some materials related to HPC+ML.
- [Weekly Review] 2020/01/13-19
Jan 19, 2020
This weekly review contains Chisel syntax such as mem, Vec and mem test.
- [Weekly Review] 2020/01/06-12
Jan 12, 2020
This review contains some Chisel and Scala syntaxes studying notes.
Chrono
- [CodeStudy] Using C++ Lib Chrono to record execution time
Aug 26, 2020
Chrono is a flexible collection of types that track time with varying degrees of precision.
Co-Verification
- [Survey] Current Verification Methods And Their Limited Situations
Jan 11, 2021
This post introduces the current verification methods, steps and their limitations, including formal verification, constrained random verification (CRV) and hardware-software co-verification using virtual platform with hardware emulation and acceleration.
CoDesign
- [Weekly Review] 2020/05/18-24
May 24, 2020
This weekly review contains some backgrounds related to hardware-software co-design and ESL.
- [Read Paper] FPGA/DNN Co-Design: An Efficient Design Methodology for IoT Intelligence on the Edge
Feb 09, 2020
Develop DNNs and the corresponding FPGA accelerators simultaneously. DNN designs should be FPGA-architecture driven, and FPGA accelerators should be DNN-aware.
- [Weekly Review] 2020/01/27-02/02
Feb 02, 2020
This review contains some hotchip19's slides and materials of HPC
CoVerification
- [Weekly Review] 2020/05/18-24
May 24, 2020
This weekly review contains some backgrounds related to hardware-software co-design and ESL.
CodeBloat
- [Glean] Code bloat
Aug 25, 2020
In computer programming, code bloat is the production of program code (source code or machine code) that is perceived as unnecessarily long, slow, or otherwise wasteful of resources. Code bloat can be caused by inadequacies in the programming language in which the code is written, the compiler used to compile it, or the programmer writing it. Thus, while code bloat generally refers to source code size (as produced by the programmer), it can be used to refer instead to the generated code size or even the binary file size.
Codesign
- [Glean] Hardware/Software Codesign
Aug 29, 2020
As the name implies, Hardware/Software Codesign (HSCD) denotes design methodologies for electronic systems that exploit the trade-offs and the synergy of Hardware (HW) and Software (SW).
Coherence
- [Weekly Review] 2020/05/04-10
May 10, 2020
This weekly review includes some knowledge related to cache indexed and tagged methods, TLB, coherence between Cache and DMA, coherence between iCache and dCache, coherence between multiple processors.
Compression
- [Read Paper] Code compression for embedded VLIW processors using variable-to-fixed coding
Aug 26, 2020
In this paper, it introduces a compress method which uses variable-to-fixed coding schemes based on either Tunstall coding or arithmetic coding to overcome the communication bottleneck between memory and CPU, especially for RISC or VLIW processors, which have a code size bloating problem compare to CISC processors.
Compute Express Link
- [Glean] Compute Express Link (CXL) Introduction
Jun 26, 2024
This post introduces Compute Express Link (CXL) and its benefits.
Computer Architecture
- [Weekly Review] 2021/01/18-2021/01/24
Jan 24, 2021
The weekly review 2021/01/18-2021/01/24
- [Glean] Turning Tax
Jan 24, 2021
Turning Tax is a term taught in the advanced computer architecture by Paul H J Kelly at IC London. It describes the overhead (performance, cost, or energy) of the universality of the universal computing devices. It can be caused by instructions, data routing, register access and configurable ALU, where we can reduce the Turning Tax.
- [Glean] Tomasulo Algorithm
Jan 24, 2021
Tomasulo Algorithm eliminate three kinds of hazard RAW, WAR and WAW hazards by forwarding and renaming. The three stages of this algorithm are issue, execute and write back.
Converters
- [Emulate] Refinement of Computation and Communication
Jun 19, 2020
This post introduces Refinement of Computation and Communication in SystemC. Including different kinds of communication refinement, such as channel refinement, module refinement, hw-hw refinement, sw-sw refinement, hw-sw refinement. It also introduces the steps in communication refinement.
Crontab
- [Tutorial] Schedule Command for CentOS using `crontab`
Jul 31, 2024
Schedule command for CentOS using `crontab`, including crontab syntax, examples, output, and environment variables.
DAL
- [Workshop] Hot Chips 2020 Marvell Details ThunderX3 CPUs
Sep 04, 2020
Hot Chips 2020, Marvell Details ThunderX3 CPUs
- [Workshop] Hot Chips 2020 Google TPUv2 and TPUv3
Sep 04, 2020
Hot Chips 2020, Google TPUv2 and TPUv3
DG
- [Glean] Algorithm DG (Directed Graph)
Dec 14, 2020
Usually, an algorithm is graphically represented as a DG to illustrate the data dependencies among the algorithm tasks.
DLA
- [Glean] ResNet-50 Architecture and # MACs
Apr 03, 2021
This posts shows the basic architecture of the ResNet-50 and the number of weights as well as the MAC operations.
- [Workshop] Hot Chips 2020 Cerebras WSE Programming
Sep 04, 2020
Hot Chips 2020, Cerebras WSE Programming
- [Read Paper] C-LSTM Enabling Efficient LSTM using Structured Compression Techniques on FPGAs
Aug 11, 2020
This paper introduce some methods to accelerate the LSTM targeted on FPGA. It main utilize two methods: Block-Circulant Matrix; smaller coarse-grained pipelines with double-buffers
- [Read Paper] In-Datacenter Performance Analysis of a Tensor Processing Unit
Jul 31, 2020
This paper is the first generation of TPU. It introduces the goal and architecture of TPU and shows the performance comparation.
- [Read Paper] Deep Learning Hardware: Past, Present, and Future
Jul 05, 2020
This paper introduces the following aspects: 1) identifies trends in deep learning research that will influence hardware architectures and software platforms of the future; 2) Five DL use cases with different hardware requirements; 3) Present and Future Deep-Learning Architectures; 4) Requirements for Future DL Hardware and Software
- [Read Paper] Systolic Arrays for VLSI
Jul 04, 2020
This paper proposes new multiprocessor structures and parallel algorithms for processing some basic matrix computations which are capable of pipelining matrix computations with optimal speed-up.
- [Read Paper] A Domain-Specific Supercomputer for Training Deep Neural Networks
Jun 28, 2020
This paper introduces the TPU v2 and v3.
- [Read Paper] An efficient kernel transformation architecture for binary- and ternary-weight neural network inference
Mar 08, 2020
The initial kernels are transformed into much fewer and sparser ones, and the output feature maps are rebuilt from the immediate results. Overall, the number of total operations in convolution is reduced.
- [Read Paper] A Survey of Accelerator Architectures for Deep Neural Networks
Mar 01, 2020
A Survey of Accelerator Architectures for Deep Neural Networks.
- [Read Paper] FPGA/DNN Co-Design: An Efficient Design Methodology for IoT Intelligence on the Edge
Feb 09, 2020
Develop DNNs and the corresponding FPGA accelerators simultaneously. DNN designs should be FPGA-architecture driven, and FPGA accelerators should be DNN-aware.
- [Read Paper] BitFlow: Exploiting Vector Parallelism for Binary Neural Networks on CPU
Feb 09, 2020
A gemm operator network three-level optimization framework for fully exploiting the computing power of BNNs on CPU.
- [Read Paper] Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices
Jan 19, 2020
Sze Vivienne's Paper. This article contains more details of `Eyeriss V2`.
- [Read Paper] ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware
Jan 12, 2020
Han Song's Paper: ProxylessNAS. ProxylessNAS that can directly learn neural network architectures on the target task and target hardware without any proxy.
- [Read Paper] EIE: Efficient Inference Engine on Compressed Deep Neural Network
Jan 12, 2020
Han Song's paper: EIE using CSC data format
- [Read Paper] Design Automation for Efficient Deep Learning Computing
Jan 12, 2020
Han Song's Paper, Design Automation, NAS, contains automatically designing specialized fast models, auto channel pruning, and auto mixed-precision quantization.
- [Read Paper] Eyeriss v2: A Flexible and High-Performance Accelerator for Emerging Deep Neural Networks
Dec 29, 2019
Sze Vivienne's Paper. This article describes a performance analysis framework named `Eyexam` with roofline models and a DNN accelerator named `Eyeriss v2` which uses a hierarchical NoC.
- [Read Paper] Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks
Dec 29, 2019
Sze Vivienne's Paper. This article introduces the way to reduce the cost of data movement by exploiting data reuse in a multilevel memory hierarchy. Also, techniques such as compression and data-adaptive processing are introduced to save both memory bandwidth and processing power.
- [Read Paper] Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks
Dec 29, 2019
Sze Vivienne's Paper. This article proposed RS dataflow which can adapt to different CNN shape configurations and reduces all types of data movement through maximally utilizing the processing engine (PE) local storage, direct inter-PE communication and spatial parallelism.
- [Read Paper] Efficient Processing of Deep Neural Networks A Tutorial and Survey
Dec 29, 2019
This survey focuses on: processing of DNN inference, addressing the efficiency of the CONV layers.
DMA
- [Weekly Review] 2020/02/10-16
Feb 16, 2020
This review contains one way to think matrix multiply, one Chisel class named DataMirror which can monitor the details of ports, and a discussing of how can RoCC accelerator communicate with L2 cache. Also, I continued my survey at AI for HPC.
DNN
- [Glean] Grouped Convolution
May 27, 2021
Grouped convolution is a variant of convolution where the channels of the input feature map are grouped and convolution is performed independently for each grouped channels. There are also visualised graphs to show both spatial and channel domain of convolution, grouped convolution and other convolutions.
- [Read Paper] In-Datacenter Performance Analysis of a Tensor Processing Unit
Jul 31, 2020
This paper is the first generation of TPU. It introduces the goal and architecture of TPU and shows the performance comparation.
- [Read Paper] TEA-DNN: the Quest for Time-Energy-Accuracy Co-optimized Deep Neural Networks
Jun 28, 2020
This paper introduces a method to optimize DNN with Pareto-optimal models and Bayesian optimization.
- [Read Paper] An efficient kernel transformation architecture for binary- and ternary-weight neural network inference
Mar 08, 2020
The initial kernels are transformed into much fewer and sparser ones, and the output feature maps are rebuilt from the immediate results. Overall, the number of total operations in convolution is reduced.
- [Weekly Review] 2020/02/10-16
Feb 16, 2020
This review contains one way to think matrix multiply, one Chisel class named DataMirror which can monitor the details of ports, and a discussing of how can RoCC accelerator communicate with L2 cache. Also, I continued my survey at AI for HPC.
- [Read Paper] A Survey on Methods and Theories of Quantized Neural Networks
Jan 05, 2020
There is no excerpt to show~
- [Weekly Review] 2019/12/23-29
Dec 29, 2019
This review contains some troubles I met while setting up the Chisel develop environment.
- [Read Paper] Eyeriss v2: A Flexible and High-Performance Accelerator for Emerging Deep Neural Networks
Dec 29, 2019
Sze Vivienne's Paper. This article describes a performance analysis framework named `Eyexam` with roofline models and a DNN accelerator named `Eyeriss v2` which uses a hierarchical NoC.
- [Read Paper] Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks
Dec 29, 2019
Sze Vivienne's Paper. This article introduces the way to reduce the cost of data movement by exploiting data reuse in a multilevel memory hierarchy. Also, techniques such as compression and data-adaptive processing are introduced to save both memory bandwidth and processing power.
- [Read Paper] Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks
Dec 29, 2019
Sze Vivienne's Paper. This article proposed RS dataflow which can adapt to different CNN shape configurations and reduces all types of data movement through maximally utilizing the processing engine (PE) local storage, direct inter-PE communication and spatial parallelism.
- [Read Paper] Efficient Processing of Deep Neural Networks A Tutorial and Survey
Dec 29, 2019
This survey focuses on: processing of DNN inference, addressing the efficiency of the CONV layers.
- [Weekly Review] 2019/12/16-22
Dec 22, 2019
This review contains some basic knowledge of Scala, and the tutorial of deep learning accelerator designs named 'Efficient Processing of Deep Neural Network: from Algorithms to Hardware Architectures'.
DSA
- [Glean] Computer Architectures for Next Generation Applications
Jan 18, 2021
This post is mainly translated from one zhihu answer by Bao Yungang. It introduces three laws: Moore_s law, Makimoto_s wave, Bell_s law and design methods and optimizations for performance and power as well as fragmented requirements in AIoT aging. These methods including reducing data movements, reducing data precious, improve parallelism and agile hardware development.
- [Read Paper] Domain-Specific Hardware
Sep 24, 2020
Some tricks for design domain specific accelerators.
DSH
- [Weekly Review] 2019/12/23-29
Dec 29, 2019
This review contains some troubles I met while setting up the Chisel develop environment.
- [Weekly Review] 2019/12/16-22
Dec 22, 2019
This review contains some basic knowledge of Scala, and the tutorial of deep learning accelerator designs named 'Efficient Processing of Deep Neural Network: from Algorithms to Hardware Architectures'.
Data format
- [Weekly Review] 2021/01/18-2021/01/24
Jan 24, 2021
The weekly review 2021/01/18-2021/01/24
- [Glean] Posit: A Potential Replacement for IEEE 754
Jan 18, 2021
Introduce the type III Unum, Posit. Including its four parts, the computation of the real value and recommend exponent bits.
Debug
- [Weekly Review] 2020/12/21-2021/01/10
Jan 10, 2021
The weekly review 2020/12/21-2021/01/10
- [Weekly Review] 2020/12/14-2020/12/20
Dec 20, 2020
The weekly review 2020/12/14-2020/12/20
Dequantization
- [Workshop] tinyML Talks: AIML SoC for Ultra-Low-Power Mobile and IoT devices
Jul 22, 2020
This workshop introduces two computation optimization methods and three memory optimization methods. Address Generation HW Unit and pipeline architecture are helpful to computation optimization. Dequantization, entropy compression and pooling on the fly are benefit to memory optimization.
Design Compiler
- [Glean] Design Compiler ddc file
Nov 24, 2020
In general, it is binary file which contains both verilog gate level description and design constrains.
Design Pattern
- [Glean] GoF Design Pattern Overview
Jun 14, 2021
The Overview of GoF Design Patterns: Creational Patterns, Structural Patterns, Behaviour Patterns and J2EE Patterns.
DesignAutomation
- [Read Paper] Design Automation for Efficient Deep Learning Computing
Jan 12, 2020
Han Song's Paper, Design Automation, NAS, contains automatically designing specialized fast models, auto channel pruning, and auto mixed-precision quantization.
Detector
- [Read Paper] MaxpoolNMS: Getting Rid of NMS Bottlenecks in Two-Stage Object Detectors
Dec 04, 2020
MaxpoolNMS, a parallelizable alternative to the NMS algorithm, which is based on max-pooling classification score maps.
DianNao
- [Read Paper] A Survey of Accelerator Architectures for Deep Neural Networks
Mar 01, 2020
A Survey of Accelerator Architectures for Deep Neural Networks.
Docker
- [Tutorial] Develop Chisel with Dev Container in Idea.
Oct 19, 2023
This article introduces how to develop Chisel with Dev Container in IntelliJ Idea.
E1400
- [Glean] What Is Memory-Hard
Apr 19, 2021
In cryptography, a memory-hard function (MHF) is a function that costs significant amount of memory to evaluate. I also show the solution from Linzhi.
ECSM
- [Glean] Library Formats: CCS, ECSM, and NLDM
Apr 03, 2023
This article provides an overview of three common library formats used in the design and analysis of digital circuits: Composite Current Source (CCS), Effective Current Source Model (ECSM), and Non-Linear Delay Model (NLDM), which is generated by ChatGPT4.
EIE
- [Read Paper] EIE: Efficient Inference Engine on Compressed Deep Neural Network
Jan 12, 2020
Han Song's paper: EIE using CSC data format
ESL
- [Weekly Review] 2020/05/18-24
May 24, 2020
This weekly review contains some backgrounds related to hardware-software co-design and ESL.
ESWEEK
- [Workshop] ESWEEK 2021 QuantumFlow
Oct 09, 2021
This workshop introduces the basic of Quantum Circuit and its applications on Quantum Deep Learning Neural Networks.
Einstein Summation Convention
- [Glean] Einsum Introduction
Mar 06, 2024
This post introduces Einsum and its applications.
Einsum
- [Glean] Einsum Introduction
Mar 06, 2024
This post introduces Einsum and its applications.
Emulation
- [Weekly Review] 2020/04/13-19
Apr 19, 2020
Contains CS61B binary search tree, red-black trees, and hashing, heap; Three methods to dump vcd files (waveform) in Chisel Testers2; The first two generations verification and the coming third generation verification, plus the defination of simulator, emulation and formal verification.
Entropy
- [Workshop] tinyML Talks: AIML SoC for Ultra-Low-Power Mobile and IoT devices
Jul 22, 2020
This workshop introduces two computation optimization methods and three memory optimization methods. Address Generation HW Unit and pipeline architecture are helpful to computation optimization. Dequantization, entropy compression and pooling on the fly are benefit to memory optimization.
- [Glean] Entropy Coding and Tunstall Coding
Jun 14, 2020
This post introduces the concepts of entropy coding and Tunstall coding.
Eyeriss
- [Read Paper] Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices
Jan 19, 2020
Sze Vivienne's Paper. This article contains more details of `Eyeriss V2`.
- [Read Paper] Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks
Dec 29, 2019
Sze Vivienne's Paper. This article introduces the way to reduce the cost of data movement by exploiting data reuse in a multilevel memory hierarchy. Also, techniques such as compression and data-adaptive processing are introduced to save both memory bandwidth and processing power.
- [Read Paper] Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks
Dec 29, 2019
Sze Vivienne's Paper. This article proposed RS dataflow which can adapt to different CNN shape configurations and reduces all types of data movement through maximally utilizing the processing engine (PE) local storage, direct inter-PE communication and spatial parallelism.
- [Weekly Review] 2019/12/16-22
Dec 22, 2019
This review contains some basic knowledge of Scala, and the tutorial of deep learning accelerator designs named 'Efficient Processing of Deep Neural Network: from Algorithms to Hardware Architectures'.
EyerissV2
- [Read Paper] Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices
Jan 19, 2020
Sze Vivienne's Paper. This article contains more details of `Eyeriss V2`.
- [Read Paper] Eyeriss v2: A Flexible and High-Performance Accelerator for Emerging Deep Neural Networks
Dec 29, 2019
Sze Vivienne's Paper. This article describes a performance analysis framework named `Eyexam` with roofline models and a DNN accelerator named `Eyeriss v2` which uses a hierarchical NoC.
Eyexam
- [Read Paper] Eyeriss v2: A Flexible and High-Performance Accelerator for Emerging Deep Neural Networks
Dec 29, 2019
Sze Vivienne's Paper. This article describes a performance analysis framework named `Eyexam` with roofline models and a DNN accelerator named `Eyeriss v2` which uses a hierarchical NoC.
FFmpeg
- [Tutorial] Concatenate a List of `.flv` Files into a Single `.mp4` File
Aug 14, 2024
This tutorial introduces how to concatenate a list of `.flv` files into a single `.mp4` file using `ffmpeg`. And if the video and audio codecs are different, how to re-encode the video stream with the `libx264` codec.
FP16
- [Glean] Precision Format
Feb 21, 2024
Precision formats of floating-point and integer.
FP32
- [Glean] Precision Format
Feb 21, 2024
Precision formats of floating-point and integer.
FP8
- [Glean] Precision Format
Feb 21, 2024
Precision formats of floating-point and integer.
FalseSharing
- [Weekly Review] 2020/05/11-17
May 17, 2020
This weekly review includes cache atomic and false sharing.
Find
- [Glean] Remove Empty File Folder
Jan 11, 2021
Introduces two Linux command find and xargs. By combining this two command, you can easily remove empty directories and finish more jobs.
FloatingPoint
- [Glean] Precision Format
Feb 21, 2024
Precision formats of floating-point and integer.
FogComputation
- [Weekly Review] 2019/12/30-2020/01/05
Jan 05, 2020
This weekly review contains two terms: `Stackelberg Game` and `Fog Computation`.
Folding
- [Weekly Review] 2020/04/13-19
Apr 19, 2020
Contains CS61B binary search tree, red-black trees, and hashing, heap; Three methods to dump vcd files (waveform) in Chisel Testers2; The first two generations verification and the coming third generation verification, plus the defination of simulator, emulation and formal verification.
Formal
- [Weekly Review] 2021/02/01-2021/02/07
Feb 07, 2021
The weekly review 2021/02/01-2021/02/07
- [Glean] VC Formal Apps
Feb 02, 2021
This post introduces the Apps of VC formal, including AEP, FCA, CC, SEQ, FRV, FXP, FPV, FTA, FSV, DPV, RMA, AIP and FuSa.
- [Glean] Static Sign-Off, Formal & Simulation
Feb 01, 2021
This post introduces the differences of Static Sign-Off, Formal and Simulation by three key functional verification metrics. analysis always finishes, all the violations flagged by the analysis, 100% of the failures are found.
- [Weekly Review] 2021/01/25-2021/01/31
Jan 31, 2021
The weekly review 2021/01/25-2021/01/31
- [Glean] Formal Signoff
Jan 29, 2021
This post introduces VC Formal Apps, the reason and goal of formal signoff. Later seven steps of formal signoff based on Synopsys are listed.
- [Survey] Current Verification Methods And Their Limited Situations
Jan 11, 2021
This post introduces the current verification methods, steps and their limitations, including formal verification, constrained random verification (CRV) and hardware-software co-verification using virtual platform with hardware emulation and acceleration.
- [Weekly Review] 2020/04/13-19
Apr 19, 2020
Contains CS61B binary search tree, red-black trees, and hashing, heap; Three methods to dump vcd files (waveform) in Chisel Testers2; The first two generations verification and the coming third generation verification, plus the defination of simulator, emulation and formal verification.
Forwarding
- [Glean] Tomasulo Algorithm
Jan 24, 2021
Tomasulo Algorithm eliminate three kinds of hazard RAW, WAR and WAW hazards by forwarding and renaming. The three stages of this algorithm are issue, execute and write back.
Fusion Compiler
- [Glean] Logic Synthesis, Physical Synthesis
Nov 24, 2021
The difference between Logic Synthesis and Physical Synthesis.
GEMM
- [Weekly Review] 2021/08/30-2021/09/06
Sep 06, 2021
The weekly review 2021/08/30-2021/09/06
- [Survey] GEMM, Strassen and Winograd Fast Convolution Algorithms
Aug 21, 2021
This blog surveys the papers that optimise the convolution by GEMM, Strassen and Winograd algorithms.
GPT
- [Tutorial] Obtain GPT Prompts.
Nov 12, 2023
This article introduces how to obtain the system prompts of the GPT, such as ChatGPT.
GTKWave
- [Tutorial] GTKWave and Verdi Enum tcl Commands
Nov 26, 2021
In this tutorial, the tcl commands of GTKWave and Verdi for displaying enum are introduced.
Genus
- [Glean] Backend Design Flow: SDC and Timing Constraints Apr 22, 2023
- [Tutorial] Obtain Objects in the collection in Genus Using Tcl
Mar 06, 2023
collection is an extension provided by EDA vendors like Synopsys to support a list of objects in their Tcl API. Usually, most database query operations in Cadence and Synopsys would return a collection object. Complex query operations with filters may be slow in large design. Pre-store the query results might reduce runtime when it will be used in multiple places.
- [Tutorial] Background Execution of Reporting Commands in Cadence Genus
Mar 06, 2023
Cadence Genus supports doing report in parallel and running them in the background. This tutorial introduces how to conditional enable this feature using Tcl syntax.
- [Glean] Cadence Genus Synthesis Check List
Mar 06, 2023
Here lists several messages that should be checked from the Genus synthesis log file to make sure there is no error and mismatch between the simulation and synthesis results.
Git
- [Glean] Debugging Git Using Trace
Jan 08, 2021
Debugging Git Using GIT_TRACE and restart the gpg-agent to solve the gpg failed to sign the data.
- [Weekly Review] 2020/03/09-15
Mar 15, 2020
git commit types. chisel `withRegion`, Scala `collect`, et.
- [Weekly Review] 2019/12/09-15
Dec 15, 2019
This review contains come basic knowledge related to git, RISC-V, Chipyard, RoCC interface, SHA3 and cache.
GitHub
- [Glean] Branch Strategy
Jan 29, 2021
Version control strategies, like TBD, Git-Flow, GitLab-Flow.
GoF
- [Glean] GoF Design Pattern Overview
Jun 14, 2021
The Overview of GoF Design Patterns: Creational Patterns, Structural Patterns, Behaviour Patterns and J2EE Patterns.
Graph
- [Weekly Review] 2020/04/20-26
Apr 26, 2020
This weekly review includes the introduction of SystemC, modeling, JVM Memory, Rocket Chip's interruption PLIC and CLINT. Also, including CS61B's Graph.
Grouped Convolution
- [Glean] Grouped Convolution
May 27, 2021
Grouped convolution is a variant of convolution where the channels of the input feature map are grouped and convolution is performed independently for each grouped channels. There are also visualised graphs to show both spatial and channel domain of convolution, grouped convolution and other convolutions.
HLS
- [Weekly Review] 2020/05/18-24
May 24, 2020
This weekly review contains some backgrounds related to hardware-software co-design and ESL.
HM-NoC
- [Read Paper] Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices
Jan 19, 2020
Sze Vivienne's Paper. This article contains more details of `Eyeriss V2`.
HPC
- [Survey] MLforHPC Benchmarks
Feb 16, 2020
I attached my recent survey on ML4HPC benchmarks, including three papers 1) A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning; 2) HPC AI500: A Benchmark Suite for HPC AI Systems; 3) A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning; and some other presentation slides.
- [Weekly Review] 2020/02/03-09
Feb 09, 2020
This review contains the usage of general data type in Chisel, the basic architecture of NN and the introductions of BNN and the BitFlow algorithm. Also, some materials related to HPC+ML.
- [Weekly Review] 2020/01/27-02/02
Feb 02, 2020
This review contains some hotchip19's slides and materials of HPC
HPCforML
- [Survey] HPCforML and MLforHPC
Feb 23, 2020
This survey contains two papers 1) Understanding ML driven HPC: Applications and Infrastructure; 2) Learning Everywhere: A Taxonomy for the Integration of Machine Learning and Simulations.
HPML
- [Survey] HPCforML and MLforHPC
Feb 23, 2020
This survey contains two papers 1) Understanding ML driven HPC: Applications and Infrastructure; 2) Learning Everywhere: A Taxonomy for the Integration of Machine Learning and Simulations.
- [Survey] MLforHPC Benchmarks
Feb 16, 2020
I attached my recent survey on ML4HPC benchmarks, including three papers 1) A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning; 2) HPC AI500: A Benchmark Suite for HPC AI Systems; 3) A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning; and some other presentation slides.
- [Weekly Review] 2020/02/03-09
Feb 09, 2020
This review contains the usage of general data type in Chisel, the basic architecture of NN and the introductions of BNN and the BitFlow algorithm. Also, some materials related to HPC+ML.
- [Weekly Review] 2020/01/27-02/02
Feb 02, 2020
This review contains some hotchip19's slides and materials of HPC
HUAWEI
- [Weekly Review] 2020/01/27-02/02
Feb 02, 2020
This review contains some hotchip19's slides and materials of HPC
HWPE
- [CodeStudy] HPWE and Its Interfaces between Hardware and Software
Dec 26, 2020
This article introduces the MMIO register files of HWPE in Pulp SoC with its related c codes for simulation. It also gives hints of custom modifying the codes to use more registers or more events. I think it can also help you to understand the interaction between hardware and software.
Habana
- [Weekly Review] 2020/01/27-02/02
Feb 02, 2020
This review contains some hotchip19's slides and materials of HPC
Hanguang
- [Workshop] Hot Chips 2020 Alibabas Hanguang 800 NPU
Sep 04, 2020
Hot Chips 2020, Alibabas Hanguang 800 NPU from Anandtech
Haswell
- [Weekly Review] 2021/02/01-2021/02/07
Feb 07, 2021
The weekly review 2021/02/01-2021/02/07
- [Glean] Terms in the Intel Xeon E5-2600 V3 “Haswell-EP” Workstation Die Configurations
Feb 06, 2021
Some terms in Intel Xeon E5-2600 V3 Haswell-EP Workstation Die Configurations, including ACA, LLC, Cbo, SAD, QPI, IIO.
HierarchicalNeuralNetwork
- [Workshop] tinyML Talks: Low-Power Computer Vision
Jun 28, 2020
By utilizing hierarchical neural network, we can separate the big neural network into much small ones, hence reduce the training time and inference power consumption. However, it might increase the latency.
HotChips
- [Workshop] Hot Chips 2020 Marvell Details ThunderX3 CPUs
Sep 04, 2020
Hot Chips 2020, Marvell Details ThunderX3 CPUs
- [Workshop] Hot Chips 2020 Google TPUv2 and TPUv3
Sep 04, 2020
Hot Chips 2020, Google TPUv2 and TPUv3
- [Workshop] Hot Chips 2020 Cerebras WSE Programming
Sep 04, 2020
Hot Chips 2020, Cerebras WSE Programming
- [Workshop] Hot Chips 2020 Alibabas Hanguang 800 NPU
Sep 04, 2020
Hot Chips 2020, Alibabas Hanguang 800 NPU from Anandtech
- [Workshop] Hot Chips 2020 Highlights
Aug 31, 2020
Some highlights of Hot Chips 2020
Hotchip
- [Weekly Review] 2020/01/27-02/02
Feb 02, 2020
This review contains some hotchip19's slides and materials of HPC
Hwacha
- [Weekly Review] 2020/02/10-16
Feb 16, 2020
This review contains one way to think matrix multiply, one Chisel class named DataMirror which can monitor the details of ports, and a discussing of how can RoCC accelerator communicate with L2 cache. Also, I continued my survey at AI for HPC.
IIO
- [Weekly Review] 2021/02/01-2021/02/07
Feb 07, 2021
The weekly review 2021/02/01-2021/02/07
- [Glean] Terms in the Intel Xeon E5-2600 V3 “Haswell-EP” Workstation Die Configurations
Feb 06, 2021
Some terms in Intel Xeon E5-2600 V3 Haswell-EP Workstation Die Configurations, including ACA, LLC, Cbo, SAD, QPI, IIO.
INT16
- [Glean] Precision Format
Feb 21, 2024
Precision formats of floating-point and integer.
INT8
- [Glean] Precision Format
Feb 21, 2024
Precision formats of floating-point and integer.
Idea
- [Tutorial] Develop Chisel with Dev Container in Idea.
Oct 19, 2023
This article introduces how to develop Chisel with Dev Container in IntelliJ Idea.
Img2Col
- [Survey] GEMM, Strassen and Winograd Fast Convolution Algorithms
Aug 21, 2021
This blog surveys the papers that optimise the convolution by GEMM, Strassen and Winograd algorithms.
- [Weekly Review] 2021/07/26-2021/08/01
Aug 01, 2021
The weekly review 2021/07/26-2021/08/01
Impact Map
- [Glean] Impact Map
Sep 17, 2020
The introduction of impact map and how to create an impact map.
Inference
- [Read Paper] In-Datacenter Performance Analysis of a Tensor Processing Unit
Jul 31, 2020
This paper is the first generation of TPU. It introduces the goal and architecture of TPU and shows the performance comparation.
Inner Product
- [Weekly Review] 2021/08/30-2021/09/06
Sep 06, 2021
The weekly review 2021/08/30-2021/09/06
Innovus
- [Glean] Backend Design Flow: SDC and Timing Constraints Apr 22, 2023
- [Glean] Library Formats: CCS, ECSM, and NLDM
Apr 03, 2023
This article provides an overview of three common library formats used in the design and analysis of digital circuits: Composite Current Source (CCS), Effective Current Source Model (ECSM), and Non-Linear Delay Model (NLDM), which is generated by ChatGPT4.
Integer
- [Glean] Precision Format
Feb 21, 2024
Precision formats of floating-point and integer.
Interface
- [Emulate] Interface and Channel Design
Jun 16, 2020
This post introduces Interface and Channel Design in SystemC. Including primitive and hierarchical channels.
Interposer
- [Glean] 2.5D and 3D Interposer
Mar 23, 2021
Interposers are wide, extremely fast electrical signal conduits used between die in a 2.5D configuration.
IoU
- [Glean] IoU and NMS
Dec 03, 2020
Intersection over Union (IoU) is an evaluation metric used to measure the accuracy of an object detector on a particular dataset. Non-maximum suppression (NMS) is a technique to remove duplicates and false positives in object detection.
JVM
- [Weekly Review] 2020/04/20-26
Apr 26, 2020
This weekly review includes the introduction of SystemC, modeling, JVM Memory, Rocket Chip's interruption PLIC and CLINT. Also, including CS61B's Graph.
L2DC
- [Read Paper] Learning to Design Circuits
Jan 12, 2020
Han Song's Paper: Learning to Design Circuits. Using ML to design analogue circuits.
LLC
- [Weekly Review] 2021/02/01-2021/02/07
Feb 07, 2021
The weekly review 2021/02/01-2021/02/07
- [Glean] Terms in the Intel Xeon E5-2600 V3 “Haswell-EP” Workstation Die Configurations
Feb 06, 2021
Some terms in Intel Xeon E5-2600 V3 Haswell-EP Workstation Die Configurations, including ACA, LLC, Cbo, SAD, QPI, IIO.
LSTM
- [Read Paper] C-LSTM Enabling Efficient LSTM using Structured Compression Techniques on FPGAs
Aug 11, 2020
This paper introduce some methods to accelerate the LSTM targeted on FPGA. It main utilize two methods: Block-Circulant Matrix; smaller coarse-grained pipelines with double-buffers
Latex
- [Glean] Latex includegraphics decodearray
Oct 23, 2022
The use of the decodearray option to includegraphics allows the rendering colors to be changed.
- [Glean] Package hyperref: Token not allowed
May 15, 2022
Package hyperref Warning: Token not allowed in a PDF string (PDFDocEncoding):(hyperref). Using texorpdfstring to solve this.
- [Glean] A better way to apply subfloat
May 15, 2022
Simply wrapper the includegraphics with makebox to adjust the width of the caption and image separately.
- [Weekly Review] 2020/10/12-10/18
Oct 18, 2020
The weekly review from 2020/10/12 to 10/18
- [Tutorial] Layout Two Graphs at One Main Graph
Oct 17, 2020
Using column width to fit the image width
Linux
- [Tutorial] Updating CentOS 7 Mirror to Aliyun
Aug 20, 2024
Updating the CentOS 7 mirror to the Aliyun mirror, including updating the CentOS 7 mirror configuration file and updating the mirror cache.
- [Tutorial] Schedule Command for CentOS using `crontab`
Jul 31, 2024
Schedule command for CentOS using `crontab`, including crontab syntax, examples, output, and environment variables.
- [Tutorial] Config CentOS 8 Server in 2022
Nov 02, 2022
Config CentOS 8 Server in 2022
- [Tutorial] Linux aliases
Jan 30, 2021
Some useful Linux aliases.
- [Glean] Remove Empty File Folder
Jan 11, 2021
Introduces two Linux command find and xargs. By combining this two command, you can easily remove empty directories and finish more jobs.
- [Weekly Review] 2020/03/02-08
Mar 08, 2020
This weekly review contains the usage of Linux `tree` and Chisel `<>` as well as `:=`. Also, DecoupledDriver.
- [Weekly Review] 2020/01/20-26
Jan 26, 2020
This weekly review contains the usage of `grep` as well as Scala Patton Match
Linzhi
- [Glean] What Is Memory-Hard
Apr 19, 2021
In cryptography, a memory-hard function (MHF) is a function that costs significant amount of memory to evaluate. I also show the solution from Linzhi.
- [Glean] Five Steps to Make an ASIC for Algorithm X
Apr 19, 2021
Five Steps to Make an ASIC for Algorithm X: Math first, Optimization Target, Hardware-Software Boundary, Building Blocks, Physical Implementation
Logic Synthesis
- [Glean] Logic Synthesis, Physical Synthesis
Nov 24, 2021
The difference between Logic Synthesis and Physical Synthesis.
Low Power
- [Workshop] Using UPF for Low Power Design and Verification
Nov 27, 2021
This workshop describes the detailed information related to UPF. Including its definition, terminology, some Tcl commands, etc.
ML
- [Glean] Grouped Convolution
May 27, 2021
Grouped convolution is a variant of convolution where the channels of the input feature map are grouped and convolution is performed independently for each grouped channels. There are also visualised graphs to show both spatial and channel domain of convolution, grouped convolution and other convolutions.
- [Weekly Review] 2020/05/25-31
May 25, 2020
There is no excerpt to show~
- [Read Paper] Learning to Design Circuits
Jan 12, 2020
Han Song's Paper: Learning to Design Circuits. Using ML to design analogue circuits.
ML4HPC
- [Survey] HPCforML and MLforHPC
Feb 23, 2020
This survey contains two papers 1) Understanding ML driven HPC: Applications and Infrastructure; 2) Learning Everywhere: A Taxonomy for the Integration of Machine Learning and Simulations.
- [Survey] MLforHPC Benchmarks
Feb 16, 2020
I attached my recent survey on ML4HPC benchmarks, including three papers 1) A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning; 2) HPC AI500: A Benchmark Suite for HPC AI Systems; 3) A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning; and some other presentation slides.
- [Weekly Review] 2020/02/03-09
Feb 09, 2020
This review contains the usage of general data type in Chisel, the basic architecture of NN and the introductions of BNN and the BitFlow algorithm. Also, some materials related to HPC+ML.
MLPerf
- [Weekly Review] 2020/01/27-02/02
Feb 02, 2020
This review contains some hotchip19's slides and materials of HPC
MLforSystem
- [Survey] HPCforML and MLforHPC
Feb 23, 2020
This survey contains two papers 1) Understanding ML driven HPC: Applications and Infrastructure; 2) Learning Everywhere: A Taxonomy for the Integration of Machine Learning and Simulations.
MST
- [Weekly Review] 2020/04/27-05/03
May 03, 2020
This weekly review contains spanning tree, A*, Primi's algorithm, Kruskal's algorithm, MST, dynamic programming and LIS. Also introduce some basic terms of Cache, such as offset, cache line, way, cache thrashing, et.
Makimoto
- [Glean] Computer Architectures for Next Generation Applications
Jan 18, 2021
This post is mainly translated from one zhihu answer by Bao Yungang. It introduces three laws: Moore_s law, Makimoto_s wave, Bell_s law and design methods and optimizations for performance and power as well as fragmented requirements in AIoT aging. These methods including reducing data movements, reducing data precious, improve parallelism and agile hardware development.
Matplotlib
- [Weekly Review] 2021/01/18-2021/01/24
Jan 24, 2021
The weekly review 2021/01/18-2021/01/24
- [CodeStudy] Python Matplotlib
Jan 22, 2021
Using Python Matplotlib to draw graph and plot it in a function.
Maxpool
- [Read Paper] MaxpoolNMS: Getting Rid of NMS Bottlenecks in Two-Stage Object Detectors
Dec 04, 2020
MaxpoolNMS, a parallelizable alternative to the NMS algorithm, which is based on max-pooling classification score maps.
Memory Hard
- [Glean] What Is Memory-Hard
Apr 19, 2021
In cryptography, a memory-hard function (MHF) is a function that costs significant amount of memory to evaluate. I also show the solution from Linzhi.
Memory Pooling
- [Glean] Compute Express Link (CXL) Introduction
Jun 26, 2024
This post introduces Compute Express Link (CXL) and its benefits.
Memory Profiler
- [CodeStudy] Python Performance Analysis
Mar 03, 2021
This blog introduces Python memory and execution time analysis tools Memory Profiler and cProfile.
Mesh
- [Glean] Network Topology
Jul 10, 2020
Network topology is the arrangement of the elements of a communication network. Including point to point, bus, star, ring or circular, mesh, tree, hybrid, or daisy chain.
Mill
- [Weekly Review] 2020/02/24-03/01
Mar 01, 2020
This week I read a deep learning accelerator survey named 'A Survey of Accelerator Architectures for Deep Neural Networks'. Also, I tried to use a Scala library named `Breeze`.
- [Tutorial] Establish Linux Environment for Chisel and Chipyard Developments
Jan 02, 2020
This tutorial will help you to establish a Linux environment for Chisel and Chipyard development quickly with little error.
Model
- [Weekly Review] 2021/01/25-2021/01/31
Jan 31, 2021
The weekly review 2021/01/25-2021/01/31
- [Tutorial] Python Hardware Behaviour Model
Jan 31, 2021
In this tutorial, I will try to help you understand how to write a Python hardware behaviour model.
Modelling
- [Weekly Review] 2020/08/24-30
Aug 30, 2020
This week, the accelerator model successfully simulated the first layer of MobileNet V2. I also took almost two days to write my research proposal.
- [Weekly Review] 2020/08/17-23
Aug 23, 2020
This week, I still worked on the Loosely Timed TLM.
- [Weekly Review] 2020/08/10-16
Aug 16, 2020
This week, I still worked on the Loosely Timed TLM. I'm a little knowledge the concept of memory cell and memory structure. I spent a lot of time on optimizing the memory structure. I also learned a little about the SystemC TLM quantum keeper, but didn't use it in my modelling as I didn't think I need it to sync the time.
- [Weekly Review] 2020/08/03-09
Aug 09, 2020
This week, I still worked on the Loosely Timed TLM. This post contains some thinking about implementation and modelling, Chisel and SystemC.
- [Emulate] Different Abstraction Models
Jun 11, 2020
Six different abstraction models by nctu. The models includes specification model, component assembly model, bus arbitration model, cycle accurate computation and RTL model.
Models
- [Weekly Review] 2020/04/20-26
Apr 26, 2020
This weekly review includes the introduction of SystemC, modeling, JVM Memory, Rocket Chip's interruption PLIC and CLINT. Also, including CS61B's Graph.
Moore
- [Glean] Computer Architectures for Next Generation Applications
Jan 18, 2021
This post is mainly translated from one zhihu answer by Bao Yungang. It introduces three laws: Moore_s law, Makimoto_s wave, Bell_s law and design methods and optimizations for performance and power as well as fragmented requirements in AIoT aging. These methods including reducing data movements, reducing data precious, improve parallelism and agile hardware development.
MultiWidthFIFO
- [CodeStudy] RocketChip MultiWidthFIFO
Sep 25, 2020
Learned some tips of Chisel via RocketChip. This includes the Imp of Multi-Width-FIFO.
Multiprocessing
- [CodeStudy] Python Multiprocessing with Return Values Using Pool
Mar 05, 2021
This post introduces multiprocessing in Python with return values from the child processing using Pool class.
N3XT
- [Read Paper] The N3XT Approach to Energy-Efficient Abundant-Data Computing
Jul 14, 2020
This paper introduces the framework of N3XT as well as its evaluation methodology. It also mentions the RRAM endurance resiliency.
NAS
- [Read Paper] ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware
Jan 12, 2020
Han Song's Paper: ProxylessNAS. ProxylessNAS that can directly learn neural network architectures on the target task and target hardware without any proxy.
- [Read Paper] Learning to Design Circuits
Jan 12, 2020
Han Song's Paper: Learning to Design Circuits. Using ML to design analogue circuits.
- [Read Paper] Design Automation for Efficient Deep Learning Computing
Jan 12, 2020
Han Song's Paper, Design Automation, NAS, contains automatically designing specialized fast models, auto channel pruning, and auto mixed-precision quantization.
NCCL
- [Glean] NVIDIA NCCL Introduction
Jun 26, 2024
This post introduces NVIDIA Collective Communication Library (NCCL).
NLDM
- [Glean] Library Formats: CCS, ECSM, and NLDM
Apr 03, 2023
This article provides an overview of three common library formats used in the design and analysis of digital circuits: Composite Current Source (CCS), Effective Current Source Model (ECSM), and Non-Linear Delay Model (NLDM), which is generated by ChatGPT4.
NLP
- [Read Paper] Attention Is All You Need
Jan 07, 2021
This blog is the combination of two blogs which introduces the paper Attention is All You Need. Shortages and one improvement is shown, too.
NMS
- [Read Paper] MaxpoolNMS: Getting Rid of NMS Bottlenecks in Two-Stage Object Detectors
Dec 04, 2020
MaxpoolNMS, a parallelizable alternative to the NMS algorithm, which is based on max-pooling classification score maps.
- [Glean] IoU and NMS
Dec 03, 2020
Intersection over Union (IoU) is an evaluation metric used to measure the accuracy of an object detector on a particular dataset. Non-maximum suppression (NMS) is a technique to remove duplicates and false positives in object detection.
NPU
- [Read Paper] A Survey of Accelerator Architectures for Deep Neural Networks
Mar 01, 2020
A Survey of Accelerator Architectures for Deep Neural Networks.
NVIDIA
- [Glean] NVIDIA NCCL Introduction
Jun 26, 2024
This post introduces NVIDIA Collective Communication Library (NCCL).
- [Glean] CUDA Programming Model
Apr 02, 2024
This post introduces the CUDA programming model, including kernels, thread hierarchy, thread blocks, thread block clusters, and memory hierarchy.
- [Glean] CUDA C++ Maximize Memory Throughput
Apr 02, 2024
This post introduces how to maximize memory throughput in CUDA C++ programming.
- [Glean] CUDA C++ Function Execution Space
Apr 02, 2024
CUDA C++ Function Execution Space, including `__global__`, `__device__`, `__host__`, and `__host__ __device__`. A summary table is provided for comparison at the end.
- [Weekly Review] 2020/01/27-02/02
Feb 02, 2020
This review contains some hotchip19's slides and materials of HPC
Network
- [Glean] Network Topology
Jul 10, 2020
Network topology is the arrangement of the elements of a communication network. Including point to point, bus, star, ring or circular, mesh, tree, hybrid, or daisy chain.
NumPy
- [Glean] Einsum Introduction
Mar 06, 2024
This post introduces Einsum and its applications.
OperatorFusion
- [Glean] Operator Fusion
Jun 28, 2020
There are many opportunities, where fused operators—in terms of fused chains of basic operators—can significantly improve performance.
Outer Product
- [Weekly Review] 2021/08/30-2021/09/06
Sep 06, 2021
The weekly review 2021/08/30-2021/09/06
Outstanding
- [Glean] Outstanding Transactions
Feb 01, 2024
This post explains the concept of outstanding transactions.
PDK
- [Tutorial] Install PDK
Nov 23, 2020
How to install the PDK and obtain the db files
PIPT
- [Weekly Review] 2020/05/04-10
May 10, 2020
This weekly review includes some knowledge related to cache indexed and tagged methods, TLB, coherence between Cache and DMA, coherence between iCache and dCache, coherence between multiple processors.
PLIC
- [Weekly Review] 2020/04/20-26
Apr 26, 2020
This weekly review includes the introduction of SystemC, modeling, JVM Memory, Rocket Chip's interruption PLIC and CLINT. Also, including CS61B's Graph.
POI
- [CodeStudy] Scala Excel Read: POI XSSF
Nov 11, 2020
In this article, I introduced how to read a workbook, a sheet, a row and a special cell. The methods to obtain the row number and column number are also given. One way to filter empty cells is introduced too.
Parallelism
- [Glean] Computer Architectures for Next Generation Applications
Jan 18, 2021
This post is mainly translated from one zhihu answer by Bao Yungang. It introduces three laws: Moore_s law, Makimoto_s wave, Bell_s law and design methods and optimizations for performance and power as well as fragmented requirements in AIoT aging. These methods including reducing data movements, reducing data precious, improve parallelism and agile hardware development.
ParetoOptimal
- [Read Paper] TEA-DNN: the Quest for Time-Energy-Accuracy Co-optimized Deep Neural Networks
Jun 28, 2020
This paper introduces a method to optimize DNN with Pareto-optimal models and Bayesian optimization.
Physical Synthesis
- [Glean] Logic Synthesis, Physical Synthesis
Nov 24, 2021
The difference between Logic Synthesis and Physical Synthesis.
Posit
- [Glean] Posit: A Potential Replacement for IEEE 754
Jan 18, 2021
Introduce the type III Unum, Posit. Including its four parts, the computation of the real value and recommend exponent bits.
PowerWall
- [Weekly Review] 2020/05/25-31
May 25, 2020
There is no excerpt to show~
Programming Model
- [Glean] CUDA Programming Model
Apr 02, 2024
This post introduces the CUDA programming model, including kernels, thread hierarchy, thread blocks, thread block clusters, and memory hierarchy.
Prompt
- [Tutorial] Obtain GPT Prompts.
Nov 12, 2023
This article introduces how to obtain the system prompts of the GPT, such as ChatGPT.
Protocol
- [Glean] Compute Express Link (CXL) Introduction
Jun 26, 2024
This post introduces Compute Express Link (CXL) and its benefits.
Pulp
- [Weekly Review] 2020/12/21-2021/01/10
Jan 10, 2021
The weekly review 2020/12/21-2021/01/10
- [CodeStudy] HPWE and Its Interfaces between Hardware and Software
Dec 26, 2020
This article introduces the MMIO register files of HWPE in Pulp SoC with its related c codes for simulation. It also gives hints of custom modifying the codes to use more registers or more events. I think it can also help you to understand the interaction between hardware and software.
- [Weekly Review] 2020/12/14-2020/12/20
Dec 20, 2020
The weekly review 2020/12/14-2020/12/20
- [Tutorial] Configure and Run PULPissimo
Dec 20, 2020
From install gcc and SDK to Run simple runtime examples for PULPissimo
Pulpissimo
- [Weekly Review] 2020/12/21-2021/01/10
Jan 10, 2021
The weekly review 2020/12/21-2021/01/10
- [Weekly Review] 2020/12/14-2020/12/20
Dec 20, 2020
The weekly review 2020/12/14-2020/12/20
- [Tutorial] Configure and Run PULPissimo
Dec 20, 2020
From install gcc and SDK to Run simple runtime examples for PULPissimo
Python
- [Tutorial] Understanding Python Variable Assignment and Copying Mechanisms
Jan 09, 2025
In Python, understanding how variables are assigned and how copying mechanisms work is crucial for writing robust and efficient code. This blog post aims to demystify these concepts and provide best practices for handling different variable types to avoid unintended side effects.
- [Glean] Einsum Introduction
Mar 06, 2024
This post introduces Einsum and its applications.
- [CodeStudy] Python Multiprocessing with Return Values Using Pool
Mar 05, 2021
This post introduces multiprocessing in Python with return values from the child processing using Pool class.
- [CodeStudy] Python Performance Analysis
Mar 03, 2021
This blog introduces Python memory and execution time analysis tools Memory Profiler and cProfile.
- [Tutorial] Python Hardware Behaviour Model
Jan 31, 2021
In this tutorial, I will try to help you understand how to write a Python hardware behaviour model.
- [CodeStudy] Python Matplotlib
Jan 22, 2021
Using Python Matplotlib to draw graph and plot it in a function.
QNN
- [Read Paper] A Survey on Methods and Theories of Quantized Neural Networks
Jan 05, 2020
There is no excerpt to show~
QPI
- [Weekly Review] 2021/02/01-2021/02/07
Feb 07, 2021
The weekly review 2021/02/01-2021/02/07
- [Glean] Terms in the Intel Xeon E5-2600 V3 “Haswell-EP” Workstation Die Configurations
Feb 06, 2021
Some terms in Intel Xeon E5-2600 V3 Haswell-EP Workstation Die Configurations, including ACA, LLC, Cbo, SAD, QPI, IIO.
Qiskit
- [Workshop] ESWEEK 2021 QuantumFlow
Oct 09, 2021
This workshop introduces the basic of Quantum Circuit and its applications on Quantum Deep Learning Neural Networks.
Quantum Circuit
- [Workshop] ESWEEK 2021 QuantumFlow
Oct 09, 2021
This workshop introduces the basic of Quantum Circuit and its applications on Quantum Deep Learning Neural Networks.
Quantum Neural Networks
- [Workshop] ESWEEK 2021 QuantumFlow
Oct 09, 2021
This workshop introduces the basic of Quantum Circuit and its applications on Quantum Deep Learning Neural Networks.
RISC
- [Read Paper] Code compression for embedded VLIW processors using variable-to-fixed coding
Aug 26, 2020
In this paper, it introduces a compress method which uses variable-to-fixed coding schemes based on either Tunstall coding or arithmetic coding to overcome the communication bottleneck between memory and CPU, especially for RISC or VLIW processors, which have a code size bloating problem compare to CISC processors.
RISC-V
- [Tutorial] Establish Linux Environment for Chisel and Chipyard Developments
Jan 02, 2020
This tutorial will help you to establish a Linux environment for Chisel and Chipyard development quickly with little error.
- [Weekly Review] 2019/12/09-15
Dec 15, 2019
This review contains come basic knowledge related to git, RISC-V, Chipyard, RoCC interface, SHA3 and cache.
RRAM
- [Read Paper] The N3XT Approach to Energy-Efficient Abundant-Data Computing
Jul 14, 2020
This paper introduces the framework of N3XT as well as its evaluation methodology. It also mentions the RRAM endurance resiliency.
Refinement
- [Emulate] Refinement of Computation and Communication
Jun 19, 2020
This post introduces Refinement of Computation and Communication in SystemC. Including different kinds of communication refinement, such as channel refinement, module refinement, hw-hw refinement, sw-sw refinement, hw-sw refinement. It also introduces the steps in communication refinement.
RegMap
- [Tutorial] TileLink RegMap
Mar 20, 2020
The study of TileLink TLRegMap
Regex
- [Tutorial] Count the number of occurrences of a pattern in a file in Vim
Apr 18, 2024
Count the number of occurrences of a pattern in a file in Vim.
- [Tutorial] Remove the C/C++ comment block in Vim
Apr 17, 2024
This tutorial introduces how to remove the C/C++ comment blocks and line comments in Vim.
Renaming
- [Glean] Tomasulo Algorithm
Jan 24, 2021
Tomasulo Algorithm eliminate three kinds of hazard RAW, WAR and WAW hazards by forwarding and renaming. The three stages of this algorithm are issue, execute and write back.
ResNet-50
- [Glean] ResNet-50 Architecture and # MACs
Apr 03, 2021
This posts shows the basic architecture of the ResNet-50 and the number of weights as well as the MAC operations.
Retiming
- [Weekly Review] 2020/04/06-12
Apr 12, 2020
Contains cuset retiming, zen of Python, and some knowledge of CS61B.
RoCC
- [Weekly Review] 2020/02/10-16
Feb 16, 2020
This review contains one way to think matrix multiply, one Chisel class named DataMirror which can monitor the details of ports, and a discussing of how can RoCC accelerator communicate with L2 cache. Also, I continued my survey at AI for HPC.
- [Weekly Review] 2019/12/09-15
Dec 15, 2019
This review contains come basic knowledge related to git, RISC-V, Chipyard, RoCC interface, SHA3 and cache.
Rocket Chip
- [Tutorial] Quick Debug and Run Test on Chisel Repos based on CI Flow Files
Feb 28, 2023
This tutorial introduces the quick way to debug the code of Chisel environment, such as Chisel3, playground, Rocket Chip, et al. The method introduced in this tutorial can also be used for other repos.
RocketChip
- [CodeStudy] RocketChip Optional Bundle
Oct 08, 2020
Learned some tips of Chisel via RocketChip. This introduces how to make the bundles be optional.
RoeketChip
- [CodeStudy] RocketChip MultiWidthFIFO
Sep 25, 2020
Learned some tips of Chisel via RocketChip. This includes the Imp of Multi-Width-FIFO.
- [CodeStudy] Some Chisel details in the project RocketChip
Sep 24, 2020
Learned some tips of Chisel via RocketChip. Here includes come implicit classes, and one implementation of a gray counter.
RoundRobin
- [Glean] Round-Robin Arbitration
Jun 25, 2020
Round robin arbitration is a scheduling scheme which gives to each requestor its share of using a common resource for a limited time or data elements.
SAD
- [Weekly Review] 2021/02/01-2021/02/07
Feb 07, 2021
The weekly review 2021/02/01-2021/02/07
- [Glean] Terms in the Intel Xeon E5-2600 V3 “Haswell-EP” Workstation Die Configurations
Feb 06, 2021
Some terms in Intel Xeon E5-2600 V3 Haswell-EP Workstation Die Configurations, including ACA, LLC, Cbo, SAD, QPI, IIO.
SDC
SHA3
- [Weekly Review] 2019/12/09-15
Dec 15, 2019
This review contains come basic knowledge related to git, RISC-V, Chipyard, RoCC interface, SHA3 and cache.
SINT16
- [Glean] Precision Format
Feb 21, 2024
Precision formats of floating-point and integer.
SINT8
- [Glean] Precision Format
Feb 21, 2024
Precision formats of floating-point and integer.
Sbt
- [Tutorial] Establish Linux Environment for Chisel and Chipyard Developments
Jan 02, 2020
This tutorial will help you to establish a Linux environment for Chisel and Chipyard development quickly with little error.
Scala
- [CodeStudy] Scala Excel Read: POI XSSF
Nov 11, 2020
In this article, I introduced how to read a workbook, a sheet, a row and a special cell. The methods to obtain the row number and column number are also given. One way to filter empty cells is introduced too.
- [Weekly Review] 2020/03/23-29
Mar 29, 2020
This weekly review contains Scala intersection, union and complement, as well as ScalaDoc tags. Also, introduce using console to print colorful log. An error occurred while I using `RegInit` without giving the width to UInt.
- [Weekly Review] 2020/03/16-22
Mar 22, 2020
include Scala high-order function, Scala Regex, Chisel forkwithRegion. Also, the definition of `base address` and `offset`
- [Weekly Review] 2020/03/09-15
Mar 15, 2020
git commit types. chisel `withRegion`, Scala `collect`, et.
- [Weekly Review] 2020/02/24-03/01
Mar 01, 2020
This week I read a deep learning accelerator survey named 'A Survey of Accelerator Architectures for Deep Neural Networks'. Also, I tried to use a Scala library named `Breeze`.
- [Weekly Review] 2020/02/17-23
Feb 23, 2020
This week, I continued on the survey of ML4HPC and found several papers of Indiana University, which described the definitions of ML4HPC and its subcategories. Also, I finished the draft implementation of GLB cluster with some test.
- [Weekly Review] 2020/01/20-26
Jan 26, 2020
This weekly review contains the usage of `grep` as well as Scala Patton Match
- [Weekly Review] 2020/01/06-12
Jan 12, 2020
This review contains some Chisel and Scala syntaxes studying notes.
- [Tutorial] Establish Linux Environment for Chisel and Chipyard Developments
Jan 02, 2020
This tutorial will help you to establish a Linux environment for Chisel and Chipyard development quickly with little error.
- [Weekly Review] 2019/12/16-22
Dec 22, 2019
This review contains some basic knowledge of Scala, and the tutorial of deep learning accelerator designs named 'Efficient Processing of Deep Neural Network: from Algorithms to Hardware Architectures'.
ScalaDoc
- [Weekly Review] 2020/03/23-29
Mar 29, 2020
This weekly review contains Scala intersection, union and complement, as well as ScalaDoc tags. Also, introduce using console to print colorful log. An error occurred while I using `RegInit` without giving the width to UInt.
ScaledML
- [Workshop] ScaledML: Moore's Law in the age of AI Chips
Jul 12, 2020
The presentation by Jim Keller, Intel. Introduces the Moore's law, complexity limits and technology optimism for AI chips.
Scrum
- [Tutorial] Scrum Master Guide
Apr 07, 2023
Generated by ChatGPT4 -- This Scrum Master Guide provides an overview of the Scrum Master's role in five key Scrum meetings -- Sprint Planning, Daily Stand-up, Backlog Refinement, Sprint Review, and Sprint Retrospective. It also discusses the process of assisting in task breakdown and the importance of having a clear and concise Definition of Done. The guide is designed to be a helpful resource for Scrum Masters to facilitate team communication, collaboration, and continuous improvement.
Scrum Master
- [Tutorial] Scrum Master Guide
Apr 07, 2023
Generated by ChatGPT4 -- This Scrum Master Guide provides an overview of the Scrum Master's role in five key Scrum meetings -- Sprint Planning, Daily Stand-up, Backlog Refinement, Sprint Review, and Sprint Retrospective. It also discusses the process of assisting in task breakdown and the importance of having a clear and concise Definition of Done. The guide is designed to be a helpful resource for Scrum Masters to facilitate team communication, collaboration, and continuous improvement.
Signoff
- [Glean] Static Sign-Off, Formal & Simulation
Feb 01, 2021
This post introduces the differences of Static Sign-Off, Formal and Simulation by three key functional verification metrics. analysis always finishes, all the violations flagged by the analysis, 100% of the failures are found.
- [Glean] Formal Signoff
Jan 29, 2021
This post introduces VC Formal Apps, the reason and goal of formal signoff. Later seven steps of formal signoff based on Synopsys are listed.
Simulation
- [Glean] Static Sign-Off, Formal & Simulation
Feb 01, 2021
This post introduces the differences of Static Sign-Off, Formal and Simulation by three key functional verification metrics. analysis always finishes, all the violations flagged by the analysis, 100% of the failures are found.
Simulator
- [Weekly Review] 2020/04/13-19
Apr 19, 2020
Contains CS61B binary search tree, red-black trees, and hashing, heap; Three methods to dump vcd files (waveform) in Chisel Testers2; The first two generations verification and the coming third generation verification, plus the defination of simulator, emulation and formal verification.
Software2
- [Weekly Review] 2020/05/25-31
May 25, 2020
There is no excerpt to show~
SpanningTree
- [Weekly Review] 2020/04/27-05/03
May 03, 2020
This weekly review contains spanning tree, A*, Primi's algorithm, Kruskal's algorithm, MST, dynamic programming and LIS. Also introduce some basic terms of Cache, such as offset, cache line, way, cache thrashing, et.
Sparsity
- [Workshop] tinyML Talks: Saving 95% of Your Edge Power with Sparsity
Jun 28, 2020
It will explain these types of sparsity (time, space, connectivity, activation) in terms of edge processes, and how they affect computation on a practical level.
Strassen
- [Survey] GEMM, Strassen and Winograd Fast Convolution Algorithms
Aug 21, 2021
This blog surveys the papers that optimise the convolution by GEMM, Strassen and Winograd algorithms.
- [Read Paper] Minimizing Computation in Convolutional Neural Networks
Aug 18, 2021
Strassen algorithm can compute 2x2 Matrix Mult using only 7 multiplications.
Subfloat
- [Glean] A better way to apply subfloat
May 15, 2022
Simply wrapper the includegraphics with makebox to adjust the width of the caption and image separately.
Sublime
- [Weekly Review] 2020/12/21-2021/01/10
Jan 10, 2021
The weekly review 2020/12/21-2021/01/10
- [Weekly Review] 2020/12/14-2020/12/20
Dec 20, 2020
The weekly review 2020/12/14-2020/12/20
- [Tutorial] Configure Sublime Text for Verilog
Dec 20, 2020
Configure Sublime Text for Verilog and SystemVerilog
Survey
- [Read Paper] Deep Learning Hardware: Past, Present, and Future
Jul 05, 2020
This paper introduces the following aspects: 1) identifies trends in deep learning research that will influence hardware architectures and software platforms of the future; 2) Five DL use cases with different hardware requirements; 3) Present and Future Deep-Learning Architectures; 4) Requirements for Future DL Hardware and Software
- [Read Paper] Efficient Processing of Deep Neural Networks A Tutorial and Survey
Dec 29, 2019
This survey focuses on: processing of DNN inference, addressing the efficiency of the CONV layers.
Synthesis
- [Glean] Cadence Genus Synthesis Check List
Mar 06, 2023
Here lists several messages that should be checked from the Genus synthesis log file to make sure there is no error and mismatch between the simulation and synthesis results.
SystemC
- [Emulate] SystemC Communication Ports
Jul 18, 2020
SystemC Communication ports. Including chapter 11 and 12 of the book SystemC from the Ground Up. It introduces the communication ports in SystemC, port array, exports and shows all connectivity possibilities in SystemC as a handy reference.
- [Emulate] Syntaxes of SystemC
Jul 17, 2020
This blog introduces the different syntaxes of SystemC module and constructors.
- [Emulate] Structure Design Hierarchy
Jul 17, 2020
This post introduces six approaches of design hierarchy. It's the chapter 10 of the book From the Ground Up.
- [Emulate] Overview of SystemC Components
Jul 17, 2020
Overview of SystemC Components, the second chapter of from the ground up. It introduces the module and hierarchy as well as the three stages of SystemC simulation.
- [Emulate] Typical Modeling Patterns with TLM API
Jul 06, 2020
This blog introduces the typical modeling patterns with TML API, such as router, arbiter, pipeline. And also show how to explore the architecture with hub and spoke as well as cross bar switch.
- [Emulate] TLM API 1.0 in SystemC
Jul 03, 2020
TLM API 1.0 in SystemC. Including core interfaces and standard channels.
- [CodeStudy] The Implementation of TLM Simple Bus in SystemC
Jul 01, 2020
Study the code: the implementation of timed TLM in SystemC
- [Emulate] Transaction Level Modeling in SystemC
Jun 29, 2020
This post descripts the timed TLM, including the three sets of master interfaces, two sets of slave interfaces
- [Emulate] Refinement of Computation and Communication
Jun 19, 2020
This post introduces Refinement of Computation and Communication in SystemC. Including different kinds of communication refinement, such as channel refinement, module refinement, hw-hw refinement, sw-sw refinement, hw-sw refinement. It also introduces the steps in communication refinement.
- [Tutorial] Build SystemC Environment
Jun 18, 2020
How to Build SystemC Environment in Windows and Linux Ubuntu.
- [Emulate] Interface and Channel Design
Jun 16, 2020
This post introduces Interface and Channel Design in SystemC. Including primitive and hierarchical channels.
- [Emulate] Untimed TLM in SystemC
Jun 14, 2020
This post introduces Untimed TLM in SystemC
- [Emulate] SystemC and Its Simulation Kernel
Jun 12, 2020
This post introduces the components of SystemC, including Modules, Interfaces, Ports, Channels, Process and Events.
- [Emulate] Different Abstraction Models
Jun 11, 2020
Six different abstraction models by nctu. The models includes specification model, component assembly model, bus arbitration model, cycle accurate computation and RTL model.
- [Weekly Review] 2020/04/20-26
Apr 26, 2020
This weekly review includes the introduction of SystemC, modeling, JVM Memory, Rocket Chip's interruption PLIC and CLINT. Also, including CS61B's Graph.
SystemforML
- [Survey] HPCforML and MLforHPC
Feb 23, 2020
This survey contains two papers 1) Understanding ML driven HPC: Applications and Infrastructure; 2) Learning Everywhere: A Taxonomy for the Integration of Machine Learning and Simulations.
Systolic
- [Read Paper] Systolic Arrays for VLSI
Jul 04, 2020
This paper proposes new multiprocessor structures and parallel algorithms for processing some basic matrix computations which are capable of pipelining matrix computations with optimal speed-up.
- [Read Paper] A Domain-Specific Supercomputer for Training Deep Neural Networks
Jun 28, 2020
This paper introduces the TPU v2 and v3.
Systolic Array
- [Weekly Review] 2021/08/30-2021/09/06
Sep 06, 2021
The weekly review 2021/08/30-2021/09/06
TCM
- [Glean] Tightly Coupled Memory
Jul 20, 2020
The concept of Tightly Coupled Memory (TCM) and the difference between TCM and Cache.
TDD
- [Weekly Review] 2020/03/30-04/05
Apr 05, 2020
Contains BDD, TDD, CI and CS61B. Plus the ICR in Chisel.
TF32
- [Glean] Precision Format
Feb 21, 2024
Precision formats of floating-point and integer.
TLB
- [Weekly Review] 2020/05/04-10
May 10, 2020
This weekly review includes some knowledge related to cache indexed and tagged methods, TLB, coherence between Cache and DMA, coherence between iCache and dCache, coherence between multiple processors.
TLM
- [Emulate] Typical Modeling Patterns with TLM API
Jul 06, 2020
This blog introduces the typical modeling patterns with TML API, such as router, arbiter, pipeline. And also show how to explore the architecture with hub and spoke as well as cross bar switch.
- [Emulate] TLM API 1.0 in SystemC
Jul 03, 2020
TLM API 1.0 in SystemC. Including core interfaces and standard channels.
- [CodeStudy] The Implementation of TLM Simple Bus in SystemC
Jul 01, 2020
Study the code: the implementation of timed TLM in SystemC
- [Emulate] Transaction Level Modeling in SystemC
Jun 29, 2020
This post descripts the timed TLM, including the three sets of master interfaces, two sets of slave interfaces
- [Emulate] Untimed TLM in SystemC
Jun 14, 2020
This post introduces Untimed TLM in SystemC
- [Emulate] Different Abstraction Models
Jun 11, 2020
Six different abstraction models by nctu. The models includes specification model, component assembly model, bus arbitration model, cycle accurate computation and RTL model.
TPU
- [Workshop] Hot Chips 2020 Marvell Details ThunderX3 CPUs
Sep 04, 2020
Hot Chips 2020, Marvell Details ThunderX3 CPUs
- [Workshop] Hot Chips 2020 Google TPUv2 and TPUv3
Sep 04, 2020
Hot Chips 2020, Google TPUv2 and TPUv3
- [Read Paper] In-Datacenter Performance Analysis of a Tensor Processing Unit
Jul 31, 2020
This paper is the first generation of TPU. It introduces the goal and architecture of TPU and shows the performance comparation.
- [Read Paper] A Domain-Specific Supercomputer for Training Deep Neural Networks
Jun 28, 2020
This paper introduces the TPU v2 and v3.
- [Read Paper] A Survey of Accelerator Architectures for Deep Neural Networks
Mar 01, 2020
A Survey of Accelerator Architectures for Deep Neural Networks.
TSMC
- [Weekly Review] 2020/01/27-02/02
Feb 02, 2020
This review contains some hotchip19's slides and materials of HPC
Tcl
- [Tutorial] Obtain Objects in the collection in Genus Using Tcl
Mar 06, 2023
collection is an extension provided by EDA vendors like Synopsys to support a list of objects in their Tcl API. Usually, most database query operations in Cadence and Synopsys would return a collection object. Complex query operations with filters may be slow in large design. Pre-store the query results might reduce runtime when it will be used in multiple places.
- [Tutorial] Background Execution of Reporting Commands in Cadence Genus
Mar 06, 2023
Cadence Genus supports doing report in parallel and running them in the background. This tutorial introduces how to conditional enable this feature using Tcl syntax.
TensorFloat32
- [Glean] Precision Format
Feb 21, 2024
Precision formats of floating-point and integer.
TileLink
- [CodeStudy] RocketChip Fuzzer
Mar 29, 2020
Study the code: fuzzer in rocketchip. Including how to generate source id, how to send requirement via TileLink.
- [Tutorial] TileLink Spec
Mar 21, 2020
The study of SiFive TileLink. Including TileLink buses, nodes and its chisel codes in chipyard.
- [Tutorial] TileLink RegMap
Mar 20, 2020
The study of TileLink TLRegMap
Timing
- [Glean] Two terms for timing analysis: WNS and TNS
Oct 04, 2022
WNS (worst negative slack) and TNS (total negative slack), including a summary table from ChatGPT4.
Tomasulo
- [Glean] Tomasulo Algorithm
Jan 24, 2021
Tomasulo Algorithm eliminate three kinds of hazard RAW, WAR and WAW hazards by forwarding and renaming. The three stages of this algorithm are issue, execute and write back.
Trace
- [Glean] Debugging Git Using Trace
Jan 08, 2021
Debugging Git Using GIT_TRACE and restart the gpg-agent to solve the gpg failed to sign the data.
Transformer
- [Read Paper] Attention Is All You Need
Jan 07, 2021
This blog is the combination of two blogs which introduces the paper Attention is All You Need. Shortages and one improvement is shown, too.
Tunstall
- [Read Paper] Code compression for embedded VLIW processors using variable-to-fixed coding
Aug 26, 2020
In this paper, it introduces a compress method which uses variable-to-fixed coding schemes based on either Tunstall coding or arithmetic coding to overcome the communication bottleneck between memory and CPU, especially for RISC or VLIW processors, which have a code size bloating problem compare to CISC processors.
- [Glean] Entropy Coding and Tunstall Coding
Jun 14, 2020
This post introduces the concepts of entropy coding and Tunstall coding.
Turing Tax
- [Glean] Turning Tax
Jan 24, 2021
Turning Tax is a term taught in the advanced computer architecture by Paul H J Kelly at IC London. It describes the overhead (performance, cost, or energy) of the universality of the universal computing devices. It can be caused by instructions, data routing, register access and configurable ALU, where we can reduce the Turning Tax.
Tutorial
- [Read Paper] Efficient Processing of Deep Neural Networks A Tutorial and Survey
Dec 29, 2019
This survey focuses on: processing of DNN inference, addressing the efficiency of the CONV layers.
UINT16
- [Glean] Precision Format
Feb 21, 2024
Precision formats of floating-point and integer.
UINT8
- [Glean] Precision Format
Feb 21, 2024
Precision formats of floating-point and integer.
UPF
- [Workshop] Using UPF for Low Power Design and Verification
Nov 27, 2021
This workshop describes the detailed information related to UPF. Including its definition, terminology, some Tcl commands, etc.
- [Glean] Unified Power Format
Jun 25, 2020
The Unified Power Format (UPF) is a published IEEE standard. It is intended to ease the job of specifying, simulating and verifying IC designs that have a number of power states and power islands.
Ubuntu
- [Tutorial] Build SystemC Environment
Jun 18, 2020
How to Build SystemC Environment in Windows and Linux Ubuntu.
Unfolding
- [Weekly Review] 2020/04/13-19
Apr 19, 2020
Contains CS61B binary search tree, red-black trees, and hashing, heap; Three methods to dump vcd files (waveform) in Chisel Testers2; The first two generations verification and the coming third generation verification, plus the defination of simulator, emulation and formal verification.
Upmem
- [Weekly Review] 2020/01/27-02/02
Feb 02, 2020
This review contains some hotchip19's slides and materials of HPC
V2F
- [Read Paper] Code compression for embedded VLIW processors using variable-to-fixed coding
Aug 26, 2020
In this paper, it introduces a compress method which uses variable-to-fixed coding schemes based on either Tunstall coding or arithmetic coding to overcome the communication bottleneck between memory and CPU, especially for RISC or VLIW processors, which have a code size bloating problem compare to CISC processors.
VC Formal
- [Weekly Review] 2021/02/01-2021/02/07
Feb 07, 2021
The weekly review 2021/02/01-2021/02/07
- [Glean] VC Formal Apps
Feb 02, 2021
This post introduces the Apps of VC formal, including AEP, FCA, CC, SEQ, FRV, FXP, FPV, FTA, FSV, DPV, RMA, AIP and FuSa.
VCD
- [Weekly Review] 2020/04/13-19
Apr 19, 2020
Contains CS61B binary search tree, red-black trees, and hashing, heap; Three methods to dump vcd files (waveform) in Chisel Testers2; The first two generations verification and the coming third generation verification, plus the defination of simulator, emulation and formal verification.
VIPT
- [Weekly Review] 2020/05/04-10
May 10, 2020
This weekly review includes some knowledge related to cache indexed and tagged methods, TLB, coherence between Cache and DMA, coherence between iCache and dCache, coherence between multiple processors.
VIVT
- [Weekly Review] 2020/05/04-10
May 10, 2020
This weekly review includes some knowledge related to cache indexed and tagged methods, TLB, coherence between Cache and DMA, coherence between iCache and dCache, coherence between multiple processors.
VLIW
- [Read Paper] Code compression for embedded VLIW processors using variable-to-fixed coding
Aug 26, 2020
In this paper, it introduces a compress method which uses variable-to-fixed coding schemes based on either Tunstall coding or arithmetic coding to overcome the communication bottleneck between memory and CPU, especially for RISC or VLIW processors, which have a code size bloating problem compare to CISC processors.
- [Read Paper] Very Long Instruction Word Architectures and the ELI-512
Jul 20, 2020
This paper introduces three problems that VLIW meets and the corresponding possible solutions. I also posts the pros and cons of VLIW.
Verdi
- [Tutorial] GTKWave and Verdi Enum tcl Commands
Nov 26, 2021
In this tutorial, the tcl commands of GTKWave and Verdi for displaying enum are introduced.
Verification
- [Weekly Review] 2021/01/25-2021/01/31
Jan 31, 2021
The weekly review 2021/01/25-2021/01/31
- [Glean] Formal Signoff
Jan 29, 2021
This post introduces VC Formal Apps, the reason and goal of formal signoff. Later seven steps of formal signoff based on Synopsys are listed.
- [Survey] Current Verification Methods And Their Limited Situations
Jan 11, 2021
This post introduces the current verification methods, steps and their limitations, including formal verification, constrained random verification (CRV) and hardware-software co-verification using virtual platform with hardware emulation and acceleration.
Verilog
- [Weekly Review] 2020/12/21-2021/01/10
Jan 10, 2021
The weekly review 2020/12/21-2021/01/10
- [Weekly Review] 2020/12/14-2020/12/20
Dec 20, 2020
The weekly review 2020/12/14-2020/12/20
- [Tutorial] Configure Sublime Text for Verilog
Dec 20, 2020
Configure Sublime Text for Verilog and SystemVerilog
Version Control
- [Weekly Review] 2021/01/25-2021/01/31
Jan 31, 2021
The weekly review 2021/01/25-2021/01/31
- [Glean] Branch Strategy
Jan 29, 2021
Version control strategies, like TBD, Git-Flow, GitLab-Flow.
Vim
- [Tutorial] Remove Dupilicated Lines in Vim
Jun 13, 2024
This tutorial introduces how to remove duplicated lines in Vim.
- [Tutorial] Count the number of occurrences of a pattern in a file in Vim
Apr 18, 2024
Count the number of occurrences of a pattern in a file in Vim.
- [Tutorial] Remove the C/C++ comment block in Vim
Apr 17, 2024
This tutorial introduces how to remove the C/C++ comment blocks and line comments in Vim.
VirtualPrototypes
- [Weekly Review] 2020/05/18-24
May 24, 2020
This weekly review contains some backgrounds related to hardware-software co-design and ESL.
Windows
- [Tutorial] Build SystemC Environment
Jun 18, 2020
How to Build SystemC Environment in Windows and Linux Ubuntu.
Winograd
- [Survey] GEMM, Strassen and Winograd Fast Convolution Algorithms
Aug 21, 2021
This blog surveys the papers that optimise the convolution by GEMM, Strassen and Winograd algorithms.
- [Read Paper] Fast Algorithms for Convolutional Neural Networks
Aug 19, 2021
Winograd’ s minimal filtering algorithms compute minimal complexity convolution over small tiles, which makes them fast with small filters and small batch sizes. However, this paper introduces only stride 1.
- [Weekly Review] 2021/07/26-2021/08/01
Aug 01, 2021
The weekly review 2021/07/26-2021/08/01
Xargs
- [Glean] Remove Empty File Folder
Jan 11, 2021
Introduces two Linux command find and xargs. By combining this two command, you can easily remove empty directories and finish more jobs.
alias
- [Weekly Review] 2021/01/25-2021/01/31
Jan 31, 2021
The weekly review 2021/01/25-2021/01/31
- [Tutorial] Linux aliases
Jan 30, 2021
Some useful Linux aliases.
cProfile
- [CodeStudy] Python Performance Analysis
Mar 03, 2021
This blog introduces Python memory and execution time analysis tools Memory Profiler and cProfile.
data precesion
- [Glean] Precision Format
Feb 21, 2024
Precision formats of floating-point and integer.
db
- [Tutorial] Install PDK
Nov 23, 2020
How to install the PDK and obtain the db files
ddc
- [Glean] Design Compiler ddc file
Nov 24, 2020
In general, it is binary file which contains both verilog gate level description and design constrains.
formal
grey
- [CodeStudy] Some Chisel details in the project RocketChip
Sep 24, 2020
Learned some tips of Chisel via RocketChip. Here includes come implicit classes, and one implementation of a gray counter.
implicit class
- [CodeStudy] Some Chisel details in the project RocketChip
Sep 24, 2020
Learned some tips of Chisel via RocketChip. Here includes come implicit classes, and one implementation of a gray counter.
interface
- [CodeStudy] HPWE and Its Interfaces between Hardware and Software
Dec 26, 2020
This article introduces the MMIO register files of HWPE in Pulp SoC with its related c codes for simulation. It also gives hints of custom modifying the codes to use more registers or more events. I think it can also help you to understand the interaction between hardware and software.
offset
- [Weekly Review] 2020/04/27-05/03
May 03, 2020
This weekly review contains spanning tree, A*, Primi's algorithm, Kruskal's algorithm, MST, dynamic programming and LIS. Also introduce some basic terms of Cache, such as offset, cache line, way, cache thrashing, et.
playgroud
- [Tutorial] Quick Debug and Run Test on Chisel Repos based on CI Flow Files
Feb 28, 2023
This tutorial introduces the quick way to debug the code of Chisel environment, such as Chisel3, playground, Rocket Chip, et al. The method introduced in this tutorial can also be used for other repos.
tcl
- [Tutorial] GTKWave and Verdi Enum tcl Commands
Nov 26, 2021
In this tutorial, the tcl commands of GTKWave and Verdi for displaying enum are introduced.
tinyML
- [Workshop] tinyML Talks: AIML SoC for Ultra-Low-Power Mobile and IoT devices
Jul 22, 2020
This workshop introduces two computation optimization methods and three memory optimization methods. Address Generation HW Unit and pipeline architecture are helpful to computation optimization. Dequantization, entropy compression and pooling on the fly are benefit to memory optimization.
- [Workshop] tinyML Talks: Saving 95% of Your Edge Power with Sparsity
Jun 28, 2020
It will explain these types of sparsity (time, space, connectivity, activation) in terms of edge processes, and how they affect computation on a practical level.
- [Workshop] tinyML Talks: Low-Power Computer Vision
Jun 28, 2020
By utilizing hierarchical neural network, we can separate the big neural network into much small ones, hence reduce the training time and inference power consumption. However, it might increase the latency.
verification
- [Weekly Review] 2021/01/11-2021/01/17
Jan 17, 2021
The weekly review 2021/01/11-2021/01/17
- [Glean] Functional Verification Cycle and Challenges
Jan 14, 2021
This post introduces the four phases in the functional verification cycle and its four challenges to reduce time and improve robustness at each stage. The corresponding solutions are mentioned as well, which can be seen as the suitable situations for different verification methodology.
- [Glean] State Explosion Problem and Formal Verification Jan 13, 2021
way
- [Weekly Review] 2020/04/27-05/03
May 03, 2020
This weekly review contains spanning tree, A*, Primi's algorithm, Kruskal's algorithm, MST, dynamic programming and LIS. Also introduce some basic terms of Cache, such as offset, cache line, way, cache thrashing, et.
xargs
- [Weekly Review] 2021/01/11-2021/01/17
Jan 17, 2021
The weekly review 2021/01/11-2021/01/17