Posts By Category

CodeStudy

Emulate

Glean

  • [Glean] NVIDIA NCCL Introduction Jun 26, 2024

    This post introduces NVIDIA Collective Communication Library (NCCL).

  • [Glean] Compute Express Link (CXL) Introduction Jun 26, 2024

    This post introduces Compute Express Link (CXL) and its benefits.

  • [Glean] AXI Bus Introduction Apr 28, 2024

    AXI Bus Introduction, including five channels: read address, read data, write address, write data, and write response, and VALID/READY Handshake.

  • [Glean] CUDA Programming Model Apr 02, 2024

    This post introduces the CUDA programming model, including kernels, thread hierarchy, thread blocks, thread block clusters, and memory hierarchy.

  • [Glean] CUDA C++ Maximize Memory Throughput Apr 02, 2024

    This post introduces how to maximize memory throughput in CUDA C++ programming.

  • [Glean] CUDA C++ Function Execution Space Apr 02, 2024

    CUDA C++ Function Execution Space, including `__global__`, `__device__`, `__host__`, and `__host__ __device__`. A summary table is provided for comparison at the end.

  • [Glean] Einsum Introduction Mar 06, 2024

    This post introduces Einsum and its applications.

  • [Glean] Precision Format Feb 21, 2024

    Precision formats of floating-point and integer.

  • [Glean] Outstanding Transactions Feb 01, 2024

    This post explains the concept of outstanding transactions.

  • [Glean] Backend Design Flow: SDC and Timing Constraints Apr 22, 2023

  • [Glean] Library Formats: CCS, ECSM, and NLDM Apr 03, 2023

    This article provides an overview of three common library formats used in the design and analysis of digital circuits: Composite Current Source (CCS), Effective Current Source Model (ECSM), and Non-Linear Delay Model (NLDM), which is generated by ChatGPT4.

  • [Glean] Cadence Genus Synthesis Check List Mar 06, 2023

    Here lists several messages that should be checked from the Genus synthesis log file to make sure there is no error and mismatch between the simulation and synthesis results.

  • [Glean] Latex includegraphics decodearray Oct 23, 2022

    The use of the decodearray option to includegraphics allows the rendering colors to be changed.

  • [Glean] Two terms for timing analysis: WNS and TNS Oct 04, 2022

    WNS (worst negative slack) and TNS (total negative slack), including a summary table from ChatGPT4.

  • [Glean] Package hyperref: Token not allowed May 15, 2022

    Package hyperref Warning: Token not allowed in a PDF string (PDFDocEncoding):(hyperref). Using texorpdfstring to solve this.

  • [Glean] A better way to apply subfloat May 15, 2022

    Simply wrapper the includegraphics with makebox to adjust the width of the caption and image separately.

  • [Glean] Logic Synthesis, Physical Synthesis Nov 24, 2021

    The difference between Logic Synthesis and Physical Synthesis.

  • [Glean] GoF Design Pattern Overview Jun 14, 2021

    The Overview of GoF Design Patterns: Creational Patterns, Structural Patterns, Behaviour Patterns and J2EE Patterns.

  • [Glean] Grouped Convolution May 27, 2021

    Grouped convolution is a variant of convolution where the channels of the input feature map are grouped and convolution is performed independently for each grouped channels. There are also visualised graphs to show both spatial and channel domain of convolution, grouped convolution and other convolutions.

  • [Glean] What Is Memory-Hard Apr 19, 2021

    In cryptography, a memory-hard function (MHF) is a function that costs significant amount of memory to evaluate. I also show the solution from Linzhi.

  • [Glean] Five Steps to Make an ASIC for Algorithm X Apr 19, 2021

    Five Steps to Make an ASIC for Algorithm X: Math first, Optimization Target, Hardware-Software Boundary, Building Blocks, Physical Implementation

  • [Glean] ResNet-50 Architecture and # MACs Apr 03, 2021

    This posts shows the basic architecture of the ResNet-50 and the number of weights as well as the MAC operations.

  • [Glean] 2.5D and 3D Interposer Mar 23, 2021

    Interposers are wide, extremely fast electrical signal conduits used between die in a 2.5D configuration.

  • [Glean] Terms in the Intel Xeon E5-2600 V3 “Haswell-EP” Workstation Die Configurations Feb 06, 2021

    Some terms in Intel Xeon E5-2600 V3 Haswell-EP Workstation Die Configurations, including ACA, LLC, Cbo, SAD, QPI, IIO.

  • [Glean] VC Formal Apps Feb 02, 2021

    This post introduces the Apps of VC formal, including AEP, FCA, CC, SEQ, FRV, FXP, FPV, FTA, FSV, DPV, RMA, AIP and FuSa.

  • [Glean] Static Sign-Off, Formal & Simulation Feb 01, 2021

    This post introduces the differences of Static Sign-Off, Formal and Simulation by three key functional verification metrics. analysis always finishes, all the violations flagged by the analysis, 100% of the failures are found.

  • [Glean] Formal Signoff Jan 29, 2021

    This post introduces VC Formal Apps, the reason and goal of formal signoff. Later seven steps of formal signoff based on Synopsys are listed.

  • [Glean] Branch Strategy Jan 29, 2021

    Version control strategies, like TBD, Git-Flow, GitLab-Flow.

  • [Glean] Turning Tax Jan 24, 2021

    Turning Tax is a term taught in the advanced computer architecture by Paul H J Kelly at IC London. It describes the overhead (performance, cost, or energy) of the universality of the universal computing devices. It can be caused by instructions, data routing, register access and configurable ALU, where we can reduce the Turning Tax.

  • [Glean] Tomasulo Algorithm Jan 24, 2021

    Tomasulo Algorithm eliminate three kinds of hazard RAW, WAR and WAW hazards by forwarding and renaming. The three stages of this algorithm are issue, execute and write back.

  • [Glean] Computer Architectures for Next Generation Applications Jan 18, 2021

    This post is mainly translated from one zhihu answer by Bao Yungang. It introduces three laws: Moore_s law, Makimoto_s wave, Bell_s law and design methods and optimizations for performance and power as well as fragmented requirements in AIoT aging. These methods including reducing data movements, reducing data precious, improve parallelism and agile hardware development.

  • [Glean] Posit: A Potential Replacement for IEEE 754 Jan 18, 2021

    Introduce the type III Unum, Posit. Including its four parts, the computation of the real value and recommend exponent bits.

  • [Glean] Functional Verification Cycle and Challenges Jan 14, 2021

    This post introduces the four phases in the functional verification cycle and its four challenges to reduce time and improve robustness at each stage. The corresponding solutions are mentioned as well, which can be seen as the suitable situations for different verification methodology.

  • [Glean] State Explosion Problem and Formal Verification Jan 13, 2021

  • [Glean] Remove Empty File Folder Jan 11, 2021

    Introduces two Linux command find and xargs. By combining this two command, you can easily remove empty directories and finish more jobs.

  • [Glean] Debugging Git Using Trace Jan 08, 2021

    Debugging Git Using GIT_TRACE and restart the gpg-agent to solve the gpg failed to sign the data.

  • [Glean] Algorithm DG (Directed Graph) Dec 14, 2020

    Usually, an algorithm is graphically represented as a DG to illustrate the data dependencies among the algorithm tasks.

  • [Glean] Classifying Algorithms Based On Task Dependences Dec 14, 2020

    Algorithms can be broadly classified based on task dependences: Serial algorithms/Parallel algorithms/Serial–parallel algorithms (SPAs)/Nonserial–parallel algorithms (NSPAs)/Regular iterative algorithms (RIAs).

  • [Glean] IoU and NMS Dec 03, 2020

    Intersection over Union (IoU) is an evaluation metric used to measure the accuracy of an object detector on a particular dataset. Non-maximum suppression (NMS) is a technique to remove duplicates and false positives in object detection.

  • [Glean] Design Compiler ddc file Nov 24, 2020

    In general, it is binary file which contains both verilog gate level description and design constrains.

  • [Glean] Impact Map Sep 17, 2020

    The introduction of impact map and how to create an impact map.

  • [Glean] Hardware/Software Codesign Aug 29, 2020

    As the name implies, Hardware/Software Codesign (HSCD) denotes design methodologies for electronic systems that exploit the trade-offs and the synergy of Hardware (HW) and Software (SW).

  • [Glean] Content-addressable memory Aug 28, 2020

    Content-addressable memories (CAMs) are hardware search engines that are much faster than algorithmic approaches for search-intensive applications. CAMs are composed of conventional semiconductor memory (usually SRAM) with added comparison circuitry that enable a search operation to complete in a single clock cycle. The two most common search-intensive tasks that use CAMs are packet forwarding and packet classification in Internet routers. I introduce CAM architecture and circuits by first describing the application of address lookup in Internet routers. Then we describe how to implement this lookup function with CAM.

  • [Glean] Code bloat Aug 25, 2020

    In computer programming, code bloat is the production of program code (source code or machine code) that is perceived as unnecessarily long, slow, or otherwise wasteful of resources. Code bloat can be caused by inadequacies in the programming language in which the code is written, the compiler used to compile it, or the programmer writing it. Thus, while code bloat generally refers to source code size (as produced by the programmer), it can be used to refer instead to the generated code size or even the binary file size.

  • [Glean] ANSI Escape Codes Aug 06, 2020

    ANSI escape sequences are a standard for in-band signaling to control the cursor location, color, and other options on video text terminals and terminal emulators.

  • [Glean] Tightly Coupled Memory Jul 20, 2020

    The concept of Tightly Coupled Memory (TCM) and the difference between TCM and Cache.

  • [Glean] Network Topology Jul 10, 2020

    Network topology is the arrangement of the elements of a communication network. Including point to point, bus, star, ring or circular, mesh, tree, hybrid, or daisy chain.

  • [Glean] Operator Fusion Jun 28, 2020

    There are many opportunities, where fused operators—in terms of fused chains of basic operators—can significantly improve performance.

  • [Glean] All-Reduce Operations Jun 28, 2020

    The all reduce operations are one kind of collective operations in NCCL and MPI lib.

  • [Glean] Unified Power Format Jun 25, 2020

    The Unified Power Format (UPF) is a published IEEE standard. It is intended to ease the job of specifying, simulating and verifying IC designs that have a number of power states and power islands.

  • [Glean] Round-Robin Arbitration Jun 25, 2020

    Round robin arbitration is a scheduling scheme which gives to each requestor its share of using a common resource for a limited time or data elements.

  • [Glean] Entropy Coding and Tunstall Coding Jun 14, 2020

    This post introduces the concepts of entropy coding and Tunstall coding.

  • [Glean] 3D Convolution: kernel will traverse in 3-D Jun 12, 2020

    This post introduces the 3D convolution.

ReadPaper

Survey

  • [Survey] GEMM, Strassen and Winograd Fast Convolution Algorithms Aug 21, 2021

    This blog surveys the papers that optimise the convolution by GEMM, Strassen and Winograd algorithms.

  • [Survey] Current Verification Methods And Their Limited Situations Jan 11, 2021

    This post introduces the current verification methods, steps and their limitations, including formal verification, constrained random verification (CRV) and hardware-software co-verification using virtual platform with hardware emulation and acceleration.

  • [Survey] HPCforML and MLforHPC Feb 23, 2020

    This survey contains two papers 1) Understanding ML driven HPC: Applications and Infrastructure; 2) Learning Everywhere: A Taxonomy for the Integration of Machine Learning and Simulations.

  • [Survey] MLforHPC Benchmarks Feb 16, 2020

    I attached my recent survey on ML4HPC benchmarks, including three papers 1) A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning; 2) HPC AI500: A Benchmark Suite for HPC AI Systems; 3) A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning; and some other presentation slides.

Tutorial

WeeklyReview

Workshop