Tulio Pereira Bitencourt

 

OpenROAD™ is increasingly being used as the leading Open Source EDA solution by a large number of users in industry and academia who are starting to explore and build ASIC designs for a range of mainstream applications of today. Video-on-demand (VoD) is a rapidly growing market dominating >80% of current internet traffic. Video streaming applications demand fast performance to deliver real-time video at high quality, low latency and lower design costs. AV1 supports higher video resolution standards (e.g., 4K, 8K) to fulfill requirements for  video size, new video coding standards but fails to meet real-time throughput.

The Problem

AV1, an Open Media (AO Media)  video coding delivers good compression rates but does not meet real-time execution and throughput on software only implementations given its high complexity.

In order to develop the next generation of the encoder that meets the ultra-high performance needs (8K@120fps) for MRTR (Maximum Real Time Resolution), Tulio and his team at Informatics Institute, Federal University of Rio Grande do Sul, sought zero-cost OpenEDA solutions to explore and design enhanced design architectures to meet their design goals.

OpenROAD for AE-AV1 Arithmetic encoder design

OpenROAD™ enables free, open access to tools for RTL-GDS flows and open PDKs within 24 hours run times.  This was important for Tulio to explore multiple design architectures to meet his design goals i.e. high performance, low cost (small die area) in the fastest possible time at multiple technology nodes.

The  AE-AV1 , open, royalty-free, encoder implements arithmetic coding as a lossless data compression algorithm that improves upon its predecessor codecs – HEVC, VVC. VP9 etc. It optimizes key variables that depict a numeric interval (Low, Range) to encode  incoming symbols into a reduced bitstream based on probabilities of their appearance.

The original AV1 lacked the ability to predict hardware implementation results since it relied heavily on dynamic arrays for an unknown set of input symbols. These unique and stringent requirements made OpenROAD the only viable solution to design AE-AVI with a good confidence for manufacturability..

Design Architecture

The team first developed a baseline design in RTL as a multi-stage pipeline shown in the figure. :

               

Stage 1 Receives symbols, number of symbols in the alphabet and probabilities, and performs pre-calculation

Stage 2 Updates Range and is the critical path. It is optimized by splitting it into Stages 1 (pre-calculations) and 3 (Low updating).

Stage 2 couldn’t be further accelerated due to self-feeding constraints by the Range variable.

Stage-4 is the hardware-friendly stage that implements carry propagation and stores the compressed stream in output registers.

The reason behind separating the updating process of Range and Low in two different stages is to avoid increasing the critical path and, hence avoid adding additional delays into AE-AV1.

Ease-of-Use: Easy Installation, Configuration for Rapid Exploration

OpenROAD installation is fast and easy- docker based installation encapsulates the  complexity of required packages and libraries. It is fantastic how easy it is to just execute a command and have the entire toolset installed and configured all at once, without requiring any intermediary step. ”, says Tulio.

“The scripts used for running the entire OpenROAD flow are extremely easy to use and straightforward to configure. The majority of the work, when one wants to get quick results, is just related to adding the targeted design into the OpenROAD ‘designs’ folder and editing the configuration file. Furthermore, upon designing an architecture, it should be a great idea for any researcher to just use the open-source solutions developed by the OpenROAD team to find the best possible configuration for the design just created, as well as to acquire results quickly to optimize parameters. OpenROAD goes from an RTL input, in my case, a bunch of Verilog files, to GDSII without any extra step necessary aside from triggering the flow.”

“The OpenROAD tools are extremely easy to use and require a very low time to set up. If one considers that a conventional tool requires a lot of infrastructure just to handle licenses, and even more to process the different tasks it supports, it is easy  to conclude that running state-of-the-art paid EDA tools in a normal laptop would be unbearable. When running the OpenROAD flow, I used an older generation Dell Inspiron, which is not powerful and could barely handle the AV1 reference software (I had to boot my Linux OS without GUI for that). For OpenROAD, however, I executed everything on the same computer using an external hard-drive, which deprecates the performance even more. My computer did not struggle to run, and in almost no time the analyses were completed.”

To advance computational efficiency, OpenROAD leverages cloud resources to efficiently parallelize key stages in the design flow  and distribute processes across multiple machines and CPUs.

Meeting Design Goals- High Throughput, High performance, Low Area

Achieving high performance at the least cost was the design goal–power was not considered to be a key PPA metric for this version of the encoder.OpenLane was initially used to explore design configurations and flow. However, Tulio chose OpenROAD-flow-scripts for its support of ASAP7 along with other Open PDKS (sky130, nangate 45) needed for exploration across technology nodes. OpenROAD-flow-scripts delivers the complete RTL-GDSII flow including yosys for synthesis, OpenSTA for timing analysis and optimization and klayout for DRC checking.

Rapid Design Exploration for optimal Area and Performance

Tulio was successfully able to run several design experiments based on targeted design configurations for multiple frequencies and process technologies including SkyWater130nm (HS, HD), nangate 45nm and ASAP7 predictive PDK.

OpenROAD supports design exploration through an OpenLane python script that automatically runs multiple, user-defined experiments based on different synthesis strategies to optimize area and performance. The table below depicts a sample experiment showing different results for the design to optimize gate count, area and the worst path delay  for a given process.

                 

               

Results

The final design implementation of AE-AV1 using ASAP7 shows a significant improvement in gate count  and frequency, over the baseline AV1,  with the target MRTR (Maximum-Real-time_Resolution) goal of 8K@120fps, for real-time processing at the maximum possible AV1 resolution. Table below shows the exploration and  implementation results across multiple OpenPDKS.

  

   

ASAP7 delivered the best PPA, area and frequency improvements to area (24.48%)  and frequency ( 82.8%) as compared to the Nangate 45nm PDK. Significant area and performance improvements were possible only at  45nm and lower nodes.The gates count (i.e., area),  post-layout for all technologies was calculated by the actual area obtained by each circuit divided by the smaller two-input gate available on the PDK (i.e.,commonly a NAND-2 gate).

The final routed design implementation on ASAP7 is shown below.

 

                               Final Routed Design in ASAP7 in OpenROAD GUI

Tulio and his team were able to successfully meet their design goals using OpenROAD based flows, Open PDKs and the GUI, to explore and enhance the AV1 RTL design architecture and verify PPA  at multiple technologies all of which were available within a fully integrated, easy-to-use and open ecosystem. They achieved these results within a significantly shorter period of time than what it would have taken with conventional EDA tools and at zero tool and PDK costs They published a paper to showcase their innovative research in this paper (include link).

“The OpenROAD toolset has a very well-structured flow, which can be easily configured by adding a design and editing the configuration file, if one wants quick results, or changing multiple parameters for achieving better results. For someone who was not familiar with the OpenROAD flow, I was very happy to find out that it was extremely straightforward to use and to reach a RTL-to-GDSII flow. The way OpenROAD allows for certain parameters to be kept as default, or be changed according to the needs of the user is incredible and allows designers to reach impressive results with state-of-the-art PDKs”., concludes Tulio.

References

AE-AV1 publication :https://jics.org.br/ojs/index.php/JICS/article/view/564

Baseline AV1 https://ieeexplore.ieee.org/document/9800932