Q&A

What are the core files required to use HeteroSTA?

To compile and run an application using HeteroSTA, you must include the relevant C/C++ header files (heterosta.h and netlistdb.h) in your source code. During the linking phase, you need to link against the appropriate library file—libheterosta.so for general use, or libheterosta.cpu.so for non-CUDA environments.

What physical units does HeteroSTA use, and what are the key considerations for placement data?

HeteroSTA's internal STA engine operates with standard industry units: femtofarads (fF) for capacitance and kilo-ohms (kΩ) for resistance.

When using heterosta_extract_rc_from_placement, there are two critical considerations:

Pin-level Coordinates: The function requires the exact coordinates of each pin, not the origin of its parent cell. Your application is responsible for calculating the absolute pin coordinates based on the cell's placement and the pin's relative position within the cell (often found in LEF files).
Unit Consistency: While the absolute units of your input coordinates are user-defined (e.g., microns), the unit-based RC parameters you provide must be scaled consistently to produce the required internal units. As stated in the API reference, the following relationships must hold:
- unit_cap * distance = Capacitance in fF (1e-15 Farads)
- unit_res * distance = Resistance in kΩ (1e3 Ohms)

Does HeteroSTA preserve the pin and cell order from my input files?

It depends on how the netlist is loaded. This is a critical distinction:

Using heterosta_read_netlist: The library will parse the Verilog file and create its own internal ordering. You should not assume this order matches your file. To find the internal ID for a given pin name, you must use the heterosta_lookup_pin function.
Using heterosta_set_netlistdb: This low-level approach gives you full control. The internal pin and cell IDs used by HeteroSTA will directly correspond to the indices you define when you construct and pass the NetlistDB object. This is the recommended method for tight integration with tools like placers, as it eliminates the need for ID remapping.

What are the key data formatting requirements when building a `NetlistDB` manually?

Based on the Netlist Loading Demonstration, two key requirements are:

IO Pins (Ports): The design's top-level input/output ports are not part of any instance. They should be associated with a special cell representing the top-level module, which by convention is the cell at index 0.
Instance Pin Naming: Pins belonging to a cell instance must use a hierarchical naming convention with a forward slash (/) as the separator, such as <instance_name>/<pin_name> (e.g., u1/a).

What is the typical API calling sequence?

The API is designed to be called in a logical sequence. A typical workflow, detailed in the Get Started guide, can be broken down into three phases:

Initialization and Setup (Called once):
- heterosta_init_license(): Initialize and validate the HeteroSTA license.
- heterosta_init_logger(): Initialize the logger with the callback function.
- heterosta_new(): Create the STAHoldings environment.
- heterosta_set_delay_calculator_*(): Choose a delay model (e.g., Arnoldi).
- heterosta_read_liberty(): Load timing libraries for both EARLY and LATE corners.
- heterosta_read_netlist() or heterosta_set_netlistdb(): Load the design netlist.
- heterosta_flatten_all(): Finalize the design data into a high-performance format.
- heterosta_build_graph(): Construct the internal timing graph.
- heterosta_read_sdc(): Load the design constraints.
Timing Analysis (Called in a loop for optimization):
- heterosta_extract_rc_from_placement(): Update parasitics based on new pin locations. (Alternatively, heterosta_read_spef is used for one-shot analysis).
- heterosta_update_delay(): Recalculate cell and net delays.
- heterosta_update_arrivals(): Propagate arrival times and calculate slacks. This must be called after heterosta_update_delay.
Reporting and Cleanup (Called after analysis):
- heterosta_report_wns_tns() or heterosta_report_slacks_at_max(): Retrieve timing results.
- heterosta_free(): Release all memory associated with the STAHoldings environment.

For timing-driven optimization, should I use setup slack or hold slack?

You should primarily use setup slack (heterosta_report_slacks_at_max). Setup time violations determine the maximum clock frequency of the design, making setup slack the most critical metric for performance optimization during physical design stages like placement.

When should I use `heterosta_extract_rc_from_placement` versus `heterosta_read_spef`?

Use heterosta_extract_rc_from_placement during iterative, time-driven placement. In this flow, cell positions change in every iteration. This API allows you to dynamically re-estimate parasitics based on the latest layout to guide the placer.
Use heterosta_read_spef for sign-off or post-layout analysis. When you have a static layout and a detailed parasitic file generated by an extraction tool, this API provides the most accurate results.

How can I debug a CUDA `IllegalAddress` error?

This error almost always indicates a memory location mismatch. When you call an API with use_cuda=true, the library expects all array pointers (e.g., xs and ys coordinates) to point to memory previously allocated on the GPU device. If you pass a pointer to standard CPU host memory, the GPU kernel cannot access that address, resulting in an IllegalAddress crash. Ensure all necessary data has been correctly transferred to the GPU before the API call.

Can I mix GPU-accelerated API calls with CPU calls for reporting?

Yes, this is a supported and recommended workflow. You can perform the computationally intensive tasks (like heterosta_update_delay and heterosta_update_arrivals) on the GPU by setting use_cuda=true. Then, for reporting functions like heterosta_report_wns_tns, you can set use_cuda=false. The library will handle the internal data transfer from GPU to CPU to generate the results.

What is the purpose of `nets_zero_array` and `nets_one_array` when building a `NetlistDB`?

These arrays are important for an accurate analysis. They explicitly tell the STA engine which nets are tied to a constant logic '0' (ground) and logic '1' (power). This information is crucial for correct logic propagation, identifying constant pins, and preventing the analysis of false timing paths. The Netlist Loading Demonstration provides a clear example of how these are populated.

Does HeteroSTA have requirements for instance pin naming?

Yes. HeteroSTA expects a hierarchical naming scheme using a forward slash (/).

U1/a is supported.
U1:a is not supported.

Should virtual placement blockages be passed to HeteroSTA as cells?

No. The timing engine is concerned only with real, physical circuit elements that are part of a timing path (standard cells, macros, IOs). Virtual elements like placement blockages or virtual IO pins do not have timing characteristics and should be filtered out by your application before you build the NetlistDB.

My timing report shows WNS and TNS as 0.0. What's the probable cause?

A WNS/TNS of 0.0, especially on a complex design, strongly suggests that the timing engine did not find any valid, constrained timing paths to analyze. Common causes include:

Incorrect NetlistDB construction: The cell connectivity, pin directions, or clock port identification might be wrong.
Missing or incorrect SDC constraints: A clock may not have been defined with create_clock, or I/O delays might be missing, leaving paths unconstrained.
API sequence error: Calling a reporting function like heterosta_report_wns_tns before successfully running heterosta_update_delay and heterosta_update_arrivals will result in a panic or incorrect zeroed values.
Incompatible Environment (GPU): When using use_cuda=true, an incompatible CUDA driver or toolkit version can sometimes cause silent failures in the GPU kernels, leading to zeroed results being returned.

Q&A

What are the core files required to use HeteroSTA?#

What physical units does HeteroSTA use, and what are the key considerations for placement data?#

Does HeteroSTA preserve the pin and cell order from my input files?#

What are the key data formatting requirements when building a NetlistDB manually?#

What is the typical API calling sequence?#

For timing-driven optimization, should I use setup slack or hold slack?#

When should I use heterosta_extract_rc_from_placement versus heterosta_read_spef?#

How can I debug a CUDA IllegalAddress error?#

Can I mix GPU-accelerated API calls with CPU calls for reporting?#

What is the purpose of nets_zero_array and nets_one_array when building a NetlistDB?#

Does HeteroSTA have requirements for instance pin naming?#

Should virtual placement blockages be passed to HeteroSTA as cells?#

My timing report shows WNS and TNS as 0.0. What's the probable cause?#