Seven Steps to an Accurate Worst-Case Power Analysis Using Xilinx Power Estimator (XPE)

By: Brian Philofsky

Power and cooling specifications for an FPGA design have to be determined early in the product’s design cycle, often even before the logic within the FPGA has been designed. An accurate worst-case power analysis early on helps you avoid the pitfalls of overdesigning or underdesigning your product’s power or cooling system.

The Xilinx Power Estimator (XPE) can perform a power analysis at any time during the design cycle. This White Paper describes a seven step procedure for analyzing your design’s power requirements using the Xilinx Power Estimator.
Seven Steps to an Accurate Power Estimation

As a necessary step in any FPGA design, power and cooling specifications need to be properly set in order to create a functioning and reliable system. In most cases, these thermal and power specifications need to be set prior to PCB design and, due to the flexibility of FPGAs, often the FPGA design is not completed or sometimes even started prior to system design and/or PCB fabrication. This creates an interesting challenge for FPGA designers, since thermal and power characteristics can vary dramatically depending on the bitstream (design), clocking, and data put into the chip.

Underdesigning the power or thermal system can make the FPGA operate out of specification, which could result in the FPGA not operating at the expected performance or potentially other more serious consequences. Overdesigning the power system is generally less serious but still not desirable since it can add unnecessary cost and complexity to the overall FPGA design. The task of power estimation is not a trivial one prior to completing the design.

The purpose of this paper is to simplify this task by splitting it into seven steps in order to achieve an accurate power estimation prior to design completion. Note that this paper is primarily focused on power analysis. There are several techniques for power optimization that can be explored and applied during the analysis and can result in significant power savings. For more details on power optimization, see White Paper WP285, Virtex-5 FPGA System Power Design Considerations, before performing power analysis.

Step 1: Obtain the latest version of Xilinx Power Estimator for the selected target device.

It is important to make sure you are using the latest version of the Xilinx Power Estimator (XPE) tool because power information is updated periodically to reflect the latest power modeling and characterization data. The latest version of XPE can be obtained from the Xilinx web site at http://www.xilinx.com/power. It is also helpful to check this web site occasionally during the design process to determine whether a new version has become available. If a new version is available, you may import the data from a previous version into the updated version using the Import from XPE... button on the updated version’s Summary tab. Keeping the Xilinx Power Estimator up to date ensures the most current power information will be used in the power analysis at all times during the design cycle.
Step 2: Complete the Device information on the Summary tab.

Make sure each field in the Device section of the Summary tab is properly set since each can have a significant effect on the end power calculation, particularly in static and clocking power.

- **Part** – An improperly set part can lead to incorrect quiescent and dynamic power component estimations such as the dynamic power reported for clocks. An incorrect part setting will also result in improperly reported available resources.
- **Package** – The package selection can affect the device’s heat dissipation and thus affect the end junction temperature. An incorrect junction temperature can result in an incorrect quiescent power calculation.
• **Grade** – Select the appropriate grade for the device (typically **Commercial** or **Industrial**). Some devices may have different quiescent power specifications depending on this setting. Setting this properly will also allow for the proper display of junction temperature limits for the chosen device.

• **Process** – For the purposes of a worst-case analysis, the recommended process setting is **Maximum**. The default setting of **Typical** will give a closer picture to what would be measured statistically, but changing the setting to **Maximum** will modify the power specification to worst-case values.

• **Speed Grade** (if available) – Some FPGA families may have different power specifications for different speed grades. For instance, the Virtex®-5 family has lower quiescent typical and maximum power specifications for the slowest (-1) speed grade.

• **Stepping** (if available) – Different steppings represent different silicon revisions that may have different power characteristics.
Step 3: Complete the Thermal Information on the Summary tab.

Set the proper thermal conditions in the Thermal Information section of the Summary tab.

- **Ambient Temp (°C)** – Specify the maximum possible temperature expected inside the enclosure that will house the FPGA design. This, along with airflow and other thermal dissipation paths (for example, the heatsink), will allow an accurate calculation of Junction Temperature which in turn will allow a more accurate calculation of quiescent current.

- **Airflow (LFM)** – The airflow across the chip is measured in Linear Feet per Minute (LFM). LFM can be calculated from the fan output in CFM (Cubic Feet per Minute) divided by the cross sectional area through which the air passes. Specific
placement of the FPGA and/or fan may have an effect on the effective air movement across the FPGA and thus the thermal dissipation. Note that the default for this parameter is 250 LFM. If you plan to operate the FPGA without active air flow (still air operation) then the 250 LFM default has to be changed to 0 LFM.

- **Heat Sink** (if available) – If a heatsink is used and more detailed thermal dissipation information is not available, choose an appropriate profile for the type of heatsink used. This, along with other entered parameters, will be used to help calculate an effective $\Theta_{JB}$, resulting in a more accurate junction temperature and quiescent power calculation. Note that some types of sockets may act as heatsinks, depending on the design and construction of the socket.

- **Board Selection** and **# of Board Layers** (if available) – Selecting an approximate size and stack of the board will help calculate the effective $\Theta_{JB}$ by taking into account the thermal conductivity of the board itself.

- **Custom $\Theta_{JB}$ ($^\circ$ C/W)** – In the event more accurate thermal modeling of the board and system is available or in the case of Heatsink and Board parameters unavailable in the version of XPE being used, Custom $\Theta_{JB}$ (printed circuit board thermal resistance) should be used in order to specify the amount of heat dissipation expected from the FPGA.

**Note:** In order to specify a Custom $\Theta_{JB}$, the Board Selection must be set to **Custom**. If you do specify a Custom $\Theta_{JB}$, you must also specify a Board Temperature for an accurate power calculation.

The more accurately Custom $\Theta_{JB}$ can be specified, the more accurate the estimated junction temperature will be, thus affecting quiescent power calculations.
Step 4: Set worst-case voltages for all supplies.

By default, each voltage rail for a particular device is set to its nominal value. In order to get an accurate power estimation, the worst-case or highest voltage value seen at the FPGA device needs to be specified. This can be generally calculated using the nominal output value and tolerances from supplies/regulators to each rail. If any significant IR (voltage) drop may be seen, particularly with supplies that are unregulated, the voltage drop should be accounted for in the maximum voltage calculation.

If you are not using some of the \( V_{CCO} \) or MGT voltage sources, leave the default values in the rows for those voltage sources.

Note: If you are targeting a Virtex-5 LXT, SXT or FXT FPGA, the default value for \( MGTAVCCPLL \) is the typical value for FXT (1.0V) but out of the 1.14V to 1.26V range for LXT or SXT. Make sure the proper voltage is set, particularly for this rail, when targeting those devices.

Figure 3: Voltage Source Summary - Summary Tab
Step 5: Enter clock and resource information.

If the design has already been run through the ISE tools or a previous revision of the design has been run and that revision be used as a good starting point for the analysis, the Map report (.mrp) from the design can be imported into XPE to help fill out clock and resource information. To do this, use the Import from ISE… button located on the Summary tab of XPE. Even if you do read in a Map report file, additional information and adjustments are often needed. It is best to consider the importing of this information as a good starting step for entering the information but not necessarily a complete solution. In any event, for each of the resource tabs, fill out the expected resources to be used in the design.

- **Clock Tree Power** – Enter each clock, the expected frequency, and the expected clocking resource it will use. If you are not certain which clocking resource will be used, keep the default selection for Type as **Global** clock. At this point, don’t worry about fanout. That will be taken care of in Step 6.

![Figure 4: Clock Tree Power Tab](image-url)
- **Logic Power** – In the Logic Power tab, enter an estimate for the number of Slice resources. The LUTs column should represent the number of LUTs used for arithmetic or logic, Shift Registers are the number of LUTs configured as SRLs, and SelectRAMs are the number of LUTs configured as memory. FFs are the number of registers or latches configured in the design. Use the different rows to separate different logic functions and/or characteristics (i.e. clock speed, toggle rate, etc.).

**Note:** For Virtex-5 designs, the LUT number should refer to the number of 6-input LUTs configured in the design. For FPGA architectures prior to Virtex-5, the LUT column should be filled out with the number of 4-input LUTs.

In the early stages of the FPGA design, it can be difficult to get accurate numbers for such resources, so a good suggestion is to work with large round numbers early (when the end resource count isn’t well known), and as the design progresses to update the values to better represent the final representation. If the design has been partially or fully implemented prior to estimation, Section 13 of the Map Report (.mrp file) can be useful in determining the implemented logic per hierarchy. This can prove very useful in breaking down the design entry into pieces that can be more easily adjusted thus providing both a better understanding and an improved result.

One other good tip to follow is: When entering the clock frequency information, use Excel’s capabilities to relate that cell to the cell populated in the Clock Tree.

![Logic Power Tab](image)

**Figure 5: Logic Power Tab**
Power tab. To do this, select the desired Clock (MHz) cell in the logic view, type =, and select the cell in the Clock Tree Power tab corresponding to the clock source for that logic. This should populate that cell with the value in the Clock Tree Power tab. The primary benefit of this methodology is that if the clock frequency would ever need to be changed, either by a specification change or by exploring power trade-offs vs. frequency, the value would only need to be updated in one place and can be reflected throughout the analysis. This methodology can also reduce the chance of errors and inconsistencies during the data entry.

- **I/O Power** – It is important to fill out the I/O Power tab of XPE properly to get an accurate overall estimation of all rails of the chip. Depending on the selected I/O Standard and I/O circuitry, a significant amount of power may be consumed not only in the VCCO rail but also VCCINT and VCCAUX rails. Many times it is simplest to enter each device interface separately and also to break out the interface signals to the data, control and clock signals. This makes it easier to specify different I/O Standards as well as other I/O characteristics such as load and toggle rates. When targeting a Virtex-5 FPGA, it is important to properly indicate the use of IODELAYs and IDELAYCTRLs in the design. These can represent a measurable amount of power on VCCAUX, which in turn can affect VCCINT due to thermal differences created by that rail.

![Figure 6: I/O Power Tab](image-url)
For the I/O current calculations, the predicted power assumes standard board trace and termination is applied. For details on the expected connectivity for a given I/O Standard, refer to the following:

- **Virtex-5** – The “SelectIO Resources” chapter in the **Virtex-5 FPGA User Guide**.
- **Virtex-4** – The “SelectIO Resources” chapter in the **Virtex-4 FPGA User Guide**.
- **Spartan®-3 Generation** – The “Using IO Resources” chapter in the **Spartan-3 Generation FPGA User Guide**.

Note that if using differential I/O each input and output should be specified as a pair. Do not specify two inputs in the spreadsheet to indicate a single differential input.

- **BRAM Power** – Enter the number and configurations of the BlockRAM intended to be used for the design. Make sure to adjust the Enable Rate to the percentage of time the ENA or ENB port will be enabled. The amount of time the RAM is enabled is directly proportional to the dynamic power it consumes, so entering the proper value for this parameter is important to an accurate BRAM power estimation.

![Figure 7: Block RAM Power Tab](image-url)
- **DSP/Multiplier Power** – Complete the DSP or Multiplier tab of XPE. Note that the DSP blocks can be used for purposes other than multipliers, such as counters, barrel shifters, MUXs and other common functions.

![Figure 8: DSP Power Tab](image-url)
• **DCM/PLL Power** – If a DCM and/or PLL is used in the design, specify the use and configuration of each.

![Image of Xilinx Power Estimator](image)

**Figure 9: PLL Power Tab**

• **Other** – Complete any other tabs with the usage and configuration of any additional elements used in the design.

**Step 6: Set the toggle and connectivity parameters.**

For each tab of the tool containing a Toggle Rate, Average Fanout or Enable Rate, review the set value. For toggle and enable rates, in the absence of any other information or knowledge, we generally suggest leaving these settings at their defaults. However if you determine that the default may not represent the characteristics of this design, make the necessary adjustments. For instance, if you know that a memory interface has a training pattern routine that exercises a sustained high toggle rate on that interface, the Toggle Rate may need to be raised to reflect this additional activity. Alternatively, if a portion of a circuit is clock enabled in a way that reduces the overall activity of the circuit, the toggle rate may need to be reduced. More information on methods to determine toggle rate can be found in UG440, Xilinx Power Estimator User Guide.

For clock fanout, the easiest way to specify this in XPE is to create an equation which will SUM all of the synchronous elements for any particular clock domain. For instance, in the Fanout field for a given clock, type =**SUM** ( and then select all of the cells which specify the number of synchronous elements sourced by that clock (that is, BRAMs, FFs, Shift Registers, Select RAMs, etc.). When completed, close the parenthesis and this will populate the Fanout cell with the appropriate number. This method of entering clock fanout not only is often the easiest, but also has the added advantage of automatically updating when adjustments are made to the spreadsheet resource counts. The resulting Excel equation would be similar to this:

```excel
=SUM(LOGIC!E3:G3,IO!E3:G3,BRAM!C4,MULT!C3)
```

For logic fanout, the nature of the data and control paths need to be thought out. In designs with well structured sequential data paths, such as DSP designs, fanouts generally tend to be lower than the set default. In designs with many data execution paths, such as in some embedded designs, higher fanouts may be seen. As with toggle
rates, if this information is not known it is best to leave the setting at the default and adjust later if needed.

For I/O Output Load, enter a simple capacitive load for each design output. This will affect the dynamic power of the driven output. The Output Load value is primarily made up from the sum of the individual input capacitances of each device connected to that output. The input capacitance can generally be obtained from the datasheets of the devices to which the FPGA I/O is connected.

**Step 7: Analyze the results.**

Before you analyze the results, update Steps 1 through 6, if necessary. After completing these steps, analyze the results. Make sure the junction temperature is not exceeded and the power drawn is within the desired budget for the project. If the thermal dissipation or power characteristics are not within targets, adjustments should be made to either the environmental characteristics (that is, more airflow, a heatsink, etc.) or the resource and power characteristics of the design until an acceptable result is reached. Many times, trade-offs can be made to derive the desired functionality with a tighter power budget, and the best time to explore these options is early in the design process. Once the data is completely entered and the part is operating within the thermal limits of the selected grade, the power reported by XPE can be used to specify the rails for the design. If your confidence in the data entered is not very high, additional padding to the numbers may be used to circumvent the possibility of underdesigning the power system for the FPGA. If, however, you are fairly certain of the data entered, no additional padding above the data reported by the tool is necessary.

As the design matures, continue to review and update the information in the spreadsheet to reflect the latest requirements and implementation details. This will present the most current picture of the power used in the design and could potentially allow early identification of adjustments to the power budgeting up or down depending on the current power trends of the design.

**Conclusion**

With accurate data entered into the Xilinx Power Estimator, accurate power estimations can be made. It can be difficult to determine the exact power requirements of an FPGA system prior to implementing it. However, with the seven steps laid out in this document, the problem can now be broken down into smaller, easier to define and understand phases which should allow for improved data entry and improved data accuracy. In order to help ensure that no step is missed, a checklist is attached to this document (Figure 10). Please feel free to use this checklist for your next power estimation task.

More detail on the Xilinx Power Estimator can be found in UG440, *Xilinx Power Estimator User Guide*. 
Figure 10: Xilinx Power Estimator (XPE) Checklist