![]() ![]() |
||||
|
||||
[Design Application] Co-Verify To Optimize Your Embedded Design When it’s part of an existing tool flow, performance analysis promises improved end products with little or no cost increase. Jim Kenney May 2004
As the capabilities of wireless networks improve and become more sophisticated, the expectations and desires of wireless-device users seem to grow exponentially. The result is an ever-increasing demand for better levels of service and performance from mobile devices. For true data and communications mobility, bandwidth must increaseeven as power requirements decrease and security improves. These pressures are driven by an accelerating, highly competitive wireless marketplace. Now more than ever before, design engineers must confront an increasingly difficult set of tasks. The challenges of wireless systems design are many: integrating radio-frequency (RF), analog, and digital signals; reducing power requirements to help extend the battery life of mobile devices; and managing a growing set of functional demands that result in greater hardware and software complexity. More often than not, the growth of system-on-a-chip (SoC) and embedded-systems designs forces all of these competing priorities to be managed on a single piece of silicon (FIG. 1). The consumer appetite for wireless voice data transfer is driving the burgeoning feature sets of ever-evolving wireless laptops, smart phones, and entertainment devices. Yet concerns about design performance are being raised in tandem with demands for increased device functionality. To respond to these challenges effectively, wireless systems designers must be able to explore alternative configurations early in the design cycle. They need the ability to easily make tradeoffs across mixed-signal boundaries. Designers also must be able to quickly and efficiently pose "what ifs" that shift functions between hardware and software. Having such capabilities in hand greatly enhances the chances of achieving the increased performance that balances the required size, function, and cost of embedded designs. The answer to providing these options lies in performance-analysis technology. But embedded-systems design teams are already struggling to gain a high level of confidence in functional verification before tapeout. Who has any time to devote to performance analysis and tuning? Plus, performance analysis requires another toolset. With its associated learning-curve and support burden, this toolset is difficult to wedge into the project schedule. But what if performance analysis could be accomplished with minimal effort using a tool that's already in the flow? The market value of the end product could be enhanced substantially at little or no additional cost. Correct functionality is vitally important to a design. If the design is not functionally correct, the speed at which the tasks are completed will be of little consequence. Success in the marketplace also is measured in terms of performance. A host of forgotten or unsuccessful products attests to the frailty of functional designs that failed to perform as expected. To meet the broader expectations of the wireless marketplace, performance verificationas measurable throughput of specific design architecturesmust become a priority early in the design process. Once correct functionality has been confirmed, there is significant value in delivering a product that exceeds expectations and outperforms the competition. Today's electronic-design-automation (EDA) and design communities place their emphasis on functional verification. So how can the performance optimization of embedded systems be elevated in the verification process? It's possible to add another discrete toolset and process step to focus on performance analysis and design tuning. Yet such an act is sure to be met by resistance from design teams. They wouldn't welcome the idea of wedging another toolset into an already tight project schedule. Nor would they want to deplete scarce design resources and pressure-constrained budgets with increased tool costs. Ideally, performance analysis and optimization would be realized as a complementary augmentation of tools and steps that are already integrated into the flow. A tool is needed that can do performance analysis on data that's gathered in the simulation environment. To address the requirements of embedded-systems performance, both hardware and software execution will be essential. DATA-GATHERING CHOICES In order for an environment to be considered for co-simulation of hardware and software, it must have sufficient visibility to collect the required performance data. Contending environments include: logic simulation that instantiates a full model of the processor; hardware emulation that incorporates the physical processor (or a model of it); and hardware/software co-verification. Of these three environments, logic simulation is the least effective. Because of the difficulty in integrating a software debugger in this type of modeling, the data for profiling software functions isn't available. In addition, the execution speed of a logic simulator is limited to under 20 instructions per second. As a result, not enough software can be run to provide meaningful results. Logic simulators can be configured to provide data about bus and memory transactions. At best, however, they can only derive hardware performance data if they don't have the ability to correlate with software execution. More robust, full-system analysis is beyond the reach of this approach. As an environment for performance-data collection, hardware emulation offers a substantial improvement over logic simulation. By representing the processor with a physical device that interfaces with a symbolic debugger, it can quickly derive data. That data can then be used to profile code. But in cases where the processor is either an emulator primitive or is implemented in one, the opportunities for capturing symbolic data are constrained. Here, the emulator can report on memory transactions. In addition, bus monitoring can be instantiated in the emulated design just as it is with logic simulation. Yet restricted symbolic data gathering does limit the robustness of system-performance analysis that can be achieved. The environment with the greatest potential to provide a rich set of performance data is hardware/software co-verification. Comprised of both an instruction-set simulator (ISS) and software debugger, co-verification processor models are able to provide data for code profiling as well as cache hits and misses. The co-verification kernel processes all memory transactions to the memory subsystem, which is modeled in the logic simulator. It satisfies the data requirements for depicting memory activity graphically. Instantiating a bus monitor in the logic simulator provides data on bus loading and arbitration delay. Hardware/software co-verification is the most promising environment for delivering a fully robust set of performance data. As a result, the discussion that follows will assume the use of a properly instrumented hardware/software co-verification tool. PROFILING TRANSACTIONS The significance of software-profiling data can be displayed with bar and Gantt charts. A bar chart can show the percentage of CPU resources that's consumed for each function (FIG. 2). A Gantt chart provides a sequential display of the function execution, calls, and returns. It shows the time that is required to perform each step (FIG. 3). By knowing exactly how long it takes a time-critical function to execute, one can prevent system errors like incomplete data transfers or dropped packets. For example, one design team was not confident that the RAM copy routine (part of software initialization) would complete within the required time. They considered adding hardware to perform the RAM copy separately from the processor. But this alternative was both expensive and time consuming. In contrast, software profiling based on the hardware/software co-verification model provided the team with explicit information about the duration of the RAM copy function. The team could then be confident that the routine ran within the specified window. They avoided the time, effort, and expense of developing RAM copy hardware. Software profiling can draw attention to critical functions that don't execute within the required time frame. It allows designers to improve overall system performance by speeding up selected code executions. Functions can be rewritten to improve efficiency in a number of ways: by implementing in assembly code rather than C, by changing the interrupt priorities while the function is executing, or by changing the implementation of the function from firmware to hardware. MEMORY TRANSACTIONS A graphical depiction of memory transactions over time can highlight peak memory utilization. Designers can then focus on the most critical demands for memory bandwidth. The peaks and valleys that are represented in the graph indicate opportunities to balance memory access. Less time-critical functions can be shifted to a point at which memory bandwidth is underutilized (FIG. 4). Firmware or hardware calls are correlated with a particular point in time. This approach helps designers identify and correct memory bottlenecks. By clicking at any point along the memory-activity graph, one sees the function names displayed as visual keys to memory usage. This capability aids designers in the annotation of memory reads and writes to improve memory performance. Efficient cache activity also is crucial to overall system performance. If instruction and data caches are used effectively, the CPU load on the main memory can be minimized. In addition, firmware execution speeds may be improved. Excessive cache misses can indicate poor data locality for a given function. They also may suggest that the cache size should be increased. Once the nature of the problem has been identified, both cache size and configuration can be optimized to improve firmware execution and relieve memory loads. Here, the "virtual world" that's invoked by a hardware/software co-verification tool delivers a significant benefit. Designers have the ability to quickly iterate on different cache configurations for optimal performance. Fast iterations speed the evaluation of proposed changes to cache size or algorithm. Those iterations are supported by a graphical software debugger, logic simulator, and the clear display of memory transactions. In contrast, attempting to optimize cache by manipulating a hardware prototype alone restricts a designer's options for change. It also provides indirect feedback on efforts to improve efficiency. MANAGING BUS UTILIZATION If bus utilization maxes out at 100%, it indicates a bandwidth-limited function. Such a condition can reflect a DMS transfer in which every available bus cycle is commonly in use. It also may reflect an unexpected peak in bus usage that requires further investigation. By reviewing the appropriate software-profile information, designers can act on this information to implement changes. They can determine which function calls are occurring during the bandwidth limitation. Identifying and eliminating bus-usage bottlenecks can dramatically improve system throughput. The bus arbiter controls bus master access to a particular bus. It's difficult to choose the most effective arbitration scheme and set correct function call priorities in order to balance bus access. Adjustments are made to these parameters to ensure sufficient bus access for critical functions without ignoring lower-priority functions. It can be challenging to correctly identify bus-arbitration problems. Buffers may back up and overflow or data may be dropped entirely. Often, it's unclear that the problematic behavior that's being observed is rooted in arbitration. By plotting the time that's required to grant bus access, one can gain insight into the source of arbitration problems. This process may help designers avoid what can otherwise be a slow and tedious task. Viewing arbitration delay makes balancing the bus access easier than monitoring changes in secondary effects. Often, the task of balancing bus access requires multiple iterations. Improving access for one bus master can degrade access for others. To reduce the time that's required to complete each iteration, designers can review an arbitration-delay graph after changes are made to function call priorities or the arbitration scheme. Faster iterations enhance the achievement of an optimum bus-access balance. They also save critical design time. VERIFICATION-TOOL PERFORMANCE When a design's operation characteristics are presented in a clear, flexible, and easily accessible manner, substantial performance gains can be achieved. Such gains are possible even with a relatively small expenditure of development resources. The quick analysis of performance alternatives can result in designs that yield end products of superior value. These end products will be well poised for success in the demanding wireless marketplace. |
|||||||||||||||||
|
|
|||||||||||||||||
|
[Reader Comments] Co-Verify To Optimize Your Embedded Design |
|
|
|
|
|
Electronic Design Europe Electronic Design China EEPN Microwaves & RF Schematics ![]() Electronic Design Military Electronics Featured Vendors EE Events Free Design Resources |
|
|
Planet EE Network Home |
Contact Us |
Editorial Calendar |
Media Kit |
Headlines |
Site Feedback & Bugs Copyright © 2008 Penton Media, Inc., All rights reserved. Legal | Privacy |