# An Adaptive Approach for Reducing Leakage Energy Consumption in Value Predictors

Juan M. Cebrián<sup>1</sup>, Juan L. Aragón<sup>1</sup>, José M. García<sup>1</sup> and Stefanos Kaxiras<sup>2</sup>

<sup>1</sup>Dept. of Computer Engineering University of Murcia, Murcia, 30100, Spain +34 968 367656 {jcebrian,jlaragon,jmgarcia}@ditec.um.es <sup>2</sup>Dept. of Electrical and Computer Engineering University of Patras Rio, 26500 Patras, Greece +30 2610 996441 kaxiras@ee.upatras.gr

# Abstract

Energy-efficient microprocessor designs are one of the major concerns in both high performance and embedded processor domains. Furthermore, as process technology advances toward deep submicron, static power dissipation becomes a new challenge to address. Value prediction emerged as a effective way of increasing processor performance by overcoming data dependences. The more accurate the predictor is the more performance is obtained, at the expense of becoming a source of power consumption and a thermal hot spot.

In this paper we propose the design of leakageefficient value predictors by applying adaptive decay techniques in order to disable unused entries in the prediction tables of value predictors (Stride, DFCM and FCM) studying the tradeoffs for these prediction structures, that exhibit different pattern access behaviour than caches, in order to reduce their leakage energy efficiently compromising neither VP accuracy nor the speedup provided. Results show average leakage energy reductions of 52%, 70% and 80% for the Stride, DFCM and FCM value predictors of 20 KB respectively.

# **1. INTRODUCTION**

Energy consumption and power dissipation are one of the main goals when facing the design of a modern microprocessor in the high performance domain and, more crucially, in the embedded microprocessor domain. There are two sources of power dissipation, *dynamic* and *static* power (power dissipated regardless of activity). For several generations, static power (leakage) has been just a small fraction of the overall power consumption in microprocessors, and it was not considered a major concern [9]. However, as feature size shrinks to allow greater transistor density and higher performance, supply voltage must be lowered in order to restrain dynamic power consumption since it is proportional to the square of supply voltage. But using smaller geometries has the additional effect of increasing leakage loss exponentially, which leads static power to dominate the overall power consumption as process technology drops below 65 nm [5][9].

Several proposals can be found in the literature for managing leakage power, at both circuit and architecture level. Some proposals have focused on reducing the leakage power by switching off unused portions of large array structures, since they occupy a significant fraction of total die area, therefore, providing a great opportunity for leakage savings. Cache Decay [8] selectively turns individual data cache lines off if they have not been used for a long time, reducing leakage at the expense of losing the contents of the cache line. This *non-state preserving* technique has also been applied to branch predictors and *BTB* structures.

On the other hand, Value Prediction (VP) has been proposed as a very effective way of improving superscalar processor performance [6] [11] by overcoming data dependences which are one of the major performance limitations in current high performance processors. However, the use of value prediction structures despite the speedup provided (average 15% as reported in [2]) has not been widely spread, mainly due to complexity-delay issues. Note however that, unlike other prediction structures such as branch predictors where increasing access time and complexity can significantly reduce their benefits since the next fetched instruction is needed as early as possible, the access time in VPs is not so crucial. First, the predicted value is not needed until the instruction has reached its issue stage, and second, current high performance processors typically implement deeper pipelines (14 stages or more) which effectively hide the VP latency due to the increased front-end pipeline length. When an instruction reaches the end of the multi-stage front-end, the predicted value allows a speculative issue of the instruction if any register input is not ready, making traditional VP a very effective way of increasing processor performance.

However, the use of VP structures incurs in additional dynamic and static power dissipation. The continuous access to the prediction tables in almost each clock cycle may result in a thermal hot spot, increasing the leakage power of the structure, as in the case of caches and branch predictors. In modern high performance processors, due to high operating temperatures, it is necessary to fight to reduce leakage in every possible structure. Although the VP is a small structure compared to an L2 cache, if we let it overheat (likely, as it is accessed frequently and resides quite close to the core) without any precaution to regulate its leakage, the negative effects can be quite serious. Small hot structures can leak more than larger but cooler ones. We cannot afford not to attack leakage even at the smallest structures.

In this paper we propose *Adaptive Value Prediction Decay (AVPD)*, a mechanism able to dramatically reduce the leakage energy of traditional Value Predictors with negligible impact on prediction accuracy nor processor performance by dynamically locating VP entries that have not been accessed for a noticeable amount of time. When those entries have been identified, *AVPD* switches them off to prevent them from leaking, which makes Value Predictors complexityeffective structures (due to the minimal extra hardware required) when used in medium and long pipelines as well as a power-performance efficient mechanism suitable for high performance processor designs.

Previous proposals that applied *static decay* approaches to both caches and branch predictors

needed to carefully choose a decay interval, which could be even tuned per application, in order to minimize the performance impact of leakage power reduction. However, even obtaining the best decay interval per application (by profiling techniques) does not guarantee the best energy savings, since the static decay approach cannot capture variations within an application. This is particularly important in the case of prediction structures since correct and wrong predictions usually appear clustered.

The contribution of the present work is a novel adaptive decay scheme suited for the peculiarities of Value Predictors. The new AVPD extends the static approach [3] and is needed for two reasons. First, adapting the decay interval individually for the very small VP entries (as opposed to cache lines) would represent significant overhead and thus we consider it impractical. Second, VPs are non-tagged structures, and, therefore, it is not feasible to track the ideal miss rate vs. the induced miss rate. AVPD uses a global decay interval, requiring no additional hardware per entry. To adapt this global decay interval without tags, AVPD uses a time-based approach to judge whether or not the current decay interval causes an inordinate number of entries to be prematurely shutoff.

The rest of the paper is organized as follows. Section 2 analyzes the utilization of the prediction tables. The proposed *AVPD* scheme is described in Section 3. Section 4 shows the experimental methodology and the leakage energy savings obtained. Section 5 provides some background. Finally, Section 6 summarizes the main conclusions of the work.

## 2. Problem Overview

#### 2.1. Generational Behaviour in VPs

Power dissipation of value prediction structures is divided into dynamic and static power, as cited before. The dynamic component strongly depends on the utilization of the VP tables. Values can be predicted at different demanding levels: the most aggressive utilization predicts the output value for all instructions traversing the pipeline. Other approaches restrict the use of the value predictor to just a fraction of instructions such as longlatency instructions, load instructions that miss in



Figure 1. Fraction of time spent in dead state (SpecInt2000).

the L1 or L2 data cache, instructions that belong to a critical path, or just to predict the effective address for memory disambiguation. Therefore, restricting the VP utilization to just a fraction of selected instructions effectively reduces the dynamic power component of this structure. However, the static power component is still present, as the VP structure leaks regardless of utilization with increasing leakage loss for finer process technologies. For this reason, this work is focused on reducing the VP's static component.

The authors in [8] showed that, very frequently, cache lines have an initial active period (known as live time) followed by a period of no utilization (known as *dead time*) before they are eventually evicted. They proposed to break the stream of references to a particular cache line into generations. Each generation lasts until the cache line is evicted and replaced by a new one. This generational behaviour also appears in the VP structure, although with some particularities: as value predictors are implemented as directmapped tables with no tags and allowing destructive interferences, in our proposal, a generation ends when the VP entry is accessed by an instruction with a different PC. Its live time will be the period of accesses with the same PC and its *dead time* will be the period between the last access with an specific PC until an access with a different one.

To better understand the generational behaviour in value predictors, Figure 1 shows the utilization of the VP entries by measuring the fraction of time each entry remains in a *dead* 

state<sup>1</sup> for the whole SPECint2000 benchmark suite as a function of VP size. It can be observed that the three evaluated value predictors –Stride, FCM and DFCM– present a similar utilization regardless of their size. For sizes around 20 KB, the average fraction of dead time is 43% and for predictor sizes around 40 KB the average fraction of time the entries spend in their *dead* state is 47%. Therefore, if we were able to take advantage of these *dead times* by detecting them and shutting the entries off, we could reduce the leakage energy of the VP structure by one half on average.

# 3. Adaptive Value Prediction Decay (AVPD)

Dynamically applying decay techniques to Value Predictors is not a trivial fact as we need to detect those VP entries that have been unused for a significant amount of time and switch them off to prevent them from leaking. *Adaptive Value Prediction Decay (AVPD)* is a time-based mechanism that analyzes each VP entry individually to detect how often that entry is accessed. If an entry is unused for a long period of time, it probably means that it has entered in a dead state, and we should proceed to turn it off.

The problem is to dynamically determine how long a decay interval (the time we wait before shutting an entry off) must be. If we choose to turn VP entries off using too long decay intervals, the potential leakage energy savings will be reduced. Conversely, if the time-based policy chooses too short decay intervals, the VP accuracy might be reduced and, therefore, inducing a performance degradation. A positive effect of AVPD compared to the original cache decay mechanism is that prematurely disabling a VP entry is not so harmful as disabling a cache line: losing the contents of the cache line always leads to an extra access to L2 cache or memory to retrieve the lost information incurring in extra execution cycles; however, losing the contents of a VP entry might result -or not- in a value misprediction on the next access to that entry but this is exactly what would happen if we had a real generational change (which is a very common

<sup>&</sup>lt;sup>1</sup> This fraction of time can be measured as the ratio total dead time/(total live time+total dead time).



Figure 2. AVPD mechanism.

situation and one of the major limitations in traditional non-tagged VPs, where the huge number of destructive interferences dramatically shortens the generational replacement).

Regarding the utilization of VPs, throughout the paper we are predicting the output values for *all* instructions traversing the pipeline. However, it is important to note that this aggressive prediction scheme does not benefit a decay mechanism, either static or adaptive, since they are based on locating unused predictor entries. The more demanding use of the VP structure the less opportunities to detect unused VP entries and the less leakage energy savings obtained from a decaying mechanism.

The best decay interval is dependant on the application running in the processor or even on the section of the code being executed. During program execution there are sections of code where the VP usually hits (or fails) its predictions (correct and wrong predictions appear clustered depending on the program phase). In other program sections the number of VP entries being accessed is low, or we can even identify instructions whose optimal decay interval is different from others. Therefore, if we are able to dynamically adapt the decay interval to the program needs, higher leakage energy savings could be obtained compared to statically setting it.

The implementation of the decay interval is done by means of a hierarchical counter composed of a *global counter* and a two-bit saturated graycode counter for each individual value predictor entry<sup>2</sup> (*local counters*). In order to make the *AVPD* mechanism easier to implement we will use power-of-two decay intervals. VP entries are shut off, preventing them from leaking, by using *gated*- $V_{DD}$  transistors [10]. These "sleep" transistors are inserted between the ground (or supply) and the cells of each VP entry, which reduces the leakage in several orders of magnitude and it can be considered negligible. An alternative to using *gated*- $V_{DD}$  transistors consists of using quasi-static 4T transistors, although similar leakage savings would be expected.

The AVPD mechanism considers that each VP entry can be in one of the following three states, as shown in Figure 2: enabled (both data and the local counter are enabled), partially disabled (data is shut off but the local counter is enabled) or disabled (both data and the local counter are shut off). AVPD uses two additional global counters that account for: a) the number of partially disabled entries (entries that change from the enable state to the partially disabled state) within the previous decay interval; and b) the number of re-enabled entries (entries that change from the partially disabled state to the enabled state) within the current decay interval. After a number of cycles equal to the average live time<sup>3</sup>, a reactivation ratio is calculated as the number reenabled entries over the number of partially disabled entries.

In addition, *AVPD* uses two pre-defined threshold values (*increasing threshold* and *decreasing threshold*) in order to determine whether the length of the current decay interval is correct, that is, if the current decay interval makes VP entries to decay during their *live time* (prematurely) or during their *dead time*. Therefore, if the *re-activation ratio* is higher than the *increasing threshold*, the current decay window is too short and it is doubled since the are many entries being disabled prematurely. On the other hand, if the *re-activation ratio* is lower than the *decreasing threshold*, the current decay

<sup>&</sup>lt;sup>2</sup> Using a hierarchical counter is more power-efficient since it allows accessing the local counters at a much coarser level.

<sup>&</sup>lt;sup>3</sup> As cited in section 2.2, the static decay experiments showed that the average live time is around 400 cycles for the three evaluated VPs.

#### XVIII Jornadas de Paralelismo, Zaragoza 2007

window is too long and it is halved since we are shutting entries off too late, loosing opportunities to reduce the VP leakage.

The *AVPD* mechanism works as follows (see Figure 2): each cycle the global decay counter is incremented by one and, when it overflows, the local counters of all VP entries in either the *enabled* or *partially disabled* state are incremented. However, an access to any VP entry will result on an immediate reset of its local counter. In addition:

- For those entries in the *enabled* state (both VP data and the local counter are enabled): if the entry remains unused for a long time, its local counter will eventually overflow and the entry will change to the *partially disabled* state. The number of *partially disabled* entries is incremented.
- For those entries in the *partially disabled* state (VP data is shut off whereas the local counter is enabled): if the entry is not accessed within the average live time<sup>4</sup>, it will be changed to the *disabled* state and the local counter will be also shut off. However, an access to a *partially disabled* entry will change it to the *enabled* state, increasing the number of *reenabled* entries.
- For those entries in the *disabled* state (both VP data and the local counter are shut off): an access to the entry will change it to the *enabled* state.

Regarding the pre-defined values used for the *increasing* and *decreasing* thresholds, it is important to note that setting the *decreasing threshold* to small values will make *AVPD* sure that there are few *re-enabled entries* before lowering the decay interval, resulting in a more conservative policy. On the other hand, setting the *decreasing threshold* to high values will make *AVPD* to decrease the decay interval more frequently, resulting in a more aggressive policy.

Finally, the power overhead associated to the *AVPD* mechanism can be divided into three main components. The first component is associated to the dynamic and static power derived from the two-bit local counters inserted into every entry of the predictor (same overhead as for the static decay scheme). The second component comes from the three global counters: one is part of the two-level decay interval counter (also appears in

| Table 1. Configuration of the simulated pro | ocessor. |
|---------------------------------------------|----------|
|---------------------------------------------|----------|

| Processor Core      |                       |  |
|---------------------|-----------------------|--|
| Process Technology: | 70 nanometers         |  |
| Frequency:          | 5600 Mhz              |  |
| Instruction Window: | 128 RUU, 64 LSQ       |  |
| Decode Width:       | 8 inst/cycle          |  |
| Issue Width:        | 8 inst/cycle          |  |
| Functional Units:   | 8 Int Alu; 2 Int Mult |  |
|                     | 8 FP Alu; 2 FP Mult   |  |
|                     | 2 Memports            |  |
| Pipeline:           | 22 stages             |  |
| Memory Hierarchy    |                       |  |
| L1 Icache:          | 64KB, 2-way           |  |
| L1 Dcache:          | 64KB, 2-way           |  |
| L2 cache:           | 2MB, 4-way, unified   |  |

the static decay scheme) and the other two counters are particular of the adaptive decay scheme. The third component overhead, is derived from the induced VP misses (when a VP entry is prematurely disabled) that increase program execution time. These extra cycles that the program is running will also lead to additional static and dynamic power dissipation. Note that this third component (also appears in the static decay scheme) is highly destructive since each extra cycle accounts for the overall processor dynamic and static power and can easily cancel whatever leakage energy savings provided by *AVPD*.

#### 4. Experimental Results

#### 4.1. Simulation Methodology

To evaluate the energy-efficiency of the *AVPD*, we have used the SPECint2000 benchmark suite. All benchmarks were compiled with maximum optimizations (-O4 -fast) and they were run using a modified version of *HotLeakage* power-performance simulator that includes the dynamic and static power model for the evaluated Value Predictors (Stride, FCM and DFCM) as well as the power overhead associated to *AVPD*. The VP access latency is 5 cycles.

Due to the large number of dynamic instructions in some benchmarks, we reduced the input data set while keeping a *complete* execution. Table 1 shows the configuration of the simulated architecture. Leakage related parameters have been taken from the Alpha 21264 processor, provided with the HotLeakage simulator suite, and using a process technology of 70 nanometers.

# 4.2. Leakage-efficiency of AVPD Mechanism

This section presents the leakage-efficiency evaluation of the proposed *AVPD* mechanism for the Stride, FCM and DFCM predictors. Each figure shows the VP *leakage energy* savings<sup>4</sup> respect to not applying a decay scheme for some representative configurations of the adaptive mechanism as well as the best static decay configuration (512-cycle decay interval) for comparison purposes.

For the evaluation of *AVPD*, we carried out a comprehensive set of experiments for many configurations defined by using different *decreasing* and *increasing threshold* values. In this work we only present the most representative configurations:

- Configuration 00/100 (decreasing threshold set to 0% / increasing threshold set to 100%): this is the most conservative policy since *AVPD* will try to decrease the decay interval only if none of the entries are re-activated; and it will only try to increase the decay interval when all the entries are re-activated. It works pretty well for all studied predictors as it does not take any risks when changing the decay interval.
- Configuration 50/50: this is the most aggressive configuration as it keeps changing the decay interval continuously, increasing or decreasing the decay interval according to the *re-activation ratio*. This configuration is so aggressive that the constant changes on the decay interval neutralize, for many benchmarks, the VP energy savings with the overhead of the extra execution cycles.
- Configurations 40/60 and 70/100: they are the best ones we have found for the different predictors. The 40/60 is quite aggressive but works well with the Stride predictor, as it balances long decay intervals with short ones.



Figure 3. DFCM leakage energy savings (SPECint2000).



Figure 4. STP leakage energy savings (SPECint2000).



Figure 5. FCM leakage energy savings (SPECint2000).

<sup>&</sup>lt;sup>4</sup> Total processor leakage-energy results are not presented due to HotLeakage limitations that only provides static-power models for regular array structures (caches, predictors and register file).

#### XVIII Jornadas de Paralelismo, Zaragoza 2007

The 70/100 configuration has the trend to shorten the decay interval whenever is possible, only raising it when all decayed entries are re-activated.

Figure 3 shows the average leakage energy savings for the DFCM predictor and the cited adaptive configurations as well as for the best static decay interval (512 cycles). For this predictor, the best adaptive configuration is 70/100 that surpasses the best static decay scheme for all evaluated predictor sizes. For an average size of 10.5 KB, AVPD obtains 64% leakage energy savings versus the 55% of the static scheme. For the smaller size of 5 KB, the difference between the adaptive and static schemes is even more evident: AVPD provides additional leakage energy savings of 14% respect to the static scheme (AVPD obtains 55% and the static scheme just 41% of leakage energy savings). It can be observed that, as size grows, the differences between the adaptive and static schemes disappear, both obtaining 80% leakage energy savings for a size of 87 KB. In such big size predictors, there is no need for an adaptive scheme as there are very low generational changes, and they can be easily identified by the static scheme. The 70/100 configuration is the best one we have found since its trend is to reduce the decay interval towards its lower limit of 256 cycles. In general, we have seen that whatever configuration that tends to shorten the decay interval will perform well with DFCM, but constant changes of the decay interval, like in the 50/50 configuration, will result in a loose of net leakage energy savings.

Figure 4 shows the average leakage energy savings for the STP predictor. As cited in section 3, the AVPD mechanism tries to decrease the decay interval in order to reduce the leakage energy. The STP predictor is especially susceptible to these trials of reducing the decay interval since a big interval reduction degrades the STP accuracy enough to make the power overhead due to the induced extra cycles equal to the power savings provided by AVPD. This results in the adaptive scheme to behave similarly to the static scheme. The STP predictor works better with configurations that change the decay interval quickly, like 50/50 or 40/60, because configurations with a trend to shorten the decay interval (like 70/100) decrease the predictor's

accuracy too much, making the overhead even greater than the provided energy savings.

Figure 5 shows the average leakage energy savings for the FCM predictor. This predictor behaves very similarly to DFCM, with the same best configuration (70/100), but obtaining even greater leakage energy savings. In addition, the differences compared to the best static decay scheme are also higher. For a predictor size of 4.6 KB, the static approach obtains 50% leakage energy savings whereas the adaptive scheme obtains 74% (an additional 24%). For greater sizes, the differences between the static and adaptive schemes keep lowering until they converge to the same leakage energy savings for very big predictor sizes (close to 90% leakage energy savings for a size of 78 KB). If we focus on moderated FCM sizes (around 10 KB), the best static scheme gets 64% leakage energy savings whereas AVPD obtains 77% (13% of additional savings). Note that FCM, like DFCM, performs well with any configuration that tends to decrease the decay interval, due to the negligible impact on its accuracy.

#### 5. Related Work

In order to reduce leakage power in processors, many proposals have focused on reducing the leakage power by switching off unused portions of large array structures. These techniques have been categorized into *state-preserving* and *non-state preserving* [1][7][12].

Studies by Powell et al. [10] proposed gated- $V_{DD}$  as a technique to limit static leakage power by banking and providing "sleep" transistors which dramatically reduce leakage current by gating off the supply voltage. This technique, known as *decay*, reduces the leakage power drastically at the expense of losing the cell's contents, being necessary to apply it very carefully since the loose of information can result in an increase of the dynamic power to retrieve it again. Kaxiras et al. [8] successfully applied decay techniques to individual cache lines in order to reduce leakage in cache structures (67% of static power consumption can be saved with minimal performance loss). This technique has also been applied to conditional branch predictors and BTB structures. On the other hand, drowsy techniques try to reduce leakage without losing

References

the cell's information. Drowsy caches [4] use different supply voltages according to the state of each cache line. The lines in drowsy mode use a low-voltage level, retaining the data, while requiring a high voltage level to access it again.

Li *et al.* [7] evaluated the use of state and nonstate preserving techniques in caches. The authors showed that for a fast L2 cache decay techniques are superior in terms of both performance loss and energy savings to drowsy ones.

# 6. Conclusions

This paper proposes *Adaptive Value Prediction Decay* (*AVPD*), a mechanism able to reduce the leakage energy of traditional Value Predictors with negligible impact on prediction accuracy nor processor performance by dynamically locating VP entries that have not been accessed for a noticeable amount of time. Once those unused entries have been located, *AVPD* switches them off to prevent them from leaking. The proposed *AVPD* extends the static decay approach in order to better exploit the program behaviour as well as the differences between sections of code where the VP can be under-utilized.

The *AVPD* mechanism requires just slight modifications, with virtually no extra hardware overhead compared to the static decay scheme (just two additional global counters). In addition, in our scheme, the aggressiveness of the adaptation is easily controlled by two parameters (*increasing* and *decreasing thresholds*).

The average leakage energy savings for the best known configuration of the adaptive mechanism for a moderated predictor size of around 10 KB are 32%, 64% and 77% for the three evaluated predictors, Stride, DFCM and FCM, respectively. Compared to the best static decay scheme, *AVPD* provides *additional* average leakage energy savings (e.g., 14% for a 5 KB DFCM and 24% for a 5 KB FCM).

# 7. ACKNOWLEDGMENTS

This work has been supported by the Ministry of Education and Science of Spain under grants TIN2006-15516-C04-03 and CSD2006-00046.

- J.A. Butts and G. Sohi. "A static power model for architects". In Proc. of the 33rd Int. Symp. on Microarchitecture, 2000.
- [2] B. Calder, G. Reinman and D.M. Tullsen. "Selective Value Prediction". In Proc. of the 26th Int. Symp. on Comp. Arch., May 1999.
- [3] J.M. Cebrián, J.L. Aragón and J.M. García. "Leakage Energy Reduction in Value Predictors through Static Decay". In Proc. of the Int. Workshop on High-Performance, Power-Aware Computing HP-PAC'07 (in conjunction with IPDPS'07), March 2007.
- [4] K. Flautner et al. "Drowsy Caches: Simple Techniques for Reducing Leakage Power". In Proc. of the 29th Int. Symp. on Computer Architecture, 2002.
- [5] M.J. Flynn and P. Hung. "Microprocessor Design Issues: Thoughts on the Road Ahead". *IEEE Micro*, vol. 25, no. 3, pp. 16-31, May/Jun, 2005.
- [6] B. Goeman, H. Vandierendonck and K. de Bosschere. "Differential FCM: Increasing Value Prediction Accuracy by Improving Table Usage Efficiency". In Proc. of the 7th Int. Symp. on High-Performance Comp. Architecture, 2001.
- [7] Y. Li et al. "State-Preserving vs. Non-State-Preserving Leakage Control in Caches," In Proc. of the DATE Conference, Feb. 2004.
- [8] S. Kaxiras, Z. Hu and M. Martonosi. "Cache Decay: Exploiting Generational Behavior to Reduce Cache Leakage Power". In Proc. of the 28th Int. Symp. on Computer Architecture, 2001.
- [9] N.S. Kim, T. Austin *et al.* "Leakage Current: Moore's Law Meets Static Power". *IEEE Computer*, 2003.
- [10] M.D. Powell et al. "Gated-Vdd: A Circuit Technique to Reduce Leakage in Deep-Submicron Cache Memories". In Proc. of the Int. Symp. on Low Power Electronics and Design, 2000.
- [11] Y. Sazeides and J.E. Smith. "The predictibility of data values". In Proc. of the 30th Annual Int. Symp. of Microarchitecture, Dec 1997.
- [12] S. Yang et al. "An integrated circuit/architecture approach to reducing leakage in deep-submicron high-performance I-Caches". In Proc. of the 7th Int. Symp. on High-Performance Comp. Architecture, 2001.