March 02, 2017

Simultaneous Setup-Hold Critical Node

I've got this question multiple times- How do we fix timing violations on paths that have at least one node which is both setup critical and hold critical simultaneously. To answer that question, one must realize that (generally speaking) for the same PVT and same RC corner, there cannot be paths where all nodes are simultaneously setup and hold critical. 

Let's take an example:
Test Case

Now, if we buffer at node C, path from B to C which was already setup critical will start violating.

Buffering at C

If we buffer at Node A, the path from A to D which was already setup critical would start violating.

What shall we do here now? Any suggestions? Thoughts? I'd like to hear from you and I'll post the right answer (at least one of the right answers soon!). Just like always, looking forward to engage in the comments section below. 

March 01, 2017


When I had started my career around 6 years back, we were introduced to the term called OCV. While the OCV concept was quite simple and fascinating, it didn't me long to realize that OCV can be a nightmare for every STA engineer out there. I had introduced OCV long time back while explaining the difference between OCV v/s PVT. In this post, I intend to draw a distinction between OCV (On-Chip Variation) and AOCV (Advanced On Chip Variation).

Before we discuss anything about OCVs, it would be prudent to talk about the sources and types of variations that any semiconductor chip may exhibit.

The semiconductor device manufacturing process exhibit two major types of variations:

  • Systematic Variations: As the name suggests, systematic variations are deterministic in nature, and these can usually be attributed to a particular manufacturing process parameter like the manufacturing equipment used, or perhaps even the manufacturing technique used. Systematic variations can be experimentally calibrated and modeled. They also exhibit spatial correlation- meaning two transistors close to each other would exhibit similar systematic variation- which makes them easier to gauge. Example would be inter-chip process variations between two different batch of manufactured chips.
    When a certain technology is in its nascent stage (let's say 10-nm technology), the process engineers would typically be more concerned about these variations and as the technology matures, process engineers are able to calibrate and tune their manufacturing process to reduce this variation component.
  • Random Variations: These are totally random, and therefore non-deterministic in nature. Random variations do not show spatial correlation and therefore very difficult to gauge and predict. Unlike systematic variations, random variations usually have a cancelling effect owing to their random nature. Examples are subtle variations in transistor threshold voltage.
As the semiconductor node shrinks, the susceptibility to the variations increase. And the effect of these variations need to be taken into account while doing timing analysis, or perhaps during the overall design planning to some extent. Shifting our focus back to OCV and AOCV. At this time one may ask themselves in what form would these variations manifest themselves? Well, these variations can manifest themselves in form of increase or decrease in the threshold voltage of devices, shift the process of the manufactured devices, perhaps vary the oxide thickness or change the doping concentration..
There might be infinite such manifestations and we engineers like to make our lives easier, don't we? ;)
Experienced folks must have guessed where am I headed. If you haven't guessed it yet, stay with me, take a step back and what does all these parameters have in common? What's that one quantifiable metric that these will impact and the answer is the delay! OCV and AOCV are essentially models which guide us on how the cell delay varies in light of the systematic and random variations.

On-Chip-Variations (OCV): OCVs are simplistic and (generally) pessimistic view of modelling process variations. Here we use that the delay of all cells can show, let's say X% variation in their delays. Now you would either model this variation as -X% to +X%, or perhaps -(X/2)% to +(X/2)%. Let's say we choose the latter. Now we would model the delay of all cells and subject them to OCVs in a manner that our timing becomes pessimistic and we can claim that in the worst case, as long as process guys can ensure that the variation would be within the bracket of -X% to +X%, we'd be safe.

  • Setup Analysis under OCV: In order to make setup analysis immune to process variations on silicon, we need to model the OCVs such that setup check becomes more pessimistic. That would be the case if we increase the data path delay by X% (you can take a call whether or not to apply a derate on the net delays. One can choose to apply a net derate based on the net length, and the metal layer in which the net is routed, a separate discussion for a separate post! :)); increase the launch clock path delay by X% and decrease the capture clock path delay by X%. Here you might want to check the post on Common Path Pessimism to see what type of clock path cells need to be exempted from OCVs.
Setup Analysis under OCV
  • Hold Analysis under OCV: Hold check would be the exact opposite of what we did for setup, namely decrease the data path delay by X% (you can take a call whether or not to apply a derate on the net delays. Usually, we don't apply derate on net delays); decrease the launch clock path delay by X% and increase the capture clock path delay by X%.
Hold Analysis under OCV

We talked so much about spatial correlation, then inherent cancellation of random variations but didn't use either of these concepts while explaining OCVs. This is the precise reason OCVs tend to be generally pessimistic. And as we shrink the technology node, a need arises for an intelligent methodology to perform variation aware timing analysis. And the answer is AOCV.

Let's take a look at AOCV in detail:

Advanced On-Chip Variations (AOCV): AOCV methodology hinges on three major concepts:
  • Cell Type: Variations should take into account the cell-type. Surely an AND gate an an OR gate can't exhibit the same variation pattern. Nor could an AND3X and an AND6X cell. The impact of variation should be calculated for each individual cell.
  • Distance: As the distance in x-y coordinates increase, the systematic variations would increase and we might need to use a higher derate value to reflect the uncertainty in timing analysis to mitigate any surprises on silicon.
  • Path Depth: If within a given distance, path depth is more, the impact of systematic variations would be constant, but the random variations would tend to cancel each other. Therefore as the path depth increases (within the same unit distance), the AOCV derates tend to decrease.
Bounding Box Creation for AOCV

While performing reg2reg timing analysis, AOCV methodology finds the bounding box containing the sequentials, clock buffers between two sequentials and all the data cells. Now within a unit distance, if the path depth increases, the AOCV derate decreases due to cancelling of random variations. However, if the distance increases, AOCV derates increases due to increase in the systematic variations. These variations are modeled in form of a LUT.

Sample AOCV Table for Setup Analysis

Now some final comments for OCV vs AOCV. 

  • For small path depths, OCV tends to be more optimistic than AOCV. (AOCV is more accurate).
  • For higher path depths, OCV tends to be more pessimistic than AOCV. (AOCV is still more accurate).
I hope you were able to draw the above inference. If not, I'd be willing to engage in discussion down in the comments section. See you all till next time! :)

February 05, 2017

Power Domain Crossings

With all the fuss about low power designs, the implementation of multiple power domains has gained significant traction in the past decade and they still play a critical role in designing the chips which are less power hungry. The use-cases have become more complicated and keeping up with the pace, the implementation techniques have become even more intricate! 

In this post I intend to talk about the motivation for power domains, draw a distinction between two commonly confused terms- the power domain and the voltage area, and some basics which can help the newbies lay a foundation of what to expect while implementing such designs.

As we've already discussed, the basic intention to create multiple power domains is to reduce the power consumption. But how does it help? Let's take a look. Let's say your design has three IPs- Alpha, Beta and Gamma. The Alpha IP is the heart of the design and it's basically the computation engine- think of something like the ALU or the Execution Unit. Beta IP does house-keeping jobs, mostly in the support role. While Gamma IP does all the "dirty"stuff- think of something like analog IPs like the Power Management Unit. Now, one important thing to appreciate here is that these three IPs serve different purposes, at different times with a distinct level of significance. 

Let's dig a little bit deeper into this. Aplha IP does the most important task, it is critical for performance and therefore should run at the highest possible clock frequencies and therefore burns a significant chunk of the overall power consumption. Beta IP is not needed that often, and neither does any critical work, nor does it all the time! Gamma IP does the significant task of managing the power supplies and therefore needs to be always-on!

Having delineated the expectations for each IP, now it's time to delve into some technical details that would be of particular interest to a physical design engineer and let's try and understand the rationale behind it. This time we shall start from Gamma IP. Gamma IP being the analog IP (or at best a mixed-signal IP) usually operates on high voltage because analog circuits are usually more susceptible to noise. Let's say Gamma IP operates on 3V. (Unrealistic number, when it comes to modern VLSI design, but I intend to give you a relative context! :)) Beta IP is a digital IP therefore it can operate on a lower voltage, let's say 1V in our chip and it would be a switchable IP, meaning that we can turn-off the power to this IP when it's not in use, thereby saving some power. Alpha IP being the heart of the chip, operates at the highest frequency, operates on an intermediate voltage, let's say 2V in our example. Remember, while this IP would dissipate too much power, the performance is also critical for our design and if we lower the operating voltage of the IP, the delays of standard cells would increase, and timing closure on such high frequencies would be difficult and in the worst case, we won't be able to meet the timing spec. Also, there may be times where I don't need such high performance from the chip, and I may choose to either lower the frequency (Dynamic Frequency Scaling (DFS)), or lower the voltage (Dynamic Voltage Scaling (DVS)), or simply turn-off the power (Power Gating) or lower both the voltage and frequency (Dynamic Voltage Frequency Scaling DVFS)).

Now that we have the design perspective in place, let's go back to power domains.  How many power domains do we have here? And how do we decide? Any logical hierarchies (or a bunch of hierarchies) which have the same power plane should comprise one power domain. Let's elaborate more on that. Any logic (let's say here 2 IPs) are said to be in the same power plane iff they operate on the same voltage and have the same switchable properties.

In our example, all three IPs operate on different voltages, and therefore naturally comprise three different power planes and hence 3 different power domains! I shall elaborate on this concept later via examples. 

Power Domain View

Having understood the concept of a power domain, let's try and understand what is a voltage area. Power domain, as the definition suggests, refers to the logical view wherein we've classified every logical hierarchy and defined the properties (operating voltage, and power state table) for all the standard cells that eventually get synthesized under that logical hierarchy. Voltage area is nothing but the physical view of a power domain where we assign a physical area on the chip for the logic under a given power domain to sit physically on the chip! Confusing, I know, but read it again and you'd be able to grasp the difference. Voltage Area would always be characterized by the fact that you'd need to specify the "rectilinear coordinates" within your chip area to define one.

Voltage Area View

Let's also talk about which I believe would be the most interesting take-away from this blog. How do we handle signals crossing from one power domain to another- which leads us to the discussion of isolation cells and the level shifters, and perhaps enable level-shifters which are the combination of isolation cells and level shifters.
Any signal crossing from one a region or a power domain which is operating on voltage of let's say V1 to a power domain which is let's say operating at voltage V2 would need to be level-shifted from V1 to V2. And the special cells which help us achieve this are called level-shifters. Level-shifters are basically buffers with two power supply nets. The input supply net is connected to the voltage supply of the driver domain and the output supply net is connected to the voltage supply of the receiver domain. Where exactly should this LS be placed (whether in the driver domain or the receiver domain is another question which would be answered in a different post).


Any signal crossing from a switchable domain to an on domain need to be isolated using an isolation cell to prevent any X's being propagated. Consider a signal going from power domain A (switchable) to power domain B (on). Let's say at some time the supply to be power domain is turned-off. All the outputs of power domain A would be at an unknown state and hence referred to as "X". If these X's are not isolated, the always-on supply would be corrupt with X. Most isolation cells are usually AND/OR gates where the other input is set to a controlling value (0 for AND gate, 1 for OR gate) to prevent X-propagation. We don't see any ISOLATION from the always-on domain to the switchable domain. (Ask yourself why?! :P)

Isolation Cells

On a side note: one physical voltage area may be mapped to many logical power domains provided they all have the same primary and secondary logical power supplies.

Now I'd like you to explain what all would we need for our case with Alpha, Beta and Gamma IPs to make sure signals cross from one power domain to another seamlessly (assume all permutation and combination of crossings).

Concepts to be discussed in a future post:
  • Where exactly should this LS be placed: whether in the driver domain or the receiver domain.
  • The concept of primary v/s secondary power supply.

September 18, 2016

Register Banking

Register Banking, also referred to as Multi-Bit Register Banking is a physical implementation technique of merging two or more flip-flops into a single multi-bit register. Let's first look at what all flip-flops are potential candidates for implementing register banks.

Technically speaking, any two flip-flops which share the same clock, the same asynchronous control pins, e.g. the reset or the preset pins, and the same scan enable pins are potential candidates for register banking. Before we delve any further, let's talk about the incentives for designers to use register banking for their design? 

Advantages of register banking:

  • Illustrating with the example of 2-bit register banks, one can easily see that the overall pin density of the 2-bit MBFF has significantly reduced as compared to using 2 standalone flip-flops. By pin density, I refer to the number of pins per unit area of silicon. While, the number of pin per standard cell has certainly increased, the overall pin density would be less. Higher pin density is the major cause of shorts in the SoCs. Reducing the pin density can therefore not only mitigate the short count, but also reduce the DRC count post-routing. 

    By sharing the scan enable, clock pin, reset pin, and the scan input pin one can reduce the total pins from 12 to 7 just by using a 2-bit register bank. Imagine the benefits when one would go for higher order register banks! I have used up to 8 bit register banks and now I can appreciate the reasons better! :)
  • While reducing pin density is indirectly useful, there's more tangible gain. That is the area! Circuit designers can do a better optimization of transistors when they have to fit two flip-flops on a single standard cell versus when designers have to use 2 standalone flip-flops. Hence, the overall area of a MBFF (Multi-bit FF) will always be more optimum.

  • Third advantage would be the optimum use of routing resources. Imagine routing signals like Scan Enable, Reset and Clock to 8 sequentials instead of one! However, the benefits won't scale by the same ratio of 8:1, using a MBFF would use lesser routing resources over using standalone FFs.

  • The biggest and the foremost reason behind using register banking is something different. I'm sure you must have guessed it by now. And it is the dynamic power! If you have lesser number of clock sinks, that would mean you need to route the clock to fewer sequentials. This would directly translate into using lesser number of clock buffers, hence lesser DYNAMIC POWER! This indeed is the motivation behind using register banking. Moreover, for FINFETs, owing to their 3-dimensional structure, the pin capacitance is significantly higher than their CMOS counterparts. Higher capacitance would directly mean higher dynamic power dissipation. Using register banking technique helps to offset some of that extra pin capacitance and reduces the overall dynamic power.
  • Ancillary benefits of register banking could be the requirement of lesser number of hold buffers or even reduction in the length of scan chain.

The biggest headache of using register banking technique is the Logic Equivalence Checks because mapping of the register banks to appropriate sequentials from the RTL becomes quite difficult. LEC, among many things, checks for the name of the sequentials while mapping and checking the equivalence between the golden and the revised sides. The instance name of the register bank is usually a combination of the instance names of all the standalone flip-flops comprising the multi-bit register. Hence LEC may have a tough time in establishing the equivalencies. 

While theoretically, register banking sounds simple, the designers or alternatively the design tools should exercise care in choosing the standalone sequentials for register banking. In addition to ensuring that the scan enable, clock signal and the asynchronous control signals are same, it is also desired that the length of the combinational cloud feeding the individual data pins of the multi-bit flops be almost the same. If let's say, one input has a higher combinational depth as compared to other inputs, the clock to the multi-bit sequential might need to be "pushed" to meet timing, and thereby offsetting the benefits of register banking. Excess use of clock buffers might even lead to congestion issues and would significantly eat up the routing resources.

August 09, 2016

IR Drop Analysis

Just yesterday, I got a question from one of our readers Lakshman Yandapalli. I thought it would be nice to write a blog post for you all!

Let's start with some background as to what indeed is the IR drop analysis.

When we talk about standard cells, we usually talk about the logical pins, let's say, A and B for the inputs and Z for outputs. What we do miss stating explicitly are the power/ground pins: the VDD and the VSS. These connections are usually implicit from the context (unless of course if you have a Muti-voltage design! Let's save this story for some other post).

IR drop is the voltage drop in the metal wires consituting the power grid before it reaches the VDD pins of the standard cells. Why do we bother about the voltage? Because the speed of the standard cell (the propagation delay) would be directly proportional to the VDD value. Higher VDD would mean faster cell, or lower propagation delay.

Now imagine that your SoC has a nominal voltage of 1V, and you closed your setup timing assuming the ideal 1V libraries. However, the IR drop of 40mV came into picture after you built the power grid, and the voltage is no longer 1V, let's say it is 0.96V. Now, with V = 0.96V, the delays of standard cells would be higher and you might see an increase in your setup-time violations!

Let's look into the factors that could cause this IR drop and how can we mitigate those factors, and what should our sign-off corners be to make sure no failures post-silicon!

While considering IR drop, you'd be concerned with two factors:

1. Static IR Drop: Dependent on the RC of the power grid connecting the power supply to the respective standard cells.

It is ALWAYS desirable to create the POWER GRID in higher metal layers. Higher metal layers would mean more wide wires, and hence would mean lower resistance. Lower resistance would mean that the IR drop would be lower, and hence lesser impact on setup-timing. 

Capacitance of metal wires would be the combination of ground and the coupling capacitance. If for some reason, you feel that the capacitance is too large, and it is indeed the reason for IR drop, it could either be because 
  • Long wire length: Resulting in higher wire cap. 
  • High fan-out of the net: Resulting in higher load-cap, or perhaps 
  • High routing congestion in a particular area resulting in high coupling capacitance with the neighboring nets.

Now, how to mitigate the problem? You can try splitting the net so that the fan-out gets distributed (pretty much similar to building a clock tree), you can split the long wire by placing appropriate power bumps. Or you can also analyze the congestion and space the wires apart to reduce coupling capacitance!
Simple equation representing the static IR drop would be the following:
Vstatic_drop = Iavg x Rwire
2. Dynamic IR Drop: Dependent on the switching activity of the standard cells themselves.
Switching activity of standard cells also contributes significantly to the IR drop, also known as the Dynamic IR drop. Higher would be the switching activity, in a given region, there'll be an increased demand for current from the power supply. More is the current, more would be the IR drop (which is essentially Current times the wore resistance!).
If you ever come across such a use case, you might want to space the standard cells apart so that the burden on a given bump to feed many standard cells which have high switching activity would be mitigated. 

Dynamic IR Drop is also sometimes referred to by the term of Voltage "Droop".

Update: Dynamic IR drop is contingent upon the current drawn by the standard cells, and that brings in a time-dependent variation of current into picture. Dynamic IR drop is represented by the equation:

Vdynamic_drop = L (di/dt)

Now that we have a fair understanding of IR Drop analysis, let's talk about the PVT/RC corners where one should analysis IR drop in their design.
Let's start with the RC corner.

1. RC Corner: The RC corner where the physical design engineers should analyze for IR drop would be the case when the RC product is worst. And that would indeed be the (RC)max corner, also referred to as the RCWorst corner.

2. PVT Conditions: PVT conditions would typically impact the standard cells. For IR drop analysis we would be concerned about the case where we expect the highest switching activity for standard cells. That would be the FF corner, High voltage, and high temperature.
High temperature might seem an anomaly, but higher temperature would mean higher wire resistance as well, and hence higher RC!

Last comment about IR drop analysis. It also makes sense to run IR drop analysis for the worst case setup timing check because IR drop would most probably impact only setup timing. So, designers may want to run the IR drop analysis for the RCWorst, High Temperature, SS slow and low voltage. But typically it is not done because the low voltage corner is usually already guard-banded to account for the IR drop. So, running IR drop analysis on the low voltage corner would be overly pessimistic! 

June 17, 2016

Self Gated Flip-Flop

Hey folks!

Just yesterday, I was wondering if it's possible to come up with a self gated flip-flop architecture which could be used to extreme low-power applications. As soon as I designed the flip-flop and satisfied myself that it seemed to be working well on paper, I was ecstatic! However, that was short-lived because a prior art search revealed that someone had already designed a pretty similar structure 2 years back!

But since I found it cool, I'm tempted to share it with the readers here. Let's start with the motivation for such a flip-flop.

There may be applications in which certain flip-flops of the design may toggle states quite infrequently. Now, it's a well known fact that even though a flip-flop is not switching states, it will continue to dissipate dynamic power internally as long as the clock is constantly switching states. And there's also a well-known, exalted solution of clock gating! But clock gating is not always a viable solution. Let's look at the reasons when and why clock gating may not be a viable solution:

Clock Gating Integrated Cell
  • Clock Gating is usually performed by using a clock gating integrated cell, which essentially comprises of a latch and an AND gate. Latch itself is a sequential element, and logically half of the flip-flop, and physically takes up around 60-65% of the flip-flop area. Coupled with an AND gate, the internal switching activity of the clock gating cell would result in significant standby power dissipation.
  • Adding a clock gating cell makes sense only if there are a bunch of flip-flops to be clock gated. That is basically to offset the extra overhead of power dissipation within a clock gating cell.
  • Clock Gating cell will also have additional logic to control it's enable signal, leading to more power dissipation, however, this component is not really significant in most of the cases.

All the above reasons point for need of an effective strategy for a fine-grained clock gating technique without worrying about any additional overheads one might incur in doing so. That way a self-gated flip-flop might come to our rescue and would help in saving that extra milli- or perhaps micro-watts of power! Pretty cool, no! ;)

What I thought was: flip-flop would have a state either 0 or a 1. And these are the only two states that one ever needs to worry about. And this is best accomplished by a toggle flop.

Toggle Flop

Now, let's say initially flip-flop was reset to 0 and D was 0. The flop should be self gated. And as soon as D goes to 1, the flip-flop should TOGGLE, and stay at 1 as long as D stays as 1. So, we need the following components: a toggle flop; a XOR between D, Q; and a clock gating logic (either an AND or an OR gate).

Connections are pretty intuitive as shown as follows:

Implementing XOR is simple, and can be accomplished by using 8 transistors plus inverters. NOR gate would need 4 additional transistors. So, using just 12 extra transistors on top of the existing flip-flop circuit, you get a self-gated flip-flop with minimal dynamic power! How's that for a circuit?!

Self-Gated FF

It would be prudent to add the name of the patent/publication that I eventually found in the references, so that nobody accuses me of plagiarism! :D


  • Low Power Toggle latch-based flip flop including integrated clock gating circuit: US 20150200652A1.

April 24, 2016

Hold Time Violations

How often has someone asked you how to fix setup time violations?! And how often have you replied with many techniques ranging from cell upsizing, to logical retiming. From Vt swapping to utilizing useful clock skew or perhaps even reduction in the clock frequency?

And how often someone has trapped you for the asking the impact of clock frequency on the hold time!

Let's imagine a scenario. You designed a chip, and it's been manufactured. You discovered that there's one hold time violation and let's say, the slack is -10ps. Well, logical answer would be to throw that chip away since hold time cannot be met by tweaking the clock frequency. But it it were that simple, I wouldn't have asked this question! :P

Now, think a little. And answer what all "engineering tweaks" you can do in order to make the chip work, or I should say to try and make the chip work?

I expect a healthy discourse on this question, and I'm sure even I would end up learning a few things which I might not have appreciated till now. I request you to enlighten me with your thoughts.

Thanks! :)