July 28, 2012

Design for Testability: The Need for Modern VLSI Design

DFT is the acronym for Design for Testability. DFT is an important branch of VLSI design and in crude terms, it involves putting up a test structure on the chip itself to later assist while testing the device for various defects before shipping the part to the customer.

Have you ever wondered how the size of electronic devices is shrinking? Mobile phone used to be big and heavy with basic minimal features back in 90s. But nowadays, we have sleek phones, lighter in weight and with all sorts of features from camera, bluetooth, music player and not to forget with faster processors. All that's possible because of the scaling of technology nodes. Technology node refers to the channel length of the transistors which form the constituents of your device. Well, we are moving to reduced channel lengths. Some companies are working on technology nodes as small as 18nm. Smaller is the channel length, more difficult it is for the foundries to manufacture. And more are the chances of manufacturing faults.

Possible manufacturing faults are: Opens and shorts.
The figure shows two metal lines one of which got "open" while other got "shorted". As we are moving to lower technology nodes, not only the device size is shrinking but that also enables to pack more transistors on the same chip and hence density is increasing. And manufacturing faults have become therefore indispensable. DFT techniques enable us to test these (other kinds as well) faults.

July 27, 2012

Electrostatic Discharge vs Electromigration

Electrostatic Discharge and Electromigration might sound similar, but refer to different physical phenomenon. I would try to explain the difference between the two.

Electrostatic Discharge (ESD) is the large amount of current flow between any two points when a large (usually momentarily) potential difference is applied across those two points. In semiconductor terms, let's say you by some means a large potential is applied on the Gate of the MOS device, then a large current tend to flow through the gate and this in turn may disrupt the Silicon dioxide of the transistor. As you are aware this this silicon dioxide controls the important parameters like the threshold voltage (Vt) of the transistor, any physical damage would render the functionality of the entire device capricious.

To give you a general perspective: 
  • The semiconductor industry incurs losses worth millions of dollars just due to ESD and therefore while shipping the parts, each and every IC is packed with utmost care and insulated from the outside world. 
  • Also, while working in the labs in research centers or universities, or corporate, care is taken to obviate any excess potential from getting accumulated on any lab material. There's a separate ground for every device, which may be as small as a metallic needle. Even back in my college days, our professor used to admonish us for touching the pins of any IC with bare hands because sufficient potential can get accumulated on our body, specially, our extremities.
Note that ESD is a single time event. It can occur maybe while shipping, maybe while you are beginning to use the device or maybe when you are using that device.

Electromigration (EM): Let's say a device is operating over a long period of time. And there are certain regions in the device, where the current density is pretty high. These electrons have the propensity to displace the atoms of the device and this might create voids in certain regions and hillocks in other regions.

July 26, 2012

Sample Problem on Setup and Hold

In the post Timing: Basics, we discussed about the basics of setup and hold times. Why is it necessary to meet the setup and hold timing requirements. And how frequency affects setup but does not affect hold.

Let us understand the concept with an example:


I hope the above waveforms are self explanatory.
Setup Slack in the above case (as inferred from the waveforms as well) is:

Setup Slack = Tclk - T(clk-2-q) - Tdata - T(su,FF2)

If this setup slack is positive, we say that setup time constraint is met. Note that setup slack depends upon the clock period and hence in turn frequency at which your design is clocked.

Let us consider hold timing:
Hold Slack = Tdata + T(clk-2-q) - T(ho,FF2)

As evident from the above equation, hold slack is independent of the frequency of the design.

Note:
  • Setup is the next cycle check, we would take the setup time T(su,FF2) of FF2 into account while finding setup slack at input pin of FF2.
  • Hold time is the same cycle check, we would take the hold time T(ho,FF2) of FF2 into account while computing the hold slack at input pin of FF2.
Try and grasp this example. I shall introduce the concept of clock skew next.

July 14, 2012

Clock Gating

Clock signal is the highest frequency toggling signal in any SoC. As we discussed in the post: Need for Low-Power Design Methodology, the capacitive load power component of the dynamic power is directly proportional to the switching frequency of the devices. This implies that clock path cells would contribute maximum to the dynamic power consumption in the SoC. 

Power consumption in the clock paths alone contribute to more than 50% of the total dynamic power consumed within modern SoCs. Power being a very critical aspect of the design, one needs to make prudent efforts to reduce this. Clock Gating is one such method. 

Let's try and build further on this perspective.
Clock feeds the CLOCK pins all the Flip-Flops in the design. Clock Tree itself comprises of clock tree buffers which are needed to maintain a sharp slew (numerically small) in the clock path. Refer to the post Clock Transition for details. 


Consider the above figure. It is not necessary that the output of the flip-flop would be switching at all times. Modern devices support various low-power modes in which only a certain part of your SoC is working. This may include some key features pertaining to security or some critical functional aspects of your device. Apart from this, there are some configuration registers in your device which need to be programmed either once or very seldom. So, let's say, the above FF will not be switching states for a considerable period of time. If it is used the way it is, what's the problem? Power! Clock is switching incessantly. Clock Tree buffers are switching states and hence consuming power. So are the FFs. Remember that FF itself is made up of latches. So, despite the fact that input and output of the FF is not switching, some part of the latch is switching and consuming power.

What could be done to alleviate the above problem? Clock Gating is one such solution. Here's how it'll help.


If you place an AND gate at the clock path and knowing that you don't need a certain part of your device to receive clock, drive a logic '0' on the ENABLE pin. This would ensure that all the Clock Tree buffers and the sink pin of the FF are held at a constant value (0 in this case). Hence these cells would not contribute to dynamic power dissipation. However, they would still consume leakage power.

Similarly, you can place an OR gate and drive it's one input to logic 1. Again, you would save on the dynamic power.

However, a word of caution. The output of the AND gate feeding the entire clock path might be glitchy. See the following figure:

Solution: The output won't be glitchy if the enable signal changes only when the CLOCK signal is low. So, all you gotta make sure is that ENABLE is generated by a negative-edge triggered FF. This would ensure that the signal is changing after the fall edge of the CLOCK signal.

Similarly, while using an OR-gate, clock pulse would be propagated if the ENABLE signal changes when the CLOCK is high. Make sure that it is generated by a positive-edge triggered FF in order to avoid any glitch being passed onto the FFs. 


Why would a glitch be detrimental anyway? The answer is:
Glitches constitute an edge! FF might sample the value because they are edge-triggered. But, problem is that all FFs have a certain duty cycle requirement (Also called Pulse-width check), which needs to be fulfilled in order to ensure that they don't go into METASTABILITY. And if an unknown state : X is propagated in a design, the entire functionality of the chip can go haywire!

Some terminologies: 
  • AND/NAND gate based clock gating is referred to as Active-High Clock Gating.
  • OR/NOR gate based clock gating is referred to as Active-Low Clock Gating.
NAND and NOR clock gates work similar to AND and OR respectively.

So, Clock Gating is an efficient solution to save dynamic power consumption in the design. Modern SoCs have many IPs integrated together. Placing a clock gate and enabling them in various possible combinations is what gives rise to different low-power modes in the device.

July 10, 2012

Puzzle: Finite State Machine

I loved solving problems on Finite State Machines back in college days. Recently, I came across a good problem and thought it would be expedient to share it for you as well!

Q. The ACME Company has recently received an order from a Mr. Wiley E. Coyote for their all-digital Perfectly Perplexing Padlock. The P3 has two buttons ("0" and "1") that when pressed cause the FSM controlling the lock to advance to a new state. In addition to advancing the FSM, each button press is encoded on the B signal (B=0 for button "0", B=1 for button "1"). The padlock unlocks when the FSM sets the UNLOCK output signal to 1, which it does whenever the last N button presses correspond to the N-digit combination.
  1. Unfortunately the design notes for the P3 are incomplete. Using the specification above and clues gleaned from the partially completed diagrams below fill in the information that is missing from the state transition diagram with its accompanying truth table. When done :
    • Each state in the transition diagram should be assigned a 2-bit state name S1S0 (note that in this design the state name is not derived from the combination that opens the lock),
    • The arcs leaving each state should be mutually exclusive and collectively exhaustive,
    • The value for UNLOCK should be specified for each state, and
    • The truth table should be completed.
    •  What is the combination for the lock?


     
    Source: MIT Course Ware 

July 07, 2012

Timing: Basics

In a few earlier posts, we have already mentioned timing. It's time to discuss it formally.
Timing is a constraint that must be met so that the design functions the way it was meant to.

  • What will happen if the timing constraints are met?
    You can be pretty sure that the device will function correctly at the frequency that was intended.
  • What will happen if the timing constraints are not met?
    Device will not function correctly at the intended frequency. And it might or might not function at a slower frequency.
Pretty confusing? Don't worry. Read on.

Consider the following digital circuit. Two rise-edge triggered flops a and b, fed by a clock signal CLK, talking to each other. Output of Flop a after being processed by combinatorial logic Comb is reaching the input of Flop b.

How does the above circuit work? Consider the two waveforms which are the clock signals at flop a and b respectively. Flop a samples the input data IN at rising clock edge 1a and this data is captured by Flop b at the clock edge 2b. Similarly, data sampled and launched by the flop a at clock edge 2a is captured by flop b at 3b. 

As long as this launching and capturing relationship is maintained correctly, our timing constraint is also met and device would function perfectly fine! But the question: What actually is this timing constraint?

The data launched at edge 1a has to do undergo the following delays before it reaches the input of flop b.
Clock to q delay of Flop a and delay of the combinatorial logic Comb. 
And it should reach the input of flop b a at least some time before the edge 2b reaches the clock pin of Flop b. This time is called Setup Time. 
Also, we have to make sure that the data launched by Flop a at clock edge 1a is not captured by Flop b at clock edge 1b (it needs to be captured at 2b). So, the data must reach the flop b at least some time after clock edge 1b reaches Flop b. This time is called Hold Time.


Read the above two lines again. 
Same would be the relationship for other edges. Setup checks: 2a-3b; 3a-4b. Hold checks: 2a-2b; 3a-3b and so on.
Setup and Hold are the bread and butter of every backend design engineer. But why should the data reach some time before or after some clock edge? Where do these times come from? What exactly is the origin of setup and hold times? I do not mean any disrespect, but the answer to this question can puzzle even an experienced design engineer and I assure you that we will take this up in detail very soon.

For now, convince yourself that:
  • Setup is a next-cycle check while hold is the same cycle check.
  • Setup is dependent on the period (and hence frequency) at which your flip-flops are clocked while hold checks are frequency independent.
A direct ramification of the above statement is that setup violations can be fixed by lowering the operating frequency of the design. But hold violations cannot be fixed that way! I shall explain the Origin of Setup and Hold times soon. Also, I would like to take up some examples that would corroborate the concepts that I explained in this post.


July 06, 2012

Need for Low-Power Design Methodology

Low Power is the today's need in VLSI. Why? Well, ask yourself ! You go to gadget shop, looking for a new cell-phone. Apart from the price, what are the qualitative things that you would be most concerned about? 
  • Features including the speed of the processor.
  • Battery back-up
  • Operating System
A good Operating system can make an efficient use of the system's hardware resources but is more driven by the software applications that you wish to run. However, the first two are directly influenced by the design methodology and the technology node that goes behind designing your device.

You would love to buy a cell-phone with a faster processor to enable you to have your applications run fast, your computations quicker. Also, you wouldn't want to charge your cell-phone every hour. Or for that matter everyday! This would translate into a design challenge to have your device to consume least power.

Frequency and power go hand-in-hand. You cannot just go on increasing the frequency (assuming that timing is met!), without expecting any hit on power. 

Power, itself has many components. To just give you a glimpse, we'll talk about the components of power in brief.
Power dissipated has two components: Dynamic and Static.

 
 Dynamic power constitute that component of total power which comes into picture when the devices (the individual transistors) switch their values from either 0 to 1 or vice-versa. Dynamic power itself has two components: 
  • Capacitive Load Power: Depends on the output load of each transistor switching states.
  • Short Circuit Power: Depends on the input transition.



 Static Power is the component which is dissipated when the device is not switching i.e. it is in standby mode and mainly constitutes of leakage power.

We talked about the fact that Power and speed of the device go hand-in-hand. It is pretty much evident from the above equation. As you tend to increase the frequency of your design (again emphasizing that timing must be met!), the switching rate of the devices would increase and hence capacitive load component of the dynamic power would increase.

One turn-around to reduce power is to reduce the voltage supply at which your devices are working. But this, in turn, will reduce the signal swing available for the devices to cross the threshold voltage ( Vt ) and hence would engender myriad design challenges.

Before I conclude this post, I would like to make one last point. The device complexity is increasing every day and the device size is shrinking. This ensures that your latest cell-phone is sleek in its look but again, hit is on Power!


 The above table shows the trend of ever-increasing power dissipation with scaling down of technology nodes. This has forced the designers to come up with innovative design solutions to deliver the best to you.
In upcoming posts, we will discuss these design-for-low-power solutions in detail



References:
[1] Low Power Methodology Manual: For System on Chip Design by Michael Keating, David Flynn, Robert Aitken, Alan Glibbons and Kaijian Shi.

Puzzle: Clock Transition

In the post Factors Affecting Delays of Standard Cells, we talked about the clock transition and the way it impacts setup and hold times.

While building our clock tree we ensure that clock transition is as low as possible. 

If clock transition or the slew at clock tree buffers were bad, apart from the penalty on hold time, what other deteriorating impact would it have on the design?

July 03, 2012

Factors Affecting Delays of Standard Cells

In this post, we would talk about the factors that affect the delays of standard cells. Before starting with the discussion, it would be prudent to discuss what is meant by Timing Arcs:

Timing Arcs: A timing arc represents the direction of the signal flow from usually an input to an output. They may be combinational or sequentialCombinational arcs represent the signal flow in combinatorial cells like AND, NAND, OR gates. Sequential arcs represents the signal flow in Flip Flops and they usually have a control signal like CLOCK associated with them. Third type that is closely related to sequential arcs are the setup and hold arcs. They represent the setup and hold requirements and in general, do not represent any signal flow. 


The information about these timing arcs come from the timing library (.lib) files.


Let's turn our attention back to delays.

Consider an AND gate. As discussed above, A to Z is a combinational timing arc. The delay of this arc is picked up from the .lib. This .lib is then read by the timing tools in timing reports.

This delay depends on primarily 2 factors:
1. The input slew or the transition at A pin.
2. The output load or the capacitance at the Z pin.

Note that the output load is the sum total of the input capacitance of the cells connected to the node Z and also the net capacitance of all such nodes.

Output Load = Input Cap of all cells at the fan-out of Z + Total net capacitance of the nets connected to node Z.


Delay is directly proportional to the input transition and the output load.
1. More is the output cap, more time the cell would require to charge/discharge that capacitance. And hence,  delays would be more.
2. More is the input transition, more time the cell would require to change the output after processing the input value.

You would note that explanation behind delays just boil down to charging/discharging of the capacitors!! Once you befriend them, you would be able to deduce half the concepts intuitively. 

We are now set to discuss the delays of timing arcs of a flip-flop.

1. Clock-to-Q delay: As expected, it depends upon the clock transition and the load at the output Q. It may sound surprising, but clock-to-q delay does not depend upon the transition at the D input.
2. Setup and Hold time: Setup and Hold time depend upon the transition value at clock pin and transition value at D pin. It does not depend on the output load.

Some surprises might be yet to unfold. Read on.
1. Clock-to-q delay is directly proportional to the clock transition and the output cap at Q.
2. Setup time is directly proportional to input transition at D and inversely proportional to the clock transition. Recall the definition of setup time. More is the clock transition time, more time you are allowing for the input at D to settle setup-time before the clock transition.
3. Hold time is inversely proportional to input transition at D and directly proportional to the clock transition. Again, recall the definition of hold time. More is the clock transition time, greater is the possibility that the D input might change in the hold window after clock transition.
I hope I was able to explain this stuff clearly. In case of any doubts, please feel free to post them here.


July 01, 2012

Reading from and writing to a file in Tcl

File handling operations like reading from and writing to a file in any programming language is one of the most commonly used operations.

Writing to a file in Tcl is straightforward. 
In this post, we will discuss two ways to read a file in Tcl:

Note that the text in black represents tcl commands. Text in red represents user-defined variables and comments are in blue.

  1. set in_file [open palindrome_in.csv r]          ## Opens the file palindrome_in.csv in read mode
    set out_file [open palindrome_out.csv w]    ## Opens the file palindrome_out.csv in write mode
    set data [read $in_file]                                   ## "data" now has contents of the input file
    set lines [split $data "\n"]                             ## "lines" now contain collection of lines
    foreach line $lines {                                     ## Reading each "line" from collection of "lines"
    <body of your proc>                                    ## Body of the proc
    }
    puts $out_file "xyz"                                       ## Printing the desired output in palindrome_out.csv
    close $out_file                                              ## Closing the output file
    close $in_file                                                 ## Closing the input file


    Note: Closing both the input and output files is important. If not done, your Tcl shell might return an error "too many open files". Or even worse, the output in the output file might get terminated pre-maturely.
  2. set in_file [open palindrome_in.csv r]            ## Opens the file palindrome_in.csv in read mode
    set out_file [open palindrome_out.csv w]      ## Opens the file palindrome_out.csv in write mode
    while { [gets $in_file line] >= 0 } {                   ## Note that here "line" is not a user-defined variable
    <
    body of your proc>                                       ## Body of the proc
    }
    puts $out_file "xyz"                                       ## Printing the desired output in palindrome_out.csv
    close $out_file                                              ## Closing the output file
    close $in_file                                                 ## Closing the input file
    What's the difference between the two? Well, not much, if the size of your input file in small. However, if it is a big file (for example: SDF files, where the file maybe as big as 1 GB!!), you might prefer using the second method.

    In the first method, the user-defined variable lines contain all the lines of the file in form of a collection. If file is too big, this collection would be too big and one variable will have to hold this fairly big data till your script is working. This might lead to "stack-overflow error".
    This problem is alleviated in the second way where you are reading each line on the go.