|Suppression of gate-induced drain leakage by optimization of junction profiles in 22 nm and 32 nm SOI nFETs|
Jul 02, 2013
|Fault Tolerant Design for Low Power Hierarchical Search Motion Estimation Algorithms|
Jul 02, 2013
|A CNFET-based Characterization Framework for Digital Circuits|
This paper introduces a framework to develop and characterize digital circuits using Carbon Nanotube Field Effect Transistors (CNFET). We define a 4-step process that involves design capture, pre-processing, circuit simulation and results extraction and interpretation. The initial work leading to this framework involves the selection of appropriate CNFET model and model parameters, and determination of optimized substrate voltage. Through a set of custom-design automated scripts, various logic gates were simulated, data were compiled and characterization results were obtained. A complete approximate squarer circuit was also designed, implemented and characterized using the framework. To demonstrate the power of Carbon Nanotube technology, the same circuit was also implemented in 16 nm CMOS technology for comparison. An improvement by factor of 17× in PDP was achieved with CNT.
Nov 26, 2012
|A two-dimensional logarithmic number system (2DLNS)-based Finite Impulse Response (FIR) filter design|
The ever increasing demand for low power DSP applications has directed researchers to contemplate a variety of potential approaches in different contexts. In this regard, using some alternative number systems, which inherently are capable of reducing the hardware complexity, have been propounded. In this work, a 2DLNS-basedplatform for multiplication intensive DSP applications is presented. Implementing an FIR filter structure on this basis shows outstanding privilege to its binary counterpart in terms of VLSI area and power consumption.
Nov 26, 2012
|A Directional Gamma Ray Detector Using a Single Chip Computational Sensor|
This paper presents the design and test results of a computational radiation sensor system based on a singlechip solution that can determine the direction of gamma rays emitted from a radiation source. The overall system is formed by merging a sensor section with a compact and low power computational radiation sensor section. The sensor section houses three NaI gamma ray detectors arranged in a spatial configuration that allows for direction finding. The computational sensor is based on a single chip solution developed by authors that houses multiple low power sensor front ends, event driven analog-to-digital converters, and a dedicated microcontroller on the same die. The presented system is capable of gathering the pulse height spectra from the gamma isotope data received from the three separate NaI detectors. Further processing of the data is possible by executing software algorithms using the computation resources available on chip. To that end, a compact fixed-point program was developed to perform on-chip real-time gamma ray collection and direction estimation. The singlechip solution was fabricated in a 0.18 μm CMOS technology with field tests demonstrating the validity of the approaches taken. The total computational sensor system power consumption is less than 20 μW, excluding thedetector power consumption. The gamma isotope direction finding program executes in less than 1 ms with 5° accuracy.
Nov 26, 2012
|Improving Transition Delay Test Using a Hybrid Method|
This transition-fault-testing technique combines the launch-off-shift method and an enhanced launch-off-capture method for scan-based designs. The technique improves fault coverage and reduces pattern count and scan-enable design effort. It is practice oriented, suitable for low-cost testers, and implementable with commercial ATPG tools.
Nov 26, 2012
|A Physics-Based Three-Dimensional Analytical Model for RDF-Induced Threshold Voltage Variations|
In this paper, a 3-D analytical model is proposed to capture the threshold voltage, surface potential, and electric field variations induced by random dopant fluctuations in the channel region of metal-oxide-semiconductor field-effect transistors. The 3-D model treats the effect of each dopant separately and is based on fundamental laws of physics. The proposed approach enables determination of transistor threshold voltage variations with both very low computational cost and high accuracy. Using the developed model, we performed statistical analysis, simulating more than 100 000 transistor samples. Interestingly, the results showed that, although the distribution of the threshold voltage for large-channel transistors is Gaussian, for scaled transistors, it is non-Gaussian. Furthermore, the proposed model predicts known formulas, which are proven for 1-D analysis and large transistors, simply by setting the appropriate transistor size. As a consequence, this model is a logical extension of the theory of large transistors to nanoscaled devices.
Oct 24, 2012
|Cryogenic Operation of Junctionless Nanowire Transistors|
This letter presents the properties of nMOS junctionless nanowire transistors (JNTs) under cryogenic operation. Experimental results of drain current, subthreshold slope, maximum transconductance at low electric ﬁeld, and threshold voltage, as well as its variation with temperature, are presented. Unlike in classical devices, the drain current of JNTs decreases when temperature is lowered, although the maximum transconductance increases when the temperature is lowered down to 125 K. An analytical model for the threshold voltage is proposed to explain the inﬂuence of nanowire width and doping concentration on its variation with temperature. It is shown that the wider the nanowire or the lower the doping concentration, the higher the threshold voltage variation with temperature.
Oct 24, 2012
|Silicon-die Thermal Monitoring Using Embedded Sensor Cells Unit|
Thermal monitoring is essential in integrated circuit (IC) and VLSI chip which are a multilayer structure and a stack of different materials. The increase of the internal temperature of the VLSI circuits can conduct to serious thermal and also thermo-mechanical problems. Due to aggressive technology scaling, VLSI integration density as well as power density increases drastically. Thermal phenomena research activities on micro-scale level are essential for SoC and MEMS-based applications. However, various measurement techniques are needed to understand the thermal behavior of VLSI chip. In particular, measurement techniques for surface temperature distributions of large VLSI systems are a highly challenging research topic. This paper presents an algorithm and the experimental result of silicon-die thermal monitoring method using embedded sensor cells unit. Sensor implementation results and analysis are also presented.
Oct 24, 2012
|Low-Power Functionality Enhanced Computation Architecture Using Spin-Based Devices|
Power consumption in CMOS integrated circuits increases every technology generation due to increased subthreshold and gate leakage currents. To cope with such a problem, researchers have started looking at the possibility of logic devices based on electron spin, as an alternative to charge based CMOS, for realizing low-power integrated circuits with low active power dissipation and zero standby leakage. In this paper, we investigate spin-based logic devices that employ low-power spintorque switching mechanism for circuit operation. We have developed a Functionality Enhanced All Spin Logic (FEASL) architecture and a synthesis framework using Logically Passively Self Dual (LPSD) formulation. This methodology enables the design of large functional logic blocks, especially low-power adders and multipliers, which constitute the building blocks of all arithmetic logic units (ALU). In addition, we have investigated three different variants of ASL, which are lowpower, medium-power--medium performance and high performance and we analyze their merits and drawbacks at circuit/architecture level. We synthesized Discrete Cosine Transform (DCT) algorithm using adders and multipliers to show the efficacy of the proposed FEASL approach in designing digital signal processing (DSP) systems. Compared to 15nm CMOS implementation, the FEASL based DCT shows 88% improvement in power and 83% in PDP with 43% degradation in performance.
Sep 26, 2012
|Allocator Implementations for Network-on-Chip Routers |
The present contribution explores the design space for virtual channel (VC) and switch allocators in network-on-chip(NoC) routers. Based on detailed RTL-level implementations, we evaluate representative allocator architectures in terms of matching quality, delay, area and power and investigate the sensitivity of these properties to key network parameters. We introduce a scheme for sparse VC allocation that limits transitions between groups of VCs based on the function they perform, and reduces the VC allocator's delay, area and power by up to 41%, 90% and 83%, respectively. Furthermore, we propose a pessimistic mechanism for speculative switch allocation that reduces switch allocator delay by up to 23% compared to a conventional implementation without increasing the router's zero-load latency. Finally, we quantify the effects of the various design choices discussed in the paper on overall network performance by presenting simulation results for two exemplary 64-node NoC topologies.
Sep 26, 2012
|Effect of Nonlinear Summation of Synaptic Currents on the Input-Output Properties of Spinal Motoneurons|
A single spinal motoneuron receives tens of thousands of synapses. The neurotransmitters released by many of these synapses act on iontotropic receptors and alter the driving potential of neighboring synapses. This interaction introduces an intrinsic nonlinearity in motoneuron input–output properties where the response to two simultaneous inputs is less than the linear sum of the responses to each input alone. Our goal was to determine the impact of this nonlinearity on the current delivered to the soma during activation of predetermined numbers and distributions of excitatory and inhibitory synapses. To accomplish this goal we constructed compartmental models constrained by detailed measurements of the geometry of the dendritic trees of three feline motoneurons. The current “lost” as a result of local changes in driving potential was substantial and resulted in a highly nonlinear relationship between the number of active synapses and the current reaching the soma. Background synaptic activity consisting of a balanced activation of excitatory and inhibitory synapses further decreased the current delivered to the soma, but reduced the nonlinearity with respect to the total number of active excitatory synapses. Unexpectedly, simulations that mimicked experimental measures of nonlinear summation, activation of two sets of excitatory synapses, resulted in nearly linear summation. This result suggests that nonlinear summation can be difﬁcult to detect, despite the substantial “loss” of current arising from nonlinear summation. The magnitude of this “loss” appears to limit motoneuron activity, based solely on activation of iontotropic receptors, to levels that are inadequate to generate functionally meaningful muscle forces.
Sep 26, 2012
|AXR-CMP: Architecture Support in Accelerator-Rick CMPs|
To improve performance/power efficiency, we expect that future CMPs may use special-purpose accelerators extensively. This work discusses hardware architectural support for accelerator-rich CMPs. First, we introduce an efficient cache management scheme for accelerators to mitigate memory latency by overlapping data transfer with computation. Second, we present a hardware resource management scheme for accelerator sharing. This scheme supports sharing and arbitration of multiple cores for a common set of accelerators, and it uses a software-based priority mechanism to provide feedback to cores that indicates the wait time before acquiring a particular resource. Finally we propose architectural support that allows us to compose a larger virtual accelerator out of multiple smaller accelerators, and chain multiple accelerators together with minimal intervention of the requesting core. Experimental results show significant performance and energy improvement compared to approaches that use OS-based accelerator management, and achieve on the average 9X in performance (up to 40.17X) and 32X in energy efficiency (up to 90X) over a software implementation, with minimal hardware overhead.
Sep 26, 2012
|Combined Loop Transformation and Hierarchy Allocation for Data Reuse Optimization Design Compiler|
External memory bandwidth is a crucial bottleneck in the majority of computation-intensive applications for both performance and power consumption. Data reuse is an important technique for reducing the external memory access by utilizing the memory hierarchy. Loop transformation for data locality and memory hierarchy allocationare two major steps in data reuse optimization flow. But they were carried out independently. This paper presents a combined approach which optimizes loop transformation and memory hierarchy allocationsimultaneously to achieve global optimal results on external memory bandwidth and on-chip data reuse buffer size. We develop an efficient and optimal solution to the combined problem by decomposing the solution space into two subspaces with linear and nonlinear constraints respectively. We show that we can significantly prune the solution space without losing its optimality. Experimental results show that our scheme can save up to 31% of on-chip memory size compared to the separated two-step method when the memory hierarchy allocation problem is not trivial. Also, run-time complexity is acceptable for the practical cases.
Aug 27, 2012
|Energy-Efficient Pipeline Templates for High-Performance Asynchronous Circuits|
We present two novel energy-efficient pipeline templates for high throughput asynchronous circuits. The proposed templates, called N-P and N-Inverter pipelines, use single-track handshake protocol. There are multiple stages of logic within each pipeline. The proposed techniques minimize handshake overheads associated with input tokens and intermediate logic nodes within a pipeline template. Each template can pack significant amount of logic in a single stage, while still maintaining a fast cycle time of only 18 transitions. Noise and timing robustness constraints of our pipelined circuits are quantified across all process corners. A completion detection scheme based on wide NOR gates is presented, which results in significant latency and energy savings especially as the number of outputs increase. To fully quantify all design trade-offs, three separate pipeline implementations of an 8x8-bit Booth-encoded array multiplier are presented. Compared to a standard QDI pipeline implementation, the N-Inverter and N-P pipeline implementations reduced the energy-delay product by 38.5% and 44% respectively. The overall multiplier latency was reduced by 20.2% and 18.7%, while the total transistor width was reduced by 35.6% and 46% with N-Inverter and N-P pipeline templates respectively.
Aug 27, 2012
|Core Cannibalization Architecture: Improving Lifetime Chip Performance for Multicore Processors in the Presence of Hard Faults|
To improve the lifetime performance of a multicore chip with simple cores, we propose the Core Cannibalization Architecture (CCA). A chip with CCA provisions a fraction of the cores as cannibalizable cores (CCs). In the absence of hard faults, the CCs function just like normal cores. In the presence of hard faults, the CCs can be cannibalized for spare parts at the granularity of pipeline stages. We have designed and laid out CCA chips composed of multiple OpenRISC 1200 cores. Our results show that CCA improves the chips’ lifetime performances, compared to chips without CCA.
Aug 27, 2012
|Comparison of the Inhibition of Renshaw Cells During Subthreshold and Suprathreshold Conditions Using Anatomically and Physiologically Realistic Models|
Inhibitory synaptic inputs to Renshaw cells are concentrated on the soma and the juxtasomatic dendrites. In the present study, we investigated whether this proximal bias leads to more effective inhibition under different neuronal operating conditions. Using compartmental models based on detailed anatomical measurements of intracellularly stained Renshaw cells, we compared the inhibition produced by glycine/-aminobutyric acid-A (GABAA) synapses when distributed with a proximal bias to the inhibition produced when the same synapses were distributed uniformly (i.e., with no regional bias). The comparison was conducted in subthreshold and suprathreshold conditions. The latter were mimicked by voltage clamping the soma to 55 mV. The voltage clamp reduces nonlinear interactions between excitatory and inhibitory synapses. We hypothesized that for electrotonically compact cells such as Renshaw cells, the strength of the inhibition would become much less dependent on synaptic location in suprathreshold conditions. This hypothesis was not conﬁrmed. The inhibition produced when inhibitory inputs were proximally distributed was always stronger than when the same inputs were uniformly distributed. In fact, the relative effectiveness of proximally distributed inhibitory inputs over uniformly distributed synapses was greater in suprathreshold conditions than that in subthreshold conditions. The somatic voltage clamp minimized saturation of inhibitory driving potentials. Because this effect was greatest near the soma, the current produced by more distal synapses suffered a greater loss because of saturation. Conversely, in subthreshold conditions, the effectiveness of proximal synapses was substantially reduced at high levels of background synaptic activity because of saturation. Our results suggest glycine/GABAA synapses on Renshaw cells are strategically distributed to block the powerful excitatory drive produced by recurrent collaterals from motoneurons.
Aug 27, 2012
|Application Exploration for 3-D Integrated Circuits: TCAM, FIFO, and FFT Case Studies|
3-D stacking and integration can provide system advantages. This paper explores application drivers and computer-aided design (CAD) for 3-D integrated circuits (ICs). Interconnect-rich applications especially beneﬁt, sometimes up to the equivalent of two technology nodes. This paper presents physical-design case studies of ternary content-addressable memories (TCAMs), ﬁrst-in ﬁrst-out (FIFO) memories, and a 8192-point fast Fourier transform (FFT) processor in order to quantify the beneﬁt of the through-silicon vias in an available 180-nm 3-D process. The TCAM shows a 23% power reduction and the FFT shows a 22% reduction in cycle-time, coupled with an 18% reduction in energy per transform.
Jul 25, 2012
|Argus: Low-Cost, Comprehensive Error Detection in Simple Cores|
Argus, a novel approach for detecting errors in simple processor cores, dynamically verifies the correctness of the fours tasks performed by a Von Neumann Core: control flow, data flow, computation, and memory access. Argus detects transient and permanent errors, with far lower impact on performance and chip area than previous techniques.
Jul 25, 2012
|Design of a Link-Controller architecture for Multiple Serial Link Protocols|
This paper introduces a novel Multi-mode Serial Link Controller (MMSLC) for logic physical layer (PHY) and data link layer (DLL) of USB 3.0, PCle 2.0 and SATA 3.0. Functions defined in these protocols are grouped based on qualifying similarities and workload. The framework consists of a configurable circuit, programmable accelerator and event processor for flexible implementation. This MMSLC can essentially substitute for three individual link-controllers across protocols, thus achieving area reduction. An RTL level implementation is fulfilled and the synthesis results are shown at the end of this paper.
Jul 25, 2012
|A Two-Step Readout CMOS Image Sensor Active Pixel Architecture|
In this paper, we introduce a 5-transistor (5T) active pixel sensor (APS) structure and a specialized oscillator readout circuit. The pixel keeps a reasonable ﬁll factor of 43% using n-well and p-sub photodiode with an area of 5 µm x 5 µm and generates a two-step signal response to the illumination. The pixel successfully extends output swing to 0.72 V. Measured pixel random noise is 2.5 mV, achieving 51 dB signal-to-noise ratio (SNR). A readout circuit is also implemented using a ring oscillator to replace the traditional design with analog-to-digital converter (ADC) circuitry. It generates frequency output and is recorded by counters to perform signal digitization. The design is implemented with an array of 32 x 92 pixels in a 0.13µm digital CMOS process and tested with a 1.25 V supply voltage.
Jul 25, 2012
|Asymmetric Drain Spacer Extension (ADSE) FinFETs for Low-Power and Robust SRAMs|
In this paper, we analyze and optimize FinFETs with asymmetric drain spacer extension (ADSE) that introduces a gate underlap only on the drain side. We present a physics-based discussion of current–voltage relationships, short channel effects, and leakage and show the application of ADSE FinFETs in 6T static random access memory (SRAM) bit cell. By exploiting asymmetry in current, we show that it is possible to achieve improvement in both read and write stability for the 6T SRAM bit cell, along with reduction in cell leakage at the cost of negligible increase in access time and area. We also propose a general circuit-aware device optimization methodology for SRAM design. We use this methodology to optimize the underlap in ADSE FinFETs. Compared to conventional FinFETs, we achieve 57% reduction in leakage, 11% improvement in read static-noise margin, and 6% improvement in write margin, with 7% increase in access time and cell area.
Jun 27, 2012
|Stage Number Optimization for Switched Capacitor Power Converts in Micro-Scale Energy Harvesting|
Micro-scale energy harvesting has become an increasingly viable and promising option for powering ultra-low power systems. A power converter is a key component in microscale energy harvesting systems. Various design parameters of the power converter, most notably the number of stages in a multi-stage power converter, play a crucial role in determining the amount of electrical power that can be extracted from a micro-scale energy transducer such as a miniature solar cell. Existing stage number optimization techniques for switched capacitor power converters, when used for energy harvesting systems, result in a substantial degradation in the amount of harvested electrical power. To address this problem, this paper proposes a new stage number optimization technique for switched capacitor power converters that maximizes the net harvested power in micro-scale energy harvesting systems. The proposed technique is based on a new figure-of-merit that is well suited for energy-harvesting systems. We have validated the proposed technique through circuit simulations using IBM 65nm technology. Our simulation results demonstrate that the proposed stage number optimization technique results in an increase of 60% - 290% in net harvested power, compared to existing stage number optimization techniques.
Jun 27, 2012
|Accurate and Scalable IO Buffer Macromodel Based on Surrogate Modeling|
In this paper, a new method is proposed to generate accurate and scalable macromodels for input/output buffers. The method characterizes the physically based model elements with adaptive multivariate surrogate modeling techniques in order to achieve high ﬁdelity and process–voltage–temperature scalability. Both single-ended and differential output buffer circuit examples demonstrate that the proposed modeling method offers good accuracy and ﬂexible scalability to facilitate signal integrity analysis.
Jun 27, 2012