Services: Design and Verification

We have a good talented pool of VLSI Design & Verification Engineers who are competent in delivering wide range of RTL/IP/FPGA based Designs & SOC/UVM/System Level Verification. In the Semiconductor domain, we could take on any challenges.

Functional Verification

The functional verification domain involves usage of Constrained Randomization technique, Coverage driven verification, Assertions, Verification using System Verilog and methodologies like UVM. Verification services can be provided in IP development and SOC verification. Following are the core areas of our expertise

SOC Design and Verification (chip level)
IP Design and Verification (block level)
System Level Verification
Proof of concepts and Architecture Designing and Verification

The functional level verification services involve expertise in the following Languages/Technologies

System Verilog
VHDL
Verilog
Constrained Randomization
Assertions based on SVA/PSL
Coverage Driven Verification
Technologies like UVM

RTL Design

We have experience in handling RTL Design related activities like

Understanding different protocols specification
Prototyping and developing a Micro Architecture
Developing RTL Design for the same
Low Power, Minimum Area, Speed (Timing) Analysis and Optimizations
Synthesis and timing clean RTL
Developing Block Level Test-Bench and Block Level Verification Environment
Coverage and Assertion based Analysis

IP Design and Verification RTL and FPGA based

Prototyping
FPGA based Architecture
RTL designing for speed and low power
FPGA constraints development
Synthesis and timing clean RTL
Block level Test-Bench development
FPGA Verification and Validation
System Verilog, UVM
TCL, Perl
Random Constraints
Assertions
C, C++, System Level Verification
Coverage Driven Verification

Virtual Modeling Expertise (System Level Verification)

C, C++, System C, TLM 2.0
ARM based verification model creation
Virtual Platform Development
Synthesizable IP modeling

Quality Assurance, Test-Case development and Automation

We offer Quality Assurance, Test-Case development and Automation services

Insight into some of the projects that we have worked on

AMBA AHB Protocol

AHB is a new generation of AMBA bus which is intended to address the requirements of high-performance synthesizable designs. It is a high-performance system bus that supports multiple bus masters and provides high-bandwidth operation. AMBA AHB features required for high-performance, high clock frequency systems includes

burst transfers Single-cycle bus master handover
Single-clock edge operation
Non-tristate implementation
Bridging between this higher level of bus and the current ASB/APB can be done efficiently to ensure that any existing designs can be easily integrated
Fully pipelined structure is implemented for high performance

Implementation of AMBA AHB design

AHB master

A bus master is capable of initiating read and write operations by providing address and control information
Only one bus master allowed to actively use the bus at any given point in time
Incrementing four, eight and sixteen-beat bursts as well as undefined length bursts (less than 1kb) and single transfers
Idle, busy, sequential, non-sequential transfer types

AHB arbiter

The bus arbiter ensures that only one bus master at a time allowed to initiate data transfers
A modified arbiter architecture based on fuzzy logic for a system on chip design is used as algorithm for arbitration
It minimizes the time required for request handling, arbitration addressing, so that most bus cycles are used for useful data transfer operations
Modification done by using dynamic lottery scheme in fuzzy logic arbitration scheme with gated clock
Using this modified arbiter one can avoid starvation among the masters as well as overcome contention problem

ENOCODING

A hamming distance multi-coding encoding scheme used at write bus
It reduces the transaction activity at bus which leads to low dynamic power loss

AHB slave

A bus slave responds to a read or write operation within a given address-space range
The bus slave signals back to the active master the success, failure or waiting of the data transfer
Single cycle OKAY response
A slave can insert wait state in the transfer which borrow extra time for slave
Decoding scheme is used in here for getting right data from master

Tools and language

All source codes are written in Verilog HDL
Xilinx ISE 14.7 is used for synthesis and simulation
Xilinx power analyser is used for estimating power for encoding scheme

AMBA ASB Protocol

Features

Advanced System Bus (ASB)	Advanced Peripheral Bus (APB)
- High performance	- low power
- Pipelined operation	- latched address and control
- Burst transfer	- simple interface
- Multiple bus masters	- suitable for many peripherals

ASB master

A bus master is capable of initiating read and write operations by providing an address and control information. Only one bus master allowed to actively use the bus at any one time

ASB-APB Bridge

The APB Bridge appears as a slave module which handles the bus handshake and control signal retiming on behalf of the local peripheral bus. By defining the APB interface from the starting point of the system bus, the benefits of the system diagnostics and test methodology can be exploited

ASB Slave

A bus slave responds to a read or write operation within a given address space range. The bus slave signals back to the active master the success, failure or waiting of the data transfer

ASB Decoder

The bus decoder performs the decoding of the transfer addresses and selects slaves appropriately. The bus decoder also ensures that the bus remains operational when no bus transfers are required. A single centralized decoder is required in all ASB implementations.

ASB Arbiter

The bus arbiter ensures that only one bus master at a time allowed to initiate data transfers. Even though the arbitration protocol is fixed, any arbitration algorithm, such as highest priority or fair access can be implemented depending on the application requirements

Implementation of AMBA ASB design

Implementation considered arbiter which consists of both timing and priority into consideration
Timing & priority based arbiter v/s Priority based arbiter
- Arbiter based on priority has a demerit that if a request signal AREQ gets asserted which is the signal with highest priority, request is granted
- Next time when request is tested and the master with highest priority is still requesting for the bus then it is granted. Hence other masters with lower priority in this case will never get the grant for the bus
- This can be avoided with timing and priority based arbiter
- The FSM developed in this case for the arbiter clearly checks for the previous allotted priority at each state if the highest priority AREQ was granted bus earlier will not be granted this time. The highest priority among the remaining will be granted bus according to their priority. This way it voids the grant of bus to the highest priority all the time if it is always requesting.

DDR3 SDRAM Controller

DD3 Synchronous Dynamic Random Access Memory controller is designed for a RAM of size 1 Gb
The design architecture can work as a controller for DDR3 RAM as well as DDR2 RAM
The controller reads or writes the 32-bit chunk of data from each location
The Design has ‘1’ cycle “CAS” latency
The Designed controller take ‘5’ state for successful reading and writing operation into or from RAM
The clock multiplier twice or thrice the clock frequency for the required DDR2 or DDR3 controller
The controller is designed for RAM having 4(Banks)X16384X4096X32 bit
The Design contains a module “Data Path” which receives or transmits data from RAM at high speed clock generated by clock multiplier
The design contains high speed modified Dynamic lottery based arbiter for granting the access of the controller to read and write data on RAM

Implementation of DDR3 SDRAM Controller design

The Architecture contains 6 main modules for proper functioning of controller.

Address mapping

This module assign the row address and column address to the RAM for data to be written or read from RAM. It decides which address should be given to row address line and column address line on which state. It also decodes the encoded bit to select the required bank for read and write operation

Controller

This serve the heart of controller which follows an FSM for successful and effective read /write operation

Arbiter between master request and refresh

This arbiter arbitrates between refreshing the ram and read/write operation of ram. Arbiter arbitrates by giving high priority to refreshing the ram in comparison to read /write operation through different masters

Data Path

This path forms a bidirectional bus for reading data from ram by host or writing data from host to ram. The data read/writes data at speed of DDR2 /DDR3 at faster clock generated from clock multiplier.

Low Power pipelined 32-Bit RISC Processor

A Low power 32-bit Pipelined RISC Processor using clock gating scheme and some efficient adders and multipliers
The design has been implemented using Harvard architecture in which the processor can access both data and instruction at the same time as the design have separate data memory and instruction memory
The design required only 1 cycle for an instruction to execute an instruction
The architecture operates on 4 stages pipeline architecture Load instruction, Decode instruction, Fetch operand, Execute instruction, Write Back result into memory
The architecture makes use of clock multiplier to double the clock rate
The stages execute their work on posedge and negedge of clock from clock multiplier
The design includes three main execution modules ALU, Barrel Shifter, Universal Shift Register
The design serve low power design by switching off the other two modules when one is working out of ALU, Barrel shifter, Universal Shifter Register
The Design include most efficient Carry select Adder and Subtractor, multiplier using shifter and CSA, divider and modulus using shifter and CSS and universal shifter register and Barrel shifter

Design and Power Analysis of IBM’s On Chip Peripheral Bus (OPB)

Developed an arbiter using IEEE based papers on Dynamic Lottery Bus Arbiter for Shared Bus System on Chip and Priority Based Arbiter Scheme and Clubbing both arbitration scheme developed “A Modified Dynamic Lottery Bus Architecture for shared Bus”
This arbiter scheme is most efficient scheme as it has improved the bus latency to 40 ns with a clock period of 10 ns for 4-Master whereas the previous Dynamic Lottery Bus architecture offer the latency of 150 ns for clock period of 10 ns for same master
The On-Chip Peripheral bus of IBM has been designed and implemented using Dynamic lottery bus Arbitration

Low Power Reconfigurable Routers (Network On Chip (NOC))

There are four different channels in this router from where data can enter and leave. It can reconfigure itself depending on the amount of data.
If it is more than the stack height of FIFO of one channel the rest of the data is routed to its neighbouring channel depending on the availability of remaining block.
Used clock gating technique and one channel off and partially crossbar for reducing the power consumption.
Simulation on ModelSim and Synthesis on Xilinx ISE and implementation on Spartan 6 FPGA (Xilinx).
Analysis of power by XPower Analyser (Xilinx).

Low Power USB 2.0

This project is designed to transmit the 8-bit binary data serially using USB2.0 protocol.
Both protocol layer and physical layer of both transmitter and receiver are designed using Verilog HDL.
Protocol layer of both receiver & transmitter is implemented by using an FSM.
The transmission starts whenever the TXstart is pressed from input
As soon as the TXstart is pressed the protocol layer transmit OUT TOKEN, and waits for the acknowledgement and does not send any other data if valid acknowledgement is not received, after the reception of acknowledgement the DATA PID is transmitted to indicate the receiver that the upcoming data is a valid data again waits for the acknowledgement and after the reception of ack. Input data is transmitted.
All the data packet from transmitter can’t be transmitted directly, so the data packet from protocol layer first comes to transmitter physical layer module, this module first convert the parallel 8-bit data into serial data and before sending the data a special pattern known as SYNC pattern (10101010) is transmitted to indicate the start of packet and after the transmission of data another special pattern known as EOP pattern (01111111) is transmitted to terminate the transmission of data.
The physical layer of transmitter includes NRZI encoder and BIT STUFFER.
NRZI encoder encodes the data, so that proper transition on the transmitted bit is maintained.
NRZI encoder is designed in such a way that When 0 is received in serial data it is transmitted directly but when 1 is received it is negated and then transmitted. All the data bits except SYNC and EOP pattern are NRZI encoded.
BIT STUFFER module makes sure that the continuous 6 1’s will not transmit by the transmitter; After the 5 consecutive 1’s ‘0’ is inserted.
On the receiver side the reception of data starts after the detection of SYNC pattern by the physical layer of receiver module.
After the detection of valid SYNC and EOP pattern the received 8 bit data is transferred to the receiver protocol layer.
After the detection of OUT TOKEN by the protocol layer the receiver gets ready to receive data
After the detection of valid OUT TOKEN, DATA PID and DATA packet, acknowledgement is transferred to the transmitter to tell the transmitter that received data is valid.
To receive the proper transmitted data, the physical layer of the receiver include NRZI decoder which decodes the NRZI encoded data and BIT UNSTUFFER which removes the extra 0 inserted by the BIT STUFFER on the transmitter side.
All this functionality is written in Verilog, implemented on FPGA BASYS2 kit.
The 8-bit value to be transmitted is given from the 8 UP DOWN buttons, and the output can be seen on the 7-segment display.
To display the 8-bit value on seven segments, 8-bit binary value is first converted into the BCD value for which an algorithm known as Double Dabble algorithm is implemented. The BCD value so obtained is then decoded into seven segment form.

I2C Protocol implementation

I2C is a serial protocol for two-wire interface. The I2C-bus supports any IC fabrication process (NMOS, CMOS, bipolar). Two wires, serial data (SDA) and serial clock (SCL), carry information between the devices connected to the bus

Each device is recognized by a unique address
Devices can operate as either a transmitter or receiver
A master is the device which initiates a data transfer on the bus and generates the clock signals to permit that transfer
At that time, any device addressed is considered a slave

Features of I2C design and implementation

Dual master dual slave configuration
Bidirectional data transfer
Half - duplex mode
Serial communication synchronous
Master slave system
Software programmable SCL clock frequency
Clock stretching and wait state generation
Interrupt flag generation
Arbitration lost interrupt with automatic transfer cancellation
Bus busy detection
Supports 7-bit and 10-bit addressing modes
Supports 100 KHz and 400 KHz modes
Clock Supports 3.4 MHz

Implementation of I2C design

Top Level Module

This connects all the functional blocks together
Generates byte-wide data and acknowledgement

Internal Registers Module - Have different register for protocol functioning
- Prescale Register
- Transmit Register
- Receive Register
- Status Register
- Command Register
Byte command controller module
- Includes a state machine to handle normal I2C transfer sequences
- Contains a shift register which is used for both READ and WRITE cycles
Bit command controller module
- It controls the I2C bus, scl and sda lines
- Generate the correct sequences for START, STOP, Repeated START, READ, and WRITE commands
- Ensures that the logical relationship between the scl and sda lines meets the I2C requirement for these critical commands

Low Power USB 3.0

Physical layer of USB3.0 is implemented using Verilog HDL in such a way that the dynamic power consumption gets reduced
The inputs of the top module are 8 bit TX data in and 1 bit TxDataK
TxDataK indicates whether the input is a valid data or a valid command
The physical layer receives 8-bit data from the link layer and scrambles the data to reduce EMI emissions
It then encodes the scrambled 8-bit data into 10 bit symbols for transmission over physical connection
The bit stream is recovered from the differential link by the receiver, assembles into 10 bit symbols, decoded and descrambled, producing 8-bit data that are then sent to link layer for further processing
The main module of transmitter physical layer has 8B10B encoder and data scrambler
The scrambling function is implemented using free running Linear Feedback Shift Register (LFSR). On the transmit side, scrambling is applied to character prior to the 8b/10b encoding
Scrambler operation is not done when the TxDataK is high i.e. if input is a command
The scrambled 8-bit data is then mapped to 10-bit using 8B10B Encoding
The encoder block is divided into two sub block namely 3b4b encoder and 5b6b encoder
To reduce the dynamic power consumption another module clock generator is implemented before the encoder
This clock generator generates clock for encoder only if the value input data is valid
On the receiver side, reverse operation is performed i.e. it include data descrambler which descramble the scrambled data and 10b8b decoder which decodes 10-bit encoded data into 8-bit original data
The receiver gets the single bit data in 8b10 encoded format, which is then decoded into original form after that 8-bit data so obtained is passed through descrambler module to get the original transmitted data
Like transmitter same type of clock generator module is also used before the decoder to reduce the dynamic power consumption
The main USB 3.0 has 4 layers physical, datalink, protocol and application (device host)
A Scrambler is part of the physical layer
The purpose of the scrambler is to reduce repeated patterns and to prevent concentration of emitted energy at only few frequencies
It validates the analytical assumption that all symbols have equal probability to be transmitted
The scrambler is implemented using the linear feedback shift registers (LFSR), as shown in the above figure
Input sequence is XORed with a PN (pseudorandom) sequence, to produce a seemingly random sequence
XOR the output with the same PN sequence to recover the data in the receiver
This encoder is a form of line code, or baseband modulation
Its main purpose is to maintain DC balance
8 bits (a byte) is mapped to 10 bits
Using a lookup table, the 3 MSB is mapped to 4 bits; the 5 LSB are mapped to 6-bits
The mapping is dependent on the difference between the number of 1’s and the number of 0’s
By 8B/10B encoding, the difference between 0’s, 1’s is always kept below 2
In addition, no more than five 1’s or 0’s appears in a row, this makes clock recovery easier

Parallel to serial conversion

Up to this point, the data is stored in microprocessor (in parallel)
USB 3.0 supports serial data transmission
Each symbol (coded 10 bits) is transmitted LSB first

Clock and data recovery

No clock data is sent
Therefore, the receiver must generate a clock based on received data
Align the generated clock with the data by phase-locked loop (PLL)

Elasticity Buffer

Recovered clock might be slightly different from actual clock
This results in timing jitter (see figure)
Elasticity buffer reduces timing jitter
Data is temporarily stored in the elasticity buffer, and retrieved at a rate based on the average rate of the incoming data stream

8B/10B Decoding

The dual of 8B/10B encoding
10 bits is converted back to a byte
The 8B/10B Decoder has a very similar structure compared to the encoder (see figure below receiver section of Physical Layer)

Descrambler

Feed the output of the 8b/10b decoder to the same Linear Feedback Shift Register (LFSR) to recover the data sequence
Performing XOR twice results in the original sequence

UART (Universal Asynchronous Receiver and Transmitter)

UART sends data serially at the clock frequency defined by the baud rate
BAUD rate defines how many number of bits can be transferred in 1 second e.g. BAUD 9600 means 9600 bits can be transferred in 1 sec.
Before the transmission of 8-bit data start bit is transmitted from the transmitter and after the transmission of 8-bit data parity bit of the 8-bit data is transmitted.
To check the correctness of data the parity of 8-bit transmitted data is also appended after the data bits.
At the end a STOP bit i.e. 1 is send to terminate the transmission of data.
All the above operations are controlled by an FSM.
To send different type of bits i.e. start bit, data bit, parity bit, stop bit a 4X1 mux is implemented select line of which is controlled by the FSM.
Ideally the FSM is at IDLE state and at this state value of select line is 2’b11 i.e. output of MUX is STOP bit.
Whenever TXstart button is pressed the state of FSM jumps from idle state to START BIT state and in this state the value of select line is 2’b00 i.e. output of MUX is start bit, in this state the 8bit input which is to be transmitted is also loaded on to the temp register.
After 1 clock cycle the state of FSM changes to DATA STATE and remains on this state for the 8-clock cycle and at this state the value of select line is 2’b01 and at each clock cycle the 8-bit data is transmitted serially.
After the 8 clock cycle the state changes to PARITY STATE, at this state the value of select line is 2’b10 i.e. parity of 8-bit transmitted data is transmitted at this state.
At the next clock pulse the state changes to STOP STATE and at this state 1 i.e. stop bit is transmitted.
On the receiver side reverse operation is performed.
The receiver is ideally at high state, the reception starts whenever it goes high to low i.e. start bit detected.
After the detection of valid start bit, the receiver compares the parity of 8 bit received data with the bit received after the reception of 8-bit data.
After the detection of valid parity the stop bit is checked, if either stop bit of parity bit is not detected, the receiver does not generates any output.
All the UART functionality written in Verilog and implemented on FPGA BASYS2 kit.
The 8-bit value to be transmitted is given from the 8 UP DOWN buttons, and the output can be seen on the 7-segment display.
To display the 8-bit value on seven segments, 8-bit binary value is first converted into the BCD value for which a algorithm known as Double Dabble algorithm is implemented. The BCD value so obtained is then decoded into seven segment form.

Digital Alarm Clock

Digital Alarm Clock is designed using Verilog HDL
Top module of this project has 3 main inputs
- Show alarm - To show the alarm time
- Increment Min. - To increment the Min digit
- Increment sec. - To increment second
Increment Min and increment Sec works for both alarm mode as well as time mode
To perform the time operation, time counter module is implemented which increment time at every second
To store the alarm value, alarm register module is implemented
To increment the time data or alarm data, time set counter module is implemented which increment the time value or alarm value depending upon the state by 1 at every press on increment minute or increment second button
To control all the operation of all modules another module which include controller, FSM is implemented. This module is responsible for the generation of all the necessary control signal at each state i.e. whether we want to show time or on seven segment and also which value is to be incremented when increment button is pressed
Seven segment can display either the time value or alarm value or the incremented value, thus to implemented this functionality another module defined by Display driver is implemented which contains set of mux and gives either alarm data or time data at the output
All this functionality is implemented on FPGA basys2 kit
Seven segments are used as a display device and push button is used to give the input to alarm clock
LED glows whenever time data reaches to value where alarm is set

Camera Serial Interface(CSI) – 2

MIPI Alliance standard for CSI-2 (camera serial interface 2) provides the image sensor a normal, low power, high speed, and low cost interface that supports an extensive range of imaging solutions for mobile phone, pc-camera and vehicle video recorder devices.
CSI has a layered architecture
- The PHY Layer specifies the transmission medium (electrical conductors), the input/output circuitry and the clocking mechanism that captures “ones” and “zeroes” from the serial bit stream. This part of the specification documents the characteristics of the transmission medium, electrical parameters for signalling and the timing relationship between clock and data Lanes. This layer is implemented by MIPI DPHY Protocol.
- Pixel/Byte Packing/Unpacking Layer: The CSI-2 supports image applications with varying pixel formats from six to twenty-four bits per pixels. In the transmitter, this layer packs pixels from the Application layer into bytes before sending the data to the Low-Level Protocol layer. In the receiver, this layer unpacks bytes from the Low-Level Protocol layer into pixels before sending the data to the Application layer. Eight bits per pixel data is transferred unchanged by this layer.
- Low Level Protocol: The Low-Level Protocol (LLP) includes the means of establishing bit level and byte-level synchronization for serial data transferred between SoT (Start of Transmission) and EoT (End of Transmission) events and for passing data to the next layer. The minimum data granularity of the LLP is one byte. The LLP also includes assignment of bit-value interpretation within the byte, i.e. the “Endian” assignment.
- Lane Management: CSI-2 is Lane-scalable for increased performance. The number of data Lanes may be one, two, three or four depending on the bandwidth requirements of the application. The transmitting side of the interface distributes (“distributor” function) the outgoing data stream to one or more Lanes. On the receiving side, the interface collects bytes from the Lanes and merges (“merger” function) them together into a recombined data stream that restores the original stream sequence.
- Application Layer: This layer describes higher-level encoding, interpretation of data contained in the data stream. The CSI-2 specification describes the mapping of pixel values to bytes. Implemented Tx, Rx as per specification:

MIPI D-PHY

D-PHY describes a source synchronous, high speed, low power, low cost PHY, especially suited for mobile applications. This D-PHY specification has been written primarily for the connection of camera and display applications to a host processor. Nevertheless, it can be applied to many other applications. It is envisioned that the same type of PHY will also be used in a dual-simplex configuration for interconnections in a more generic communication network.
Clock lane: A Clock Lane is similar to a Unidirectional Data Lane. However, 913 there are some timing differences and a Clock Lane transmits a High-Speed DDR clock signal instead of data bits. Furthermore, the Low-Power mode functionality is defined differently for a Clock Lane than a Data Lane. A Clock Lane shall be unidirectional and shall not include regular Escape mode functionality.
Date Lane: All Data Lanes shall support High-Speed transmission and Escape mode in the Forward direction. There are two main types of Data Lanes:
- Bi-directional (featuring Turnaround and some Reverse communication functionality)
- Unidirectional (without Turnaround or any kind of Reverse communication functionality)

Note: Click on the linked heading text to expand or collapse panels.

TECHNOLOGIES

VHDL
Verilog
System Verilog
UVM
Randomization
Coverage
Assertions
C/C++
TCL
Perl
System C, TLM, ARM
FPGA
STA, Low Power

PROJECTS

CSI/MIPI
USB 2.0/3.0
DDR3 SDRAM
32-bit RISC Processor
OPB
NOC
AMBA, AHB, ASB, APB
I2C UART AXI
PCI Express
AES
DMA
ARM
Bluetooth Wireless