Services: Design and Verification

We have a good talented pool of VLSI Design & Verification Engineers who are competent in delivering wide range of RTL/IP/FPGA based Designs & SOC/UVM/System Level Verification. In the Semiconductor domain, we could take on any challenges.

Functional Verification

The functional verification domain involves usage of Constrained Randomization technique, Coverage driven verification, Assertions, Verification using System Verilog and methodologies like UVM. Verification services can be provided in IP development and SOC verification. Following are the core areas of our expertise

  • SOC Design and Verification (chip level)
  • IP Design and Verification (block level)
  • System Level Verification
  • Proof of concepts and Architecture Designing and Verification

The functional level verification services involve expertise in the following Languages/Technologies

  • System Verilog
  • VHDL
  • Verilog
  • Constrained Randomization
  • Assertions based on SVA/PSL
  • Coverage Driven Verification
  • Technologies like UVM
RTL Design

We have experience in handling RTL Design related activities like

  • Understanding different protocols specification
  • Prototyping and developing a Micro Architecture
  • Developing RTL Design for the same
  • Low Power, Minimum Area, Speed (Timing) Analysis and Optimizations
  • Synthesis and timing clean RTL
  • Developing Block Level Test-Bench and Block Level Verification Environment
  • Coverage and Assertion based Analysis
IP Design and Verification RTL and FPGA based
  • Prototyping
  • FPGA based Architecture
  • RTL designing for speed and low power
  • FPGA constraints development
  • Synthesis and timing clean RTL
  • Block level Test-Bench development
  • FPGA Verification and Validation
  • System Verilog, UVM
  • TCL, Perl
  • Random Constraints
  • Assertions
  • C, C++, System Level Verification
  • Coverage Driven Verification
Virtual Modeling Expertise (System Level Verification)
  • C, C++, System C, TLM 2.0
  • ARM based verification model creation
  • Virtual Platform Development
  • Synthesizable IP modeling
Quality Assurance, Test-Case development and Automation

We offer Quality Assurance, Test-Case development and Automation services

Insight into some of the projects that we have worked on

AHB is a new generation of AMBA bus which is intended to address the requirements of high-performance synthesizable designs. It is a high-performance system bus that supports multiple bus masters and provides high-bandwidth operation. AMBA AHB features required for high-performance, high clock frequency systems includes

  • burst transfers Single-cycle bus master handover
  • Single-clock edge operation
  • Non-tristate implementation
  • Bridging between this higher level of bus and the current ASB/APB can be done efficiently to ensure that any existing designs can be easily integrated
  • Fully pipelined structure is implemented for high performance

Implementation of AMBA AHB design

    AHB master
  • A bus master is capable of initiating read and write operations by providing address and control information
  • Only one bus master allowed to actively use the bus at any given point in time
  • Incrementing four, eight and sixteen-beat bursts as well as undefined length bursts (less than 1kb) and single transfers
  • Idle, busy, sequential, non-sequential transfer types
  • AHB arbiter
  • The bus arbiter ensures that only one bus master at a time allowed to initiate data transfers
  • A modified arbiter architecture based on fuzzy logic for a system on chip design is used as algorithm for arbitration
  • It minimizes the time required for request handling, arbitration addressing, so that most bus cycles are used for useful data transfer operations
  • Modification done by using dynamic lottery scheme in fuzzy logic arbitration scheme with gated clock
  • Using this modified arbiter one can avoid starvation among the masters as well as overcome contention problem
  • A hamming distance multi-coding encoding scheme used at write bus
  • It reduces the transaction activity at bus which leads to low dynamic power loss
  • AHB slave
  • A bus slave responds to a read or write operation within a given address-space range
  • The bus slave signals back to the active master the success, failure or waiting of the data transfer
  • Single cycle OKAY response
  • A slave can insert wait state in the transfer which borrow extra time for slave
  • Decoding scheme is used in here for getting right data from master
  • Tools and language
  • All source codes are written in Verilog HDL
  • Xilinx ISE 14.7 is used for synthesis and simulation
  • Xilinx power analyser is used for estimating power for encoding scheme
  • Advanced System Bus (ASB) Advanced Peripheral Bus (APB)
    - High performance - low power
    - Pipelined operation - latched address and control
    - Burst transfer - simple interface
    - Multiple bus masters - suitable for many peripherals
  • ASB master
  • A bus master is capable of initiating read and write operations by providing an address and control information. Only one bus master allowed to actively use the bus at any one time
  • ASB-APB Bridge
  • The APB Bridge appears as a slave module which handles the bus handshake and control signal retiming on behalf of the local peripheral bus. By defining the APB interface from the starting point of the system bus, the benefits of the system diagnostics and test methodology can be exploited
  • ASB Slave
  • A bus slave responds to a read or write operation within a given address space range. The bus slave signals back to the active master the success, failure or waiting of the data transfer
  • ASB Decoder
  • The bus decoder performs the decoding of the transfer addresses and selects slaves appropriately. The bus decoder also ensures that the bus remains operational when no bus transfers are required. A single centralized decoder is required in all ASB implementations.
  • ASB Arbiter
  • The bus arbiter ensures that only one bus master at a time allowed to initiate data transfers. Even though the arbitration protocol is fixed, any arbitration algorithm, such as highest priority or fair access can be implemented depending on the application requirements

Implementation of AMBA ASB design

  • Implementation considered arbiter which consists of both timing and priority into consideration
  • Timing & priority based arbiter v/s Priority based arbiter
    • Arbiter based on priority has a demerit that if a request signal AREQ gets asserted which is the signal with highest priority, request is granted
    • Next time when request is tested and the master with highest priority is still requesting for the bus then it is granted. Hence other masters with lower priority in this case will never get the grant for the bus
    • This can be avoided with timing and priority based arbiter
    • The FSM developed in this case for the arbiter clearly checks for the previous allotted priority at each state if the highest priority AREQ was granted bus earlier will not be granted this time. The highest priority among the remaining will be granted bus according to their priority. This way it voids the grant of bus to the highest priority all the time if it is always requesting.
  • DD3 Synchronous Dynamic Random Access Memory controller is designed for a RAM of size 1 Gb
  • The design architecture can work as a controller for DDR3 RAM as well as DDR2 RAM
  • The controller reads or writes the 32-bit chunk of data from each location
  • The Design has ‘1’ cycle “CAS” latency
  • The Designed controller take ‘5’ state for successful reading and writing operation into or from RAM
  • The clock multiplier twice or thrice the clock frequency for the required DDR2 or DDR3 controller
  • The controller is designed for RAM having 4(Banks)X16384X4096X32 bit
  • The Design contains a module “Data Path” which receives or transmits data from RAM at high speed clock generated by clock multiplier
  • The design contains high speed modified Dynamic lottery based arbiter for granting the access of the controller to read and write data on RAM

Implementation of DDR3 SDRAM Controller design

The Architecture contains 6 main modules for proper functioning of controller.

    Address mapping
  • This module assign the row address and column address to the RAM for data to be written or read from RAM. It decides which address should be given to row address line and column address line on which state. It also decodes the encoded bit to select the required bank for read and write operation
  • Controller
  • This serve the heart of controller which follows an FSM for successful and effective read /write operation
  • Arbiter between master request and refresh
  • This arbiter arbitrates between refreshing the ram and read/write operation of ram. Arbiter arbitrates by giving high priority to refreshing the ram in comparison to read /write operation through different masters
  • Data Path
  • This path forms a bidirectional bus for reading data from ram by host or writing data from host to ram. The data read/writes data at speed of DDR2 /DDR3 at faster clock generated from clock multiplier.
  • A Low power 32-bit Pipelined RISC Processor using clock gating scheme and some efficient adders and multipliers
  • The design has been implemented using Harvard architecture in which the processor can access both data and instruction at the same time as the design have separate data memory and instruction memory
  • The design required only 1 cycle for an instruction to execute an instruction
  • The architecture operates on 4 stages pipeline architecture Load instruction, Decode instruction, Fetch operand, Execute instruction, Write Back result into memory
  • The architecture makes use of clock multiplier to double the clock rate
  • The stages execute their work on posedge and negedge of clock from clock multiplier
  • The design includes three main execution modules ALU, Barrel Shifter, Universal Shift Register
  • The design serve low power design by switching off the other two modules when one is working out of ALU, Barrel shifter, Universal Shifter Register
  • The Design include most efficient Carry select Adder and Subtractor, multiplier using shifter and CSA, divider and modulus using shifter and CSS and universal shifter register and Barrel shifter
  • Developed an arbiter using IEEE based papers on Dynamic Lottery Bus Arbiter for Shared Bus System on Chip and Priority Based Arbiter Scheme and Clubbing both arbitration scheme developed “A Modified Dynamic Lottery Bus Architecture for shared Bus”
  • This arbiter scheme is most efficient scheme as it has improved the bus latency to 40 ns with a clock period of 10 ns for 4-Master whereas the previous Dynamic Lottery Bus architecture offer the latency of 150 ns for clock period of 10 ns for same master
  • The On-Chip Peripheral bus of IBM has been designed and implemented using Dynamic lottery bus Arbitration
  • There are four different channels in this router from where data can enter and leave. It can reconfigure itself depending on the amount of data.
  • If it is more than the stack height of FIFO of one channel the rest of the data is routed to its neighbouring channel depending on the availability of remaining block.
  • Used clock gating technique and one channel off and partially crossbar for reducing the power consumption.
  • Simulation on ModelSim and Synthesis on Xilinx ISE and implementation on Spartan 6 FPGA (Xilinx).
  • Analysis of power by XPower Analyser (Xilinx).
  • This project is designed to transmit the 8-bit binary data serially using USB2.0 protocol.
  • Both protocol layer and physical layer of both transmitter and receiver are designed using Verilog HDL.
  • Protocol layer of both receiver & transmitter is implemented by using an FSM.
  • The transmission starts whenever the TXstart is pressed from input
  • As soon as the TXstart is pressed the protocol layer transmit OUT TOKEN, and waits for the acknowledgement and does not send any other data if valid acknowledgement is not received, after the reception of acknowledgement the DATA PID is transmitted to indicate the receiver that the upcoming data is a valid data again waits for the acknowledgement and after the reception of ack. Input data is transmitted.
  • All the data packet from transmitter can’t be transmitted directly, so the data packet from protocol layer first comes to transmitter physical layer module, this module first convert the parallel 8-bit data into serial data and before sending the data a special pattern known as SYNC pattern (10101010) is transmitted to indicate the start of packet and after the transmission of data another special pattern known as EOP pattern (01111111) is transmitted to terminate the transmission of data.
  • The physical layer of transmitter includes NRZI encoder and BIT STUFFER.
  • NRZI encoder encodes the data, so that proper transition on the transmitted bit is maintained.
  • NRZI encoder is designed in such a way that When 0 is received in serial data it is transmitted directly but when 1 is received it is negated and then transmitted. All the data bits except SYNC and EOP pattern are NRZI encoded.
  • BIT STUFFER module makes sure that the continuous 6 1’s will not transmit by the transmitter; After the 5 consecutive 1’s ‘0’ is inserted.
  • On the receiver side the reception of data starts after the detection of SYNC pattern by the physical layer of receiver module.
  • After the detection of valid SYNC and EOP pattern the received 8 bit data is transferred to the receiver protocol layer.
  • After the detection of OUT TOKEN by the protocol layer the receiver gets ready to receive data
  • After the detection of valid OUT TOKEN, DATA PID and DATA packet, acknowledgement is transferred to the transmitter to tell the transmitter that received data is valid.
  • To receive the proper transmitted data, the physical layer of the receiver include NRZI decoder which decodes the NRZI encoded data and BIT UNSTUFFER which removes the extra 0 inserted by the BIT STUFFER on the transmitter side.
  • All this functionality is written in Verilog, implemented on FPGA BASYS2 kit.
  • The 8-bit value to be transmitted is given from the 8 UP DOWN buttons, and the output can be seen on the 7-segment display.
  • To display the 8-bit value on seven segments, 8-bit binary value is first converted into the BCD value for which an algorithm known as Double Dabble algorithm is implemented. The BCD value so obtained is then decoded into seven segment form.

I2C is a serial protocol for two-wire interface. The I2C-bus supports any IC fabrication process (NMOS, CMOS, bipolar). Two wires, serial data (SDA) and serial clock (SCL), carry information between the devices connected to the bus

  • Each device is recognized by a unique address 
  • Devices can operate as either a transmitter or receiver
  • A master is the device which initiates a data transfer on the bus and generates the clock signals to permit that transfer
  • At that time, any device addressed is considered a slave

Features of I2C design and implementation

  • Dual master dual slave configuration
  • Bidirectional data transfer
  • Half - duplex mode
  • Serial communication synchronous
  • Master slave system
  • Software programmable SCL clock frequency
  • Clock stretching and wait state generation
  • Interrupt flag generation
  • Arbitration lost interrupt with automatic transfer cancellation
  • Bus busy detection
  • Supports 7-bit and 10-bit addressing modes
  • Supports 100 KHz and 400 KHz modes
  • Clock Supports 3.4 MHz

Implementation of I2C design

  • Top Level Module
    • This connects all the functional blocks together
    • Generates byte-wide data and acknowledgement
  • Internal Registers Module - Have different register for protocol functioning
    • Prescale Register
    • Transmit Register
    • Receive Register
    • Status Register
    • Command Register
  • Byte command controller module
    • Includes a state machine to handle normal I2C transfer sequences
    • Contains a shift register which is used for both READ and WRITE cycles
  • Bit command controller module
    • It controls the I2C bus, scl and sda lines
    • Generate the correct sequences for START, STOP, Repeated START, READ, and WRITE commands
    • Ensures that the logical relationship between the scl and sda lines meets the I2C requirement for these critical commands
  • Physical layer of USB3.0 is implemented using Verilog HDL in such a way that the dynamic power consumption gets reduced
  • The inputs of the top module are 8 bit TX data in and 1 bit TxDataK
  • TxDataK indicates whether the input is a valid data or a valid command
  • The physical layer receives 8-bit data from the link layer and scrambles the data to reduce EMI emissions
  • It then encodes the scrambled 8-bit data into 10 bit symbols for transmission over physical connection
  • The bit stream is recovered from the differential link by the receiver, assembles into 10 bit symbols, decoded and descrambled, producing 8-bit data that are then sent to link layer for further processing
  • The main module of transmitter physical layer has 8B10B encoder and data scrambler
  • The scrambling function is implemented using free running Linear Feedback Shift Register (LFSR). On the transmit side, scrambling is applied to character prior to the 8b/10b encoding
  • Scrambler operation is not done when the TxDataK is high i.e. if input is a command
  • The scrambled 8-bit data is then mapped to 10-bit using 8B10B Encoding
  • The encoder block is divided into two sub block namely 3b4b encoder and 5b6b encoder
  • To reduce the dynamic power consumption another module clock generator is implemented before the encoder
  • This clock generator generates clock for encoder only if the value input data is valid
  • On the receiver side, reverse operation is performed i.e. it include data descrambler which descramble the scrambled data and 10b8b decoder which decodes 10-bit encoded data into 8-bit original data
  • The receiver gets the single bit data in 8b10 encoded format, which is then decoded into original form after that 8-bit data so obtained is passed through descrambler module to get the original transmitted data
  • Like transmitter same type of clock generator module is also used before the decoder to reduce the dynamic power consumption
  • The main USB 3.0 has 4 layers physical, datalink, protocol and application (device host)
  • A Scrambler is part of the physical layer
  • The purpose of the scrambler is to reduce repeated patterns and to prevent concentration of emitted energy at only few frequencies
  • It validates the analytical assumption that all symbols have equal probability to be transmitted
  • The scrambler is implemented using the linear feedback shift registers (LFSR), as shown in the above figure
  • Input sequence is XORed with a PN (pseudorandom) sequence, to produce a seemingly random sequence
  • XOR the output with the same PN sequence to recover the data in the receiver
  • This encoder is a form of line code, or baseband modulation
  • Its main purpose is to maintain DC balance
  • 8 bits (a byte) is mapped to 10 bits
  • Using a lookup table, the 3 MSB is mapped to 4 bits; the 5 LSB are mapped to 6-bits
  • The mapping is dependent on the difference between the number of 1’s and the number of 0’s
  • By 8B/10B encoding, the difference between 0’s, 1’s is always kept below 2
  • In addition, no more than five 1’s or 0’s appears in a row, this makes clock recovery easier

Parallel to serial conversion

  • Up to this point, the data is stored in microprocessor (in parallel)
  • USB 3.0 supports serial data transmission
  • Each symbol (coded 10 bits) is transmitted LSB first

Clock and data recovery

  • No clock data is sent
  • Therefore, the receiver must generate a clock based on received data
  • Align the generated clock with the data by phase-locked loop (PLL)

Elasticity Buffer

  • Recovered clock might be slightly different from actual clock
  • This results in timing jitter (see figure)
  • Elasticity buffer reduces timing jitter
  • Data is temporarily stored in the elasticity buffer, and retrieved at a rate based on the average rate of the incoming data stream

8B/10B Decoding

  • The dual of 8B/10B encoding
  • 10 bits is converted back to a byte
  • The 8B/10B Decoder has a very similar structure compared to the encoder (see figure below receiver section of Physical Layer)


  • Feed the output of the 8b/10b decoder to the same Linear Feedback Shift Register (LFSR) to recover the data sequence
  • Performing XOR twice results in the original sequence
  • UART sends data serially at the clock frequency defined by the baud rate
  • BAUD rate defines how many number of bits can be transferred in 1 second e.g. BAUD 9600 means 9600 bits can be transferred in 1 sec.
  • Before the transmission of 8-bit data start bit is transmitted from the transmitter and after the transmission of 8-bit data parity bit of the 8-bit data is transmitted.
  • To check the correctness of data the parity of 8-bit transmitted data is also appended after the data bits.
  • At the end a STOP bit i.e. 1 is send to terminate the transmission of data.
  • All the above operations are controlled by an FSM.
  • To send different type of bits i.e. start bit, data bit, parity bit, stop bit a 4X1 mux is implemented select line of which is controlled by the FSM.
  • Ideally the FSM is at IDLE state and at this state value of select line is 2’b11 i.e. output of MUX is STOP bit.
  • Whenever TXstart button is pressed the state of FSM jumps from idle state to START BIT state and in this state the value of select line is 2’b00 i.e. output of MUX is start bit, in this state the 8bit input which is to be transmitted is also loaded on to the temp register.
  • After 1 clock cycle the state of FSM changes to DATA STATE and remains on this state for the 8-clock cycle and at this state the value of select line is 2’b01 and at each clock cycle the 8-bit data is transmitted serially.
  • After the 8 clock cycle the state changes to PARITY STATE, at this state the value of select line is 2’b10 i.e. parity of 8-bit transmitted data is transmitted at this state.
  • At the next clock pulse the state changes to STOP STATE and at this state 1 i.e. stop bit is transmitted.
  • On the receiver side reverse operation is performed.
  • The receiver is ideally at high state, the reception starts whenever it goes high to low i.e. start bit detected.
  • After the detection of valid start bit, the receiver compares the parity of 8 bit received data with the bit received after the reception of 8-bit data.
  • After the detection of valid parity the stop bit is checked, if either stop bit of parity bit is not detected, the receiver does not generates any output.
  • All the UART functionality written in Verilog and implemented on FPGA BASYS2 kit.
  • The 8-bit value to be transmitted is given from the 8 UP DOWN buttons, and the output can be seen on the 7-segment display.
  • To display the 8-bit value on seven segments, 8-bit binary value is first converted into the BCD value for which a algorithm known as Double Dabble algorithm is implemented. The BCD value so obtained is then decoded into seven segment form.
  • Digital Alarm Clock is designed using Verilog HDL
  • Top module of this project has 3 main inputs
    • Show alarm - To show the alarm time
    • Increment Min. - To increment the Min digit
    • Increment sec. - To increment second
  • Increment Min and increment Sec works for both alarm mode as well as time mode
  • To perform the time operation, time counter module is implemented which increment time at every second
  • To store the alarm value, alarm register module is implemented
  • To increment the time data or alarm data, time set counter module is implemented which increment the time value or alarm value depending upon the state by 1 at every press on increment minute or increment second button
  • To control all the operation of all modules another module which include controller, FSM is implemented. This module is responsible for the generation of all the necessary control signal at each state i.e. whether we want to show time or on seven segment and also which value is to be incremented when increment button is pressed
  • Seven segment can display either the time value or alarm value or the incremented value, thus to implemented this functionality another module defined by Display driver is implemented which contains set of mux and gives either alarm data or time data at the output
  • All this functionality is implemented on FPGA basys2 kit
  • Seven segments are used as a display device and push button is used to give the input to alarm clock
  • LED glows whenever time data reaches to value where alarm is set
  • MIPI Alliance standard for CSI-2 (camera serial interface 2) provides the image sensor a normal, low power, high speed, and low cost interface that supports an extensive range of imaging solutions for mobile phone, pc-camera and vehicle video recorder devices.
  • CSI has a layered architecture
    • The PHY Layer specifies the transmission medium (electrical conductors), the input/output circuitry and the clocking mechanism that captures “ones” and “zeroes” from the serial bit stream. This part of the specification documents the characteristics of the transmission medium, electrical parameters for signalling and the timing relationship between clock and data Lanes. This layer is implemented by MIPI DPHY Protocol.
    • Pixel/Byte Packing/Unpacking Layer: The CSI-2 supports image applications with varying pixel formats from six to twenty-four bits per pixels. In the transmitter, this layer packs pixels from the Application layer into bytes before sending the data to the Low-Level Protocol layer. In the receiver, this layer unpacks bytes from the Low-Level Protocol layer into pixels before sending the data to the Application layer. Eight bits per pixel data is transferred unchanged by this layer.
    • Low Level Protocol: The Low-Level Protocol (LLP) includes the means of establishing bit level and byte-level synchronization for serial data transferred between SoT (Start of Transmission) and EoT (End of Transmission) events and for passing data to the next layer. The minimum data granularity of the LLP is one byte. The LLP also includes assignment of bit-value interpretation within the byte, i.e. the “Endian” assignment.
    • Lane Management: CSI-2 is Lane-scalable for increased performance. The number of data Lanes may be one, two, three or four depending on the bandwidth requirements of the application. The transmitting side of the interface distributes (“distributor” function) the outgoing data stream to one or more Lanes. On the receiving side, the interface collects bytes from the Lanes and merges (“merger” function) them together into a recombined data stream that restores the original stream sequence.
    • Application Layer: This layer describes higher-level encoding, interpretation of data contained in the data stream. The CSI-2 specification describes the mapping of pixel values to bytes. Implemented Tx, Rx as per specification:
  • D-PHY describes a source synchronous, high speed, low power, low cost PHY, especially suited for mobile applications. This D-PHY specification has been written primarily for the connection of camera and display applications to a host processor. Nevertheless, it can be applied to many other applications. It is envisioned that the same type of PHY will also be used in a dual-simplex configuration for interconnections in a more generic communication network.
  • Clock lane: A Clock Lane is similar to a Unidirectional Data Lane. However, 913 there are some timing differences and a Clock Lane transmits a High-Speed DDR clock signal instead of data bits. Furthermore, the Low-Power mode functionality is defined differently for a Clock Lane than a Data Lane. A Clock Lane shall be unidirectional and shall not include regular Escape mode functionality.
  • Date Lane: All Data Lanes shall support High-Speed transmission and Escape mode in the Forward direction. There are two main types of Data Lanes:
    • Bi-directional (featuring Turnaround and some Reverse communication functionality)
    • Unidirectional (without Turnaround or any kind of Reverse communication functionality)

Note: Click on the linked heading text to expand or collapse panels.


  • VHDL
  • Verilog
  • System Verilog
  • UVM
  • Randomization
  • Coverage
  • Assertions
  • C/C++
  • TCL
  • Perl
  • System C, TLM, ARM
  • FPGA
  • STA, Low Power


  • USB 2.0/3.0
  • 32-bit RISC Processor
  • OPB
  • NOC
  • PCI Express
  • AES
  • DMA
  • ARM
  • Bluetooth Wireless

Contact Us

3ST Technologies Pvt. Ltd.
1899A, First Floor,
Kotla Mubarakpur,
New Delhi - 110003,
Bengali Sweet Centre Road,
Opposite B-50 (Hotel Blue Moon),
South Ex Part 1,
South Ex Metro Station Gate 2
P: +91 11 43065187
    +91 9899026065
    +91 7042595864 (whats app)
    +91 7042595865