

## Design of an Approximate Multiplier with Novel Dual-Stage 5: 2 Compressors

Endla Upender<sup>1</sup>, V. Vijayabhasker<sup>2</sup>, G. Sreenivas<sup>3</sup>, M. Ranjith Reddy<sup>4</sup>

<sup>1</sup> Research Scholar, Siddhartha Institute of Technology and Sciences Koremulla Road, Narapally, R.R. District, Telangana.
<sup>2</sup>Associate Professor, Siddhartha Institute of Technology and Sciences Koremulla Road, Narapally, R.R. District, Telangana.
<sup>3,4</sup>Associate Professor, Siddhartha Institute of Technology and Sciences Koremulla Road, Narapally, R.R. District, Telangana.

### ABSTRACT

High-speed multimedia applications have ushered in a new era of approximation-based errortolerant circuits. These programmes provide great performance at the expense of precision. In addition, such approaches minimize system complexity, latency, and power consumption. When compared to existing systems, this research examines and gives recommendations for the design and analysis of two approximation compressors that are smaller, quicker, and consume less power while maintaining the same accuracy. The suggested designs have been thoroughly examined and projected on several sizes, including time and area. The proposed estimate 5: 2 com-pressor decreased area and latency when compared to the approximate multiplier using a 4: 2 com-pressor.16 Bit Dadda multipliers are used with the given compressors. In terms of how accurate they are, these multipliers are the same as the best approximation multipliers on the market today. The research is being expanded to look at how the Suggested architecture can be used in error-tolerant applications like image smoothening.

Index Terms: Approximate 5:2 compressors, approximate multipliers, and image processing.

### I. INTRODUCTION

sophisticated Α wide range of applications need power proficiency. These applications are also implanted and powered by a battery. Internet of Things (IoT) are example devices one of such applications. These applications, such as image preparation, detection, recognition, and AI, are inextricably error-tolerant. Due to the fact that perfect outcomes are rarely required. almost precise results are frequently sufficient. In this approach, rough figuring [1] is one of the most promising techniques for addressing the necessity for low force consumption in such applications. Force may be swapped for precision using this approach.

Multiplication is a crucial action in many applications, including the ones mentioned earlier. As a result, lowering the cost of duplication improves the previously specified group of uses. This business is based on an approximate multiplier. While a few inexact multipliers have been realized [1, 2, 3, 4, 5], their reach is limited due to the fact that the majority of the earlier studies need exactness adaptability [2, 3]. As a result, dynamic configurability is critical,



especially for the following two reasons. In today's computerized signal processing and applications, multipliers play a significant role. It is now possible to construct multipliers that match both of the following plan objectives – rapid, low force utilization and consistent format, and hence less territory – can even combine both in a single multiplier, making it suitable for a range of high speed, low force, and restricted VLSI implementations.

As more people use electronic devices, VLSI design needs to have better power delay characteristics. This makes systems that work quickly and well Processors often multiply. The speed of the processor is a decisive factor in how quickly they can work. Multiplication is a three-step process. The second stage of big multipliers gives more power, area, and delay, therefore extending this area is considered essential.

• Compressors decrease the number of operands in partial-product addition and multiplication.

• As a consequence, compressor circuit trees can efficiently use the stage for decreasing partial products.

• The compressor tree's foundations are the 4:2 and 5:2 compressors.

• The multiplier's performance will be improved by building a more efficient compressor.

• Compressors like as 3:2, 4:2, 4:3, 5:2, 5:3, 6:3, 7:3, and so on were designed by previous designers.

• A full adder, or 3:2 compressor, adds 3 bits at a time, whereas a 4:2 compressor adds 4 bits at a time and a 4:2 compressor adds 5 bits at a time. As a result, a 5:2 com-pressor improves operating speed. Approximation can be used to minimize the overhead on a processor's compute units, allowing for higher performance and efficiency. The system's latency is inversely related to the speed of operation, therefore enormous parallel processes are required, which cost a lot of hardware and energy.

Relaxing the system's precision and dependability might lead to more energy and space efficient alternatives. Approximate computing balances delay, space, and power. Approximation in arithmetic reduces design complexity and power consumption. Machine learning and multimedia applications would not be affected by the trade-off, which would be a drop in accuracy.

This app takes use of the human eye's incapacity to identify little variations in images and videos. Artificial Intelligence (AI) and Digital Signal Processing (DSP) are two topics of study (DSP), this level of mistake tolerance is integrated into approximation arithmetic circuits. Approximate arithmetic units have been the subject of extensive investigation. Partial product summation causes the largest power consumption and system delay in multiplication.

Compressors have been shown to minimize the amount of time spent waiting for product summaries, according to research Compressors use half-adders or fulladders to guess the number of logic 1s in the input. 7:3, 5:2, 4:2, and 3:2 are some of the most popular com-pressor configurations. As a result of the cascading process, a 4: 2 com-pressor is favored above any other architecture. Dadda multipliers are also often constructed using this formula.in this



paper we are introduces 5:2compressors for designing 16 bit approximate multiplier.

## **II. LITERATURE VIEW**

Jiangmin et al. presented XOR-XNOR for tree-structured fast multipliers as a lowpower 4:2 com-pressor. Chang et al. demonstrated 4:2 and 5:2 compressor at 0.6 volts. Moment et al. designed a delay- and approximation power-optimized compressor. According to Akbari and coworkers, an approximate com-pressor may reconfigured switch be to between approximation and exact operations depending on the situation. Would minimize the compressor's error profile.

A reduction in the number of columns is achieved by multiplying by two (beginning from the right in the whole partial product array). Only the remaining columns are compressed. Guo et al. propose an approximation com-pressor based on probability. An approximation multiplier based on a partial product count has a topdown structure proposed by the authors that dynamically distributes among the approximate compressors 8:2, 6:2, and 4:2.

A grouped error recovery approach is also available as a way to improve the multiplier's accuracy. To improve on the approximation multiplier proposed bv Alouani and coworkers, the researchers proposed an approximation multiplier that homogeneous. A genetic was more algorithm-based approximation adder is used to accomplish this. An XOR-less (AND-OR based) com-pressor was presented by Esposito et al. to lower the average error and the error rate. To reduce energy consumption, Chang et al. developed a 4: 2 compression algorithm with a 25%

error rate.

It was suggested by Gorantla and Deepa to use the 4: 2 com-pressor design with a 12.5 percent error rate. The area, delay, and power constraints are all loosened to achieve this. The literature investigates optimized design employing transmission gates due to the large reduction in latency when compared to traditional CMOS-based logic. The variability in peak and fall timings for different inputs, on the other hand, is a significant disadvantage. In this study, two innovative 4: 2 com-pressor designs are offered.

## III. DESIGN OF AN APPROXIMATE MULTIPLIER WITH NOVEL DUAL-STAGE5:2 COMPRESSORS

In multipliers with more than two stages of cascaded compressors, a revolutionary dual-stage com-pressor design saves space, time, and energy.

## ACCURATE 5:2 COMPRESSOR

Figure 1 shows the 5:2 compressor's overall block design. Five inputs, three outputs, and two cascaded adders. The exact 4:2 compressor contains A1, A2, A3, A4, CIN, COUT, CARRY, and SUM outputs. Define COUT, CARRY, and SUM.

 $Cout = C1(X1 \oplus X2 \oplus X3) (X4 \oplus X5 \oplus Cin) +$ 

C2(X1  $\oplus$  X2  $\oplus$  X3 ) (X4  $\oplus$  X5  $\oplus$  Cin) +

```
C1C2 (1)
```

 $Carry = ((X1 \oplus X2 \oplus X3) (X4 \oplus X5 \oplus Cin)) \oplus$ 

 $SUM = Cin \oplus X1 \oplus X2 \oplus X3 \oplus$ 

Figure 1 depicts a com-pressor chain. Cin is the input carry from the 5: 2 compressor that handled the lower significant



e-ISSN: 2348-6848 p-ISSN: 2348-795X Vol. 9 Issue 07 July 2022

bits before. CARRY and COUT are order '1' outputs having more importance than the input Cin.



Fig1. Exact 5:2 compressor.



Fig 2. 5 : 2 Compressor chain. AREA-EFFICIENT APPROXIMATE 5:2 COMPRESSOR

Figure 3 shows a high-speed compressor with a ratio of about 5:2. It receives A1, A2, A3, A4, and A5 and returns CARRY and SUM. MUX inspired SUM's design. MUX select line is XOR gate output. If the choice line is high, (A3A4A5) is picked; otherwise, (A3+A4+A5) is. The suggested 5:2 compressor may be capable of reducing carry-generating logic to an OR gate by adding an error with an error distance of 1 to the truth table of the actual compressor. SUM and CARRY are logically compatible.

$$SUM = (A1 \bigoplus A2) A3A4A5 + (A1 \bigoplus A2) (A3 + A4 + A5) (4)$$

$$CARRY = A1 + A2 (5)$$

$$A1 A2 A1 A2 A3 A4 A5 A3 A4 A5$$



Fig 3. Area-efficient 5:2 com-pressor. DUAL-STAGE APPROXIMATE 5:2 COMPRESSOR

This research presents an alternate multiplier design for more than three cascaded compressors. High-speed areaefficient com-pressor design requires one XOR, one AND, and two OR gates (as illustrated in Figure 4). Gates are OR and AND. Figure 4 shows a design with NAND and NOR gates. Even though the updated design's SUM and CARRY aren't identical to the 5: 2 com-pressor architecture's, cascading the com-pressor in multiples of 2 eliminates the inaccuracy.



Fig 4. The suggested improved Dual-stage



5:2 compressor's basic building component.

## Modified 16×16 Dadda Multiplier Using Proposed 5: 2 Compressors

Multipliers have three parts. FIRST: AND gates yield incomplete products. Second, compressors decrease PPM's maximum height (partial product matrix). Third, a carry propagation adder produces the final result. PPM reduction circuitry is largely responsible for the multiplier's design complexity (i.e., the second part). Multiplier design optimizes PPM reduction circuits. This section presents a 16x16 multiplier design. Figure 5 shows our PPM circuitry's architecture.

Larger, medium, and smaller weights are categorized by importance. Designers may adjust the number of high, medium, and low significance weights to achieve a balance between power consumption and precision.

Our PPM reduction circuitry reduces power consumption through importancedriven logic compression. Higher weights employ precise (5:2) compressors, moderate weights use approximate (5:2) two-stage compressors, while lower weights use inaccurate (5:2) area efficient compressors (OR-tree based approximation). There are three phases to our PPM reduction circuitry. All of the weights are in the first stage. Only the higher significance weights are considered in the second and third stages.

Each weight has at most two product phrases when the second and third phases are finished. As a consequence, the final output may be produced using a carry propagation adder. The circuit diagram of suggested 16 bit approximate multiplier as shown here.



Fig 5. Proposed 5:2 compressors used in 16 16 multiplier

#### **IV. RESULTS**

**RTL Schematic:** RTL schematic is the architecture's blueprint. It compares the intended design to our ideal architecture. Using verilog or vhdl, the hdl language turns an architecture's description or summary into a working summary. RTL schematics provide internal connection blocks for better investigation. The plan's RTL schematic diagram is below.



Fig 6.RTL schematic of the proposed design

**Technology Schematic**: The technology schematic represents the architecture in the



e-ISSN: 2348-6848 p-ISSN: 2348-795X Vol. 9 Issue 07 July 2022

LUT format, which is a parameter of area used in VLSI to estimate the design of the architecture. The LUTs in the FPGA hold the code's memory allocation, which is referred to as a squarunit.



Fig 7.Technology Schematic of the Proposed Design

**Simulation:** Simulation is used to verify system functioning, whereas schematics examine connections and blocks. The simulation window is launched by choosing "simulation" from the tool's home screen drop-down menu. It provides several radix number systems.



Fig 8: Simulation wave forms of proposed approximate multiplier PARAMETERS:-

Area, delay, and power are VLSI considerations. Compare designs using these criteria. The HDL language is verilog, and XILINX 14.7 calculates the parameter.

| PARAMETERS  | Approximate multiplier using | Approximate multiplier using |
|-------------|------------------------------|------------------------------|
|             | 4:2compressor                | 5:2compressor                |
| No of LUT's | 581                          | 548                          |
| Delay(ns)   | 39.790                       | 37.628                       |

**Table 1: Parameters Table** 



Fig 9: LUT comparison bar graph





### **V. CONCLUSION**

This work presents a unique technique for approximating 5: 2 compressor topologies using an approximation multiplier. First, a high-speed, spaceefficient com-compressor architecture is described. Compared to earlier state-of-theart com-compressor designs, this architecture saves a lot of space, time, and energy. In the same way, the proposed design is pretty accurate.

As a result, the suggested design decreases both the area power and the



latency. The research also showed a revolutionary dual-stage com-pressor architecture that boosted space, latency, and power without compromising accuracy. The architecture was employed for image processing applications such as image multiplication and smoothing, and the 16bit Dadda multiplier was used.

### REFERENCES

[1] S. Ghosh, D. Mohapatra, G. Karakonstantis, and K. Roy, "Voltage scalable high-speed robust hybrid arithmetic units using adaptive clocking," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 18, no. 9, pp. 1301–1309, Sep. 2010.

[2] D. Baran, M. Aktan, and V. G. Oklobdzija, "Multiplier structures for low power applications in deep-CMOS," in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), Rio de Janeiro, Brazil, May 2011, pp. 1061–1064.

[3] S. Mittal, "A survey of techniques for approximate computing," ACM Comput. Surv. vol. 48, no. 4, pp. 1–33, Mar. 2016.

[4] H. Jiang, C. Liu, L. Liu, F. Lombardi, and J. Han, "A review classification and comparative evaluation of approximate arithmetic circuits," ACM J. Emerg. Tech. Comput. Syst., vol. 13, no. 4, p. 60, 2017.

[5] J. Liang, J. Han, and F. Lombardi, "New metrics for the reliability of approximate and probabilistic adders," IEEE Trans. Comput., vol. 62, no. 9, pp. 1760–1771, Sep. 2013.

[6] R. Zendegani, M. Kamal, M. Bahadori, A. Afzali-Kusha, and M. Pedram, "RoBA multiplier: A rounding-based approximate multiplier for high speed yet energyefficient digital signal processing," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 25, no. 2, pp. 393–401, Feb. 2017.

[7] H. Jiang, J. Han, F. Qiao, and F. Lombardi, "Approximate Radix-8 booth multipliers for low-power and high-performance operation," IEEE Trans. Comput., vol. 65, no. 8, pp. 2638–2644, Aug. 2016.

[8] S. Hashemi, R. I. Bahar, and S. Reda, "DRUM: A dynamic range unbiased multiplier for approximate applications," in Proc. IEEE/ACM Int. Conf. Computer-Aided Des. (ICCAD), Austin, TX, USA, Nov. 2015, pp. 418–425.

[9] S. Narayanamoorthy, H. A. Moghaddam, Z. Liu, T. Park, and N. S. Kim, "Energyefficient approximate multiplication for digital signal processing and classification applications," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 23, no. 6, pp. 1180–1184, Jun. 2015.

[10] G. Zervakis, S. Xydis, K. Tsoumanis, D. Soudris, and K. Pekmestzi, "Hybrid approximate multiplier architectures for improved power accuracy trade-offs," in Proc. IEEE/ACM Int. Symp. Low Power Electron. Des. (ISLPED), Rome, Italy, Jul. 2015, pp. 79–84