

# A Study on the VLSI Implementation of Fingerprint Thinning Processors Using Hybrid GDI Technique

Seungmin Jung\*

Division of Software Convergence, Hanshin University, Osan, Korea jasmin@hs.ac.kr

## Abstract

Although the gate-diffusion input (GDI) technique supports low power and small area compared to conventional CMOS standard cell, it is limited to design larger integrated circuits for several reasons. It is difficult to apply the general RTL design flow of building a GDI standard cell library and designing a chip through logic circuit synthesis. In this paper, we proposed the hybrid GDI technique with new structure. We analyzed the problems of the GDI technique, and the analysis extracted the circuit characteristics of previous CMOS and GDI cells and found the cause of the problems. We proposed the synthesis algorithm for the hybrid GDI design. The performance of hybrid GDI was compared and analyzed using synthesized thinning image processor on 1.8 V, 180n CMOS process. The proposed hybrid GDI technique proved that it could be applied to system-on-a-chip (SoC) design with low power and small cell area.

Category: Embedded Systems

Keywords: Hybrid GDI circuit; Single-well; Standard cell library; RTL; SoC; VLSI

## **I. INTRODUCTION**

The 3-nm process has entered the mass production stage in semiconductor foundry technology. Nevertheless, the 90–350 nm process is still being applied in various applications. Fingerprint recognition system-on-a-chip (SoC) is an example [1, 2]. The chips do not need to be smaller in area, and it is much more efficient to apply a low-cost process. Many VLSI (very large scale integrated circuit) chips are put into mass production by applying a low-cost process. Power and area reduction are the important parameters that determine the efficiency of VLSI design. Power consumption is the primary factor in high performance computing, portable and wireless applications. Since the area of a specific block, such as a compiled memory, is large in a low-cost process, the area and power consumption of standard cells are more important parameters. In this sense, the gate-diffusion input (GDI) technique is expected to enable low power and a small cell layout [3-8]. Fig. 1 shows the GDI base cell. It looks like normal CMOS (complementary metal oxide semiconductor) inverter. However, it contains three inputs, G, P, and N. P and N are the source terminals of the CMOS inverter but are used as input terminals in the GDI circuit. Different logic functions (AND, OR, MUX, and NOT) can be implemented using this single GDI cell as shown in Table 1 [8].

In previous studies, the GDI circuit was applied in a full-custom design method only to small circuits such as combinational adders or multipliers [9-13]. VLSI circuits are typically developed with a register transfer level (RTL) design flow. Synthetic GDI libraries are required for logic

#### Open Access http://dx.doi.org/10.5626/JCSE.2023.17.1.20

#### http://jcse.kiise.org

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Received 09 October 2022; Accepted 26 February 2023 \*Corresponding Author



Fig. 1. GDI basic cell.

Table 1. GDI function for input configuration

|   | Input               |     | Output               | Function |
|---|---------------------|-----|----------------------|----------|
| G | Р                   | Ν   | - Output             | Function |
| А | В                   | '1' | A+B                  | OR       |
| А | <b>'</b> 0 <b>'</b> | В   | A·B                  | ABD      |
| А | В                   | С   | $\overline{A}B + AC$ | MUX      |
| А | '1'                 | '0' | А                    | NOT      |

circuit synthesis tools such as Synopsys Design Compiler (DC). An extension of the GDI technique was mentioned in this previous study [8]. However, no further studies have attempted GDI-based SoC as an RTL design flow. In this paper, we proposed a hybrid GDI technique with a new structure for RTL design. We analyzed the problems of the GDI technique and proposed a new GDI structure that is advantageous for electrical characteristics. The characteristics of hybrid GDI are compared and analyzed with a CMOS library on a 1.8-V, 180n p-sub, n-well CMOS process. In addition, we proposed a circuit synthesis algorithm for the hybrid GDI technique. The sample circuit has been implemented and simulated to evaluate the performance of the proposed hybrid GDI method.

Section II presents a detailed electrical characteristics



Fig. 2. Modified GDI base cell.

analysis of GDI and various results, including cell layout compared to CMOS. Section III describes the problems of the traditional GDI circuit and proposes new hybrid GDI method to improve it. A sample RTL circuit was designed using the proposed algorithm and its performance is compared and analyzed with the previous CMOS results. Conclusions and future work are discussed in Section IV.

## **II. ANALYSIS OF PRIMITIVE GDI CIRCUITS**

## A. Modified GDI Base Cell

Since the bulk voltage of MOS was tied to the source terminal in conventional GDI cell as shown in Fig. 1, a twin-well structure or a deep-well structure was required. Because the cell pattern area needed to be significantly extended, it was impossible to achieve the benefits of GDI. Fig. 2 shows the modified GDI base cell. This structure applied the same substrate voltage as in the general CMOS process using p-sub and n-well, so there was no loss of layout area. Fig. 3 shows primitive GDI combinational circuit and Fig. 4 shows the GDI circuit of OR and AND cell with N inputs. It can be seen that GDI cell was more advantageous than CMOS as the number of input pin increased. Fig. 5 shows the HSPICE simulation



Fig. 3. GDI combinational circuit: (a) OR, (b) AND, (c) MUX, and (d) XOR.



Fig. 4. N-input GDI combinational circuit: (a) multiple input OR and (b) multiple input AND.



Fig. 5. HSPICE simulation result of GDI 2-input AND and OR cell (NMOS width of 600 nm, PMOS width of 900 nm, length of 180n): (a) input A, (b) input B, (c) AND output, and (d) OR output.

results of 2-input GDI AND and OR cell. It was performed using the 1.8-V 180n CMOS process parameter. The width of NMOS was 600 nm and length was 180 nm. The width and length of PMOS were 900 nm and 180 nm, respectively. The transistor size was the same for all primitive GDI combinational cells. Fig. 5(a) and 5(b) are input signals, Fig. 5(c) is the output of AND cell, and Fig. 5(d) shows the output of OR cell. The upper output region of VDD/2 will be recognized as logical 1, and the lower region as a logical 0. The output waveform showed that, unlike a traditional CMOS cell, the GDI gate did not have a complete swing. The output voltage degradation appeared due to the MOS transistor threshold voltage (VTP, VTN) drop in AND and OR operations. If we looked at the structure of the GDI basic cell in Fig. 1, the possible output voltages could be 0, VTN, VDD-VTP, and VDD depending on the input stimulus. Here, VTP and VTN represent the threshold voltage of PMOS and NMOS, respectively.

## B. Characterization of Cell

In this research, the characteristics of the primitive GDI cell were extracted and compared with the CMOS cell. The simulations were performed by HSPICE with 180n standard CMOS process parameters at 1.8 V operating voltage. The rising time and falling time of the input signals were 100 ps, and the CL (load capacitance) of the output pin was applied to  $\times 1$ ,  $\times 3$ , and  $\times 10$  for the unit fan-out, respectively. The unit fan-out means the input capacitance of the CMOS 2-input NAND cell. The length of MOS was 180n and width is same with CMOS inverter. Period of simulation was 20 ns and operating frequency is 50MHz for power calculation. Unit of power consumption is nW/MHz. Tables 2 and 3 show the comparisons of the number of transistors, power consumption, and operating speed for GDI and CMOS basic cells. It showed that GDI had an average of 60% or more superior performance compared to CMOS in terms of the number

 Table 2. Performance comparison of GDI and CMOS primitive cells (180n 1.8 V CMOS process)

|                    | GDI       |                |          |                   |          |           | CMOS     |           |          |          |  |
|--------------------|-----------|----------------|----------|-------------------|----------|-----------|----------|-----------|----------|----------|--|
| Gate type # of TBs | Power     | ower Delay (s) |          | # - <b>f</b> TD - |          | Power     |          | Delay (s) |          |          |  |
|                    | # 01 1 KS | (nW/MHz)       | ×1       | ×3                | ×10      | # 01 1 KS | (nW/MHz) | ×1        | ×3       | ×10      |  |
| OR2                | 2         | 10.4           | 1.79E-10 | 3.65E-10          | 1.02E-09 | 6         | 45.9     | 1.46E-10  | 1.76E-10 | 2.57E-10 |  |
| OR3                | 3         | 11.7           | 1.24E-10 | 2.18E-10          | 5.45E-10 | 8         | 51.1     | 1.97E-10  | 2.32E-10 | 3.23E-10 |  |
| OR4                | 4         | 10.3           | 1.05E-10 | 1.66E-10          | 3.74E-10 | 10        | 57.9     | 2.45E-10  | 2.84E-10 | 3.85E-10 |  |
| AND2               | 2         | 7.9            | 1.26E-10 | 2.65E-10          | 7.01E-10 | 6         | 35.4     | 9.19E-11  | 1.31E-10 | 2.64E-10 |  |
| AND3               | 3         | 8.6            | 1.30E-10 | 2.76E-10          | 7.29E-10 | 8         | 38.3     | 1.05E-10  | 1.46E-10 | 2.79E-10 |  |
| AND4               | 4         | 9.1            | 1.30E-10 | 2.82E-10          | 7.48E-10 | 10        | 42.2     | 1.22E-10  | 1.66E-10 | 3.01E-10 |  |
| MUX                | 2         | 16.5           | 1.79E-10 | 3.65E-10          | 1.02E-09 | 12        | 81.9     | 1.38E-10  | 1.77E-10 | 3.09E-10 |  |
| XOR2               | 4         | 42.2           | 4.63E-11 | 7.53E-11          | 1.82E-10 | 12        | 105.5    | 1.12E-10  | 1.44E-10 | 2.27E-10 |  |
| XOR3               | 8         | 83.4           | 1.36E-10 | 2.03E-10          | 4.34E-10 | 20        | 455.4    | 1.51E-10  | 1.74E-10 | 2.42E-10 |  |
| DFF                | 12        | 316.8          | 1.42E-10 | 2.02E-10          | 3.92E-10 | 24        | 176.3    | 1.76E-10  | 2.15E-10 | 3.49E-10 |  |

 Table 3. Improvement summary of GDI cells (unit: %)

|                 | # of | Power    | Delay (s) |      |      |  |
|-----------------|------|----------|-----------|------|------|--|
|                 | TRs  | (nW/MHz) | ×1        | ×3   | ×10  |  |
| OR2             | 67   | 77       | -22       | -107 | -297 |  |
| OR3             | 63   | 77       | 37        | 6    | -68  |  |
| OR4             | 60   | 82       | 57        | 42   | 3    |  |
| AND2            | 67   | 78       | -37       | -102 | -166 |  |
| AND3            | 63   | 78       | -24       | -89  | -161 |  |
| AND4            | 60   | 78       | -7        | -70  | -148 |  |
| MUX             | 83   | 80       | -30       | -106 | -230 |  |
| XOR2            | 67   | 60       | 59        | 48   | 20   |  |
| XOR3            | 60   | 82       | 10        | -17  | -80  |  |
| DFF             | 50   | -80      | 20        | 6    | -12  |  |
| Avg improvement | 64   | 61       | 6         | -39  | -114 |  |

Improvement = (CMOS–GDI)/CMOS (%).

of transistors and power consumption. It also showed that the delay time increases sharply compared to CMOS as CL increased. Therefore, the load on the output pin should be synthesized or designed to be less than fan-out x3 as much as possible in the GDI circuit.

The basic sequential cell is a D-type flip-flop (DFF) gate in the standard cell library. Fig. 6 shows typical CMOS DFF with master and slave stage. It needs 24 TRs. The GDI DFF cells have been introduced through previous studies [14, 15]. In [14] and [15], the authors proposed that 18 or more transistors were required to implement the GDI DFF. The present paper proposed the modified



Fig. 6. Typical CMOS DFF with master and slave stage.



Fig. 7. Proposed GDI DFF with master and slave stage.

GDI DFF cell as shown in Fig. 7. It performed the same operation with only 12 transistors. This circuit consisted of two GDI MUXs and CMOS INVs. The proposed GDI DFF has the advantage of reducing the chip area by half while having the high speed compared to the CMOS as shown in Tables 2 and 3. The output stage of GDI DFF always has CMOS level voltage unlike combinational GDI circuits. This will provide an advantageous aspect in GDI netlist synthesis.

## C. Cell Layout

Chip layout was performed through auto placement and routing (P&R) process by Electronic Design Automation (EDA) tool using standard cell library. Fig. 8 shows the layout example of CMOS INV and horizontal layout tiling rule of cells. Here, the yellow box of CMOS INV indicated the cell boundary. The P&R tool maintained a fixed height and arranged the cells horizontally by attaching the cell boundaries according to the connection. The width of layout increased only as a multiple of the horizontal I/ O pin grid "H" as shown in Fig. 8. The layout width of all primitive cells must be drawn as an integer multiple of H and normalized. "k" is the normalized number of each cell layout width. Red box of Fig. 8 is the GDI 2-input AND and OR cell layout. Two GDI cells have k value of 1/2 compared to CMOS cell because these cells can be implemented with a GDI base cell. This indicated 50% reduction in the area of layout. In this paper, the layout of



Fig. 8. Cell-based CMOS and GDI cell layout.

| Table 4. | k-value | comparison | of GDI | versus | CMOS | (180n | CMOS |
|----------|---------|------------|--------|--------|------|-------|------|
| process) |         |            |        |        |      |       |      |

| Cell  | GDI             |    | CMOS            |    |  |  |
|-------|-----------------|----|-----------------|----|--|--|
| type  | cell width (µm) | k  | cell width (µm) | k  |  |  |
| OR2   | 1.4             | 2  | 2.8             | 4  |  |  |
| OR3   | 2.1             | 3  | 4.2             | 6  |  |  |
| OR4   | 2.1             | 3  | 4.2             | 6  |  |  |
| AND2  | 1.4             | 2  | 2.8             | 4  |  |  |
| AND3  | 2.1             | 3  | 3.5             | 5  |  |  |
| AND4  | 2.1             | 3  | 4.2             | 6  |  |  |
| MUX   | 1.4             | 2  | 5.6             | 8  |  |  |
| XOR2  | 2.1             | 3  | 5.6             | 8  |  |  |
| XOR3  | 4.2             | 6  | 12.6            | 18 |  |  |
| D-F/F | 8.4             | 12 | 11.9            | 17 |  |  |

primitive GDI cells was implemented and registered as P&R library on 180n CMOS process. All cells were normalized with k values to compare the layout sizes of GDI and CMOS as shown in Table 4. The GDI cell showed an area reduction rate of 50% or more compared to that of CMOS similar to that in Table 3.

# **III. NOVEL HYBRID GDI CIRCUIT**

## A. Cascade Design of GDI Cell

Cascade design is defined as a series-connected circuit in which the output of the current cell connected to the input of the next cell. Cascade design using GDI cells was achievable when utilizing a standard cell library for a cell-based RTL design flow. If the output of GDI appeared at an intermediate voltage between 0 and VDD, it was necessary to analyze whether the next cell operates. GDI technology enabled the implementation of area and powerefficient logic cells using fewer transistors. However, voltage drop of output was a major drawback of GDI technology, which limits its widespread application. The degradation of output voltage appeared due to the MOS transistor threshold voltage drop in GDI cell. According to the GDI basic cell's structure, the output voltages could range from 0 to VDD-VTP to VDD, depending on the input stimulus. When a circuit connection was made only with GDI cells, it was necessary to analyze the output characteristics. In particular, since bulk bias VBS was not 0, the output voltage characteristics were complicated. Analysis of the features was required when the intermediate level VDD-VTP and VTN are input to the GDI cell of the

| Tab | le 5. | Function | simulation | result of | cascaded | GDI | logic |
|-----|-------|----------|------------|-----------|----------|-----|-------|
|-----|-------|----------|------------|-----------|----------|-----|-------|

| A B |    | G<br>oper | GDI<br>operation |    | 1OS<br>ation | Function<br>error |     |
|-----|----|-----------|------------------|----|--------------|-------------------|-----|
|     |    | OR        | AND              | OR | AND          | OR                | AND |
| L   | WL | 0         | 0                | 0  | 0            | 0                 | 0   |
| L   | WH | 1         | 0                | 1  | 0            | 0                 | 0   |
| WL  | L  | 1         | 0                | 0  | 0            | х                 | 0   |
| WL  | WL | 1         | 0                | 0  | 0            | х                 | 0   |
| WL  | WH | 1         | 0                | 1  | 0            | 0                 | 0   |
| WL  | Н  | 1         | 0                | 1  | 0            | 0                 | 0   |
| WH  | L  | 1         | 0                | 1  | 0            | 0                 | 0   |
| WH  | WL | 1         | 0                | 1  | 0            | 0                 | 0   |
| WH  | WH | 1         | 0                | 1  | 1            | 0                 | х   |
| WH  | Н  | 1         | 0                | 1  | 1            | 0                 | х   |
| Н   | WL | 1         | 0                | 1  | 0            | 0                 | 0   |
| Н   | WH | 1         | 1                | 1  | 1            | 0                 | Ο   |

following step. There were 16 possible combinations of four voltage levels in a GDI cell with two inputs. In this paper, OR and AND cell operations were measured by HSPICE for 12 input stimulus including an intermediate voltage. Here, VTN was assigned as weak low (WL) and VDD-VTP was assigned as weak high (WH) on the truth table. Table 5 shows the simulation results of GDI OR and AND cell. An operation error occurred in the combination of WL and L in the OR cell, and in the input combination of WH and H in the case of AND as shown in Table 5. The "x" means the functional error of the GDI cascade circuit compared to CMOS in Table 5.

This problem was difficult to solve because it is caused by using the diffusion input of GDI technique according to the principle of MOS operation. The cascade circuit errors must be resolved for chip development through RTL design using GDI cell library. The first solution was to optimize the width and length of the MOS for the input stimulus. However, it was difficult to guarantee the operation due to the change according to the process change. The second method was to change the circuit structure. For this, intermediate voltages of GDI input or output, WL and WH must be converted to CMOS level L, H. In terms of circuit, a CMOS buffer can be attached to the input or output terminal of GDI cell as shown in Fig. 9. If GDI has this structure, the intermediate voltage changes to full swing, so even if cascaded GDI circuit is designed, there is no error in operation. However, this structure significantly reduced the advantages of small chip area. If four transistors for buffer were added to GDI, there would be no area difference between a GDI and CMOS cell except for some cells. An innovative solution was needed for cascade GDI design.

## **B.** Proposed GDI Technique

Fig. 10 shows novel hybrid GDI structure that solves the operating errors and enables GDI low-power, small-area



Fig. 9. GDI technique with input or output CMOS buffer.



Fig. 10. Proposed hybrid GDI technique.

chip design. The cascade circuit was a mixture of CMOS and GDI cells in the proposed hybrid GDI technique. The rules of circuit configuration are as follows.

- (1) The current cell must be a CMOS cell if at least one of its input pins is connected to the output of the GDI cell.
- (2) At the top of the chip, the input pin uses a GDI cell, and the output pin applies a CMOS cell.
- (3) NOT gates such as INV, NOR, and NAND have no circuit advantage compared to GDI, so CMOS cells are applied.
- (4) The input terminal of DFF connects the CMOS and the output terminal connects the GDI cell.

Fig. 11 shows hybrid GDI logic synthesis algorithm. The logic netlist created by a synthesis tool such as



Fig. 11. Hybrid GDI logic synthesis algorithm.

Synopsys DC must be modified. The algorithm started from one of the input pins of synthesized logic netlist. A CMOS cell was identified as the cell connected to the DFF cell's input, while a GDI cell was marked as the cell connected to the DFF cell's output. NOT-type cells such as NAND NOR XNOR use CMOS circuits. We assumed that there were N input pins in the top netlist of the chip. N was an input signal number. All N input signals were CMOS level signals. This indicated that CMOS level signals have only outputs of 0 and VDD. If all inputs of the cell connected to pin 1 were CMOS level or CMOS output signals, the cell was changed to a GDI cell. Repeat the previous process for up to N pins. The cell connected to the output signal of the top netlist was changed to a CMOS cell. In the event that the cell type could not be identified, the input signal is taken into account while determining it again.

## C. Performance Evaluation

The libraries for logic synthesis and simulation were required for performance evaluation of hybrid GDI design, respectively. The future goal of our research was to develop Verilog-HDL library and Synopsys DC library for GDI cell. In this paper, it was verified using Nanosim, which is close to the accuracy of HSPICE and rapidly simulates a large-capacity circuit. Nanosim simulated a hybrid GDI netlist at mixed mode using HSPICE model files. Transistor level simulation can accurately predict the performance improvement of hybrid GDI design than typical logic simulation.

The benchmark circuit used the thinning processor, it was designed in the fingerprint sensor SoC [1]. In order to maintain the size of a human fingerprint, many foundry SoC devices, including fingerprint sensors, use a 180n to 350n process rather than a high-level FAB. The thinning processor was implemented using the Zhang-Suen (ZS) algorithm. Fig. 12 describes each step of the ZS algorithm. Thinning was achieved on the binarized image by morphological operations. ZS parallel thinning algorithm performs sub-iteration step twice in  $3\times3$  image pixel window. Pi and P1–8 are pixel binary image value in Fig. 12(a). The image value 1 represented black and 0



Fig. 12. Pixel index and example of N(Pi): (a)  $3\times 3$  window index and (b) number of black pixel neighbor P(i).

meant white in this study. Thinning was performed only when the central image Pi was black. The first step: the pixels satisfied with following conditions are erased.

1. 
$$2 \le N(Pi) \le 6$$
  
2.  $S(Pi) = 1$   
3.  $P2*P6*P8 = 0$   
4.  $P4*P6*P8 = 0$ 

Here, N(Pi) is the number of values 1 in 8-neighbor pixels of Fig. 12(b) and expressed in following Eq. (1).

$$N(Pi) = P1 + P2 + \dots + P8.$$
 (1)

As shown in Fig. 13, S(Pi) means the number of 1 to 0  $(1\rightarrow 0)$  patterns in 8-neighbor pixels.

The second step: Condition 3 and 4 in the first step are replaced with the following conditions.

3'. P2\*P4\*P8 = 0 4'. P2\*P4\*P6 = 0

Fig. 14 shows the design flow of hybrid GDI thinning processor. The yellow box represents the EDA tool and the blue box represents the final hybrid GDI netlist. The synthesized CMOS netlist was converted into the hybrid GDI circuit by applying the algorithm, and then the power and the maximum operating frequency were measured using Nanosim. In our design process, there was no GDI library for logic synthesis or simulation currently, so it is substituted by high-speed circuit level simulation. Verification using Nanosim was more accurate than logic simulation. The thinning algorithm was designed at the behavior level using Verilog-HDL and synthesized by the Synopsys DC using the 180n CMOS library. The synthesized netlist used 33 cell types, and the total cell area was  $10,318 \ \mu m^2$ excluding the routing area. The total number of synthesized cells was 395 and the number of nets was 441. The ratio between the area of the combinational circuit and sequential circuit was approximately 6:4. After auto P&R step, the total cell area was 13,586  $\mu$ m<sup>2</sup> in conventional CMOS library as shown in Fig. 15(a). Layout was also performed for the hybrid GDI net using auto P&R tool. The layouts of 33 GDI cells were designed in full custom and individually registered as a GDI P&R library. Fig. 15(b) shows the layout result of the hybrid GDI thinning processor.



Fig. 13. Number of black to white change neighbor P(i).



Fig. 14. Design flow of hybrid GDI thinning processor.



Fig. 15. Result of 96×96 pixel thinning process layout (@180n 2-poly 6-metal CMOS process): (a) CMOS chip (118.5  $\mu$ m × 114.5  $\mu$ m) and (b) hybrid GDI chip (96.4  $\mu$ m × 91.9  $\mu$ m).

Fig. 16 shows that the hybrid GDI circuit shows the result of thinning operation in the Nanosim simulation result, and that it is operating normally in the same way



Fig. 16. Image result of thinning processing (@96 $\times$ 96 sample image): (a) original image, (b) binary image, and (c) thinning image.

as the CMOS result. Fig. 16(a) shows the original 96x96 fingerprint image, Fig. 16(b) shows the binarized image, and Fig. 16(c) shows the thinning process result. The area was 8,859  $\mu$ m<sup>2</sup> and gate count was 837 in hybrid GDI design. Table 6 compares the power consumption and operating frequency performance of the CMOS and hybrid GDI. As a result of the measurement, it was found

Table 6. Performance comparison of cascaded GDI logic

|                               | Hybrid GDI | CMOS   | Improvement (%) | Remarks            |
|-------------------------------|------------|--------|-----------------|--------------------|
| Number of synthesized cells   | 395        | 395    | -               | -                  |
| Area (µm <sup>2</sup> )       | 8,859      | 13,586 | 34.8            | 180n process       |
| Gate count                    | 837        | 1,284  | 34.8            | 180n process       |
| Power consumption (mW)        | 0.79       | 0.95   | 17              | 20 MHz 1.8 V       |
| Maximum operating speed (MHz) | 25         | 26     | -3.8            | 96×96 pixels image |

that the hybrid GDI thinning processor reduces by 34.8% in chip area and 17% in power consumption compared to only CMOS circuit. In terms of the maximum operating speed, it was found to decrease by about 3.8%. In the individual cell unit, it showed great advantages in area and power consumption compared to CMOS, but when hybrid GDI was applied, the advantages decreased somewhat. This is believed to be the case since using only the GDI circuit for the RTL design was not viable; instead, CMOS was utilized to a degree of 50%. In terms of power consumption, it is considered that the intermediate voltage of GDI induces static power in CMOS. In terms of operating speed, there is no significant change compared to CMOS. This may be a limitation of hybrid GDI. The operating speed was somewhat flexible depending on the application circuit. When the fanout of the GDI output stage increased, the delay time became longer.

In the future, it is necessary to reinforce the technique of optimizing the load of the output stage during synthesis. The future research is to find the possibility of RTL design that can apply only GDI circuits.

## **IV. CONCLUSION**

The GDI technique has low power and small cell area compared to traditional CMOS gates. However, there is a limit to the expansion of standard cell library-based RTL design due to the intermediate level of output voltage. We proposed a hybrid GDI technique with new structure. The problems of GDI technique were assessed and a new GDI structure that is advantageous for electrical characteristics was proposed. The characteristics of hybrid GDI based combinational and sequential basic gates were compared and analyzed with CMOS gates on 1.8 V, 180n CMOS process. In addition, we proposed a circuit synthesis algorithm. The algorithm replaced the entire circuit consisting of CMOS only with a hybrid GDI circuit containing more than 50% of GDI cells. As a result of verification, the proposed hybrid GDI technique proved that it could be applied to SoC design with low power and small cell area. The thinning algorithm was designed at the behavior level using Verilog-HDL and synthesized by the Synopsys DC tool using the 180n library. The synthesized netlist used a total of 33 types of cells, and the total cell area excluding the routing area was 10,318  $\mu$ m<sup>2</sup>. The total number of synthesized cells is 395 and the number of nets was 441. After auto P&R, the total cell area was 13,586 µm<sup>2</sup> in conventional CMOS library. The synthesized CMOS netlist was converted into a hybrid GDI circuit by applying the algorithm, and then the power and the maximum operating frequency were measured using Nanosim. Layout was performed for the hybrid GDI net using auto P&R tool. A total of 33 GDI layouts were designed in full custom and individually registered as a library of P&R tool. As a result of the measurement,

it was found that the circuit composed of hybrid GDI reduced by 34.8% in chip area and 17% in power consumption compared to CMOS circuit. In addition, in terms of the maximum operating speed, it was found to decrease by about 3.8%. Our future goal is to develop an accurate GDI library for synthesis in the RTL design flow.

## ACKNOWLEDGEMENTS

This work was supported by Hanshin University Research Grant.

#### Conflict of Interest(COI)

The authors have declared that no competing interests exist.

## REFERENCES

- S. Jung, "Image processor and RISC MCU embedded single chip fingerprint sensor," *Journal of Sensor and Actuator Networks*, vol. 9, no. 4, article no. 51, 2020. https://doi.org/10.3390/jsan9040051
- H. Yeo, "Touch fingerprint sensor based on sensor cell isolation technique with pseudo direct signaling," *International Journal on Smart Sensing and Intelligent Systems*, vol. 12, no. 1, pp. 1-9, 2019.
- N. Kandasamy, C. Sanjeevaiah, N. Telagam, and R. Merisala, "Hybrid 4:16 decoder using variable bias GDI technique," in *Advances in Electrical and Computer Technologies*. Singapore: Springer, 2021, pp. 637-647.
- M. Hasan, H. U. Zaman, M. Hossain, P. Biswas, and S. Islam, "Gate diffusion input technique based full swing and scalable 1-bit hybrid full adder for high performance applications," *Engineering Science and Technology, an International Journal*, vol. 23, no. 6, pp. 1364-1373, 2020.
- 5. G. Nayan, R. K. Prasad, Y. G. Praveen Kumar, and M. Z. Kurian, "A review on modified gate diffusion input logic: an approach for area and power efficient digital system design," in *Proceedings of the 2nd International Conference on Emerging Trends in Science & Technologies for Engineering Systems (ICETSE)*, Chickballapur, India, 2019.
- G. R. Mahendra Babu and S. Bhavani, "Primitive cells using gate diffusion input technique: a low power approach," *International Journal of Recent Technology and Engineering*, vol. 8, no. 1S5, pp. 257-260, 2019.
- E. Abiri, A. Darabi, and S. Salem, "Design of multiplevalued logic gates using gate-diffusion input for image processing applications," *Computers & Electrical Engineering*, vol. 69, pp. 142-157, 2018.
- A. Morgenshtein, A. Fish, and I. A. Wagner, "Gate-diffusion input (GDI): a power-efficient method for digital combinatorial circuits," *IEEE Transactions on Very Large Scale Integration* (VLSI) Systems, vol. 10, no. 5, pp. 566-581, 2002.
- 9. M. Hasan, U. K. Saha, A. Sorwar, M. A. Z. Dipto, M. S.

Hossain, and H. U. Zaman, "A novel hybrid full adder based on gate diffusion input technique, transmission gate and static CMOS logic," in *Proceedings of the 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT)*, Kanpur, India, 2019, pp. 1-6.

- N. Kandasamy, N. M. Kumar, N. Telagam, F. Ahmad, and G Mishra, "Analysis of self checking and self resetting logic in CLA and CSA circuits using gate diffusion input technique," in *Proceedings of the 2019 International Conference on Smart Systems and Inventive Technology (ICSSIT)*, Tirunelveli, India, 2019, pp. 1-6.
- M. Shoba and R. Nakkeeran, "Energy and area efficient hierarchy multiplier architecture based on Vedic mathematics and GDI logic," *Engineering Science and Technology, an International Journal*, vol. 20, no. 1, pp. 321-331, 2017.
- 12. A. Garg and G. Joshi, "Gate diffusion input based 4-bit Vedic

multiplier design," *IET Circuits, Devices & Systems*, vol. 12, no. 6, pp. 764-770, 2018.

- A. Morgenshtein, V. Yuzhaninov, A. Kovshilovsky, and A. Fish, "Full-swing gate diffusion input logic: case-study of low-power CLA adder design," *Integration*, vol. 47, no. 1, pp. 62-70, 2014.
- 14. S. S. Venkatachalam, S. S. Arumugam, and S. Sivasubramaniyam, "Design of low power flip flop based on modified GDI primitive cells and its implementation in sequential circuits," *International Journal of Advances in Computer and Electronics Engineering*, vol. 2, no. 5, pp. 22-32, 2017.
- A. Morgenshtein, A. Fish, and I. A. Wagner, "An efficient implementation of D-Flip-Flop using the GDI technique," in *Proceedings of the 2004 IEEE International Symposium on Circuits and Systems (IEEE Cat. No. 04CH37512)*, Vancouver, Canada, 2004.



## **Seungmin Jung**

Seungmin Jung received the B.S., M.S., and Ph.D. degrees in Department of Electronic Engineering from Yonsei University, Seoul, Korea, in 1990, 1992, and 2006, respectively. From 1992 to 1997, he was a senior engineer of the ASIC Division at Samsung Electronics Co. Ltd., where he worked on the design of compiled synchronous and asynchronous memory circuits, and also worked on the design of full-custom and semicustom VLSI. From 1998 to 2006, he joined the faculty of Yong-In Arts & Science University, Yongin, Korea, where he was an assistant professor in the Information and Communication Department. In 2006, he joined the faculty of Hanshin University, Osan, Korea, where he is currently professor in Division of Software Convergence. His research interests include biometric CMOS sensors, SoC (system-on-a-chip) design and mixed-mode VLSI circuit design.