What is ‘compute-in-memory’ and why is it important for AI?

If you are running AI workloads, here is something that might surprise you. Your processors are wasting more energy shuffling data around than actually doing the calculations you care about. This inefficiency is becoming a serious limit for the next generation of artificial intelligence systems. As neural networks grow to billions of parameters, traditional von Neumann architectures are hitting physical barriers.

This article explains what compute-in-memory (CIM) technology is and how it works. We will examine how current implementations are already delivering significantly better efficiency improvements compared to conventional processors. We will also explore why this new approach could change AI computing.

Challenges with traditional computers

Traditional computers keep computational units and memory systems separate. They constantly exchange data through energy-intensive transfers. Early proposals like Terasys, IRAM, and FlexRAM emerged in the 1990s. However, these initial attempts had major limitations. The CMOS technology at the time wasn’t advanced enough. Application demands were also different.

The traditional von Neumann architecture (Figure 1a) maintains a strict separation between the central processing unit and memory. This approach requires constant data transfers across a bandwidth-limited bus. This separation creates the “memory wall” problem, which particularly hurts AI workloads.

Figure 1. Evolution of computing architectures from (a) traditional von Neumann with separated CPU and memory, through (b) near-memory computing, to true compute-in-memory approaches using (c) SRAM-based and (d) eNVM-based implementations. (Image: IEEE)

Understanding compute-in-memory

CIM, also known as processing-in-memory, is very different from the traditional von Neumann architecture that has dominated computing for decades. It performs computations directly within or very close to where the data is stored.

Near-memory computing (Figure 1b) brings memory closer to processing units. True in-memory computing approaches (Figures 1c and 1d) work differently. They embed computational capabilities directly within memory arrays. This integration of storage and logic units reduces data movement. This decreases both latency and energy consumption, which are the two major bottlenecks in modern AI applications.

The rapid growth of big data and machine learning applications has driven the rise of CIM. These applications demand high computational efficiency.

Technical implementation approaches

CIM can be implemented using various memory technologies, each offering distinct advantages for different AI workloads.

Static Random-Access Memory (SRAM) has emerged as the most popular choice for CIM implementations. Its speed, robustness, and compatibility with existing fabrication processes make it ideal for AI accelerators. Researchers have developed modified SRAM bitcell structures, including 8T, 9T, and 10T configurations, along with auxiliary peripheral circuits to enhance performance.

The comprehensive nature of SRAM-based CIM development is illustrated in Figure 2. The figure shows how circuit-level innovations enable sophisticated computing functions and real-world AI applications. At the circuit level (Figure 2a), SRAM-based CIM requires specialized bitcell structures and peripheral circuits. These include analog-to-digital converters, time control systems, and redundant reference columns. These circuit innovations enable a range of functional capabilities (Figure 2b).

Figure 2. Complete framework of SRAM-based compute-in-memory showing the progression from (a) circuit-level implementations with bitcell structures and peripheral circuits, through (b) functional capabilities including digital and mixed-signal operations, to (c) real-world AI applications like CNN, AES encryption, and classification algorithms. (Image: Researching)

Digital operations include Boolean logic and content-addressable memory. Mixed-signal operations support multiply-accumulate and the sum of absolute difference computations that are fundamental to neural networks.

As demonstrated in the application layer (Figure 2c), these technical capabilities translate into accelerated AI algorithms. These include convolutional neural networks for image classification, AES encryption for security applications, and k-nearest neighbor algorithms for pattern recognition. However, SRAM faces challenges, including low density and high leakage current, that limit its scalability for large AI processors.

Dynamic Random-Access Memory (DRAM), while less common for direct in-memory computation due to its refreshing requirements, plays a central role in Near-Memory Processing architectures. Technologies such as High-Bandwidth Memory and Hybrid Memory Cube utilize 3D stacking to reduce the physical distance between computation and memory.

Resistive Random-Access Memory (ReRAM) is the most promising new technology for CIM. This non-volatile memory has several advantages. It offers high density and works well with back-end fabrication processes. It is also very suitable for matrix-vector multiplication operations. These operations are fundamental to neural networks.

CIM implementations also vary in their computational domains. Analog CIM uses the physical properties of memory cells to perform operations. It works through current summation and charge collection. This offers higher weight density but can have noise issues. Digital CIM provides high accuracy with one device per bit. Mixed-signal approaches try to balance the benefits of both analog and digital methods.

Transformative benefits for AI applications

The practical benefits of CIM for AI are both measurable and compelling, as demonstrated in Figure 3. The energy efficiency comparison reveals the advantages of CIM architectures across different technology nodes. While traditional CPUs achieve only 0.01-0.1 TOPS/W (tera operations per second per watt), digital in-memory architectures deliver 1-100 TOPS/W, representing 100 to 1000 times better energy efficiency. Advanced CIM approaches like silicon photonics, and optical systems push efficiency even higher.

Figure 3. Energy efficiency comparison across technology nodes (left) and energy consumption breakdown (right) for different processor types. (Image: ResearchGate)

The energy breakdown analysis (Figure 3, right) reveals why CIM is effective. Traditional CPUs are dominated by memory access energy (blue bars), while CIM architectures reduce this bottleneck by performing computation directly in memory. This fundamental advantage translates to measurable performance improvements across AI applications.

The real-world impact of CIM on transformer and LLM acceleration is demonstrated by recent implementations shown in Table 1. Various CIM architectures have achieved performance improvements with speedups ranging from 2.3x to 200x compared to NVIDIA GPUs. Energy efficiency gains reach up to 1894x. These results span multiple transformer models, including BERT, GPT, and RoBERTa, demonstrating CIM’s broad applicability to modern language models.

Table 1. Comparison of various CIM architectures for transformer and LLM benchmarks, showing substantial speedup and efficiency improvements over NVIDIA GPUs across different models and memory technologies. (Image: arXiv)

Summary

As we enter the post-Moore’s Law era, CIM represents a significant architectural shift that addresses key challenges in AI computing. The technology is advancing rapidly, with SRAM-based solutions approaching commercial viability and emerging non-volatile memory solutions showing potential for future applications. As AI continues to expand across technology applications, CIM could become an important enabling technology for more efficient AI deployment.

References

Memory Is All You Need: An Overview of Compute-in-Memory Architectures for Accelerating Large Language Model Inference, arXiv
Energy-efficient computing-in-memory architecture for AI processor – device, circuit, architecture perspective, Science China
A review on SRAM-based computing in-memory: Circuits, functions, and applications, Researching
An Overview of Processing-in-Memory Circuits for Artificial Intelligence and Machine Learning, IEEE
Analog, In-memory Compute Architectures for Artificial Intelligence, ResearchGate
In-Memory Computing for Machine Learning and Deep Learning, IEEE
Emerging In-memory Computing for Neural Networks, Fraunhofer

What is ‘compute-in-memory’ and why is it important for AI?

Challenges with traditional computers

Understanding compute-in-memory

Technical implementation approaches

Transformative benefits for AI applications

Summary

References

Related EE World content

EE WORLD ONLINE NETWORK

EE WORLD ONLINE

Challenges with traditional computers

Understanding compute-in-memory

Technical implementation approaches

Transformative benefits for AI applications

Summary

References

Related EE World content

You Might Also Like

Footer

EE WORLD ONLINE NETWORK

EE WORLD ONLINE