A 334 μW 0.158 mm² ASIC for Post-Quantum Key-Encapsulation Mechanism Saber With Low-Latency Striding Toom–Cook Multiplication

Source: IEEE Journal of Solid-State Circuits 2023

Authors: Archisman Ghosh, Jose Maria Bermudo Mera (PQShield), Angshuman Karmakar, Debayan Das, Santosh Ghosh, Ingrid Verbauwhede, Shreyas Sen

Abstract

Lattice-based cryptography is a novel approach to public key cryptography (PKC), of which the mathematical investigation (so far) resists attacks from quantum computers. By choosing a module learning with errors (MLWE) algorithm as the next standard, the National Institute of Standards and Technology (NIST) follows this approach. The multiplication of polynomials is the central bottleneck in the computation of lattice-based cryptography. Because PKC is mostly used to establish common secret keys, the focus is on compact area, power, and energy budget and, to a lesser extent, on throughput or latency. While most other work focuses on optimizing number theoretic transform (NTT)-based multiplications, in this article, we highly optimize a Toom–Cook-based multiplier. We demonstrate that a memory-efficient striding Toom–Cook with lazy interpolation results in a highly compact, low-power implementation, which, on top, enables a very regular memory access scheme. To demonstrate the efficiency, we integrate this multiplier into a Saber post-quantum accelerator, one of the four NIST finalists. Algorithmic innovation to reduce active memory, timely clock gating, and shift-add multiplier has helped to achieve 38% less power than state-of-the-art post-quantum cryptography (PQC) core, 4 × less memory, 36.8% reduction in multiplier energy, and 118 × reduction in active power with respect to state-of-the-art Saber accelerator (not silicon verified). This accelerator consumes 0.158- mm2 active area, which is the lowest reported to date despite the process disadvantages of the state-of-the-art designs.

Preprint

Products

Markets

A 334 μW 0.158 mm² ASIC for Post-Quantum Key-Encapsulation Mechanism Saber With Low-Latency Striding Toom–Cook Multiplication

Abstract