Abstract
Lattice-based cryptography is a novel approach to public key cryptography (PKC), of which the mathematical investigation (so far) resists attacks from quantum computers. By choosing a module learning with errors (MLWE) algorithm as the next standard, the National Institute of Standards and Technology (NIST) follows this approach. The multiplication of polynomials is the central bottleneck in the computation of lattice-based cryptography. Because PKC is mostly used to establish common secret keys, the focus is on compact area, power, and energy budget and, to a lesser extent, on throughput or latency. While most other work focuses on optimizing number theoretic transform (NTT)-based multiplications, in this article, we highly optimize a Toom–Cook-based multiplier. We demonstrate that a memory-efficient striding Toom–Cook with lazy interpolation results in a highly compact, low-power implementation, which, on top, enables a very regular memory access scheme. To demonstrate the efficiency, we integrate this multiplier into a Saber post-quantum accelerator, one of the four NIST finalists. Algorithmic innovation to reduce active memory, timely clock gating, and shift-add multiplier has helped to achieve 38% less power than state-of-the-art post-quantum cryptography (PQC) core, 4 × less memory, 36.8% reduction in multiplier energy, and 118 × reduction in active power with respect to state-of-the-art Saber accelerator (not silicon verified). This accelerator consumes 0.158- mm2 active area, which is the lowest reported to date despite the process disadvantages of the state-of-the-art designs.