Pyjamask – An authenticated encryption scheme

This blog note is an introduction to Pyjamask, a scheme for authenticated encryption with associated data (AEAD). Pyjamask is a joint work between researchers from PQShield, ANSSI, CyberCrypt, Nanyang Technological University, CryptoExperts, and NTT Secure Platform Laboratories. It is one of the 32 remaining candidates in the second round of NIST’s standardisation process for lightweight cryptography.

LIGHTWEIGHT CRYPTOGRAPHY

Modern computing has drastically evolved in recent years due to the increase and widespread use of the Internet, e-commerce, and cloud services. These systems are continually changing due to requirements from specialised platforms, such as those used by the Internet of things (IoT). From smart cities to connected shoes, more and more small devices are in need of cryptography. For practicality or cost issues, such devices have very limited resources in both computational power or data storage. Standard cryptographic primitives such as the Advanced Encryption Standard (AES) are hence often hard to be deployed without adding extra cost, either by increasing the price of the involved composants or increasing the size of the products. The main problem for cryptography in such a context is to produce efficient solutions that use limited resources without compromising the underlying security. This trending branch in cryptography is called lightweight cryptography.

Alice ecrypts a message using the symmetric key to get a ciphertext, that she sends to Bob. Bob can recover the original message by using the symmetric key.

In particular, one of the two primitives required for lightweight devices is authenticated encryption, a powerful tool that allows communication to be confidential and authenticated while providing integrity of the manipulated data. To produce such a primitive, one of the possible options to use as the main building block is an encryption scheme, more precisely a block cipher.

A block cipher is a function allowing encryption of bulk data in the form of blocks of fixed length. One of the most famous block ciphers is AES, and is used in most of the real world cryptographic solutions. In order to guarantee the security of the secret key used to transform/recover the message, most of the block ciphers rely on two properties formalised by Shannon in 1945: confusion and diffusion. Confusion ensures that the secret key and the encrypted message share no obvious connection, while diffusion ensures that the correlation between the input message and the encrypted one should be as small as possible.

SUBSTITUTION AND PERMUTATION NETWORK

One of the simplest solutions to provide these properties is called the substitution-permutation network (SPN). With an SPN structure, the substitution layer (also called non-linear layer) provides the confusion, while the permutation layer (also called linear layer) will provide the diffusion. Then, each of these layers are repeated a certain number of times (called rounds), until a certain security guarantee is achieved. Our proposal, Pyjamask, is based on this SPN structure.

THE SIDE-CHANNEL THREAT

In the late 1980’s, Kocher introduced the first physical attack against cryptographic hardware such as embedded systems (e.g your bank card). This kind of attack differs from the one in the classical model, namely the black-box model, where the adversary can only know the input and output of the crypto-system to recover its secret key. Through physical access to the device, the adversary can now obtain information on intermediate computations that directly depend on the secret key, and can hence break the underlying crypto-system. In particular, the adversary can exploit side channels of the hardware such as the computation time, the power consumption of the device, or its electro-magnetic emanations during the computation.

When an embedded device such as a smart card or an IoT device performs cryptographic operations, the underlying hardware will consume resources such as power, time, or emit electromagnetic radiations. Since the computations rely on the secret key of the device’s owner, each of these leakages will directly depend on the secret key, allowing a malicious attacker to exploit them and discover the secret information.

These attacks are particularly efficient and disastrous if one can easily gain access to the target device and utilise those side-channels. Famous target devices on which side-channel attacks were successful include credit cards, electronic ID, smartphones, or even famous car providers using wireless keys.

Amongst the existing countermeasures against physical attacks, one of the most widely used is high-order masking. It consists of splitting every sensitive variable in the cryptographic computation into several random values, denoted as masks, so that at the end of the computation the sum of all the masks is equal to the expected result. This is a sound countermeasure as it has been formally proven that the higher the number of masks, the more difficult it is for the attacker to recover the secret key. However, the use of this type of protection entails a considerable efficiency loss, making it often unusable for industrial solutions.

An attacker cannot recover the secret value if he discovers any subset of masked values, except all of them. The cryptographic operations are now evaluated on each of the masked values such that at the end of the computation, the original output is recovered by recombining each output.

Recent works, such as the ones I produced during my PhD, showed that with the use of some implementation strategies and verification tools, one can achieve significant performance gains allowing to reduce the gap between sound theoretical solutions and efficient secure implementations that are deployable on embedded systems.

PYJAMASK, THE BLOCK CIPHER

Pyjamask aims to provide symmetric (authenticated) encryption enjoying fast software implementations with high levels of security against side-channel attacks. To achieve this goal, Pyjamask has been designed to be as lightweight as possible in the presence of high-order masking in software, while still enjoying unmasked and/or hardware implementations with satisfying performances.

In the past years, it was shown that the best performance for high-order masked implementations are obtained through the use of bitslicing. This implementation strategy allows cryptographic operations to be evaluated with bitwise operations. In particular, the non-linear operations (which are the bottleneck in the presence of high-order masking) can be evaluated very efficently thanks to the high level of parallelism induced by this strategy. For this reason, the Pyjamask design team opted for a design that enjoys fast bitslice implementations in the presence of high-order masking. Specifically, we have favoured:

  • a minimal number of 32-AND operations for efficient implementation on 32-bit platforms,
  • a parallelisation degree to address 64-bit platforms and/or processor with vector instructions,
  • a design with reasonable performances for unmasked and/or hardware implementations,
  • and a design that relies on the well-studied SPN architecture (Sbox layer, linear diffusion layer, and bitwise key addition).

Pyjamask comes with two versions: the first is Pyjamask-96, which manipulates blocks of data that are 96 bits in length, and the second, Pyjamask-128, which manipulates blocks of data that are 128 bits in length. Both versions are based on the same building blocks. The following part will go into the more technical details of our proposal. If this is too technical and you are more interested in the results and connection with post-quantum cryptography, you can directly skip to the performance section.

MAIN FUNCTIONS

Substitution Layer

The nonlinear layer is composed of 32 parallel applications of a small substitution-box (Sbox), either a 3-bit or a 4-bit Sbox, which yield two instances of the cipher with either a 96-bit state or a 128-bit state. For each instance, the Sbox has the minimal cost in terms of AND gates, i.e., 3 and 4 respectively. This makes a nonlinear layer that can be evaluated with 3 or 4 bitwise AND operations in total.

S3 = [1, 3, 6, 5, 2, 4, 7, 0] and S4 = [2, 13, 3, 9, 7, 11, 10, 6, 14, 0, 15, 4, 8, 5, 1, 12].

Diffusion Layer

Since linear parts are virtually free in the masking world, the linear layer of the Pyjamask block cipher has been conceived to provide high diffusion by means of 32×32 binary matrices. Different matrices are used for the different 32-bit slices in order to avoid too much regularity. On the other hand, we chose to use circulant matrices to obtain acceptable performances for unmasked and/or hardware implementations.

M0 = cir ( [1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0] )
M1 = cir ( [0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1] )
M2 = cir ( [0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1] )
M3 = cir ( [0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1] )

Key Addition

Bitwise addition of the key state (defined below) to the internal state. For Pyjamask-128, the full key state is XORed to the internal state. For Pyjamask-96, the 3 (out of 4) rows of the key state are XORed to the internal state.

KEY SCHEDULE

The two instances Pyjamask-96 and Pyjamask-128 share the same key schedule: the only difference is the size of the subkeys extracted from key state that are injected into the internal state during the AddRoundKey operations.

In both instances, the secret key consists of 128 bits. It is initially loaded into the 128-bit key state in the same ordering as the internal state. The first round key is obtained from the initial key state. Then each round key is obtained by applying three elementary transformations: MixColumns, MixAndRotateRows, and AddConstant.

PERFORMANCE

One of the most popular architecture for small embedded devices is the ARM architecture (for instance it is present in most of the smart phones). We decided to implement both Pyjamask 96 and 128 on a ARM Cortex M4 for this reason, for which we measured some performance results depending on the side-channel security requirements.

Number of Masks 4 8 16 32 64 128
Pyjamask-96 41 95 249 736 2419 8253
Pyjamask-128 51 119 316 945 3145 10858

Timings in kilocycles on ARM Cortex-M4 to encrypt (same result for decryption).

Number of Masks 4 8 16 32 64 128
Pyjamask-96 1.2 2.2 4.1 8.2 16.2 32.3
Pyjamask-128 1.2 2.2 4.1 8.2 16.2 32.9

RAM in kilobytes on ARM Cortex-M4 to encrypt (same result for decryption).

Number of Masks 4 8 16 32 64 128
Pyjamask-96 3712 5296 5320 2892 2896 2920
Pyjamask-128 4070 5776 5686 3158 3198 3198

Code size in bytes on ARM Cortex-M4 to encrypt (same result for decryption).

With such interesting performance and reasonable security levels, lightweight cryptography might as well be considered to speed up some of the Post-Quantum schemes. In fact, recent works have shown that for some of the candidates in NIST Post-Quantum competition, the efficiency bottleneck is in the symmetric operations. Some of our own researchers at PQShield, Dr. Howe and Dr. Saarinen, showed that by replacing standard primitives with lightweight primitives, post-quantum candidates could enjoy a significant speed-up. When considering the side-channel threat for those post-quantum candidates, using an efficient lightweight primitive such as Pyjamask could have even more impact on those post-quantum primitives.

Additional resources about Pyjamask, including its full specification, the reference implementation, etc., can be found on its dedicated website:

https://pyjamask-cipher.github.io/