Post-quantum RSA讲解、R编程设计调试、辅导public key、R设计讲解 解析C/C++编程|辅导留学生 Statisti
- 首页 >> OS编程 Post-quantum RSA
This submission is from the following team, listed in alphabetical order:
Auxiliary submitters: There are no auxiliary submitters. The principal submitter is the
team listed above.
Inventors/developers: The inventors/developers of this submission are the same as the
principal submitter. Relevant prior work is credited below where appropriate.
Owner: Same as submitter.
Signature: ×. See also printed version of “Statement by Each Submitter”.
Document generated with the help of pqskeleton version 20171123.
1Contents
1 Introduction 5
2 General algorithm specification (part of 2.B.1) 5
2.1 Parameter space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Secret key and public key . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Signatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 KEM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.5 Public-key encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3 List of parameter sets (part of 2.B.1) 9
3.1 Parameter set encrypt/pqrsa15 . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Parameter set encrypt/pqrsa20 . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3 Parameter set encrypt/pqrsa25 . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.4 Parameter set encrypt/pqrsa30 . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.5 Parameter set kem/pqrsa15 . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.6 Parameter set kem/pqrsa20 . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.7 Parameter set kem/pqrsa25 . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.8 Parameter set kem/pqrsa30 . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.9 Parameter set sign/pqrsa15 . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.10 Parameter set sign/pqrsa20 . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.11 Parameter set sign/pqrsa25 . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.12 Parameter set sign/pqrsa30 . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4 Design rationale (part of 2.B.1) 11
5 Detailed performance analysis (2.B.2) 11
5.1 Description of platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5.2 Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.3 Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
25.4 How parameters affect performance . . . . . . . . . . . . . . . . . . . . . . . 13
5.5 Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
6 Expected strength (2.B.4) in general 14
6.1 Security definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
6.2 Rationale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
7 Expected strength (2.B.4) for each parameter set 14
7.1 Parameter set encrypt/pqrsa15 . . . . . . . . . . . . . . . . . . . . . . . . . 14
7.2 Parameter set encrypt/pqrsa20 . . . . . . . . . . . . . . . . . . . . . . . . . 14
7.3 Parameter set encrypt/pqrsa25 . . . . . . . . . . . . . . . . . . . . . . . . . 14
7.4 Parameter set encrypt/pqrsa30 . . . . . . . . . . . . . . . . . . . . . . . . . 14
7.5 Parameter set kem/pqrsa15 . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
7.6 Parameter set kem/pqrsa20 . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
7.7 Parameter set kem/pqrsa25 . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
7.8 Parameter set kem/pqrsa30 . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
7.9 Parameter set sign/pqrsa15 . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
7.10 Parameter set sign/pqrsa20 . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
7.11 Parameter set sign/pqrsa25 . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
7.12 Parameter set sign/pqrsa30 . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
8 Analysis of known attacks (2.B.5) 15
8.1 Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
8.2 Factorization when factors are small . . . . . . . . . . . . . . . . . . . . . . . 18
8.3 Attacks without factorization . . . . . . . . . . . . . . . . . . . . . . . . . . 19
9 Advantages and limitations (2.B.6) 20
References 20
A Statements 21
A.1 Statement by Each Submitter . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3A.2 Statement by Patent (and Patent Application) Owner(s) . . . . . . . . . . . 25
A.3 Statement by Reference/Optimized Implementations’ Owner(s) . . . . . . . 26
41 Introduction
Join us as we dance madly on the lip of the volcano!
—John Oliver hypothesizing Apple’s view
of the difficulty of securing the iPhone
https://www.youtube.com/watch?v=zsjZ2r9Ygzw
Integer factorization is a security disaster. There is a long list of proposed RSA key sizes that
have been shown vulnerable to attack. And yet RSA remains standardized and remarkably
popular. Users switch to larger RSA key sizes and believe that they will be safe.
Is it clear that quantum attacks should be handled in a different way? More importantly, is
it clear that quantum attacks will be handled in a different way?
Post-quantum RSA (pqRSA) uses extremely large RSA keys to stop Shor’s algorithm. It
uses many relatively small secret primes, and small encryption/verification exponents, so
that computations with such large keys are affordable for the legitimate users. It still uses
secret primes large enough to resist ECM and quantum versions of ECM.
Perhaps there are better quantum algorithms for factorization, especially when the factors
are relatively small. This has not been adequately studied—but how many post-quantum
proposals have been adequately studied? Many other post-quantum proposals are more
efficient than pqRSA—but is efficiency a sign of strength, or a sign of weakness?
It is not difficult to envision many RSA users gradually slouching their way into becoming
pqRSA users. The cryptographic community should not ignore this possibility: it should
figure out whether the possibility is secure.
2 General algorithm specification (part of 2.B.1)
2.1 Parameter space
This pqRSA submission provides signatures, key encapsulation, and public-key encryption.
Each operation has two parameters: K, a power of 2; and B, a positive multiple of 8.
There are actually two different options for public-key encryption:
Use a generic conversion from the key-encapsulation mechanism into a public-key encryption
mechanism. For example, use the KEM to send a 32-byte session key, and
then use the session key with AES-256-GCM to encrypt and authenticate the message.
NIST has indicated that it will apply this conversion automatically.
Use the public-key encryption mechanism specified below. This is more complicated
but saves space.
52.2 Secret key and public key
Alice’s secret key consists of K distinct primes. Each prime is between 2B?1 and 2B, and is
congruent to 5 modulo 6. Specifically, Alice accumulates a list of K primes by repeating the
following steps:
Generate a B-bit integer as a (B/8)-byte random string interpreted in little-endian
form.
Set bits 0 and B ? 1, obtaining an odd integer p between 2B?1 and 2B.
If p mod 3 = 2, p is prime, and p is not in the list, then add p to the list.
If K is a significant fraction of the number of primes congruent to 5 modulo 6 between 2B?1
and 2B, then the “not in the list” condition significantly slows down this procedure. If K is
larger than the number of primes then the procedure does not terminate. For the parameter
sets in this submission, the “in the list” condition has negligible chance of occurring, and
the test can safely be skipped.
There is an extensive literature on primality testing, with various tradeoffs between simplicity,
speed, conjectured error probability, and provability. Users are expected to follow NIST
standards on this topic.
Our reference implementation simply uses GMP’s mpz_probab_prime_p function. It is easy
to artificially construct non-primes p that have a noticeable chance of passing this test, but
a random non-prime p becomes increasingly unlikely to pass this test as B grows. All of the
parameter sets below have B ≥ 512, and we conjecture that the chance of Alice’s key being
invalid is below 2128.
Alice multiplies these K primes to obtain her public key N, an integer between 2K(B1) and
2KB. The number of bytes needed for Alice’s N is called A; i.e., A is the unique integer such
that 256A1 ≤ N < 256A. Note that K(B 1)/8 < A ≤ KB/8.
The secret key is then encoded as a 3K(B/8)-byte string, namely the concatenation of the
following strings:
For each p: B/8 bytes representing p in little-endian form.
For each p: B/8 bytes representing (N/p)?1 mod p in little-endian form. (This is used
inside various standard methods to compute cube roots.)
K(B/8) bytes representing N in little-endian form.
The public key is also encoded as K(B/8) bytes representing N in little-endian form.
62.3 Signatures
Alice signs a message M as follows:
Generate a random 32-byte string R.
Compute Y = HA1(R, M). Here the hash function HA1 is A 1 bytes of output of
SHAKE256. Recall that A is the number of bytes in N.
Compute the integer Y represented by Y in little-endian form.
Compute the cube root X of Y modulo N.
Encode X in little-endian form as a K(B/8)-byte string X.
The signature is X followed by R. The signed message is the signature followed by M.
Bob verifies an alleged signature of a message M0 as follows:
Parse the alleged signature as a K(B/8)-byte string X0 followed by a 32-byte string Compute Y 0 = HA1(R0, M0).
Compute the integer X0 represented by X0 in little-endian form. Fail if X0 ≥ N.
Compute the integer Y 0 represented by Y 0 in little-endian form.
Check whether Y 0 = (X0)3 mod N.
2.4 KEM
Bob exchanges a secret session key with Alice as follows, given Alice’s public key N:
Generate a string of K(B/8) uniform random bytes.
Clear the last K(B/8) ? (A ? 1) bytes, obtaining a string r. Recall that A is the
number of bytes in N.
Compute the session key H32(r), where H32 means 32 bytes of output of SHAKE256.
Compute the integer r represented by r in little-endian form.
Compute C = r3 mod N.
Encode C in little-endian form as a K(B/8)-byte string C.
Send C as a ciphertext.
7Alice decapsulates C0 as follows:
Fail if C0 does not have length K(B/8).
Compute the integer C0 represented by C0 in little-endian form. Fail if C0 ≥ N.
Compute the cube root r0 of C0 modulo N.
Encode r0 in little-endian form as a K(B/8)-byte string r0
Compute the session key H32(r0).
2.5 Public-key encryption
The following encryption mechanism assumes that K(B ? 1) ≥ 776. This implies N ≥ 2776,
so A ≥ 98; recall that A is the number of bytes in N. Define α = A ? 65; then α ≥ 33.
Bob sends a secret `-byte message m to Alice as follows, given Alice’s public key N:
If ` < α: Define x0 as the α-byte string (m, 1, 0, . . . , 0). There are α ? 1 ? copies of
byte 0.
If ` ≥ α: Define m0 as the first α ? 33 bytes of m, and define m1 as the remaining
` (α 33) bytes of m. Generate a uniform random 32-byte string k, and define x0
as the α-byte string (m0, k, 2).
Generate a uniform random 32-byte string r.
Compute the α-byte string x1 = x0⊕Hα(r, 0). Here ⊕ means xor; r, 0 means r followed
by byte 0; and Hα means α bytes of output of SHAKE256.
Compute the 32-byte string x2 = H32(x0, r, 1). Here H32 means 32 bytes of output of
SHAKE256.
Compute the 32-byte string x3 = r ⊕ H32(x1, x2, 2).
Compute the (A ? 1)-byte string x = (x1, x2, x3).
Compute the integer x represented by x in little-endian form.
Compute C = x3 mod N.
Encode C in little-endian form as a K(B/8)-byte string C.
If ` < α: The ciphertext is C.
If ` ≥ α: The ciphertext is C followed by the AES-256-GCM ciphertext for m1 under
key k with nonce 0.
8Alice decrypts by reversing the above steps:
Define C0 as the first K(B/8) bytes of ciphertext. Fail if the ciphertext has fewer than
K(B/8) bytes.
Define C0 as the corresponding integer. Fail if C0 ≥ N.
Compute the cube root x0 of C0 modulo N. Fail if x0 ≥ 256A?1.
Encode x0 in little-endian form as an (A ? 1)-byte string x.
Parse x as (x1, x2, x 00 03) where x1, x2, x 00 0 have α, 32, 32 bytes respectively.
Define r0 as the 32-byte string x03 ⊕ H32(x1, x2, 2). 00
Define x0
0 as the α-byte string x01 ⊕ Hα(r0, 0).
Compute the 32-byte string H32(x0, r 0 0, 1). Fail if this string is not x0
2. If the ciphertext has exactly K(B/8) bytes: Parse x0
0 as a plaintext m0 followed by
byte 1 and some number of copies of byte 0. Fail if this parsing fails.
If the ciphertext has more than K(B/8) bytes: Parse x0, 2) where k0 has 32
bytes and m0
0 has α 33 bytes. Fail if this parsing fails. Use AES-256-GCM to verify
and decrypt the remaining bytes of ciphertext, obtaining m0
1; fail if this fails. Define a
plaintext m0 as (m00 , m0 ). 1
Note that, beyond the usual importance of constant-time computations for security, it is
particularly important to hide the differences between an x0 ≥ 2A?2 failure and an
H32(1, r0 0) failure. 0 , x
3 List of parameter sets (part of 2.B.1)
3.1 Parameter set encrypt/pqrsa15
PKE with K = 512 and B = 512.
3.2 Parameter set encrypt/pqrsa20
PKE with K = 16384 and B = 512.
3.3 Parameter set encrypt/pqrsa25
PKE with K = 262144 and B = 1024.
93.4 Parameter set encrypt/pqrsa30
PKE with K = 8388608 and B = 1024.
3.5 Parameter set kem/pqrsa15
KEM with K = 512 and B = 512.
3.6 Parameter set kem/pqrsa20
KEM with K = 16384 and B = 512.
3.7 Parameter set kem/pqrsa25
KEM with K = 262144 and B = 1024.
3.8 Parameter set kem/pqrsa30
KEM with K = 8388608 and B = 1024.
3.9 Parameter set sign/pqrsa15
Signatures with K = 512 and B = 512.
3.10 Parameter set sign/pqrsa20
Signatures with K = 16384 and B = 512.
3.11 Parameter set sign/pqrsa25
Signatures with K = 262144 and B = 1024.
3.12 Parameter set sign/pqrsa30
Signatures with K = 8388608 and B = 1024.
104 Design rationale (part of 2.B.1)
Shoup’s “Simple RSA”, also known as “RSA-KEM”, takes r as a uniform random integer
modulo N. We instead take uniform random r from a power-of-2 range, simplifying the
generation process, and more specifically a power-of-256 range, further simplifying the generation
process. The size of the range is at least N/256, so an algorithm to compute our r
given rE mod N has probability at least 1/256 of computing Shoup’s r given rE mod N.
We reuse the same range for x in public-key encryption, and for Y in signatures.
In the original RSA paper [9], the encryption/verification exponent E was a random number
with as many bits as N. Rabin in [8] suggested instead using a small constant E, and said
that E = 2 is “several hundred times faster.” A complication of E = 2 is that each square
has 2K different square roots mod N; E = 3 is about twice as slow for encryption but avoids
this complication. The subsequent literature has focused mainly on E = 3 and E = 65537.
There are attacks that compute various types of structured Eth roots more quickly than
factoring. Some of these attacks are specific to very small E, and historically this has led
to some preference for E = 65537 over E = 3. We instead treat the attacks as a reason to
never take Eth powers of structured inputs. There is then no known problem taking E = 3.
Shoup has also pointed out that the connection between “RSA-OAEP+” and computing
Eth roots is very tight for E = 3, but becomes looser as E grows. See [11]. This leads to
the following conclusions:
If computing 3rd roots is harder than computing 65537th roots, then breaking RSAOAEP+
for E = 3 is harder than breaking RSA-OAEP+ for E = 65537.
If computing 3rd roots is the same hardness as computing 65537th roots, then breaking
RSA-OAEP+ for E = 3 is at least as hard as breaking RSA-OAEP+ for E = 65537.
If computing 3rd roots is easier than computing 65537th roots, then breaking RSAOAEP+
for E = 3 could still be harder than breaking RSA-OAEP+ for E = 65537.
The public-key encryption system described above is intended to be an example of RSAOAEP+,
although the details need to be checked carefully.
5 Detailed performance analysis (2.B.2)
5.1 Description of platform
The following measurements were collected on one (otherwise idle) core of a computer named
samba. The CPU on samba is an Intel Xeon E3-1220 v5 (Skylake) running at 3 GHz. Turbo
Boost is disabled. samba has 64GB of RAM and runs Ubuntu 16.04, with gcc 5.4.0.
11NIST says that the “NIST PQC Reference Platform” is “an Intel x64 running Windows
or Linux and supporting the GCC compiler.” samba is an Intel x64 running Linux and
supporting the GCC compiler. Beware, however, that different Intel CPUs have different
cycle counts.
5.2 Time
The following measurements are for kem. encrypt and sign have essentially the same performance
as kem. There is a slight slowdown for the extra hashing in encrypt, and a measurable
slowdown for long messages.
Measurements were collected by the program shown in Figure 1, compiled with
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv. Various measurements
were also checked (with no obvious discrepancies) against results of
./do-part from supercop-20170904, with the compiler list reduced to just
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv, with reduced
values of LOOPS and TIMINGS, and with timeouts (killafter) extended.
pqrsa15: About 3.5 billion cycles (3483292516) for key generation; 17 million cycles
(17492210 17410534 17382984 17361047 17358893) for encapsulation; and 122 million cycles
(122127482 122462936 122079316 122135561 122018320) for decapsulation.
Compared to the 32768-byte key size, these are around 110000 cycles per byte, 530 cycles
per byte, and 3700 cycles per byte respectively. These figures are useful in understanding
how well pqRSA scales to larger key sizes.
pqrsa20: About 120 billion cycles (119047642299) for key generation; 1.1 billion cycles
(1071561548 1077606577 1076427117 1076293353 1076391885) for encapsulation; and 6.1
billion cycles (6123286512 6116529230 6119549109 6118574953 6118670863) for decapsulation.
Compared to the 1048576-byte key size, these are around 110000 cycles per byte, 1000 cycles
per byte, and 5800 cycles per byte respectively.
pqrsa25: About 18 trillion cycles (18177137014865) for key generation; 46 billion cycles
(46248174238 46222427697 46191747583 46242537978 46191225425) for encapsulation; and
520 billion cycles (519581069107 519569878218 519608774814 519612644254 519558611912)
for decapsulation.
Compared to the 33554432-byte key size, these are around 540000 cycles per byte, 1400
cycles per byte, and 15000 cycles per byte respectively. Note that primes are bigger here,
1024 bits instead of 512 bits.
pqrsa30: About 590 quadrillion cycles (586593568853135) for key generation; 1.8 quadrillion
cycles (1821539719905 1822281625179 1822120731196 1819092856762 1821360421361) for
encapsulation; and 22 quadrillion cycles (22294539298463 22296758019988 22296538833629
22300640944170 22296717126117) for decapsulation.
12Compared to the 1073741824-byte key size, these are around 550000 cycles per byte, 1700
cycles per byte, and 21000 cycles per byte respectively.
5.3 Space
Sizes are straightforwardly calculated from parameters (and confirmed in various experiments).
Specifically, keys are 215 bytes for pqrsa15, 220 bytes for pqrsa20, 225 bytes for
pqrsa25, and 230 bytes for pqrsa30. KEM ciphertexts have the same size as keys. PKE ciphertexts
have the same size as keys if the transmitted messages are short enough. Signatures
are 32 bytes longer.
5.4 How parameters affect performance
Encryption and signature verification involve a small number of modular multiplications.
Decryption and signature generation are slower, and key generation is even slower, but if B
is chosen sensibly then these slowdowns are by factors logarithmic in the number of bits in
the modulus.
For comparison, Shor’s algorithm involves a quantum modular exponentiation, which is a
large number of modular multiplications. See Section 8. The gap in costs grows with the
number of bits in the modulus. The growth is essentially linear, giving the legitimate user a
rapidly increasing advantage over the attacker as the user’s computer power increases.
5.5 Optimizations
Compared to the KEM, the PKE is more complicated but allows compression of encrypted
messages. If messages are close to the key size, or longer, then almost the entire traffic is
used for message contents.
Similarly, a slightly more complicated scheme for public-key signatures allows pqRSA signed
messages to be compressed. This submission skips this option for simplicity.
Checking primality of many independent uniform random integers is faster than checking
primality of each integer separately. This speedup is not in the reference software that we
are submitting, but it is already implemented in our experimental software.
See [3] for further discussion of various speedup techniques.
136 Expected strength (2.B.4) in general
6.1 Security definitions
The KEM and PKE are designed for IND-CCA2 security. The signature system is designed
for EUF-CMA security. See Section 7 for quantitative estimates of the security of specific
parameter sets.
6.2 Rationale
See Section 8 for an analysis of known attacks. This analysis also presents the rationale for
these security estimates.
7 Expected strength (2.B.4) for each parameter set
7.1 Parameter set encrypt/pqrsa15
Scaled-down version provided as a target for cryptanalysis.
7.2 Parameter set encrypt/pqrsa20
Scaled-down version provided as a target for cryptanalysis.
7.3 Parameter set encrypt/pqrsa25
Scaled-down version provided as a target for cryptanalysis.
7.4 Parameter set encrypt/pqrsa30
Category 2, assuming depth limit 264.
7.5 Parameter set kem/pqrsa15
Scaled-down version provided as a target for cryptanalysis.
147.6 Parameter set kem/pqrsa20
Scaled-down version provided as a target for cryptanalysis.
7.7 Parameter set kem/pqrsa25
Scaled-down version provided as a target for cryptanalysis.
7.8 Parameter set kem/pqrsa30
Category 2, assuming depth limit 264.
7.9 Parameter set sign/pqrsa15
Scaled-down version provided as a target for cryptanalysis.
7.10 Parameter set sign/pqrsa20
Scaled-down version provided as a target for cryptanalysis.
7.11 Parameter set sign/pqrsa25
Scaled-down version provided as a target for cryptanalysis.
7.12 Parameter set sign/pqrsa30
Category 2, assuming depth limit 264.
8 Analysis of known attacks (2.B.5)
8.1 Factorization
2048-bit RSA keys are widely standardized and deployed. This key size is typically estimated
to provide more than 2100 security against the number-field sieve, and even higher security
against other known non-quantum methods for integer factorization. However, (1) precise
15estimates vary; (2) it is not clear whether the security level is maintained against multiuser
attacks; and, most importantly, (3) post-quantum cryptography also considers quantum
algorithms. In particular, Shor’s quantum algorithm is believed to pose a serious threat to
2048-bit RSA keys.
pqRSA uses much larger RSA keys, slowing down Shor’s algorithm so as to reach any desired
security level. The number-field sieve scales much more poorly than Shor’s algorithm, so we
focus on the performance of Shor’s algorithm.
Naive analysis. The main bottleneck in Shor’s algorithm is computing an n-bit quantity
ae mod N. Here N is the public key; n is the number of bits in N; a is an integer, which
can safely be taken to be small; and e is a superposition of 2n-bit integers.
e Shor computes a mod N as follows: compute a, a2 mod N, a4 mod N, a8 mod N, etc.;
multiply these consecutively into a superposition of products, conditioned upon the bits
of e. For reversibility, each multiplication is followed by a corresponding multiplication
by 1/a mod N, 1/a2 mod N, 1/a4 mod N, 1/a8 mod N, etc. Overall there are about 8n
multiplications here, half of which are in superposition.
Each step, multiplying two n-bit integers modulo N, becomes increasingly expensive as n
grows. H¨aner–Roetteler–Svore [7] report approximately 32n2 lg n Toffoli gates (not counting
CNOT gates) for a reversible n-bit modular multiplication, and thus 64n3 lg n Toffoli gates
overall for Shor’s algorithm, using a total of 2n+2 qubits. For n = 233 this is approximately
2110 Toffoli gates using approximately 234 qubits.
If each Toffoli gate has comparable cost to 236 non-quantum gates then the total cost is
comparable to 2146 non-quantum gates, i.e., the cost of finding a SHA3-256 collision, the
definition of NIST’s “Category 2”.
Higher security: communication costs. The naive analysis above counts only the cost
of computation and not the cost of communication. This is important because the algorithm
is constantly communicating data across large distances.
Communication of a qubit—or merely a bit—costs energy proportional to the distance communicated.
Concretely, Intel states that at 22nm the energy cost of simply moving 8 bytes
of data is 11.20 pJ “per 5 mm” moved, and that this is “more difficult to scale down” than
computation cost; see [6, page 9]. Replacing wires with a different technology might save a
constant factor but does not eliminate the scaling difficulties: e.g., lasers spread out linearly
over distance. Even with quantum teleportation, there is a cost of the initial setup, namely
pushing EPR pairs from one place to another; the cost per bit transmitted again increases
linearly with the distance.
1/2+o(1) Storing 2n qubits, or merely 2n bits, involves distances at least n on any realistic
two-dimensional architecture. Architectures are two-dimensional because this allows energy
to arrive (and depart) through the third dimension:
16 Billions of transistors are spread across a two-dimensional chip, with only a few layers
in the third dimension, because otherwise nobody knows how to get the energy in and
out.
At a larger scale, nodes in a computer cluster are spread across two dimensions, with
only a few layers in the third dimension, because otherwise nobody knows how to get
the energy in and out.
There is a massive literature on real two-dimensional computations, and the costs of ac- 1/2 cessing n bits of memory are consistently at least n (times the feature sizes etc., which
are independent of n). The occasional papers on three-dimensional computations (e.g.,
[10]) do not seriously address the energy issues, and we have not found literature report- 1/2 ing experiments that can be plausibly interpreted as beating n . Similarly, every serious
quantum-computing proposal is limited to two dimensions.
Higher security: limits on latency. The naive analysis also makes the absurd assumption
that attackers can wait for roughly 2100 serial qubit operations.
NIST sensibly suggests that submissions consider attacks “restricted to a fixed running time,
or circuit depth.” NIST observes that 240 sequential qubit operations is “the approximate
number of gates that presently envisioned quantum computing architectures are expected
to serially perform in a year”, and that 264 sequential bit operations is “the approximate
number of gates that current classical computing architectures can perform serially in a
decade”.
Integer multiplication might seem trivial to parallelize with enough hardware: all n bits of
one input can be multiplied by all n bits of the other input in parallel; adding the resulting
n2 bits also allows massive parallelism; final carries can also be done in parallel, or skipped
with redundant representations of integers. However, this parallelism severely increases
communication costs. Specifically, handling n2 bits in parallel means that distances increase
1/2 1/2 to n, losing another factor n in communication costs beyond the factor n discussed
above.
Furthermore, the higher-level loop in Shor’s algorithm is quite difficult to parallelize. One
can use many more qubits to parallelize the multiplications by a, a2 mod N, a4 mod N,
a8 mod N, etc., but this does nothing to parallelize the initial computation of a, a2 mod N,
a4 mod N, a8 mod N, etc.
Knowing the factors of N allows parallel exponentiation, as pointed out by von zur Gathen
[13] and much later Zeugmann [14], but the problem facing the attacker is to figure out
those factors in the first place. Adleman and Kompella [1] suggested a parallel modularexponentiation
algorithm that is essentially a sieving-type discrete-logarithm algorithm run
in parallel, but this involves an intolerable amount of computation once n is moderately large.
Bernstein and Sorenson [4] slightly reduced the latency of modular exponentiation using a
polynomial number of processors in a simplified model of computation, but this incurs huge
costs in memory consumption (and in the total number of operations), presumably increasing
17communication latency in realistic models of computation.
For comparison, NIST appears to estimate that checking a guess for an AES-128 key takes
about 215 bit operations. These operations allow considerable parallelization, so a keysearch
core carrying out a sequence of 248 key guesses will fit comfortably within the 264
latency limit. A cluster of 280 such cores will find the AES-128 key within the same latency
limit. Distributing the target to 280 cores in the first place is a nontrivial communication
problem but will still fit within the same latency limit. Similar comments apply to NIST’s
“Category 2”, finding a SHA-256 collision: parallel collision search [12] involves negligible
communication costs even with massive parallelization.
The same reasonable latency limit does not appear to allow a search for an AES-256 key
with noticeable success probability: for high success probability one needs more than 2200
cores, but there is not enough time to communicate the target to so many cores. Does
NIST’s “Category 5” implicitly assume a higher latency limit? Or does it implicitly rely
upon unrealistic assumptions about communication costs? The lack of clear definitions
of NIST’s model of computation makes it difficult to evaluate whether pqRSA fits
Categories 3–5 with gigabyte keys. The security estimates above have thus been limited to
Category 2.
Lower security: improved algorithms. Attackers will take every possible opportunity
to save logarithmic factors and constant factors: for example, there are various techniques to
use somewhat shorter exponents in Shor’s algorithm. More importantly, the fastest known
n-bit multiplication methods take only n(log n)1+o(1) bit operations.
On the other hand, all integer-multiplication methods on realistic architectures are constrained
by the Brent–Kung area-time theorem [5], which states that a chip
√
containing A
parallel cores cannot compute n-bit integer multiplication in time below Θ(n/ A). Asymptotically,
all known factorization algorithms cost energy at least n2.5+o(1) and have latency
1.5+o(1) at least n .
Concretely, can an attacker break a gigabyte key within a reasonable latency limit (say a
year), while at the same time having the energy costs of non-quantum computation, nonquantum
communication, quantum computation, and quantum communication all below the
energy cost of finding collisions in SHA3-2
This submission is from the following team, listed in alphabetical order:
Auxiliary submitters: There are no auxiliary submitters. The principal submitter is the
team listed above.
Inventors/developers: The inventors/developers of this submission are the same as the
principal submitter. Relevant prior work is credited below where appropriate.
Owner: Same as submitter.
Signature: ×. See also printed version of “Statement by Each Submitter”.
Document generated with the help of pqskeleton version 20171123.
1Contents
1 Introduction 5
2 General algorithm specification (part of 2.B.1) 5
2.1 Parameter space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Secret key and public key . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Signatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 KEM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.5 Public-key encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3 List of parameter sets (part of 2.B.1) 9
3.1 Parameter set encrypt/pqrsa15 . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Parameter set encrypt/pqrsa20 . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3 Parameter set encrypt/pqrsa25 . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.4 Parameter set encrypt/pqrsa30 . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.5 Parameter set kem/pqrsa15 . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.6 Parameter set kem/pqrsa20 . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.7 Parameter set kem/pqrsa25 . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.8 Parameter set kem/pqrsa30 . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.9 Parameter set sign/pqrsa15 . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.10 Parameter set sign/pqrsa20 . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.11 Parameter set sign/pqrsa25 . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.12 Parameter set sign/pqrsa30 . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4 Design rationale (part of 2.B.1) 11
5 Detailed performance analysis (2.B.2) 11
5.1 Description of platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5.2 Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.3 Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
25.4 How parameters affect performance . . . . . . . . . . . . . . . . . . . . . . . 13
5.5 Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
6 Expected strength (2.B.4) in general 14
6.1 Security definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
6.2 Rationale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
7 Expected strength (2.B.4) for each parameter set 14
7.1 Parameter set encrypt/pqrsa15 . . . . . . . . . . . . . . . . . . . . . . . . . 14
7.2 Parameter set encrypt/pqrsa20 . . . . . . . . . . . . . . . . . . . . . . . . . 14
7.3 Parameter set encrypt/pqrsa25 . . . . . . . . . . . . . . . . . . . . . . . . . 14
7.4 Parameter set encrypt/pqrsa30 . . . . . . . . . . . . . . . . . . . . . . . . . 14
7.5 Parameter set kem/pqrsa15 . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
7.6 Parameter set kem/pqrsa20 . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
7.7 Parameter set kem/pqrsa25 . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
7.8 Parameter set kem/pqrsa30 . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
7.9 Parameter set sign/pqrsa15 . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
7.10 Parameter set sign/pqrsa20 . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
7.11 Parameter set sign/pqrsa25 . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
7.12 Parameter set sign/pqrsa30 . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
8 Analysis of known attacks (2.B.5) 15
8.1 Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
8.2 Factorization when factors are small . . . . . . . . . . . . . . . . . . . . . . . 18
8.3 Attacks without factorization . . . . . . . . . . . . . . . . . . . . . . . . . . 19
9 Advantages and limitations (2.B.6) 20
References 20
A Statements 21
A.1 Statement by Each Submitter . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3A.2 Statement by Patent (and Patent Application) Owner(s) . . . . . . . . . . . 25
A.3 Statement by Reference/Optimized Implementations’ Owner(s) . . . . . . . 26
41 Introduction
Join us as we dance madly on the lip of the volcano!
—John Oliver hypothesizing Apple’s view
of the difficulty of securing the iPhone
https://www.youtube.com/watch?v=zsjZ2r9Ygzw
Integer factorization is a security disaster. There is a long list of proposed RSA key sizes that
have been shown vulnerable to attack. And yet RSA remains standardized and remarkably
popular. Users switch to larger RSA key sizes and believe that they will be safe.
Is it clear that quantum attacks should be handled in a different way? More importantly, is
it clear that quantum attacks will be handled in a different way?
Post-quantum RSA (pqRSA) uses extremely large RSA keys to stop Shor’s algorithm. It
uses many relatively small secret primes, and small encryption/verification exponents, so
that computations with such large keys are affordable for the legitimate users. It still uses
secret primes large enough to resist ECM and quantum versions of ECM.
Perhaps there are better quantum algorithms for factorization, especially when the factors
are relatively small. This has not been adequately studied—but how many post-quantum
proposals have been adequately studied? Many other post-quantum proposals are more
efficient than pqRSA—but is efficiency a sign of strength, or a sign of weakness?
It is not difficult to envision many RSA users gradually slouching their way into becoming
pqRSA users. The cryptographic community should not ignore this possibility: it should
figure out whether the possibility is secure.
2 General algorithm specification (part of 2.B.1)
2.1 Parameter space
This pqRSA submission provides signatures, key encapsulation, and public-key encryption.
Each operation has two parameters: K, a power of 2; and B, a positive multiple of 8.
There are actually two different options for public-key encryption:
Use a generic conversion from the key-encapsulation mechanism into a public-key encryption
mechanism. For example, use the KEM to send a 32-byte session key, and
then use the session key with AES-256-GCM to encrypt and authenticate the message.
NIST has indicated that it will apply this conversion automatically.
Use the public-key encryption mechanism specified below. This is more complicated
but saves space.
52.2 Secret key and public key
Alice’s secret key consists of K distinct primes. Each prime is between 2B?1 and 2B, and is
congruent to 5 modulo 6. Specifically, Alice accumulates a list of K primes by repeating the
following steps:
Generate a B-bit integer as a (B/8)-byte random string interpreted in little-endian
form.
Set bits 0 and B ? 1, obtaining an odd integer p between 2B?1 and 2B.
If p mod 3 = 2, p is prime, and p is not in the list, then add p to the list.
If K is a significant fraction of the number of primes congruent to 5 modulo 6 between 2B?1
and 2B, then the “not in the list” condition significantly slows down this procedure. If K is
larger than the number of primes then the procedure does not terminate. For the parameter
sets in this submission, the “in the list” condition has negligible chance of occurring, and
the test can safely be skipped.
There is an extensive literature on primality testing, with various tradeoffs between simplicity,
speed, conjectured error probability, and provability. Users are expected to follow NIST
standards on this topic.
Our reference implementation simply uses GMP’s mpz_probab_prime_p function. It is easy
to artificially construct non-primes p that have a noticeable chance of passing this test, but
a random non-prime p becomes increasingly unlikely to pass this test as B grows. All of the
parameter sets below have B ≥ 512, and we conjecture that the chance of Alice’s key being
invalid is below 2128.
Alice multiplies these K primes to obtain her public key N, an integer between 2K(B1) and
2KB. The number of bytes needed for Alice’s N is called A; i.e., A is the unique integer such
that 256A1 ≤ N < 256A. Note that K(B 1)/8 < A ≤ KB/8.
The secret key is then encoded as a 3K(B/8)-byte string, namely the concatenation of the
following strings:
For each p: B/8 bytes representing p in little-endian form.
For each p: B/8 bytes representing (N/p)?1 mod p in little-endian form. (This is used
inside various standard methods to compute cube roots.)
K(B/8) bytes representing N in little-endian form.
The public key is also encoded as K(B/8) bytes representing N in little-endian form.
62.3 Signatures
Alice signs a message M as follows:
Generate a random 32-byte string R.
Compute Y = HA1(R, M). Here the hash function HA1 is A 1 bytes of output of
SHAKE256. Recall that A is the number of bytes in N.
Compute the integer Y represented by Y in little-endian form.
Compute the cube root X of Y modulo N.
Encode X in little-endian form as a K(B/8)-byte string X.
The signature is X followed by R. The signed message is the signature followed by M.
Bob verifies an alleged signature of a message M0 as follows:
Parse the alleged signature as a K(B/8)-byte string X0 followed by a 32-byte string Compute Y 0 = HA1(R0, M0).
Compute the integer X0 represented by X0 in little-endian form. Fail if X0 ≥ N.
Compute the integer Y 0 represented by Y 0 in little-endian form.
Check whether Y 0 = (X0)3 mod N.
2.4 KEM
Bob exchanges a secret session key with Alice as follows, given Alice’s public key N:
Generate a string of K(B/8) uniform random bytes.
Clear the last K(B/8) ? (A ? 1) bytes, obtaining a string r. Recall that A is the
number of bytes in N.
Compute the session key H32(r), where H32 means 32 bytes of output of SHAKE256.
Compute the integer r represented by r in little-endian form.
Compute C = r3 mod N.
Encode C in little-endian form as a K(B/8)-byte string C.
Send C as a ciphertext.
7Alice decapsulates C0 as follows:
Fail if C0 does not have length K(B/8).
Compute the integer C0 represented by C0 in little-endian form. Fail if C0 ≥ N.
Compute the cube root r0 of C0 modulo N.
Encode r0 in little-endian form as a K(B/8)-byte string r0
Compute the session key H32(r0).
2.5 Public-key encryption
The following encryption mechanism assumes that K(B ? 1) ≥ 776. This implies N ≥ 2776,
so A ≥ 98; recall that A is the number of bytes in N. Define α = A ? 65; then α ≥ 33.
Bob sends a secret `-byte message m to Alice as follows, given Alice’s public key N:
If ` < α: Define x0 as the α-byte string (m, 1, 0, . . . , 0). There are α ? 1 ? copies of
byte 0.
If ` ≥ α: Define m0 as the first α ? 33 bytes of m, and define m1 as the remaining
` (α 33) bytes of m. Generate a uniform random 32-byte string k, and define x0
as the α-byte string (m0, k, 2).
Generate a uniform random 32-byte string r.
Compute the α-byte string x1 = x0⊕Hα(r, 0). Here ⊕ means xor; r, 0 means r followed
by byte 0; and Hα means α bytes of output of SHAKE256.
Compute the 32-byte string x2 = H32(x0, r, 1). Here H32 means 32 bytes of output of
SHAKE256.
Compute the 32-byte string x3 = r ⊕ H32(x1, x2, 2).
Compute the (A ? 1)-byte string x = (x1, x2, x3).
Compute the integer x represented by x in little-endian form.
Compute C = x3 mod N.
Encode C in little-endian form as a K(B/8)-byte string C.
If ` < α: The ciphertext is C.
If ` ≥ α: The ciphertext is C followed by the AES-256-GCM ciphertext for m1 under
key k with nonce 0.
8Alice decrypts by reversing the above steps:
Define C0 as the first K(B/8) bytes of ciphertext. Fail if the ciphertext has fewer than
K(B/8) bytes.
Define C0 as the corresponding integer. Fail if C0 ≥ N.
Compute the cube root x0 of C0 modulo N. Fail if x0 ≥ 256A?1.
Encode x0 in little-endian form as an (A ? 1)-byte string x.
Parse x as (x1, x2, x 00 03) where x1, x2, x 00 0 have α, 32, 32 bytes respectively.
Define r0 as the 32-byte string x03 ⊕ H32(x1, x2, 2). 00
Define x0
0 as the α-byte string x01 ⊕ Hα(r0, 0).
Compute the 32-byte string H32(x0, r 0 0, 1). Fail if this string is not x0
2. If the ciphertext has exactly K(B/8) bytes: Parse x0
0 as a plaintext m0 followed by
byte 1 and some number of copies of byte 0. Fail if this parsing fails.
If the ciphertext has more than K(B/8) bytes: Parse x0, 2) where k0 has 32
bytes and m0
0 has α 33 bytes. Fail if this parsing fails. Use AES-256-GCM to verify
and decrypt the remaining bytes of ciphertext, obtaining m0
1; fail if this fails. Define a
plaintext m0 as (m00 , m0 ). 1
Note that, beyond the usual importance of constant-time computations for security, it is
particularly important to hide the differences between an x0 ≥ 2A?2 failure and an
H32(1, r0 0) failure. 0 , x
3 List of parameter sets (part of 2.B.1)
3.1 Parameter set encrypt/pqrsa15
PKE with K = 512 and B = 512.
3.2 Parameter set encrypt/pqrsa20
PKE with K = 16384 and B = 512.
3.3 Parameter set encrypt/pqrsa25
PKE with K = 262144 and B = 1024.
93.4 Parameter set encrypt/pqrsa30
PKE with K = 8388608 and B = 1024.
3.5 Parameter set kem/pqrsa15
KEM with K = 512 and B = 512.
3.6 Parameter set kem/pqrsa20
KEM with K = 16384 and B = 512.
3.7 Parameter set kem/pqrsa25
KEM with K = 262144 and B = 1024.
3.8 Parameter set kem/pqrsa30
KEM with K = 8388608 and B = 1024.
3.9 Parameter set sign/pqrsa15
Signatures with K = 512 and B = 512.
3.10 Parameter set sign/pqrsa20
Signatures with K = 16384 and B = 512.
3.11 Parameter set sign/pqrsa25
Signatures with K = 262144 and B = 1024.
3.12 Parameter set sign/pqrsa30
Signatures with K = 8388608 and B = 1024.
104 Design rationale (part of 2.B.1)
Shoup’s “Simple RSA”, also known as “RSA-KEM”, takes r as a uniform random integer
modulo N. We instead take uniform random r from a power-of-2 range, simplifying the
generation process, and more specifically a power-of-256 range, further simplifying the generation
process. The size of the range is at least N/256, so an algorithm to compute our r
given rE mod N has probability at least 1/256 of computing Shoup’s r given rE mod N.
We reuse the same range for x in public-key encryption, and for Y in signatures.
In the original RSA paper [9], the encryption/verification exponent E was a random number
with as many bits as N. Rabin in [8] suggested instead using a small constant E, and said
that E = 2 is “several hundred times faster.” A complication of E = 2 is that each square
has 2K different square roots mod N; E = 3 is about twice as slow for encryption but avoids
this complication. The subsequent literature has focused mainly on E = 3 and E = 65537.
There are attacks that compute various types of structured Eth roots more quickly than
factoring. Some of these attacks are specific to very small E, and historically this has led
to some preference for E = 65537 over E = 3. We instead treat the attacks as a reason to
never take Eth powers of structured inputs. There is then no known problem taking E = 3.
Shoup has also pointed out that the connection between “RSA-OAEP+” and computing
Eth roots is very tight for E = 3, but becomes looser as E grows. See [11]. This leads to
the following conclusions:
If computing 3rd roots is harder than computing 65537th roots, then breaking RSAOAEP+
for E = 3 is harder than breaking RSA-OAEP+ for E = 65537.
If computing 3rd roots is the same hardness as computing 65537th roots, then breaking
RSA-OAEP+ for E = 3 is at least as hard as breaking RSA-OAEP+ for E = 65537.
If computing 3rd roots is easier than computing 65537th roots, then breaking RSAOAEP+
for E = 3 could still be harder than breaking RSA-OAEP+ for E = 65537.
The public-key encryption system described above is intended to be an example of RSAOAEP+,
although the details need to be checked carefully.
5 Detailed performance analysis (2.B.2)
5.1 Description of platform
The following measurements were collected on one (otherwise idle) core of a computer named
samba. The CPU on samba is an Intel Xeon E3-1220 v5 (Skylake) running at 3 GHz. Turbo
Boost is disabled. samba has 64GB of RAM and runs Ubuntu 16.04, with gcc 5.4.0.
11NIST says that the “NIST PQC Reference Platform” is “an Intel x64 running Windows
or Linux and supporting the GCC compiler.” samba is an Intel x64 running Linux and
supporting the GCC compiler. Beware, however, that different Intel CPUs have different
cycle counts.
5.2 Time
The following measurements are for kem. encrypt and sign have essentially the same performance
as kem. There is a slight slowdown for the extra hashing in encrypt, and a measurable
slowdown for long messages.
Measurements were collected by the program shown in Figure 1, compiled with
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv. Various measurements
were also checked (with no obvious discrepancies) against results of
./do-part from supercop-20170904, with the compiler list reduced to just
gcc -march=native -mtune=native -O3 -fomit-frame-pointer -fwrapv, with reduced
values of LOOPS and TIMINGS, and with timeouts (killafter) extended.
pqrsa15: About 3.5 billion cycles (3483292516) for key generation; 17 million cycles
(17492210 17410534 17382984 17361047 17358893) for encapsulation; and 122 million cycles
(122127482 122462936 122079316 122135561 122018320) for decapsulation.
Compared to the 32768-byte key size, these are around 110000 cycles per byte, 530 cycles
per byte, and 3700 cycles per byte respectively. These figures are useful in understanding
how well pqRSA scales to larger key sizes.
pqrsa20: About 120 billion cycles (119047642299) for key generation; 1.1 billion cycles
(1071561548 1077606577 1076427117 1076293353 1076391885) for encapsulation; and 6.1
billion cycles (6123286512 6116529230 6119549109 6118574953 6118670863) for decapsulation.
Compared to the 1048576-byte key size, these are around 110000 cycles per byte, 1000 cycles
per byte, and 5800 cycles per byte respectively.
pqrsa25: About 18 trillion cycles (18177137014865) for key generation; 46 billion cycles
(46248174238 46222427697 46191747583 46242537978 46191225425) for encapsulation; and
520 billion cycles (519581069107 519569878218 519608774814 519612644254 519558611912)
for decapsulation.
Compared to the 33554432-byte key size, these are around 540000 cycles per byte, 1400
cycles per byte, and 15000 cycles per byte respectively. Note that primes are bigger here,
1024 bits instead of 512 bits.
pqrsa30: About 590 quadrillion cycles (586593568853135) for key generation; 1.8 quadrillion
cycles (1821539719905 1822281625179 1822120731196 1819092856762 1821360421361) for
encapsulation; and 22 quadrillion cycles (22294539298463 22296758019988 22296538833629
22300640944170 22296717126117) for decapsulation.
12Compared to the 1073741824-byte key size, these are around 550000 cycles per byte, 1700
cycles per byte, and 21000 cycles per byte respectively.
5.3 Space
Sizes are straightforwardly calculated from parameters (and confirmed in various experiments).
Specifically, keys are 215 bytes for pqrsa15, 220 bytes for pqrsa20, 225 bytes for
pqrsa25, and 230 bytes for pqrsa30. KEM ciphertexts have the same size as keys. PKE ciphertexts
have the same size as keys if the transmitted messages are short enough. Signatures
are 32 bytes longer.
5.4 How parameters affect performance
Encryption and signature verification involve a small number of modular multiplications.
Decryption and signature generation are slower, and key generation is even slower, but if B
is chosen sensibly then these slowdowns are by factors logarithmic in the number of bits in
the modulus.
For comparison, Shor’s algorithm involves a quantum modular exponentiation, which is a
large number of modular multiplications. See Section 8. The gap in costs grows with the
number of bits in the modulus. The growth is essentially linear, giving the legitimate user a
rapidly increasing advantage over the attacker as the user’s computer power increases.
5.5 Optimizations
Compared to the KEM, the PKE is more complicated but allows compression of encrypted
messages. If messages are close to the key size, or longer, then almost the entire traffic is
used for message contents.
Similarly, a slightly more complicated scheme for public-key signatures allows pqRSA signed
messages to be compressed. This submission skips this option for simplicity.
Checking primality of many independent uniform random integers is faster than checking
primality of each integer separately. This speedup is not in the reference software that we
are submitting, but it is already implemented in our experimental software.
See [3] for further discussion of various speedup techniques.
136 Expected strength (2.B.4) in general
6.1 Security definitions
The KEM and PKE are designed for IND-CCA2 security. The signature system is designed
for EUF-CMA security. See Section 7 for quantitative estimates of the security of specific
parameter sets.
6.2 Rationale
See Section 8 for an analysis of known attacks. This analysis also presents the rationale for
these security estimates.
7 Expected strength (2.B.4) for each parameter set
7.1 Parameter set encrypt/pqrsa15
Scaled-down version provided as a target for cryptanalysis.
7.2 Parameter set encrypt/pqrsa20
Scaled-down version provided as a target for cryptanalysis.
7.3 Parameter set encrypt/pqrsa25
Scaled-down version provided as a target for cryptanalysis.
7.4 Parameter set encrypt/pqrsa30
Category 2, assuming depth limit 264.
7.5 Parameter set kem/pqrsa15
Scaled-down version provided as a target for cryptanalysis.
147.6 Parameter set kem/pqrsa20
Scaled-down version provided as a target for cryptanalysis.
7.7 Parameter set kem/pqrsa25
Scaled-down version provided as a target for cryptanalysis.
7.8 Parameter set kem/pqrsa30
Category 2, assuming depth limit 264.
7.9 Parameter set sign/pqrsa15
Scaled-down version provided as a target for cryptanalysis.
7.10 Parameter set sign/pqrsa20
Scaled-down version provided as a target for cryptanalysis.
7.11 Parameter set sign/pqrsa25
Scaled-down version provided as a target for cryptanalysis.
7.12 Parameter set sign/pqrsa30
Category 2, assuming depth limit 264.
8 Analysis of known attacks (2.B.5)
8.1 Factorization
2048-bit RSA keys are widely standardized and deployed. This key size is typically estimated
to provide more than 2100 security against the number-field sieve, and even higher security
against other known non-quantum methods for integer factorization. However, (1) precise
15estimates vary; (2) it is not clear whether the security level is maintained against multiuser
attacks; and, most importantly, (3) post-quantum cryptography also considers quantum
algorithms. In particular, Shor’s quantum algorithm is believed to pose a serious threat to
2048-bit RSA keys.
pqRSA uses much larger RSA keys, slowing down Shor’s algorithm so as to reach any desired
security level. The number-field sieve scales much more poorly than Shor’s algorithm, so we
focus on the performance of Shor’s algorithm.
Naive analysis. The main bottleneck in Shor’s algorithm is computing an n-bit quantity
ae mod N. Here N is the public key; n is the number of bits in N; a is an integer, which
can safely be taken to be small; and e is a superposition of 2n-bit integers.
e Shor computes a mod N as follows: compute a, a2 mod N, a4 mod N, a8 mod N, etc.;
multiply these consecutively into a superposition of products, conditioned upon the bits
of e. For reversibility, each multiplication is followed by a corresponding multiplication
by 1/a mod N, 1/a2 mod N, 1/a4 mod N, 1/a8 mod N, etc. Overall there are about 8n
multiplications here, half of which are in superposition.
Each step, multiplying two n-bit integers modulo N, becomes increasingly expensive as n
grows. H¨aner–Roetteler–Svore [7] report approximately 32n2 lg n Toffoli gates (not counting
CNOT gates) for a reversible n-bit modular multiplication, and thus 64n3 lg n Toffoli gates
overall for Shor’s algorithm, using a total of 2n+2 qubits. For n = 233 this is approximately
2110 Toffoli gates using approximately 234 qubits.
If each Toffoli gate has comparable cost to 236 non-quantum gates then the total cost is
comparable to 2146 non-quantum gates, i.e., the cost of finding a SHA3-256 collision, the
definition of NIST’s “Category 2”.
Higher security: communication costs. The naive analysis above counts only the cost
of computation and not the cost of communication. This is important because the algorithm
is constantly communicating data across large distances.
Communication of a qubit—or merely a bit—costs energy proportional to the distance communicated.
Concretely, Intel states that at 22nm the energy cost of simply moving 8 bytes
of data is 11.20 pJ “per 5 mm” moved, and that this is “more difficult to scale down” than
computation cost; see [6, page 9]. Replacing wires with a different technology might save a
constant factor but does not eliminate the scaling difficulties: e.g., lasers spread out linearly
over distance. Even with quantum teleportation, there is a cost of the initial setup, namely
pushing EPR pairs from one place to another; the cost per bit transmitted again increases
linearly with the distance.
1/2+o(1) Storing 2n qubits, or merely 2n bits, involves distances at least n on any realistic
two-dimensional architecture. Architectures are two-dimensional because this allows energy
to arrive (and depart) through the third dimension:
16 Billions of transistors are spread across a two-dimensional chip, with only a few layers
in the third dimension, because otherwise nobody knows how to get the energy in and
out.
At a larger scale, nodes in a computer cluster are spread across two dimensions, with
only a few layers in the third dimension, because otherwise nobody knows how to get
the energy in and out.
There is a massive literature on real two-dimensional computations, and the costs of ac- 1/2 cessing n bits of memory are consistently at least n (times the feature sizes etc., which
are independent of n). The occasional papers on three-dimensional computations (e.g.,
[10]) do not seriously address the energy issues, and we have not found literature report- 1/2 ing experiments that can be plausibly interpreted as beating n . Similarly, every serious
quantum-computing proposal is limited to two dimensions.
Higher security: limits on latency. The naive analysis also makes the absurd assumption
that attackers can wait for roughly 2100 serial qubit operations.
NIST sensibly suggests that submissions consider attacks “restricted to a fixed running time,
or circuit depth.” NIST observes that 240 sequential qubit operations is “the approximate
number of gates that presently envisioned quantum computing architectures are expected
to serially perform in a year”, and that 264 sequential bit operations is “the approximate
number of gates that current classical computing architectures can perform serially in a
decade”.
Integer multiplication might seem trivial to parallelize with enough hardware: all n bits of
one input can be multiplied by all n bits of the other input in parallel; adding the resulting
n2 bits also allows massive parallelism; final carries can also be done in parallel, or skipped
with redundant representations of integers. However, this parallelism severely increases
communication costs. Specifically, handling n2 bits in parallel means that distances increase
1/2 1/2 to n, losing another factor n in communication costs beyond the factor n discussed
above.
Furthermore, the higher-level loop in Shor’s algorithm is quite difficult to parallelize. One
can use many more qubits to parallelize the multiplications by a, a2 mod N, a4 mod N,
a8 mod N, etc., but this does nothing to parallelize the initial computation of a, a2 mod N,
a4 mod N, a8 mod N, etc.
Knowing the factors of N allows parallel exponentiation, as pointed out by von zur Gathen
[13] and much later Zeugmann [14], but the problem facing the attacker is to figure out
those factors in the first place. Adleman and Kompella [1] suggested a parallel modularexponentiation
algorithm that is essentially a sieving-type discrete-logarithm algorithm run
in parallel, but this involves an intolerable amount of computation once n is moderately large.
Bernstein and Sorenson [4] slightly reduced the latency of modular exponentiation using a
polynomial number of processors in a simplified model of computation, but this incurs huge
costs in memory consumption (and in the total number of operations), presumably increasing
17communication latency in realistic models of computation.
For comparison, NIST appears to estimate that checking a guess for an AES-128 key takes
about 215 bit operations. These operations allow considerable parallelization, so a keysearch
core carrying out a sequence of 248 key guesses will fit comfortably within the 264
latency limit. A cluster of 280 such cores will find the AES-128 key within the same latency
limit. Distributing the target to 280 cores in the first place is a nontrivial communication
problem but will still fit within the same latency limit. Similar comments apply to NIST’s
“Category 2”, finding a SHA-256 collision: parallel collision search [12] involves negligible
communication costs even with massive parallelization.
The same reasonable latency limit does not appear to allow a search for an AES-256 key
with noticeable success probability: for high success probability one needs more than 2200
cores, but there is not enough time to communicate the target to so many cores. Does
NIST’s “Category 5” implicitly assume a higher latency limit? Or does it implicitly rely
upon unrealistic assumptions about communication costs? The lack of clear definitions
of NIST’s model of computation makes it difficult to evaluate whether pqRSA fits
Categories 3–5 with gigabyte keys. The security estimates above have thus been limited to
Category 2.
Lower security: improved algorithms. Attackers will take every possible opportunity
to save logarithmic factors and constant factors: for example, there are various techniques to
use somewhat shorter exponents in Shor’s algorithm. More importantly, the fastest known
n-bit multiplication methods take only n(log n)1+o(1) bit operations.
On the other hand, all integer-multiplication methods on realistic architectures are constrained
by the Brent–Kung area-time theorem [5], which states that a chip
√
containing A
parallel cores cannot compute n-bit integer multiplication in time below Θ(n/ A). Asymptotically,
all known factorization algorithms cost energy at least n2.5+o(1) and have latency
1.5+o(1) at least n .
Concretely, can an attacker break a gigabyte key within a reasonable latency limit (say a
year), while at the same time having the energy costs of non-quantum computation, nonquantum
communication, quantum computation, and quantum communication all below the
energy cost of finding collisions in SHA3-2