Skip to content
BY 4.0 license Open Access Published by De Gruyter July 1, 2022

A deterministic algorithm for the discrete logarithm problem in a semigroup

  • Simran Tinani EMAIL logo and Joachim Rosenthal

Abstract

The discrete logarithm problem (DLP) in a finite group is the basis for many protocols in cryptography. The best general algorithms which solve this problem have a time complexity of O ( N log N ) and a space complexity of O ( N ) , where N is the order of the group. (If N is unknown, a simple modification would achieve a time complexity of O ( N ( log N ) 2 ) .) These algorithms require the inversion of some group elements or rely on finding collisions and the existence of inverses, and thus do not adapt to work in the general semigroup setting. For semigroups, probabilistic algorithms with similar time complexity have been proposed. The main result of this article is a deterministic algorithm for solving the DLP in a semigroup. Specifically, let x be an element in a semigroup having finite order N x . The article provides an algorithm, which, given any element y x , provides all natural numbers m with x m = y , and has time complexity O ( N x ( log N x ) 2 ) steps. The article also gives an analysis of the success rates of the existing probabilistic algorithms, which were so far only conjectured or stated loosely.

MSC 2010: 20M13; 68Q25; 94A60

1 Introduction

Let G be a group and assume x , y G are two elements of the group. We refer to x as the base element. The discrete logarithm problem (DLP) asks for the computation of an integer m Z (assuming such integers exist) such that x m = y . The DLP plays an important role in a multitude of algebraic and number theoretic cryptographic systems. Its use was introduced in the Diffie–Hellman protocol for public key exchange [1] and has since seen a tremendous amount of development, generalizations, and extensions [2]. Many modern-day systems for public key exchange use the DLP in a suitable group. The most commonly used groups have been the multiplicative group of finite fields and the group of points on an elliptic curve. The DLP in Jacobians of hyperelliptic curves and more general abelian varieties has also been studied extensively [3].

In this article, we compute complexities using group multiplications as one fundamental step. Thus, an exponentiation x e is performed in O ( log e ) steps. We will use the fact that for two lists of length n in which a match exists, a match can be found in O ( n log n ) steps using standard sorting and searching algorithms (for details, the interested reader may refer to ref. [4]). For a general finite group of order N , there exist algorithms that solve the DLP in O ( N log N ) steps. Such algorithms are said to produce a square root attack. The most well-known examples are Shank’s baby step-giant step algorithm [5] and the Pollard-Rho algorithm [6]. Note that Shank’s algorithm is a deterministic algorithm having time complexity O ( N log N ) and space complexity O ( N ) . In contrast, Pollard’s algorithm is a probabilistic algorithm having time complexity O ( N log N ) group multiplications and space complexity O ( 1 ) . If N is unknown, a simple modification of these algorithms would achieve a time complexity of O ( N ( log N ) 2 ) .

Elliptic curve groups have been widely implemented in practice since for a carefully selected elliptic curve group, the best known classical algorithm for solving DLP has running time O ( N log N ) , where N is the group order. This is in contrast to many other finite groups such as the multiplicative group of a finite field and the group of invertible matrices over a finite field where algorithms with subexponential running time are known [7].

In cryptography the Diffie–Hellman protocol using a finite group has been generalized to situations where the underlying problem is a DLP in a semigroup or even to situations where a semigroup acts on a set [8,9]. The interested reader will find more material in a recent survey by Goel et al. [10].

It is naturally interesting to ask whether the DLP also has a square root attack in more generalized structures such as semigroups. Here, we define a semigroup as any set of elements with an associative binary operation. Since the best algorithms for the DLP all make use of the existence of inverses, it is unclear whether they can be generalized to a semigroup. However, when a special type of semigroup element, called a torsion element, is used as the base, it turns out that the DLP is reducible in polynomial time to the DLP in a finite group. A torsion element is one whose powers eventually repeat to form a cycle, and will be defined more precisely in Section 2. This section also elaborates more on why the standard collision-based algorithms are not directly adaptable to the semigroup case. A semigroup in which every element is torsion is called a torsion semigroup.

The DLP in semigroups with a torsion base element, in a classical setting, was first discussed by Monico [11] in 2002, and later in a article by Banin and Tsaban [12] in 2016. While the discussion in the present article is entirely on classical algorithms, it is also worth mentioning the paper [13], where the authors independently provide a quantum algorithm that solves the DLP in a torsion semigroup.

Both the algorithm of Monico and the one of Banin and Tsaban are probabilistic and might fail with low probability. Furthermore, some of their methods are heuristic, dependent on an oracle or some additional assumption, and their success rates and expected number of steps are either conjectured or stated loosely. It is therefore of interest to come up with an algorithm which deterministically computes the discrete logarithm in a semigroup. In this regard, we like to make some analogy to the problem of determining if an integer is a prime number, a problem of great importance in cryptography. Nowadays, in practice the algorithm of Miller [14] and Rabin [15] has been implemented for many years. Still it was a great result when Agrawal et al. [16] came up with a deterministic polynomial time algorithm to achieve this goal.

A key step in finding the discrete logarithm in a semigroup is computing the cycle length of an element. Both the algorithms of refs [12] and [11] rely on computing some multiple of the cycle length, and then removing “extra” factors by taking greatest common divisors (gcd’s) until the cycle length is obtained. Once the cycle length value is obtained, the discrete logarithm may easily be computed with a few more simple steps. While Monico does not provide further elaboration on how this is done, the paper by Banin and Tsaban bridges this knowledge gap by showing how the problem is reduced to a DLP in a group once the cycle length and start values are known. Denote by N x the order of x (formally defined in Definition 4). The complexity of the algorithm in ref. [12] is O ( N x ( log N x ) 2 log log N x ) , and that of the one in ref. [11] is O ( N x ( log N x ) 2 ) . While both of the existing methods seem to succeed with high probability for practical values, we show that the process of taking successive gcd’s/factors is unnecessary, and that one can deterministically find the cycle length. The main contribution of this article will be a deterministic algorithm for computing the discrete logarithm of an element y in some semigroup S with respect to some torsion base element x S . The time complexity of our algorithm is O ( N x ( log N x ) 2 ) .

The article is structured as follows: After providing preliminaries and basic definitions in Section 2, we will analyse in Section 3 the success rates and expected number of steps involved in the probabilistic algorithms for cycle length by Banin and Tsaban (Algorithm 1) and Monico (Algorithm 3.1).

In Section 4, which is the main section of this article, we provide a deterministic algorithm to calculate the cycle length L x of a torsion element x of a semigroup and thus to also solve the DLP, without the use of an oracle. This algorithm has complexity O ( N x ( log N x ) 2 ) . For completeness, we will also demonstrate the use of Pohlig–Hellman algorithm [17] for a semigroup.

2 Preliminaries

A semigroup S is a set together with an associative binary operation. Like in group theory where a torsion group consists of elements of finite order only we define:

Definition 1

(Torsion element). Let S be a semigroup. An element x S is called a torsion element if the sub-semigroup x { x k k N } generated by x is finite. S is called a torsion semigroup if every x S is a torsion element.

Throughout the article the following definitions will be assumed:

Definition 2

(Cycle start). Let x S . The cycle start s x of x is defined as the smallest positive integer such that x s x = x b for some b N , b > s x .

Definition 3

(Cycle length). Let x S . The cycle length L x of x is defined as the smallest positive integer such that x s x + L x = x s x .

Definition 4

(Element order) Let x S . With notation as above, we define the order N x of x as the cardinality of the sub-semigroup x . Note that N x = s x + L x 1 .

Definition 5

(Semigroup DLP). Let S be a semigroup and x S . The semigroup DLP is defined as follows. Given y x { x k k N } , find all m N such that x m = y .

We state below a key result first proved in ref. [12].

Lemma 1

[12] Let S be a semigroup and x S be an element with cycle start s x . The set of powers G x = { x s x + k k 0 } of x forms a finite cyclic group. The identity element of G x is given by x t L x , where t is the minimum positive integer such that x t L x G x .

The following result is stated in ref. [11] in a slightly different formulation. We provide an equivalent proof based on the group structure of G x .

Lemma 2

[11] Let x S have cycle start s x and cycle length L x . For all integers n , m s x , we have x m = x n n m mod L x .

Proof

We can assume without loss of generality that n m , and so we can write n = m + k L x + u , with k 0 and 0 u < L x . First suppose that n m mod L x , i.e. u = 0 . Since m , n s x , we have x n = x m + k L x = x m .

Conversely, if x n = x m , write n 1 = n s x 0 , and m 1 = m s x 0 . We have

x s x + m 1 = x s x + n 1 = x s x + m 1 + k L x + u = x s x + m 1 + u .

Now, without loss of generality, m 1 s x , because if not, one can always increment m 1 and n 1 by multiples of L x until this happens. So, we can assume that x m 1 lies in G x and is thus invertible. We multiply by the inverse on both sides to finally obtain

x s x = x s x + u .

Thus, we must have u = 0 or n m mod L x , as required.□

Remark 1

It becomes clear from the above discussion that the standard collision-based algorithms for order and discrete log computations in a group do not adapt directly to a general semigroup. Collision-based algorithms for the computation of the order N of a group element x (for instance, see ref. [18]) are based on the principle that whenever N can be expressed in the form N = A B for non-negative integers A and B , the collision x A = x B always occurs. However, this principle does not work in a semigroup, where there are two independent components of the order. More specifically, for a semigroup element x with cycle length L x and cycle start s x , whenever L x may be expressed in the form A B for non-negative integers A and B , the equality x A = x B holds if and only if A , B s x . As an example, consider a semigroup element x with cycle length L x = 12 and cycle start s x = 5 . Then, L x = 15 3 , but x 15 x 3 . Thus without prior knowledge of the cycle start, the semigroup order N x or cycle length L x cannot directly be found using the same collision-based algorithms for groups.

Similarly, collision-based algorithms fail for discrete log computations in a semigroup. As an example, consider a semigroup element x with cycle length L x = 15 and cycle start s x = 10 , and suppose that the discrete log of y = x 5 is to be found. Then y x 6 = x 11 = x 26 is obtained as a collision. However, unlike in the group case, the conclusion y = x 26 6 = x 20 is wrong since x 5 x 20 . This happens because even though x is torsion and forms a cycle of powers, it is not invertible.

This concludes the prerequisite knowledge on torsion elements in semigroups. In the next section, we study the existing probabilistic algorithms for cycle lengths and analyse their assumptions, working, and complexities.

3 Existing probabilistic algorithms

3.1 Banin and Tsaban’s algorithm

In this section, we study the probabilistic algorithm described in ref. [12] for computing the cycle length of a torsion element in a semigroup. While the authors of the original article describe their theory only for torsion semigroups, it will become clear that the same discussion holds true for any semigroup when the base element chosen is torsion.

Let S be a semigroup and x be a torsion element of S . Let s x denote the cycle start of x and L x its cycle length. Then, recall from Lemma 1 that G x { x s x , x s x + 1 , , x s x + L x 1 } is a cyclic group, and that it has order L x . The authors of ref. [12] assume the availability of a “Discrete Logarithm Oracle” for the group G x , which returns values log x h for h G x . They state that these values need not be smaller than the group order but are polynomial in the size of G x and the element x . The representation of the identity in G x is unknown, and a method to compute inverses is not available.

The authors claim that the well-known algorithms for discrete logarithm computations in groups do not explicitly require inverses, or can easily be modified to work without the use of inverses. While it is true that these algorithms make use of mainly the existence of inverses rather than their explicit computation, we believe that the fact that easy modification is possible is not immediate without some justification. In fact, it will become clear in the later sections that the modified baby-step-giant-step algorithm devised by Monico [11] (and also the deterministic algorithm presented in Section 4) is a crucial and non-trivial part of any such modification.

We make the following observation from the proof of Lemma 1 found in ref. [12]. For any k 0 , denote by v k the smallest positive integer such that

v k L x 2 s x + k .

We then have x v k L x s x k G x and

(1) x s x + k x v k L x s x k = x v k L x = x t L x ,

so the inverse of the element x s x + k of G x is given by x v k ( L x ) s x k . In particular, the computation of inverses requires prior knowledge of the cycle start. As will be explained below, the cycle start may be computed only once the value of the cycle length is known, using a binary search. This explains why the authors insist that their discrete logarithm oracle does not need to use the computation of inverses.

Below, we describe Algorithm 1, which is the algorithm suggested in ref. [12] to compute the order of the group G x , i.e. the cycle length L x of x .

Algorithm 1: Banin–Tsaban algorithm for cycle length
Input A semigroup S and a torsion element x S ; a DLP oracle for groups
Output The cycle length L x of x
1: Initialize i , j , g , L x 1 , N s x + L x . Fix bounds r > 1 , s > 1 .
2: while j < s
1. Fix a random z { M / 2 , , M } and set h = x z .
2. while i < r
(a) Choose a random number k i > 0 .
(b) Use the DLP oracle to compute k i = log h ( h k i ) .
(c) Set g gcd j i ( k j k j ) = gcd ( gcd j < i ( k j k j ) , k i k i ) .
(d) Set i i + 1 .
3. end while
4. Set L x lcm ( L x , g ) , j j + 1 .
6: Return L x .

We first note that the authors state complexities in terms of L x , which are valid when the bound N for N x is known. If the algorithm fails for a value of N , the authors suggest to double N and try again. In this case, which we will assume from now on, we assert that the complexities need to be taken in terms of N x instead of L x . The oracle may be assumed to have the standard complexity of O ( N x log N x ) steps for discrete logarithm calculations. Step (2.2(c)) takes O ( log ( max j i ( k j k i ) ) = O ( log N x ) integer operations by the assumption on the oracle, which does not contribute to the total complexity. Thus, the total complexity of Step (2.2) comes from the oracle alone and is O ( N x log N x ) . Now, the authors of ref. [12] remark that r and s can be taken to satisfy r = O ( 1 ) and s = O ( log log N x ) . Thus, the total complexity is O ( log N x ) times the complexity of Algorithm 1, and thus O ( log log ( N x ) log N x ) times the complexity of Step (2.2). Therefore, we obtain the total complexity of O ( log log N x ( log N x ) 2 N x ) .

Finally, in Algorithm 2, we present the application of the binary search method to find the cycle start once L x is known. This algorithm is formulated as below for this purpose in ref. [12], though the idea to use a binary search is also originally mentioned in ref. [11].

Algorithm 2: Calculating cycle start (binary search)
Input A semigroup element x with cycle length L x
Output Cycle start s x of x
1: Initialize s x 1
2: while x s x + L x x s x do
s x 2 s x
3: end while
4: Set a s x / 2
5: while a s x 2
c ( a + s x ) / 2
if x c + L x x c then
a c
else
s x c
6: end while

Lemma 3

Let N x be the order of the element x . Then Algorithm 2 requires

O ( ( log N x ) 2 )

steps.

Proof

Each of Steps (2) and (5) of Algorithm 2 involves O ( log N x ) iterations. Each iteration, in turn, requires O ( log N x ) semigroup multiplications and one comparison. The total complexity is thus O ( ( log N x ) 2 ) .□

3.2 Monico’s algorithm

In his PhD thesis [11], Chris Monico provides a probabilistic algorithm (described below as Algorithm 3.1) that calculates the cycle length of an element in a finite ring of order N . This algorithm makes use of the multiplicative semigroup structure of the finite ring and of the availability of the explicit bound N for every cycle length, and is in fact applicable to any semigroup where such a bound N is available. In this subsection, we analyse this algorithm, provide a more concrete bound on its success rate, and compute its complexity in terms of N . We will discuss this algorithm in terms of torsion semigroups, as opposed to finite rings.

We first note that if L x > m and the table in Step (2) has repeated entries x q + i 1 m = x q + i 2 m , then numbers b 1 and b 2 may not exist below m . In this case, the algorithm needs to be modified to take g ( i 1 i 2 ) m . However, whenever this case does not arise, it can be shown that Steps (3) and (4) are always successful in finding a collision.

We further remark that in Step (6), the list of divisors of g is kept fixed, while g is updated to g / d whenever the condition is satisfied. In the subsequent steps, non-divisors of g / d can be immediately discarded. However, the end result depends on the order in which divisors are tested, which the algorithm does not mention explicitly. However, we note that it is, in fact, possible to restrict the testing to only the prime power divisors of g below B , and with this setting, the optimal performance is obtained by taking divisors in decreasing order. We will assume this set-up for the rest of the analysis. For completeness, we restate Algorithm 3.1 with the above clarifications in Algorithm 3.2.

Algorithm 3.1: Monico’s baby step-giant step for cycle length
Input A finite semigroup S with S = N and an element x S
Output The cycle length L x of x
1: Set m = N . Choose a prime q > N .
2: For 0 i m , compute and store in a table the pairs ( i ; x q + i m ) .
Sort the table by the second component.
3: Find the least positive integer b 1 such that x q + b 1 is in the table: x q + b 1 = x q + a 1 m . (Note: 0 < b 1 < m .)
4: Find the least positive integer b 2 such that x 2 q + b 2 is in the table: x 2 q + b 2 = x q + a 2 m . (Again, 0 < b 2 < m .)
5: Compute g = gcd ( a 1 m b 1 , a 2 m b 2 q ) .
6: For each divisor d of g below some bound B , do the following.
If x N + g / d = x N :
set g g / d ;
7: Output L x = g and stop.
Algorithm 3.2: Restated: Monico’s baby step-giant step for cycle length
Input A finite semigroup S with S = N and an element x S
Output The cycle length L x of x
1: Set m = N . Choose a prime q > N .
2: For 0 i m , compute and store in a table the pairs ( i ; x q + i m ) .
Sort the table by the second component. If a collision x q + i 1 m = x q + i 2 m occurs, set g = ( i 1 i 2 ) m and go to Step (6).
3: Find the least positive integer b 1 such that x q + b 1 is in the table: x q + b 1 = x q + a 1 m . (Note: 0 < b 1 < m .)
4: Find the least positive integer b 2 such that x 2 q + b 2 is in the table: x 2 q + b 2 = x q + a 2 m . (Again, 0 < b 2 < m .)
5: Compute g = gcd ( a 1 m b 1 , a 2 m b 2 q ) .
6: Fix a bound B and compute all the divisors of g below B . Denote these by d 1 > d 2 > > d r .
7: For i = 1 , , r , do the following.
If x N + g / d i = x N :
set g g / d i .
8: Output L x = g and stop.

Note that Step (2) involves O ( log N ) steps to compute x q and x m and another O ( N ) multiplications to compute x q , x q x m , x q x 2 m , , x q x m 2 . Step (3) involves at most m multiplications x q + 1 = x q x , x q + 1 x , , x q + m 1 , with complexity O ( N ) , and match-finding with the first list, with complexity O ( N log N ) with standard sorting and search algorithms. The same is true for Step (4). Step (5) has complexity O ( log max ( a 1 m b 1 , a 2 m b 2 q ) ) = O ( log N ) and so does not contribute to the overall complexity. Step (6) involves B iterations of a multiplication and an exponentiation x g / d , and thus has a time complexity of O ( B ( log g + 1 ) ) = O ( B log N ) multiplications.

In the original work, Monico states that the bound B of Algorithm 3.1 can always be chosen so that B < a 1 m b 1 . We remark that this claim does not hold in the current setting of the algorithm. For example, with a cycle length value of 4, and a 1 m b 1 = 104 , a 2 m b 2 q = 52 , we obtain g = 52 . If B < a 1 m b 1 = 104 < 11 , then we would only test divisors d below 11, and would never factor out 13 to obtain the true cycle length. For such a bound to work, one needs to modify the algorithm to test both divisors d and g / d in Step (6). However, we will show in Lemma 4 that it is almost always sufficient to take B to be a reasonably large fixed constant, thus the complexity of Step (6) can be counted as O ( log N ) , and does not contribute to the overall complexity. Thus, the overall time complexity is O ( N log N ) . If N is unavailable, the algorithm can also be modified to update the value of N step-by-step until a large enough value is found. In this case, Algorithm 3.1 has a total complexity of O ( N x ( log N x ) 2 ) .

Furthermore, Monico suggests a modification to the above algorithm, viz. to find several such a i and b i and compute all the gcd’s. It is clear that this suggestion is exactly the method used in Banin and Tsaban’s algorithm as discussed in Section 3.1.

We now analyse the probability of success. The algorithm first looks for collisions of the form x q + a 1 m = x q + b 1 . The working principle is that in this case, the cycle length L x divides a 1 m b 1 . Similarly, if also x q + a 2 m = x 2 q + b 2 then g = gcd ( a 1 m b 1 , a 2 m b 2 q ) is a multiple of L x .

So far, the process is essentially the same in both Algorithms 1 and 3.1: while the former uses a discrete logarithm oracle to obtain multiples of the cycle length, the latter directly finds these multiples by finding collisions. However, in Algorithm 3.1, we do not proceed with computing multiple factors of L x , but work with the fixed multiple g of L x , whereas in Algorithm 1 this multiple shrinks several times.

Algorithm 3.1 then proceeds by fixing a bound B and iterating over every number d below B to check if d g . If yes, it executes the next part, i.e. checks if x N + D / d = x N , and if this holds, it sets D D / d . Note that if the factorization of the number g is known (or if g can be factored in time negligible compared to O ( N ) , then we do not need this fixed bound B , and can instead iterate over every prime factor d of g . It is well-known that the number of prime factors of g counted with multiplicity is O ( log g ) , so Step (5) of the algorithm can find L x in O ( log N ) steps. However, in general, factoring g may be difficult, so we assume from here on that the algorithm proceeds by fixing a bound B for the divisors of g . Below we analyse the probability of the algorithm succeeding in terms of B and g .

Lemma 4

The probability that Algorithm 3.2 succeeds is bounded below by 1 1 B log g .

Proof

We write g = L x F for some number F and suppose that the algorithm fails. This means that there is a divisor, and hence also a prime power divisor of F , which the algorithm fails to factor out. Let p be a prime dividing F , α p denote its largest power dividing F , and β p be its largest power below the fixed bound B . So, we have p α p F , p α p + 1 F , p β p < B , p β p + 1 > B .

Note that the number of times the algorithm divides g by p is

i = 1 β p i = β p ( β p + 1 ) / 2 .

Since divisors are taken in decreasing order, we must have β p ( β p + 1 ) / 2 < α p if the algorithm fails. So, the algorithm succeeds as long as β p ( β p + 1 ) / 2 α p for every prime divisor p of F . Thus, the probability of success for the algorithm can be bounded below by

p g Prob β p ( β p + 1 ) 2 α p .

Write v p = β p ( β p + 1 ) 2 for simplicity. We may assume that g is a random multiple of L x below the bound B , so F is a random number in 1 , , B L x . We have,

Prob ( α p v p ) = 1 Prob ( p v p + 1 F ) = 1 B / L x p v p + 1 ( B / L x ) = 1 1 / p v p + 1 = 1 1 p β p ( β p + 1 ) 2 + 1 .

Hence, a lower bound for the probability of the algorithm’s success is

p F 1 1 p β p ( β p + 1 ) 2 + 1 .

Now, we have

p β p + 1 > B 1 p β p + 1 < 1 B 1 1 p β p ( β p + 1 ) 2 + 1 > 1 1 B β p 2 + 1 > 1 1 B .

We further make the following observation. Let ω ( n ) denote the number of distinct prime divisors of integer n (note, however, that the same statement also holds if counted with multiplicity). Then clearly, 2 ω ( n ) n , and so, taking logarithms, ω ( n ) log 2 n .

Collecting all the above results, we conclude that the probability of success Prob (success) of Algorithm 3.1 is bounded below as follows:

Prob (success) p F 1 1 B = 1 1 B ω ( F ) 1 1 B log F 1 1 B log g .

Note that this bound shows that Algorithm 3.1 is indeed successful with overwhelming probability, as conjectured by the author. For example, with B = 1 0 6 , even when g is several orders of magnitude larger than B , say g = 2 4,000 , the probability of success is greater than 99.6%, by the bound derived in Lemma 4.

4 Deterministic solution of the DLP

The solution of the DLP in a semigroup involves two parts: the calculation of the cycle length and start of the base element x , and the use of this value to find the discrete log.

4.1 Deterministic algorithm for cycle length computation

We now present our deterministic algorithm for the computation of the cycle length. It works by finding a suitable collision, and also guarantees finding the actual cycle length rather than just a multiple of it, in a fixed number of steps.

Algorithm 4: Deterministic algorithm for cycle length
Input A semigroup S and a torsion element x S . Assume N x is the order of x .
Output Cycle length L x of x
1: Initialize N 1 .
2: Set q N .
3: Compute, one by one, x N , x N + 1 , , x N + q and check for the equality x N = x N + j at each step j 1 . Store these values in a table as pairs ( N + j , x N + j ) , 0 j < q . If x N = x N + j for any j < q , then set L x j and end the process. If not, proceed to the next step.
4: For 0 i q , compute, one by one, the values x N + q , x N + 2 q , , x N + i q and at each step i , look for a match in the table of values calculated in Step (3).
5: Suppose that a match x N + i q = x N + j is found, and i is the smallest integer such that this happens. Set L x i q j and end the process.
6: If no match is found in Steps (3) or (5), set N 2 N and go back to Step (2).

Theorem 1

Let S be a semigroup and x S a torsion element with order N x . If an upper bound on N x is known, Algorithm 4 returns the correct value of the cycle length L x with

O ( N x ( log N x ) 2 )

steps. The total space complexity is O ( N x ) semigroup elements.

Proof

We first assume N max ( L x , s x ) and show that Steps (1)–(5) succeed in finding L x . We have q = N . If L x < q , then the equality x N = x N + L x is found in the first step and the statement of the theorem follows. Else if L x q , we can write uniquely

L x = i q j ,

for some positive integers i > 0 , 0 j < q . Now, we must have i q , because otherwise if i q + 1 , we would have

L x ( q + 1 ) q j > q 2 + q q = q 2 N ,

a contradiction.

We have

L x = i q j , 0 < i q , 0 j < q N + j + L x = N + i q x N + j = x N + j + L x = x N + i q ,

where the last step follows because N > s x by assumption. So, such a collision always occurs between elements of the two lists in the algorithm.

We now claim that for the smallest such integer i computed in Step (5) of Algorithm 4, L x = i q j .

To see this, let i be the smallest positive integer such that

x N + j = x N + i q .

Also let L x = i q j , 0 < i q , 0 j < q . We have already shown above that such integers i and j exist for our choice of N . By the definition of L x , we must have L x i q j . Now suppose that i > i . Then,

i q j ( i + 1 ) q j = i q + ( q j ) > i q i q j .

But, L x = i q j i q j , so we must have i q j = i q j . Since i > i , this means that

q ( i i ) q = ( j j ) < j ,

which is a contradiction because 0 j < q . So, we must have i = i , j = j . This proves the claim.

We have shown above that the algorithm finds the correct cycle length when N > max ( s x , L x ) . Since the algorithm doubles the value of N until a match is found, it always terminates and outputs the correct cycle length. We now look at the time complexity.

For a given N , Step (2) involves one exponentiation, or O ( log N ) multiplications to find x N and then at most another q = O ( N ) multiplications and equality checks for x N x , x N x 2 , , x N x q . This step also needs a storage space of at most q = O ( N ) elements. Step 5 needs one exponentiation or O ( log N ) multiplications to find x q , and then another q = O ( N ) multiplications to find x N + q x q , x N + q x 2 q , x N + q 2 . Finding matches in Steps (3) and (5) can be done in O ( q log q ) = O ( N log N ) comparisons with the use of sorting and efficient look-up methods. Thus, clearly, Steps (1)–(5) in Algorithm 4 have a total complexity of O ( N log N ) .

Moreover, the algorithm starts at N = 1 and doubles N until the cycle length is found, i.e. until N > max ( s x , L x ) . Thus, the number of times Steps (2)–(5) are performed is

log ( max ( L x , s x ) ) = O ( max ( log ( L x ) , log ( s x ) ) ) = O ( log N x )

Thus, the total number of steps involved is

O ( N x ( log N x ) 2 ) .

Clearly, Step (3) involves the storage of q = N = O ( max ( s x , L x ) ) = O ( N x ) elements, so this value gives the total space complexity. This completes the proof.□

Remark 2

If a bound N on the order N x is known a priori, then Algorithm 4 can clearly be completed in a single round, with time complexity O ( N ( log N ) ) .

Remark 3

For the case of a group, there exist better algorithms for the computation of the order of an element even when the total group order is unbounded. For instance, Algorithm 3.3 in ref. [18] uses a growth function d ( t ) , which generalizes the square root function used above, to compute the order N of a group element x , and achieves time and space complexities of O ( N ) , thus eliminating the additional log N multiplier introduced by the method in Algorithm 4.

However, this method fails when used for a general semigroup due to the presence of two independent unknown components of the order. To see this, note that the algorithm would need to be modified for a semigroup as follows. At stage t , one has g ( t 1 ) N x < g ( t ) . On the completion of the baby steps, one has a table with the powers x g ( t ) , x g ( t ) + 1 , , x g ( t ) + b ( t ) (the addition of g ( t ) is necessary in the semigroup case to ensure that the loop is entered). The giant steps compute x g ( t ) + g ( t 1 ) + b ( t ) , x g ( t ) + g ( t 1 ) + 2 b ( t ) , , x g ( t ) + g ( t 1 ) + d ( t ) b ( t ) = x 2 g ( t ) . Now, while N x is guaranteed to have a unique expression as g ( t 1 ) + i b ( t ) j with 0 < i d ( t ) and 0 j b ( t ) , this does not necessarily lead to a collision. In fact, if b ( t ) < L x < g ( t 1 ) and 2 L x > g ( t ) = g ( t 1 ) + d ( t ) b ( t ) , then neither the baby steps nor the giant steps leads to a collision, and the cycle length is never found (note that this can happen only if L x > s x ). Moreover, if a collision x g ( t ) + g ( t 1 ) + i b ( t ) = x g ( t ) + j is obtained in the giant step phase, the only conclusion that can be drawn is that L x g ( t 1 ) + i b ( t ) j . If instead we forced the condition g ( t 1 ) N x < g ( t ) , a collision again may never occur because there is no control on the cycle start (For instance, in matrix semigroups over finite simple semirings, the cycle start is often found to be much larger than the cycle length. In such cases, adapting group-based algorithms would fail.) See Remark 1 for further details.

4.1.1 Experimental results for cycle length computations

We used Algorithm 4 to compute cycle length values in several common semigroups, such as matrix semigroups over finite fields, matrix semigroups over the finite simple semiring S 20 (see ref. [19] for a construction and ref. [9] for the addition and multiplication tables), and the symmetric and alternating groups (where the cycle length is precisely the order of the element). We further used the obtained cycle lengths to compute the cycle start values using Algorithm 2. The working code may be found at https://github.com/simran-tinani/semigroup-cycle-length.

4.2 Solving the DLP once the cycle length is known

In this section, we demonstrate the solution of the DLP for a torsion element x in the semigroup S once the cycle length is known. As before let N x be the order of the sub-semigroup x , let L x be the cycle length of the torsion element x (which we assume is already computed), and let y x be an element.

In ref. [12], the authors demonstrate the next steps in solving for log x ( y ) , via a reduction to a DLP in the group G x , once L x and s x are known. The procedure is described in Algorithm 5, which has been adapted from the original formulation in ref. [12].

Algorithm 5: Algorithm for discrete logarithm
Input A semigroup S , a torsion element x S , with cycle length L x and cycle start s x , and y S with y = x m
Output The discrete logarithm m of y with base x
1: Compute t = s x L x and define x = x t L x + 1 G x .
2: Find the minimum number 0 b t such that y = y x b L x G x using binary search.
3: Use Shank’s baby step–giant step algorithm for the group x G x to compute m { 0 , 1 , , L x 1 } such that ( x ) m = y .
4: Find the maximum number c 0 such that x ( t L x + 1 ) m c L x G x using binary search.
5: Return m = m ( t L x + 1 ) ( b + c ) L x .

Since authors of ref. [12] do not give an explicit proof of correctness Step (5) in Algorithm 5, we provide it in Theorem 2. Before this, we will need the following technical result.

Lemma 5

Let L x be the cycle length of x S , and n , a , and a be fixed positive integers. Suppose that x b L x + n = x a G x , where b is the minimum number such that x b L x + n G x , and x n c L x = x a G x , where c the maximum number such that x n c L x G x . Then

b L x + n a and n c L x a .

Proof

First let x b L x + n = x a with b minimal such that x b L x + n G x . Suppose, to the contrary, that b L x + n > a . We must have, by the minimality of b , x ( b 1 ) L x + n G x , so ( b 1 ) L x + n < a .

But , x b L x + n = x a G x b L x + n a = k L x , k 1 ( b k ) L x + n = a x ( b k ) L x + n = x a G x , k 1 .

This is a contradiction to the minimality of b . So, b L x + n a . Now suppose that x x c L x = x a G x , with c maximal, and suppose that n c L x > a . We argue as above:

L x n c L x a n ( k + c ) L x = a , for some k 1 x n ( k + c ) L x = x a G x ,

which is a contradiction to the maximality of c . Thus, n c L x a .□

Theorem 2

Let S be a semigroup, x S a torsion element, and y x any element. Assume the cycle length L x and cycle start s x of x are known. Then Algorithm 5 returns the correct values of the discrete logarithm m = log x ( y ) in O ( L x + ( log N x ) 2 ) semigroup multiplications, with a required storage of O ( L x ) semigroup elements.

Proof

We use the notations of Algorithm 5 and also write n = log x y . We will show that the output m is equal to the correct discrete logarithm value n . Recall that we have a group G x , generated by x x t L x + 1 , and with identity x t L x . The parameter t is given by the formula t = s x L x . Inverses in G x can be computed in polynomial time using formula (1). Note that membership in G x can be tested with one equality check: y G x y x L x = y . There are now two cases:

  1. When y G x , we have b = 0 . Here, it is possible to use Shank’s baby step–giant step algorithm [5], which is a deterministic algorithm and requires O ( L x ) semigroup multiplications and storage space O ( L x ) , in order to compute log x ( y ) . This is done in Step (3). From this value, n = log x ( y ) is readily computed, as shown below. Note that in this case, log x ( y ) is determined modulo L x .

  2. When y G x , Algorithm 5 first computes, using binary search, the smallest power b of x L x such that the product y x b L x lies in the group G x , and then proceeds as in case 1 via the baby step–giant step algorithm to find the discrete logarithm m of y x b L x with base x (i.e. ( x ) m = y x b L x ). Note that in this case, the value of log x ( y ) is less than s x , and is thus determined uniquely in N . Again, the time and space complexity are both O ( L x ) .

In both aforementioned cases, we have the maximal value c such that x m ( t L x + 1 ) c L x G x , and so c L x + s x + 1 = N x + 1 , since m L x and t L x L x + s x . We also clearly have b t N x . Since the computations of both b and c are done via binary searches, they contribute O ( ( log N x ) 2 ) steps to the overall time complexity. Now,

x m ( t L x + 1 ) c L x = x m ( t L x + 1 ) = ( x ) m = x b L x + n .

Applying Lemma 5 to the above equation, we must have

m ( t L x + 1 ) c L x b L x + n , and b L x + n m ( t L x + 1 ) c L x .

Therefore, b L x + n = m ( t L x + 1 ) c L x , or n = m ( t L x + 1 ) ( b + c ) L x , which is precisely equal to m , the value returned by Algorithm 5. Thus, m = n . This completes the proof.□

Combining Theorem 1, Lemma 3, and Theorem 2 we arrive at the main proposition of the article:

Proposition 1

Let S be a semigroup, x S a torsion element, and y x any element. The discrete logarithm m = log x ( y ) can be computed deterministically in

O ( N x ( log N x ) 2 )

steps, with a required storage of O ( N x ) semigroup elements.

Proof

For the solution, one begins by finding L x . This can be done using Algorithm 4 and according to Theorem 1 this requires O ( N x ( log N x ) 2 ) steps and the storage of O ( N x ) elements.

By Lemma 3 the computation of the cycle start s x is achieved in O ( ( log N x ) 2 ) semigroup multiplications, which does not contribute to the overall cost of the algorithm.

By Theorem 2, the discrete logarithm m can then be retrieved using Algorithm 5, in O ( ( log N x ) 2 + L x ) steps, with a required storage of O ( L x ) semigroup elements.

As L x N x , the overall complexity is dominated by the computation of the cycle length, and the proof of the result is now clear.□

4.3 Solving the DLP once the factorization of the cycle length is known

We mentioned in the introduction that for a general group of order N the best general known algorithms for solving the DLP have complexity O ( N ) operations.

In case the order N has a prime factorization into small primes there is the famous Pohlig–Hellman algorithm [17] for solving the DLP whose complexity is dominated by the largest prime factor in the integer factorization of N .

In case that we have available the integer factorization of the cycle length L x we can adapt the Pohlig–Hellman algorithm for groups to a Pohlig–Hellman algorithm for solving the DLP in a semigroup. Algorithm 6 represents this adapted Pohlig–Hellman algorithm.

Algorithm 6: Pohlig–Hellman algorithm for solving the DLP in a semigroup
Input A semigroup S , a torsion element x S , with cycle length L x = i = 1 r p i e i and cycle start s x , and y S with y = x m
Output The discrete logarithm m of y with base x
1: Compute t = s x L x and define x = x t L x + 1 G x .
2: Find the minimum number 0 b t such that y = y x b L x G x using binary search.
3: for i { 1 , , r }
1. Compute the values x i = ( x ) L x / p i e i , y i = ( y ) L x / p i e i , and γ i ( x i ) p e i 1 .
2. Calculate the inverse z i of x i in G x using (1).
3. Set k 0 and n 0 0 .
4. while k < e i do
(a) Compute y k = ( y i z i n k ) p e i 1 k γ i .
(b) Use Shank’s baby step-giant step algorithm for the group γ i G x to compute d k { 0 , 1 , , p i 1 } such that γ i d k = y k .
(c) Set n k + 1 n k + p i k d k , and k k + 1 .
5. end while
6. Set m i n e i .
4: end for
5: Use the Chinese Remainder Theorem to solve the congruence equations
m m i ( mod p i e i ) , i { 1 , , r }
uniquely for m mod L x . This gives the discrete logarithm of y with respect to the base x in the group G x .
6: Find the maximum number c 0 such that x ( t L x + 1 ) m c L x G x using binary search.
7: Return m = m ( t L x + 1 ) ( b + c ) L x .

Theorem 3

Let S be a semigroup, x S a torsion element, and y x any element. Assume the cycle start s x of x is known and assume the integer factorization of the cycle length L x is known to be L x = i = 1 r p i e i . Then Algorithm 6 computes the discrete logarithm log x y requiring O i = 1 r e i ( log L x + p i ) + ( log N x ) 2 steps. The space complexity of the algorithm consists in O i = 1 r e i p i semigroup elements.

Proof

Steps (1) and (2) are in analogy to the corresponding steps of Algorithm 5. Steps (3)–(5) represent the Pohlig–Hellman algorithm for groups with the implied complexity dominated by the largest prime factor p i of the integer factorization of L x (for a reference on Pohlig–Hellman in groups, see in [20, Theorem 2.32]). It follows that the running time of the algorithm is O i = 1 r e i ( log L x + p i ) steps. The computation of b and c requires in addition ( log N x ) 2 steps. The total space complexity is O i = 1 r e i p i semigroup elements and that completes the proof.□

5 Conclusion

The DLP in a finite group has noteworthy significance for cryptography, and so an extension of existing solutions to other algebraic structures like semigroups, where inverses may not be available, is of natural interest. In particular, the DLP in a semigroup has been discussed before in two places, namely refs [12] and [11]. Both these authors provide probabilistic generalizations of existing collision-based methods for the case of a semigroup. The time complexity of the algorithm in ref. [12] is O ( N x ( log N x ) 2 log log N x ) , and the one in ref. [11] is O ( N x ( log N x ) 2 ) . Both these methods rely on computing a multiple of the cycle length and then taking gcd’s or factors and could fail with a small probability that depends on the parameters chosen. In this article, we provided a deterministic solution of the semigroup DLP, which computes the cycle length directly and does not rely on finding a factor of it. The time complexity of our algorithm is O ( N x ( log N x ) 2 ) . We further demonstrated the application of the Pohlig–Hellman algorithm to semigroups. A direct consequence of our findings is that for cryptographic purposes, generalizing the type of algebraic structure for the DLP offers no additional advantage, at least in the torsion case, both in a classical and a quantum setting.

  1. Funding information: This research was supported by armasuisse Science and Technology. Joachim Rosenthal is also supported by Swiss National Science Foundation grant no. 188430.

  2. Conflict of interest: The authors state no conflict of interest.

References

[1] Diffie W, Hellman ME. New directions in cryptography. IEEE Trans Inform Theory 1976;IT-22(6):644–54. 10.1109/TIT.1976.1055638Search in Google Scholar

[2] Menezes AJ, van Oorschot PC, Vanstone SA. Handbook of applied cryptography. CRC Press Series on Discrete Mathematics and its Applications. Boca Raton, FL: CRC Press; 1997. Search in Google Scholar

[3] Cohen H, Frey G, Avanzi R, Doche C, Lange T, Nguyen K, et al., editors. Handbook of elliptic and hyperelliptic curve cryptography. In: Discrete Mathematics and its Applications (Boca Raton). Boca Raton, FL: Chapman and Hall/CRC; 2006. 10.1201/9781420034981Search in Google Scholar

[4] Cormen TH, Leiserson CE, Rivest RL, Stein C. Introduction to algorithms. Cambridge, MA: MIT Press; 2009. Search in Google Scholar

[5] Shanks D. Class number, a theory of factorization, and genera. In: Proc. of Symp. Math. Soc., 1971. vol. 20, 1971. pp. 41–440. 10.1090/pspum/020/0316385Search in Google Scholar

[6] Pollard JM. Monte Carlo methods for index computation. Math Comput. 1978;32(143):918–24. Search in Google Scholar

[7] Adleman LM, DeMarrais J. A subexponential algorithm for discrete logarithms over all finite fields. Math Comp. 1993;61(203):1–15. 10.1007/3-540-48329-2_13Search in Google Scholar

[8] Kahrobaei D, Koupparis C, Shpilrain V. Public key exchange using matrices over group rings. Groups Complex Cryptol. 2013;5(1):97–115. 10.1515/gcc-2013-0007Search in Google Scholar

[9] Maze G, Monico C, Rosenthal J. Public key cryptography based on semigroup actions. Adv Math Commun. 2007;1(4):489–507. 10.3934/amc.2007.1.489Search in Google Scholar

[10] Goel N, Gupta I, Dass BK. Survey on SAP and its application in public-key cryptography. J Math Cryptol. 2020;14(1):144–52. 10.1515/jmc-2016-0004Search in Google Scholar

[11] Monico C. Semirings and semigroup actions in public-key cryptography. PhD thesis. University of Notre Dame Notre Dame; 2002. Search in Google Scholar

[12] Banin M, Tsaban B. A reduction of semigroup DLP to classic DLP. Des Codes Cryptograph. October 2016;81(1):75–82. 10.1007/s10623-015-0130-2Search in Google Scholar

[13] Childs AM, Ivanyos G. Quantum computation of discrete logarithms in semigroups. J Math Cryptol. 01 Dec 2014;8(4):405–16. 10.1515/jmc-2013-0038Search in Google Scholar

[14] Miller GL. Riemannas hypothesis and tests for primality. J Comput Sys Sci. 1976;13(3):300–17. 10.1016/S0022-0000(76)80043-8Search in Google Scholar

[15] Rabin MO. Probabilistic algorithm for testing primality. J Number Theory. 1980;12(1):128–38. 10.1016/0022-314X(80)90084-0Search in Google Scholar

[16] Agrawal M, Kayal N, Saxena N. PRIMES is in P. Ann Math (2) 2004;160(2):781–93. 10.4007/annals.2004.160.781Search in Google Scholar

[17] Pohlig S, Hellman M. An improved algorithm for computing logarithms over GF(p) and its cryptographic significance (corresp.). IEEE Trans Inform Theory. 1978;24(1):106–10. 10.1109/TIT.1978.1055817Search in Google Scholar

[18] Sutherland AV. Order computations in generic groups. PhD Thesis. Massachusetts Institute of Technology, 2007. Search in Google Scholar

[19] Zumbrägel J. Classification of finite congruence-simple semirings with zero. J Algebra Appl. 2008;7(3):363–77. 10.1142/S0219498808002862Search in Google Scholar

[20] Hoffstein J, Pipher J, Silverman JH. An introduction to mathematical cryptography. vol. 1. New York, NY: Springer; 2008. Search in Google Scholar

Received: 2021-06-13
Revised: 2022-06-08
Accepted: 2022-06-08
Published Online: 2022-07-01

© 2022 Simran Tinani and Joachim Rosenthal, published by De Gruyter

This work is licensed under the Creative Commons Attribution 4.0 International License.

Downloaded on 20.9.2024 from https://www.degruyter.com/document/doi/10.1515/jmc-2021-0022/html
Scroll to top button