This section discusses factoring polynomials using arithmetic modulo prime numbers. Information was used from D. Knuth, The Art of Computer Programming, Volume 2, Seminumerical Algorithms and J.H. Davenport et. al., Computer Algebra, SYSTEMS AND ALGORITHMS FOR ALGEBRAIC COMPUTATION.
A simple factorization algorithm is developed for univariate polynomials. This algorithm is implemented as the function BinaryFactors. The algorithm was named the binary factoring algorithm since it determines factors to a polynomial modulo 2^n for successive values of n, effectively adding one binary digit to the solution in each iteration. No reference to this algorithm has been found so far in literature.
Berlekamp showed that polynomials can be efficiently factored when arithmetic is done modulo a prime. The Berlekamp algorithm is only efficient for small primes, but after that Hensel lifting can be used to determine the factors modulo larger numbers.
The algorithm presented here is similar in approach to applying the Berlekamp algorithm to factor modulo a small prime, and then factoring modulo powers of this prime (using the solutions found modulo the small prime by the Berlekamp algorithm) by applying Hensel lifting. However it is simpler in set up. It factors modulo 2, by trying all possible factors modulo 2 (two possibilities, if the polynomial is monic). This performs the same action usually left to the Berlekamp step. After that, given a solution modulo 2^n, it will test for a solution f _i modulo 2^n if f _i or f _i+2^n are a solution modulo 2^(n+1).
This scheme raises the precision of the solution with one digit in binary representation. This is similar to the linear Hensel lifting algorithm, which factors modulo p^n for some prime p, where n increases by one after each iteration. There is also a quadratic version of Hensel lifting which factors modulo p^2^n, in effect doubling the number of digits (in p-adic expansion) of the solution after each iteration. However, according to "Davenport", the quadratic algorithm is not necessarily faster.
The algorithm here thus should be equivalent in complexity to Hensel lifting linear version. This has not been verified yet.
Arithmetic modulo an integer p requires performing the arithmetic operation and afterwards determining that integer modulo p. A number x can be written as
When Mod(x,p)=Mod(y,p), the notation Mod(x=y,p) is used. All arithmetic calculations are done modulo an integer p in that case.
For calculations modulo some p the following rules hold:
For polynomials v _1(x) and v _2(x) it further holds that
An interesting corollary to this is that, for some prime integer p:
into a form
Where f _i(x) are irreducible polynomials of the form:
The part that could not be factorized is returned as g(x), with a possible constant factor C.
The factors f _i(x) and g(x) are determined uniquely by requiring them to be monic. The constant C accounts for a common factor.
The c _i constants in the resulting solutions f _i(x) can be rational numbers (or even complex numbers, if Gaussian integers are used).
The polynomial now only has integer coefficients.
The polynomial is now a monic polynomial in y.
After factoring, the irreducible factors of p(x) can be obtained by multiplying C with 1/a _n^(n-1), and replacing y with a _n*x. The irreducible solutions a _n*x+c _i can be replaced by x+c _i/a _i after multiplying C by a _n, converting the factors to monic factors.
After the steps described here the polynomial is now monic with integer coefficients, and the factorization of this polynomial can be used to determine the factors of the original polynomial p(x).
for some polymomial d(x) to be divided by, modulo some integer p. d(x) is said to divide p(x) (modulo p) if r(x) is zero. It is then a factor modulo p.
For binary factoring algorithm it is important that if some monic d(x) divides p(x), then it also divides p(x) modulo some integer p.
Define deg(f(x)) to be the degree of f(x) and lc(f(x)) to be the leading coefficient of f(x). Then, if deg(p(x))>=deg(d(x)), one can compute an integer s such that
If p is prime, then
Because Mod(a^(p-1)=1,p) for any a. If p is not prime but d(x) is monic (and thus lc(d(x))=1),
This identity can also be used when dividing in general (not modulo some integer), since the divisor is monic.
The quotient can then be updated by adding a term:
term=s*x^(deg(p(x))-deg(d(x)))
and updating the polynomial to be divided, p(x), by subtracting d(x)*term. The resulting polynomial to be divided now has a degree one smaller than the previous.
When the degree of p(x) is less than the degree of d(x) it is returned as the remainder.
A full division algorithm for arbitrary integer p>1 with lc(d(x))=1 would thus look like:
divide(p(x),d(x),p) q(x) = 0 r(x) = p(x) while (deg(r(x)) >= deg(d(x))) s = lc(r(x)) term = s*x^(deg(r(x))-deg(d(x))) q(x) = q(x) + term r(x) = r(x) - term*d(x) mod p return {q(x),r(x)} |
The reason we can get away with factoring modulo 2^n as opposed to factoring modulo some prime p in later sections is that the divisor d(x) is monic. Its leading coefficient is one and thus q(x) and r(x) can be uniquely determined. If p is not prime and lc(d(x)) is not equal to one, there might be multiple combinations for which p(x)=q(x)*d(x)+r(x), and we are interested in the combinations where r(x) is zero. This can be costly to determine unless q(x),r(x) is unique. This is the case here because we are factoring a monic polynomial, and are thus only interested in cases where lc(d(x))=1.
It will be factored into a form:
where all factors f _i(x) are monic also.
The algorithm starts by setting up a test polynomial, p _test(x) which divides p(x), but has the property that
Such a polynomial is said to be square-free. It has the same factors as the original polynomial, but the original might have multiple of each factor, where p _test(x) does not.
The square-free part of a polynomial can be obtained as follows:
It can be seen by simply writing this out that p(x) and D(x)p(x) will have factors f _i(x)^(p _i-1) in common. these can thus be divided out.
It is not a requirement of the algorithm that the algorithm being worked with is square-free, but it speeds up computations to work with the square-free part of the polynomial if the only thing sought after is the set of factors. The multiplicity of the factors can be determined using the original p(x).
Binary factoring then proceeds by trying to find potential solutions modulo p=2 first. There can only be two such solutions: x+0 and x+1.
A list of possible solutions L is set up with potential solutions.
If an element in L divides p _test(x), p _test(x) is divided by it, and a loop is entered to test how often it divides p(x) to determine the multiplicity p _i of the factor. The found factor f _i(x)=x+c _i is added as a combination ( x+c _i, p _i). p(x) is divided by f _i(x)^p _i.
At this point there is a list L of factors that divide p _test(x) modulo 2^n. This implies that for each of the elements u in L, either u or u+2^n should divide p _test(x) modulo 2^(n+1). The following step is thus to set up a new list with new elements that divide p _test(x) modulo 2^(n+1).
The loop is re-entered, this time doing the calculation modulo 2^(n+1) instead of modulo 2^n.
The loop is terminated if the number of factors found equals deg(p _test(x)), or if 2^n is larger than the smallest non-zero coefficient of p _test(x) as this smallest non-zero coefficient is the product of all the smallest non-zero coefficients of the factors, or if the list of potential factors is zero.
The polynomial p(x) can not be factored any further, and is added as a factor (p(x), 1).
The function BinaryFactors, when implemented, yields the following interaction in Yacas:
In> BinaryFactors((x+1)^4*(x-3)^2) Out> {{x-3,2},{x+1,4}} In> BinaryFactors((x-1/5)*(2*x+1/3)) Out> {{2,1},{x-1/5,1},{x+1/6,1}} In> BinaryFactors((x-1123125)*(2*x+123233)) Out> {{2,1},{x-1123125,1},{x+123233/2,1}} |
The binary factoring algorithm starts with a factorization modulo 2, and then each time tries to guess the next bit of the solution, maintaining a list of potential solutions. This list can grow exponentially in certain instances. For instance, factoring (x-a)*(x-2*a)*(x-3*a)*... implies a that the roots have common factors. There are inputs where the number of potential solutions (almost) doubles with each iteration. For these inputs the algorithm becomes exponential. The worst-case performance is therefore exponential. The list of potential solutions while iterating will contain a lot of false roots in that case.
For the initial solutions modulo 2, where the possible solutions are x and x-1. For p=0, rem(0)=a _0. For p=1, rem(1)=Sum(i,0,n,a _i) .
Given a solution x-p modulo q=2^n, we consider the possible solutions Mod(x-p,2^(n+1)) and Mod(x-(p+2^n),2^n+1).
x-p is a possible solution if Mod(rem(p),2^(n+1))=0.
x-(p+q) is a possible solution if Mod(rem(p+q),2^(n+1))=0. Expanding Mod(rem(p+q),2*q) yields:
When expanding this expression, some terms grouped under extra(p,q) have factors like 2*q or q^2. Since q=2^n, these terms vanish if the calculation is done modulo 2^(n+1).
The expression for extra(p,q) then becomes
An efficient approach to determining if x-p or x-(p+q) divides p(x) modulo 2^(n+1) is then to first calculate Mod(rem(p),2*q). If this is zero, x-p divides p(x). In addition, if Mod(rem(p)+extra(p,q),2*q) is zero, x-(p+q) is a potential candidate.
Other efficiencies are derived from the fact that the operations are done in binary. Eg. if q=2^n, then q _next=2^(n+1)=2*q=q<<1 is used in the next iteration. Also, calculations modulo 2^n are equivalent to performing a bitwise and with 2^n-1. These operations can in general be performed efficiently on todays hardware which is based on binary representations.
For this to work the division algorithm would have to be extended to handle complex numbers with integer a and b modulo some integer, and the initial setup of the potential solutions would have to be extended to try x+1+I and x+I also. The step where new potential solutions modulo 2^(n+1) are determined should then also test for x+I*2^n and x+2^n+I*2^n.
The same extension could be made for multivariate polynomials, although setting up the initial irreducible polynomials that divide p _test(x) modulo 2 might become expensive if done on a polynomial with many variables (2^(2^m-1) trials for m variables).
Lastly, polynomials with real-valued coefficients could be factored, if the coefficients were first converted to rational numbers. However, for real-valued coefficients there exist other methods (Sturm sequences).
Newton iteration is based on the following idea: when one takes a Taylor series expansion of a function:
Newton iteration then proceeds by taking only the first two terms in this series, the constant plus the constant times dx. Given some good initial value x _0, the function will is assumed to be close to a root, and the function is assumed to be almost linear, hence this approximation. Under these assumptions, if we want f(x _0+dx) to be zero,
This yields:
And thus a next, better, approximation for the root is x[1]:=x _0-f(x[0])/(D(x)f(x[0])), or more general:
If the root has multiplicity one, a Newton iteration can converge quadratically, meaning the number of decimals precision for each iteration doubles.
As an example, we can try to find a root of Sin(x) near 3, which should converge to Pi.
Setting precision to 30 digits,
In> Precision(30) Out> True; |
We first set up a function dx(x):
In> dx(x):=Eval(-Sin(x)/(D(x)Sin(x))) Out> True; |
And we start with a good initial approximation to Pi, namely 3. Note we should set x after we set dx(x), as the right hand side of the function definition is evaluated. We could also have used a different parameter name for the definition of the function dx(x).
In> x:=3 Out> 3; |
We can now start the iteration:
In> x:=N(x+dx(x)) Out> 3.142546543074277805295635410534; In> x:=N(x+dx(x)) Out> 3.14159265330047681544988577172; In> x:=N(x+dx(x)) Out> 3.141592653589793238462643383287; In> x:=N(x+dx(x)) Out> 3.14159265358979323846264338328; In> x:=N(x+dx(x)) Out> 3.14159265358979323846264338328; |
As shown, in this example the iteration converges quite quickly.
Given N functions in N variables, we want to solve
for i=1..N. If de denote by X the vector $$ X := x[1],x[2],...,x[N] $$
and by dX the delta vector, then one can write
Setting f _i(X+dX) to zero, one obtains
where
and
So the generalization is to first initialize X to a good initial value, calculate the matrix elements a[i][j] and the vector b[i], and then to proceed to calculate dX by solving the matrix equation, and calculating
In the case of one function with one variable, the summation reduces to one term, so this linear set of equations was a lot simpler in that case. In this case we will have to solve this set of linear equations in each iteration.
As an example, suppose we want to find the zeroes for the following two functions:
and
It is clear that the solution to this is a=2 and x:=N*Pi/2 for any integer value N.
We will do calculations with precision 30:
In> Precision(30) Out> True; |
And set up a vector of functions $f_1(X),f_2(X)$ where $X:=a,x$
In> f(a,x):={Sin(a*x),a-2} Out> True; |
Now we set up a function matrix(a,x) which returns the matrix a[i][j]:
In> matrix(a,x):=Eval({D(a)f(a,x),D(x)f(a,x)}) Out> True; |
We now set up some initial values:
In> {a,x}:={1.5,1.5} Out> {1.5,1.5}; |
The iteration converges a lot slower for this example, so we will loop 100 times:
In> For(ii:=1,ii<100,ii++)[{a,x}:={a,x}+\ N(SolveMatrix(matrix(a,x),-f(a,x)));] Out> True; In> {a,x} Out> {2.,0.059667311457823162437151576236}; |
The value for a has already been found. Iterating a few more times:
In> For(ii:=1,ii<100,ii++)[{a,x}:={a,x}+\ N(SolveMatrix(matrix(a,x),-f(a,x)));] Out> True; In> {a,x} Out> {2.,-0.042792753588155918852832259721}; In> For(ii:=1,ii<100,ii++)[{a,x}:={a,x}+\ N(SolveMatrix(matrix(a,x),-f(a,x)));] Out> True; In> {a,x} Out> {2.,0.035119151349413516969586788023}; |
the value for x converges a lot slower this time, and to the uninteresting value of zero (a rather trivial zero of this set of functions). In fact for all integer values N the value N*Pi/2 is a solution. Trying various initial values will find them.
Applying a Newton iteration step g[i+1]=g[i]-h(g[i])/(D(g)h(g[i])) to this expression yields:
von zur Gathen then proves by induction that for f(x) monic, and thus f(0)=1, given initial value g _0(x)=1, that
Example:
suppose we want to find the polynomial g(x) up to the 7th degree for which Mod(f(x)*g(x)=1,x^8), for the function
First we define the function f:
In> f:=1+x+x^2/2+x^3/6+x^4/24 Out> x+x^2/2+x^3/6+x^4/24+1; |
And initialize g and i.
In> g:=1 Out> 1; In> i:=0 Out> 0; |
Now we iterate, increasing i, and replacing g with the new value for g:
In> [i++;g:=BigOh(2*g-f*g^2,x,2^i);] Out> 1-x; In> [i++;g:=BigOh(2*g-f*g^2,x,2^i);] Out> x^2/2-x^3/6-x+1; In> [i++;g:=BigOh(2*g-f*g^2,x,2^i);] Out> x^7/72-x^6/72+x^4/24-x^3/6+x^2/2-x+1; |
The resulting expression must thus be:
We can easily verify this:
In> Expand(f*g) Out> x^11/1728+x^10/576+x^9/216+(5*x^8)/576+1; |
This expression is 1 modulo x^8, as can easily be shown:
In> BigOh(%,x,8) Out> 1; |