I have measured the time for the algorithm wrongly.
For the arguments (a=0.5,b=1.0,z=0.5) for the Incomplete Beta function the
algorithm uses a symmetry relation which involes three times the calculation of
the Gamma function. The Gamma function is the bottle neck, not the algorithm for
Incomplete Beta function. I get the following time for 4096 digits in bigfloat
precision:
8 seconds with a=0.5,b=1.0 and z=0.15 (no calculation of Gamma)
55 seconds with a=0.5,b=1.0 and z=0.50 (the symmetry relation is used)
So most of the time the algorithm need to calculate the Gamma function.
Then I have tested again the algorithm which use only add, mul, ...
Now I get:
6 seconds with a=0.5,b=1.0 and z=0.15
80 seconds with a=0.5,b=1.0 and z=0.50
The old algorithm with add, mul, ... seems to be a bit faster. So the
implementation of the low level floating point arithmetic does not help to
optimize the algorithm. It seems to me that the add, mul,... implementation is
as good as the more directly implementation. The simplifier seems to do the job
very well. Only the part using the symmetry relation is 30% faster. But I do not
know why.
It is more important to have a look at the algorithm of the Gamma function. Here
the calculation of the Bernoulli numbers within the used algorithm might be the
problem.
The effect to calculate the Bernoulli numbers is very dramtic for the first call
to bffac. With a precision of 4096 digits we have to wait about 244 seconds
after the first call. The function $bern calculate 1680 Bernoulli numbers which
are quite great and store them in an array. The second call to bffac is done in
about 8 seconds.
Dieter Kaiser