Coding style
Introduction
This chapter intends to describe the coding style and conventions
applied in Yacas in order to make sure the engine always returns
the correct result. This is an attempt at fending off such errors
by combining rule-based programming with a clear coding style
which should make help avoid these mistakes.
Interactions of rules and types
One unfortunate disadvantage of rule-based programming is that rules
can sometimes cooperate in unwanted ways.
One example of how rules can produce unwanted results is the rule a*0 <-- 0.
This would always seem to be true. However, when a is a vector, e.g.
a:={b,c,d}, then a*0 should actually return {0,0,0}, that is, a zero
vector. The rule a*0 <-- 0 actually changes the type of the expression from a
vector to an integer! This can have severe consequences when other functions
using this expressions as an argument expect a vector, or even worse, have a
definition of how to work on vectors, and a different one for working on
numbers.
When writing rules for an operator, it is assumed that the operator working on
arguments, e.g. Cos or *, will always have the same properties regardless
of the arguments. The Taylor series expansion of Cos(a) is the same
regardless of whether a is a real number, complex number or even a matrix.
Certain trigonometric identities should hold for the Cos function, regardless
of the type of its argument.
If a function is defined which does not adhere to these rules when applied
to another type, a different function name should be used, to avoid confusion.
By default, if a variable has not been bound yet, it is assumed to
be a number. If it is in fact a more complex object, e.g. a vector,
then you can declare it to be an "incomplete type" vector, using
Object("IsVector",x) instead of x. This expression will evaluate to x if and
only if x is a vector at that moment of evaluation. Otherwise
it returns unevaluated, and thus stays an incomplete type.
So this means the type of a variable is numeric unless otherwise
stated by the user, using the "Object" command. No rules should
ever work on incomplete types. It is just meant for delayed
simplification.
The topic of implicit type of an object is important, since many rules
need to assume something about their argument types.
Ordering of rules
The implementor of a rule set can specify the order in which rules should be
tried. This can be used to let the engine try more specific rules (those
involving more elements in the pattern) before trying less specific rules.
Ordering of rules can be also explicitly given by precedence numbers. The Yacas
engine will split the expression into subexpressions, and will try to apply all
matching rules to a given subexpression in order of precedence.
A rule with precedence 100 is defined by the syntax such as
100 # f(_x + _y) <-- f(x) + f(y);
|
The problem mentioned above with a rule for vectors and scalars could be solved by making two rules:
- a*b (if b is a vector and a is a number) <-- return vector of each component multiplied by a.
- a*0 <-- 0
So vector multiplication would be tried first.
The ordering of the precedence of the rules in the standard math
scripts is currently:
- 50-60: Args are numbers: directly calculate. These are put in the beginning, so they are tried first. This is useful for quickly obtaining numeric results if all the arguments are numeric already, and symbolic transformations are not necessary.
- 100-199: tautologies. Transformations that do not change the type of the argument, and are always true.
- 200-399: type-specific transformations. Transformations for specific types of objects.
- 400-599: transformations on scalars (variables are assumed to be scalars). Meaning transformations that can potentially change the type of an argument.
Writing new library functions
When you implement new library functions, you need to make your new code compatible and consistent with the rest of the library. Here are some relevant considerations.
To evaluate or not to evaluate
Currently, a general policy in the library is that functions do nothing
unless their arguments actually allow something to be evaluated. For
instance, if the function expects a variable name but instead gets a
list, or expects a list but instead gets a string, in most cases it
seems to be a good idea to do nothing and return unevaluated. The
unevaluated expression will propagate and will be easy to spot. Most
functions can accomplish this by using type-checking predicates such as
IsInteger in rules.
When dealing with numbers, Yacas tries to maintain exact answers as much as
possible and evaluate to floating-point only when explicitly told so (using
N()). The general evaluation strategy for numerical functions such as Sin or Gamma
is the following:
- If Numeric=True and the arguments are numbers (perhaps complex
numbers), the function should evaluate its result in floating-point to current precision.
- Otherwise, if the arguments are such that the result can be calculated exactly, it should be
evaluated and returned. E.g. Sin(Pi/2) returns 1.
- Otherwise the function should return unevaluated (but usually with its arguments evaluated).
Here are some examples of this behavior:
In> Sin(3)
Out> Sin(3);
In> Gamma(8)
Out> 5040;
In> Gamma(-11/2)
Out> (64*Sqrt(Pi))/10395;
In> Gamma(8/7)
Out> Gamma(8/7);
In> N(Gamma(8/7))
Out> 0.9354375629;
In> N(Gamma(8/7+x))
Out> Gamma(x+1.1428571428);
In> Gamma(12/6+x)
Out> Gamma(x+2);
|
To implement this behavior, Gamma and other mathematical functions usually
have two variants: the "symbolic" one and the "numerical" one. For instance,
there are Sin and MathSin, Ln and LnNum, Gamma and GammaNum. (Here
MathSin happens to be a core function but it is not essential.) The "numerical"
functions always evaluate to floating-point results. The "symbolic" function
serves as a front-end; it evaluates when the result can be expressed exactly,
or calls the "numerical" function if Numeric=True, and otherwise returns
unevaluated.
The "symbolic" function usually has multiple rules while the
"numerical" function is usually just one large block of
number-crunching code.
Using N() and Numeric in scripts
As a rule, N() should be avoided in code that implements basic
numerical algorithms. This is because N() itself is implemented in
the library and it may need to use some of these algorithms.
Arbitrary-precision math can be handled by core functions such as
MathDivide, MathSin and so on, without using N(). For example, if
your code needs to evaluate Sqrt(Pi) to many digits as an
intermediate result, it is better to write MathSqrt(Pi()) than
N(Sqrt(Pi)) because it makes for faster, more reliable code.
Alternatively, your code may just explicitly declare Numeric=True
rather than use N() many times.
Using Precision()
The usual assumption is that numerical functions will evaluate
floating-point results to the currently set precision. For intermediate
calculations, a higher working precision is sometimes needed. In this
case, your function should set the precision back to the original value
at the end of the calculation and round off the result.
Using Verbose
For routines using complicated algorithms, or when evaluation takes a
long time, it is usually helpful to print some diagnostic information,
so that the user can at least watch some progress. The current
convention is that if the Verbose flag is set, functions may print
diagnostic information. (But do not print too much!)
Procedural programming or rule-based programming?
Two considerations are relevant to this decision. First, whether to use
multiple rules with predicates or one rule with multiple If()s.
Consider the following sample code for the "double factorial"
function n!! :=n*(n-2)*... written using predicates and rules:
1# 0 !! <-- 1;
1# 1 !! <-- 1;
2# (n_IsEven) !! <-- 2^(n/2)*n!;
3# (n_IsOdd) !! <-- n*(n-2)!!;
|
and an equivalent code with one rule:
n!! := If(n=0 Or n=1, 1,
If(IsEven(n), 2^(n/2)*n!,
If(IsOdd(n), n*(n-2)!!, Hold(n!!)))
);
|
(Note: This is not the way n!! is implemented in the library.) The first version is a lot more clear. Yacas is very quick in rule matching and evaluation of predicates, so the first version is (marginally) faster. So it seems better to write a few rules with predicates than one rule with multiple If() statements.
The second question is whether to use recursion or loops. Recursion
makes code more elegant but it is slower and limited in depth.
Currently the default recursion depth of 1000 is enough for most
casual calculations and yet catches infinite recursion errors
relatively quickly. Because of clearer code, it seems better to use
recursion in situations where the number of list elements will never
become large. In numerical applications, such as evaluation of Taylor series, recursion usually does not pay off.
Reporting errors
Errors occurring because of invalid argument types should be reported only if absolutely necessary. (In the future there may be a static type checker implemented that will make explicit checking unnecessary.)
Errors of invalid values, e.g. a negative argument of real logarithm
function, or a malformed list, mean that a human has probably made a
mistake, so the errors need to be reported. "Internal errors", i.e.
program bugs, certainly need to be reported.
There are currently two facilities for reporting errors: a "hard" one and a "soft" one.
The "hard" error reporting facility is the function Check. For example, if x=-1, then
will immediately halt the execution of a Yacas script and print the
error messsage. This is implemented as a C++ exception. A drawback of
this mechanism is that the Yacas stack unwinding is not performed by
the Yacas interpreter, so global variables such as Numeric,
Verbose, Precision() may keep the intermediate values they had been
assigned just before the error occurred. Also, sometimes it is better
for the program to be able to catch the error and continue.
The "soft" error reporting is provided by the functions Assert and IsError, e.g.
Assert("domain", x) x>0;
If(IsError("domain"), ...);
|
The error will be reported but execution will continue normally until
some other function "handles" the error (prints the error message or
does something else appropriate for that error). Here the string
"domain" is the "error type" and x will be the information object
for this error. The error object can be any expression, but it is
probably a good idea to choose a short and descriptive string for the
error type.
The global variable ErrorTableau is an associative list that
accumulates all reported error objects. When errors are "handled",
their objects should be removed from the list. The utility function
DumpErrors() is a simple error handler that prints all errors and
clears the list.
Other handlers are GetError and ClearError. These functions may be used to handle errors when it is safe to do so.
The "soft" error reporting facility is safer and more flexible than the
"hard" facility. However, the disadvantage is that errors are not
reported right away and pointless calculations may continue for a
while until an error is handled.