Coding style


Introduction

This chapter intends to describe the coding style and conventions applied in Yacas in order to make sure the engine always returns the correct result. This is an attempt at fending off such errors by combining rule-based programming with a clear coding style which should make help avoid these mistakes.


Interactions of rules and types

One unfortunate disadvantage of rule-based programming is that rules can sometimes cooperate in unwanted ways.

One example of how rules can produce unwanted results is the rule a*0 <-- 0. This would always seem to be true. However, when a is a vector, e.g. a:={b,c,d}, then a*0 should actually return {0,0,0}, that is, a zero vector. The rule a*0 <-- 0 actually changes the type of the expression from a vector to an integer! This can have severe consequences when other functions using this expressions as an argument expect a vector, or even worse, have a definition of how to work on vectors, and a different one for working on numbers.

When writing rules for an operator, it is assumed that the operator working on arguments, e.g. Cos or *, will always have the same properties regardless of the arguments. The Taylor series expansion of Cos(a) is the same regardless of whether a is a real number, complex number or even a matrix. Certain trigonometric identities should hold for the Cos function, regardless of the type of its argument.

If a function is defined which does not adhere to these rules when applied to another type, a different function name should be used, to avoid confusion.

By default, if a variable has not been bound yet, it is assumed to be a number. If it is in fact a more complex object, e.g. a vector, then you can declare it to be an "incomplete type" vector, using Object("IsVector",x) instead of x. This expression will evaluate to x if and only if x is a vector at that moment of evaluation. Otherwise it returns unevaluated, and thus stays an incomplete type.

So this means the type of a variable is numeric unless otherwise stated by the user, using the "Object" command. No rules should ever work on incomplete types. It is just meant for delayed simplification.

The topic of implicit type of an object is important, since many rules need to assume something about their argument types.


Ordering of rules

The implementor of a rule set can specify the order in which rules should be tried. This can be used to let the engine try more specific rules (those involving more elements in the pattern) before trying less specific rules. Ordering of rules can be also explicitly given by precedence numbers. The Yacas engine will split the expression into subexpressions, and will try to apply all matching rules to a given subexpression in order of precedence.

A rule with precedence 100 is defined by the syntax such as

100 # f(_x + _y) <-- f(x) + f(y);

The problem mentioned above with a rule for vectors and scalars could be solved by making two rules:

So vector multiplication would be tried first.

The ordering of the precedence of the rules in the standard math scripts is currently:


Writing new library functions

When you implement new library functions, you need to make your new code compatible and consistent with the rest of the library. Here are some relevant considerations.


To evaluate or not to evaluate

Currently, a general policy in the library is that functions do nothing unless their arguments actually allow something to be evaluated. For instance, if the function expects a variable name but instead gets a list, or expects a list but instead gets a string, in most cases it seems to be a good idea to do nothing and return unevaluated. The unevaluated expression will propagate and will be easy to spot. Most functions can accomplish this by using type-checking predicates such as IsInteger in rules.

When dealing with numbers, Yacas tries to maintain exact answers as much as possible and evaluate to floating-point only when explicitly told so (using N()). The general evaluation strategy for numerical functions such as Sin or Gamma is the following:

Here are some examples of this behavior:

In> Sin(3)
Out> Sin(3);
In> Gamma(8)
Out> 5040;
In> Gamma(-11/2)
Out> (64*Sqrt(Pi))/10395;
In> Gamma(8/7)
Out> Gamma(8/7);
In> N(Gamma(8/7))
Out> 0.9354375629;
In> N(Gamma(8/7+x))
Out> Gamma(x+1.1428571428);
In> Gamma(12/6+x)
Out> Gamma(x+2);

To implement this behavior, Gamma and other mathematical functions usually have two variants: the "symbolic" one and the "numerical" one. For instance, there are Sin and MathSin, Ln and LnNum, Gamma and GammaNum. (Here MathSin happens to be a core function but it is not essential.) The "numerical" functions always evaluate to floating-point results. The "symbolic" function serves as a front-end; it evaluates when the result can be expressed exactly, or calls the "numerical" function if Numeric=True, and otherwise returns unevaluated.

The "symbolic" function usually has multiple rules while the "numerical" function is usually just one large block of number-crunching code.


Using N() and Numeric in scripts

As a rule, N() should be avoided in code that implements basic numerical algorithms. This is because N() itself is implemented in the library and it may need to use some of these algorithms. Arbitrary-precision math can be handled by core functions such as MathDivide, MathSin and so on, without using N(). For example, if your code needs to evaluate Sqrt(Pi) to many digits as an intermediate result, it is better to write MathSqrt(Pi()) than N(Sqrt(Pi)) because it makes for faster, more reliable code. Alternatively, your code may just explicitly declare Numeric=True rather than use N() many times.


Using Precision()

The usual assumption is that numerical functions will evaluate floating-point results to the currently set precision. For intermediate calculations, a higher working precision is sometimes needed. In this case, your function should set the precision back to the original value at the end of the calculation and round off the result.


Using Verbose

For routines using complicated algorithms, or when evaluation takes a long time, it is usually helpful to print some diagnostic information, so that the user can at least watch some progress. The current convention is that if the Verbose flag is set, functions may print diagnostic information. (But do not print too much!)


Procedural programming or rule-based programming?

Two considerations are relevant to this decision. First, whether to use multiple rules with predicates or one rule with multiple If()s. Consider the following sample code for the "double factorial" function n!! :=n*(n-2)*... written using predicates and rules:
1# 0 !! <-- 1;
1# 1 !! <-- 1;
2# (n_IsEven) !! <-- 2^(n/2)*n!;
3# (n_IsOdd) !! <-- n*(n-2)!!;
and an equivalent code with one rule:
n!! := If(n=0 Or n=1, 1,
  If(IsEven(n), 2^(n/2)*n!,
  If(IsOdd(n), n*(n-2)!!, Hold(n!!)))
);
(Note: This is not the way n!! is implemented in the library.) The first version is a lot more clear. Yacas is very quick in rule matching and evaluation of predicates, so the first version is (marginally) faster. So it seems better to write a few rules with predicates than one rule with multiple If() statements.

The second question is whether to use recursion or loops. Recursion makes code more elegant but it is slower and limited in depth. Currently the default recursion depth of 1000 is enough for most casual calculations and yet catches infinite recursion errors relatively quickly. Because of clearer code, it seems better to use recursion in situations where the number of list elements will never become large. In numerical applications, such as evaluation of Taylor series, recursion usually does not pay off.


Reporting errors

Errors occurring because of invalid argument types should be reported only if absolutely necessary. (In the future there may be a static type checker implemented that will make explicit checking unnecessary.)

Errors of invalid values, e.g. a negative argument of real logarithm function, or a malformed list, mean that a human has probably made a mistake, so the errors need to be reported. "Internal errors", i.e. program bugs, certainly need to be reported.

There are currently two facilities for reporting errors: a "hard" one and a "soft" one.

The "hard" error reporting facility is the function Check. For example, if x=-1, then
Check(x>0,"bad x");
will immediately halt the execution of a Yacas script and print the error messsage. This is implemented as a C++ exception. A drawback of this mechanism is that the Yacas stack unwinding is not performed by the Yacas interpreter, so global variables such as Numeric, Verbose, Precision() may keep the intermediate values they had been assigned just before the error occurred. Also, sometimes it is better for the program to be able to catch the error and continue.

The "soft" error reporting is provided by the functions Assert and IsError, e.g.
Assert("domain", x) x>0;
If(IsError("domain"), ...);
The error will be reported but execution will continue normally until some other function "handles" the error (prints the error message or does something else appropriate for that error). Here the string "domain" is the "error type" and x will be the information object for this error. The error object can be any expression, but it is probably a good idea to choose a short and descriptive string for the error type.

The global variable ErrorTableau is an associative list that accumulates all reported error objects. When errors are "handled", their objects should be removed from the list. The utility function DumpErrors() is a simple error handler that prints all errors and clears the list. Other handlers are GetError and ClearError. These functions may be used to handle errors when it is safe to do so.

The "soft" error reporting facility is safer and more flexible than the "hard" facility. However, the disadvantage is that errors are not reported right away and pointless calculations may continue for a while until an error is handled.