Yacas is primarily intended to be a research tool for easy exploration and prototyping of algorithms of symbolic computation. The main advantage of Yacas is its rich and flexible scripting language. The language is closely related to LISP WH89 but has a recursive descent infix grammar parser ASU86 which supports defining infix operators at run time similarly to Prolog B86, and includes expression transformation (term rewriting) as a basic feature of the language.
The Yacas language interpreter comes with a library of scripts that implement a set of computer algebra features. The Yacas script library is in active development and at the present stage does not offer the rich functionality of industrial-strength systems such as Mathematica or Maple. Extensive implementation of algorithms of symbolic computation is one of the future development goals.
Yacas handles input and output in plain ASCII, either interactively or in batch mode. (A graphical interface is under development.) There is also an optional plugin mechanism whereby external libraries can be linked into the system to provide extra functionality. Basic facilities are in place to compile Yacas scripts to C++ so they can be compiled into plugins.
The Yacas engine has been implemented in a subset of C++ which is supported by almost all C++ compilers. The design goals for Yacas core engine are: portability, self-containment (no dependence on extra libraries or packages), ease of implementing algorithms, code transparency, and flexibility. The Yacas system as a whole falls into the ``prototype/hacker" rather than into the ``axiom/algebraic" category, according to the terminology of Fateman F90. There are relatively few specific design decisions related to mathematics, but instead the emphasis is made on extensibility.
The kernel offers sufficiently rich but basic functionality through a limited number of core functions. This core functionality includes substitutions and rewriting of symbolic expression trees, an infix syntax parser, and arbitrary precision numerics. The kernel does not contain any definitions of symbolic mathematical operations and tries to be as general and free as possible of predefined notions or policies in the domain of symbolic computation.
The plugin inter-operability mechanism allows extension of the Yacas kernel and the use of external libraries, e.g. GUI toolkits or implementations of special-purpose algorithms. A simple C++ API is provided for writing ``stubs'' that make external functions appear in Yacas as new core functions. Plugins are on the same footing as the Yacas kernel and can in principle manipulate all Yacas internal structures. Plugins can be compiled either statically or dynamically as shared libraries to be loaded at runtime from Yacas scripts. In addition, Yacas scripts can be compiled to C++ code for further compilation into a plugin. Systems that don't support plugins can then link these modules in statically. The system can also be run without the plugins, for debugging and development purposes. The scripts will be interpreted in that case.
The script library contains declarations of transformation rules and of function syntax (prefix, infix etc.). The intention is that all symbolic manipulation algorithms, definitions of mathematical functions etc. should be held in the script library and not in the kernel. The only exception so far is for a very small number of mathematical or utility functions that are frequently used; they are compiled into the core for speed.
The primary development platform is GNU/Linux. Currently Yacas runs under various Unix variants, Windows environments, Psion organizers (EPOC32), Ipaq PDAs, BeOS, and Apple iMacs. Creating an executable for another platform (including embedded platforms) should not be difficult.
Self-containment is a requirement if the program is to be easy to port. A dependency on libraries that might not be available on other platforms would reduce portability. On the other hand, Yacas can be compiled with a complement of external libraries on "production" platforms.
One major advantage of Yacas is the flexibility of its syntax. Although Yacas works internally as a LISP-style interpreter, all user interaction is through the Yacas script language which has a flexible infix grammar. Infix operators are defined by the user and may contain non-alphabetic characters such as "=" or "#". This means that the user interacts with Yacas using a comfortable and adjustable infix syntax, rather than a LISP-style syntax. The user can introduce such syntactic conventions as are most convenient for a given problem.
For example, the Yacas script library defines infix operators "+", "*" and so on with conventional precedence, so that an algebraic expression can be entered in the familiar infix form such as
(x+1)^2 - (y-2*z)/(y+3) + Sin(x*Pi/2) |
Once such infix operators are defined, it is possible to describe new transformation rules directly using the new syntax. This makes it easy to develop simplification or evaluation procedures adapted to a particular problem.
Suppose the user needs to reorder expressions containing non-commutative creation and annihilation operators of quantum field theory. It takes about 20 lines of Yacas script code to define an infix operation "**" to express non-commutative multiplication with the appropriate commutation relations and to automatically normal-order all expressions involving these symbols and other (commutative) factors. Once the operator ** is defined (with precedence 40),
Infix("**", 40); |
15 # (_x + _y) ** _z <-- x ** z + y ** z; 15 # _z ** (_x + _y) <-- z ** x + z ** y; |
Rule-based programming is not the only method that can be used in Yacas scripts; there are alternatives that may be more useful in some situations. For example, the familiar if / else constructs, For, ForEach loops are defined in the script library for the convenience of users.
Standard patterns of procedural programming, such as subroutines that return values, with code blocks and temporary local variables, are also available. (A "subroutine" is implemented as a new "ground term" with a single rule defined for it.) Users may freely combine rules with C-like procedures or LISP-like list processing primitives such as Head(), Tail().
This means that special-purpose systems designed for specific types of calculations, as well as heavily optimized industrial-strength computer algebra systems, will outperform Yacas. However, special-purpose or optimized external libraries can be dynamically linked into Yacas using the plugin mechanism.
The script library contains declarations of transformation rules and of function syntax (prefix, infix etc.). The intention is that all symbolic manipulation algorithms, definitions of mathematical functions and so on should be held in the script library and not in the kernel. The only exception so far is for a very small number of mathematical or utility functions that are frequently used; they are compiled into the core for speed.
For example, the mathematical operator "+" is an infix operator defined in the library scripts. To the kernel, this operator is on the same footing as any other function defined by the user and can be redefined. The Yacas kernel itself does not store any properties for this operator. Instead it relies entirely on the script library to provide transformation rules for manipulating expressions involving the operator "+". In this way, the kernel does not need to anticipate all possible meanings of the operator "+" that users might need in their calculations.
This policy-free scheme means that Yacas is highly configurable through its scripting language. It is possible to create an entirely different symbolic manipulation engine based on the same C++ kernel, with different syntax and different naming conventions, by simply using another script library instead of the current library scripts. An example of the flexibility of the Yacas system is a sample script wordproblems.ys that comes with the distribution. It contains a set of rule definitions that make Yacas recognize simple English sentences, such as "Tom has 3 apples" or "Jane gave an apple to Tom", as valid Yacas expressions. Yacas can then "evaluate" these sentences to True or False according to the semantics of the current situation described in them.
The "policy-free" concept extends to typing: strong typing is not required by the kernel, but can be easily enforced by the scripts if needed for a particular problem. The language offers features, but does not enforce their use. Here is an example of a policy implemented in the script library:
61 # x_IsPositiveNumber ^ y_IsPositiveNumber <-- MathPower(x,y); |
The kernel provides three basic data types: numbers, strings, and atoms, and two container types: list and static array (for speed). Atoms are implemented as strings that can be assigned values and evaluated. Boolean values are simply atoms True and False. Numbers are represented by objects on which arithmetic can be performed immediately. Expression trees, association (hash) tables, stacks, and closures (pure functions) are all implemented using nested lists. In addition, more data types can be provided by plugins. Kernel primitives are available for arbitrary-precision arithmetic, string manipulation, array and list access and manipulation, for basic control flow, for assigning variables (atoms) and for defining rules for functions (atoms with a function syntax).
The interpreter engine recursively evaluates expression trees according to user-defined transformation rules from the script library. Evaluation proceeds bottom-up, that is, for each function term, the arguments are evaluated first and then the function is applied to these values.
A HoldArg() primitive is provided to not evaluate certain arguments of certain functions before passing them on as parameters to these functions. The Hold() and Eval() primitives, similarly to LISP's QUOTE and EVAL, can be used to stop the recursive application of rules at a certain point and obtain an unevaluated expression, or to initiate evaluation of an expression which was previously held unevaluated.
When an expression can not be transformed any further, that is, when no more rules apply to it, the expression is returned unevaluated. For instance, a variable that is not assigned a value will return unevaluated. This is a desired behavior in a symbolic manipulation system. Evaluation is treated as a form of ``simplification", in that evaluating an expression returns a simplified form of the input expression.
Rules are matched by a pattern expression which can contain pattern variables, i.e. atoms marked by the "_" operator. During matching, each pattern variable atom becomes a local variable and is tentatively assigned the subexpression being matched. For example, the pattern _x + _y can match an expression a*x+b and then the pattern variable x will be assigned the value a*x (unevaluated) and the variable y will have the value b.
This type of semantic matching has been frequently implemented before in various term rewriting systems (see, e.g., [C86]). However, the Yacas language offers its users an ability to create a much more flexible and powerful term rewriting system than one based on a fixed set of rules. Here are some of the features:
First, transformation rules in Yacas have predicates that control whether a rule should be applied to an expression. Predicates can be any Yacas expressions that evaluate to the atoms True or False and are typically functions of pattern variables. A predicate could check the types or values of certain subexpressions of the matching context (see the _x ^ _y example in the previous subsection).
Second, rules are assigned a precedence value (a positive integer) that controls the order of rules to be attempted. Thus Yacas provides somewhat better control over the automatic recursion than the pattern-matching system of Mathematica which does not allow for rule precedence. The interpreter will first apply the rule that matches the argument pattern, for which the predicate returns True, and which has the least precedence.
Third, new rules can be defined dynamically as a side-effect of evaluation. This means that there is no predefined "ranking alphabet" of "ground terms" (in the terminology of [TATA99]), in other words, no fixed set of functions with predefined arities. It is also possible to define a "rule closure" that defines rules depending on its arguments, or to erase rules. Thus, a Yacas script library (although it is read-only) does not represent a fixed tree rewriting automaton. An implementation of machine learning is possible in Yacas (among other things). For example, when the module wordproblems.ys (mentioned in the previous subsection) "learns" from the user input that apple is a countable object, it defines a new postfix operator apples and a rule for its evaluation, so the expression 3 apples is later parsed as a function apples(3) and evaluated according to the rule.
Fourth, Yacas expressions can be "tagged" (assigned a "property object" a la LISP) and tags can be checked by predicates in rules or used in the evaluation.
Fifth, the scope of variables can be controlled. In addition to having its own local variables, a function can be allowed to access local variables of its calling environment (the UnFence() primitive). It is also possible to encapsulate a group of variables and functions into a "module", making some of them inaccessible from the outside (the LocalSymbols() primitive). The scoping of variables is a "policy decision", to be enforced by the script which defines the function. This flexibility is by design and allows to easily modify the behavior of the interpreter, effectively changing the language as needed.
Yacas supports "function overloading": it allows a user to declare functions f(x) and f(x,y), each having their own set of transformation rules. Of course, different rules can be defined for the same function name with the same number of arguments (arity) but with different argument patterns or different predicates.
Simple transformations on expressions can be performed using rules. For instance, if we need to expand the natural logarithm in an expression, we could use the following rules:
log(_x * _y) <-- log(x) + log(y); log(_x ^ _n) <-- n * log(x); |
After entering these two rules, the following interactive session is possible:
In> log(a*x^2) log( a ) + 2 * log( x ) |
Integration of the new function log can be defined by adding a rule for the AntiDeriv function atom,
AntiDeriv(_x,log(_x)) <-- x*log(x)-x; |
In> Integrate(x)log(a*x^n) log( a ) * x + n * ( x * log( x ) - x ) + C18 In> Integrate(x,B,C)log(a*x^n) log( a ) * C + n * ( C * log( C ) - C ) - ( log( a ) * B + n * ( B * log( B ) - B ) ) |
Rules are applied when their associated patterns match and when their predicates return True. Rules also have precedence, an integer value to indicate which rules need to be applied first. Using these features, a recursive implementation of the integer factorial function may look like this in Yacas script,
10 # Factorial(_n) _ (n=0) <-- 1; 20 # Factorial(n_IsInteger) _ (n>0) <-- n*Factorial(n-1); |
Rule-based programming can be freely combined with procedural programming when the latter is a more appropriate method. For example, here is a function that computes ( Mod(x^n,m)) efficiently:
powermod(x_IsPositiveInteger, n_IsPositiveInteger, m_IsPositiveInteger) <-- [ Local(result); result:=1; x:=Mod(x,m); While(n != 0) [ if ((n&1) = 1) [ result := Mod(result*x,m); ]; x := Mod(x*x,m); n := n>>1; ]; result; ]; |
Interaction with the function powermod(x,n,m) would then look like this:
In> powermod(2,10,100) Out> 24; In> Mod(2^10,100) Out> 24; In> powermod(23234234,2342424234,232423424) Out> 210599936; |
A base of mathematical capabilities has already been implemented in the script library (the primary sources of inspiration were the books [K98], [GG99] and [B86]). The script library is currently under active development. The following section demonstrates a few facilities already offered in the current system.
Basic operations of elementary calculus have been implemented:
In> Limit(n,Infinity)(1+(1/n))^n Exp( 1 ) In> Limit(h,0) (Sin(x+h)-Sin(x))/h Cos( x ) In> Taylor(x,0,5)ArcSin(x) 3 5 x 3 * x x + -- + ------ 6 40 In> InverseTaylor(x,0,5)Sin(x) 5 3 3 * x x ------ + -- + x 40 6 In> Integrate(x,a,b)Ln(x)+x 2 / 2 \ b | a | b * Ln( b ) - b + -- - | a * Ln( a ) - a + -- | 2 \ 2 / In> Integrate(x)1/(x^2-1) Ln( 2 * ( x - 1 ) ) Ln( 2 * ( x + 1 ) ) ------------------- - ------------------- + C38 2 2 In> Integrate(x)Sin(a*x)^2*Cos(b*x) Sin( b * x ) Sin( -2 * x * a + b * x ) ------------ - ------------------------- - 2 * b 4 * ( -2 * a + b ) Sin( -2 * x * a - b * x ) ------------------------- + C39 4 * ( -2 * a - b ) In> OdeSolve(y''==4*y) C193 * Exp( -2 * x ) + C195 * Exp( 2 * x ) |
Solving systems of equations has been implemented using a generalized Gaussian elimination scheme:
In> Solve( {x+y+z==6, 2*x+y+2*z==10, \ In> x+3*y+z==10}, \ In> {x,y,z} ) [1] Out> {4-z,2,z}; |
A small theorem prover [B86] using a resolution principle is offered:
In> CanProve(P Or (Not P And Not Q)) Not( Q ) Or P |
In> CanProve(a > 3 And a < 2) False |
Various exact and arbitrary-precision numerical algorithms have been implemented:
In> N(1/7,40); // evaluate to 40 digits Out> 0.1428571428571428571428571428571428571428; In> Decimal(1/7); // obtain decimal period Out> {0,{1,4,2,8,5,7}}; In> N(LnGamma(1.234+2.345*I)); // gamma-function Out> Complex(-2.13255691127918,0.70978922847121); |
Various domain-specific expression simplifiers are available:
In> RadSimp(Sqrt(9+4*Sqrt(2))) Sqrt( 8 ) + 1 In> TrigSimpCombine(Sin(x)^2+Cos(x)^2) 1 In> TrigSimpCombine(Cos(x/2)^2-Sin(x/2)^2) Cos( x ) In> GcdReduce((x^2+2*x+1)/(x^2-1),x) x + 1 ----- x - 1 |
Univariate polynomials are supported in a dense representation, and multivariate polynomials in a sparse representation:
In> Factor(x^6+9*x^5+21*x^4-5*x^3-54*x^2-12*x+40) 3 2 ( x + 2 ) * ( x - 1 ) * ( x + 5 ) In> Apart(1/(x^2-x-2)) 1 1 ------------- - ------------- 3 * ( x - 2 ) 3 * ( x + 1 ) In> Together(%) 9 ------------------- 2 9 * x - 9 * x - 18 In> Simplify(%) 1 ---------- 2 x - x - 2 |
Various "syntactic sugar" functions are defined to more easily enter expressions:
In> Ln(x*y) /: { Ln(_a*_b) <- Ln(a) + Ln(b) } Ln( x ) + Ln( y ) In> Add(x^(1 .. 5)) 2 3 4 5 x + x + x + x + x In> Select("IsPrime", 1 .. 15) Out> {2,3,5,7,11,13}; |
In> Groebner({x*(y-1),y*(x-1)}) / \ | x * y - x | | | | x * y - y | | | | y - x | | | | 2 | | y - y | \ / |
Symbolic inverses of matrices:
In> Inverse({{a,b},{c,d}}) / \ | / d \ / -( b ) \ | | | ------------- | | ------------- | | | \ a * d - b * c / \ a * d - b * c / | | | | / -( c ) \ / a \ | | | ------------- | | ------------- | | | \ a * d - b * c / \ a * d - b * c / | \ / |
This list of features is not exhaustive.
Debugging facilities are implemented, allowing to trace execution of a function, trace application of a given rule pattern, examine the stack when recursion did not terminate, or an online debugger from the command line. An experimental debug version of the Yacas executable that provides more detailed information can be compiled.
Yacas currently comes with its own document formatting module that allows maintenance of documentation in a special plain text format with a minimal markup. This text format is automatically converted to HTML, LaTeX, PostScript and PDF formats. The HTML version of the documentation is hyperlinked and is used as online help available from the Yacas prompt.
The plugin facility will be extended in the future, so that a rich set of extra additional libraries (especially free software libraries), system-specific as well as mathematics-oriented, should be loadable from the Yacas system. The issue of speed is also continuously being addressed.
[B86] I. Bratko, Prolog (Programming for Artificial Intelligence), Addison-Wesley, 1986.
[BN98] F. Baader and T. Nipkow, Term rewriting and all that, Cambridge University Press, 1998.
[C86] G. Cooperman, A semantic matcher for computer algebra, in Proceedings of the symposium on symbolic and algebraic computation (1986), Waterloo, Ontario, Canada (ACM Press, NY).
[F90] R. Fateman, On the design and construction of algebraic manipulation systems, also published as: ACM Proceedings of the ISSAC-90, Tokyo, Japan.
[GG99] J. von zur Gathen and J. Gerhard, Modern Computer Algebra, Cambridge University Press, 1999.
[K98] D. Knuth, The Art of Computer Programming (Volume 2, Seminumerical Algorithms), Addison-Wesley, 1998.
[TATA99] H. Comon, M. Dauchet, R. Gilleron, F. Jacquemard, D. Lugiez, S. Tison, and M. Tommasi, Tree Automata Techniques and Applications, 1999, online book: http://www.grappa.univ-lille3.fr/tata
[W96] S. Wolfram, The Mathematica book, Wolfram Media, Champain, 1996.
[WH89] P. Winston and B. Horn, LISP, Addison-Wesley, 1989.
Below is the list of Wester's problems with the corresponding Yacas code. "OK" means a satisfactory solution, "BUG" means that Yacas gives a wrong solution or breaks down, "NO" means that the relevant functionality is not yet implemented.
Yacas version: 1.0.57
Verify(25!, 15511210043330985984000000); Verify(50!, (26***50)*25!); |
Verify(Factors(50!), {{2,47},{3,22},{5,12}, {7,8},{11,4},{13,3},{17,2},{19,2},{23,2}, {29,1},{31,1},{37,1},{41,1},{43,1},{47,1}}); |
Verify(Sum(n,2,10,1/n) , 4861/2520); |
Verify(N(1000000000000*(-262537412640768744 + Exp(Pi*Sqrt(163))), 50)> -0.75, True); |
NumericEqual(N(BesselJ(2, 1+I)), 0.4157988694e-1+I*0.2473976415,GetPrecision()); |
Verify(Decimal(1/7), {0,{1,4,2,8,5,7}}); |
Verify([Local(p,r);p:=GetPrecision();Precision(12);r:=ContFracList(3.1415926535, 6);Precision(p);r;], {3,7,15,1,292,1}); |
Verify(RadSimp(Sqrt(2*Sqrt(3)+4)), 1+Sqrt(3)); |
Verify(RadSimp(Sqrt(14+3*Sqrt(3+2*Sqrt(5-12 *Sqrt(3-2*Sqrt(2)))))), 3+Sqrt(2)); |
Verify(2*Infinity-3, Infinity); |
Verify(GcdReduce((x^2-4)/(x^2+4*x+4),x), (x-2)/(x+2)); |
Factor(D(x) Expand((1+x)^20)); |
Factor(x^100-1); |
Apart((x^2+2*x+3)/(x^3+4*x^2+5*x+2), x); |
Cos(3*_x)/Cos(_x) <-- Cos(x)^2-3*Sin(x)^2; |
Verify(RadSimp(Sqrt(997)-(997^3)^(1/6)), 0); |
Verify(RadSimp(Sqrt(99983)-(99983^3)^(1/6)) , 0); |
Verify(RadSimp((2^(1/3)+4^(1/3))^3-6*(2^(1/3)+ 4^(1/3))-6), 0); |
Ln(Tan(x/2+Pi/4))-ArcSinh(Tan(x)); D(x)(Ln(Tan(x/2+Pi/4))-ArcSinh(Tan(x))); |
Verify( Hold({ {x}, {Re(x), Im(x)}}) @ Ln(3+4*I) , {Ln(5),ArcTan(4/3)}); |
Hold({ {x}, {Re(x), Im(x)}}) @ Tan(x+I*y); |
Verify(Simplify(Ln(Exp(z))), z); |
Verify(Solve(Exp(x)==1,x), {x==0}); |
Verify(Solve(Tan(x)==1,x), {x==Pi/4}); |
Verify(Solve({x+y+z==6, 2*x+y+2*z==10, x+3*y+z==10}, {x,y,z}), {{x==4-z,y==2,z==z}}); |
Verify(Simplify(Inverse({{a,b},{1,a*b}})), {{a/(a^2-1), -1/(a^2-1)}, {-1/(b*(a^2-1)), a/(b*(a^2-1))}}); |
Factor(Determinant(VandermondeMatrix ({a,b,c,d}))); |
Verify(EigenValues({{5,-3,-7},{-2,1,2}, {2,-3,-4}}) , {1,3,-2}); |
Verify(Limit(x,Infinity) (1+1/x)^x, Exp(1)); Verify(Limit(x,0) (1-Cos(x))/x^2, 1/2); |
Verify(D(x) Abs(x), Sign(x)); |
Verify(Simplify(Integrate(x) Abs(x)), Abs(x)*x/2); |
Verify(D(x)if(x<0) (-x) else x, Simplify(if(x<0) -1 else 1)); |
Verify(Simplify(Integrate(x) if(x<0) (-x) else x), Simplify(if(x<0) (-x^2/2) else x^2/2)); |
S := Taylor(v,0,4) 1/Sqrt(1-v^2/c^2); TestYacas(S, 1+v^2/(2*c^2)+3/8*v^4/c^4); |
TestYacas(Taylor(v,0,4) 1/S^2, 1-v^2/c^2); |
TestYacas(Taylor(x,0,5)(Taylor(x,0,5)Sin(x))/ (Taylor(x,0,5)Cos(x)), Taylor(x,0,5)Tan(x)); |
//Taylor(x,1,3)(Ln(x))^a*Exp(-b*x); |
//Taylor(x,0,5) Ln(Sin(x)/x); |
TestYacas(InverseTaylor(y,0,4) Sin(y)+Cos(y), (y-1)+(y-1)^2/2+2*(y-1)^3/3+(y-1)^4); |
P(n,x) := Simplify( 1/(2*n)!! * Deriv(x,n) (x^2-1)^n ); TestYacas(P(4,x), (35*x^4)/8+(-15*x^2)/4+3/8); |
Verify(OrthoP(4,x) , 3/8+((35*x^2)/8-15/4)*x^2); |
Verify(OrthoP(4,1), 1); |
p:=Sum(i,1,5,a[i]*x^i); Verify(p, a[1]*x+a[2]*x^2+a[3]*x^3 +a[4]*x^4+a[5]*x^5); |
Verify(Horner(p, x), ((((a[5]*x+a[4])*x +a[3])*x+a[2])*x+a[1])*x); |
CForm(Horner(p, x)); |
Verify(True And False, False); |
Verify(CanProve(x Or Not x), True); |
Verify(CanProve(x Or y Or x And y => x Or y) , True); |
Well-formed expressions consist of symbols A, B, I, N and are either
This defines a certain set of well-formed expressions (statements of the ABIN language); for example, NBII is a statement of the language but AB is not. The truth/falsehood interpretation of the ABIN language is the following. All well-formed expressions starting with NB are interpreted as true statements (they are "axioms" of the system). In addition, there is one deduction rule allowing one to prove "theorems":
Thus, NABIBI can be proved starting from the axiom NBI, but NANBB cannot be proved. The task at hand is to decide whether a given sequence of symbols is a provable statement of the ABIN language.
(The idea behind this interpretation is to assume that all B, BI etc. are some false statements that one could denote "B0", "B1" according to the number of "I" symbols; "N" is the logical Not and "A" is the logical And. Then the statement NABIB would mean "it is false that both B0 and B1 are true" and NANBB would mean "it is false that both B0 and negation of B0 are true". The NANBB statement is true in this interpretation but the deductive system of ABIN is too weak to obtain its proof.)
IsExpr(x_IsList) <-- IsBExpr(x) Or IsNExpr(x) Or IsAExpr(x); IsProvable(x_IsList) <-- IsAxiom(x) Or IsTheorem(x); IsAxiom(x_IsList) <-- IsNExpr(x) And IsBExpr(Tail(x)); |
The definitions of IsBExpr(x) and IsNExpr(x) are simple recursion to express the rules 1 and 2 of the ABIN grammar. Note the use of Take to create a copy of a list (we'd better not modify the value of x in the body of the rule).
10 # IsBExpr({}) <-- False; 10 # IsBExpr({"B"}) <-- True; 20 # IsBExpr(x_IsList) <-- x[Length(x)]="I" And IsBExpr(Take(x, {1, Length(x)-1})); 10 # IsNExpr({}) <-- False; 20 # IsNExpr(x_IsList) <-- x[1] = "N" And IsExpr(Tail(x)); |
The predicate IsAExpr(x) is a little bit more complicated because our rule 3 requires to find two well-formed expressions that follow A. Also, for proving theorems we need to be able to extract the first of these expressions. With this in mind, we define another auxiliary function, FindTwoExprs(x), that returns the results of search for two well-formed expressions in the list x. The return value of this function will be a pair such as {True, 3} to indicate that two well-formed expressions were found, the first expression being of length 3. We shall use a For loop for this function:
FindTwoExprs(x_IsList) <-- [ Local(iter, result); For( [ iter:=1; result:=False; ], iter < Length(x) And Not result, iter:=iter+1 ) [ result := IsExpr(Take(x, iter)) And IsExpr(Take(x, {iter+1, Length(x)})); ]; {result, iter-1}; ]; |
Now we can define the remaining predicates:
10 # IsAExpr(x_IsList)_(Length(x) <= 1) <-- False; 20 # IsAExpr(x_IsList) <-- x[1] = "A" And FindTwoExprs(Tail(x))[1]; IsTheorem(x_IsList) <-- If(IsNExpr(x) And IsAExpr(Tail(x)) And IsProvable( Concat({"N"}, Take(Tail(Tail(x)), FindTwoExprs(Tail(Tail(x)))[2]) )); |
The ABIN language is now complete. Let us try some simple examples:
In> IsExpr({"A","B"}); Out> False; In> IsExpr({"N","B","I"}); Out> True; In> IsAxiom({"N","B","I"}); Out> True; In> IsTheorem({"N","B","I"}); Out> False; In> IsProvable({"N","B","I"}); Out> True; In> IsProvable({"N","A","B","I","B"}); Out> True; |
It is somewhat inconvenient to type long lists of characters. So we can create an interface function to convert atomic arguments to lists of characters, e.g. AtomToCharList(BII) will return {"B","I","I"} (provided that the symbol BII has not been given a definition). Then we define a function ABIN(x) to replace IsProvable.
AtomToCharList(x_IsAtom) <-- [ Local(index, result); For( [ index:=Length(String(x)); result:={}; ], index > 0, index:=index-1 ) Push(result, StringMid(index, 1, String(x))); result; ]; Holdarg(AtomToCharList, 1); ABIN(x) := IsProvable(AtomToCharList(x)); |
In> AtomToCharList(NBII); Out> {"N", "B","I","I"}; In> ABIN(NANBB); Out> False; |
It is easy to modify the predicates IsTheorem() and IsAxiom() so that they print the sequence of intermediate theorems and axioms used for deriving a particular theorem. The final version of the code is in the file examples/ABIN.ys. Now we can try to check a "complicated" theorem and see an outline of its proof:
In> ABIN(NAAABIIBIBNB); Axiom {"NBII"} Theorem NABIIBI derived Theorem NAABIIBIB derived Theorem NAAABIIBIBNB derived Out> True; |
Suppose we want to build a system that understands a simple set of English sentences and will be able to answer questions. For example, we would like to say "Tom had an apple and Jane gave 3 apples to Tom"; the system should understand that Tom has 4 apples now. In the usual LISP-based treatments of artificial intelligence, this problem would be illustrated with a cumbersome list syntax such as (had (Tom apple 1)) but we would like to use the power of the Yacas syntax and use plain English.
We shall create a set of rules that will "simplify" sentences to atoms such as True or False. As a side-effect, these "simplifications" will maintain a "knowledgebase" of information about all existing persons and objects.
Unix> yacas [editvi.ys] [gnuplot.ys] [unix.ys] True; Numeric mode: "Internal" To exit Yacas, enter Exit(); or quit or Ctrl-c. Type ?? for help. Or type ?function for help on a function. Type 'restart' to restart Yacas. To see example commands, keep typing Example(); In> Load("wordproblems.ys") Out> True; In> Jitse and Ayal are persons; OK, Jitse is a person. OK, Ayal is a person. Out> {True,True}; In> apple is an object; OK, apple is an object. Out> True; In> there are many apples and pears; Note: we already know that apple is an object OK, we assume that the plural of " apple " is " apples ". OK, pear is an object. OK, we assume that the plural of " pear " is " pears ". Out> {True,True}; In> Serge had an apple; OK, Serge is a person. OK, Serge has 1 apples now. Out> True; In> Jitse had (10!) pears; OK, Jitse has 3628800 pears now. Out> True; In> Ayal had (2+3) apples and Serge had \ 2 pears; OK, Ayal has 5 apples now. OK, Serge has 2 pears now. Out> {True,True}; In> Serge ate the apple; OK, Serge has no apples now. Out> True; In> Ayal ate a pear;// this should fail Error: Ayal does not have enough pears at this time. Out> False; In> Ayal gave an apple to Serge and \ Serge gave a pear to Ayal; OK, Ayal has 4 apples now. OK, Serge has 1 apples now. OK, Serge has 1 pears now. OK, Ayal has 1 pears now. Out> {True,True}; In> Ayal ate a pear; OK, Ayal has no pears now. Out> True; In> soup is an object and Ayal had \ some soup; OK, soup is an object. OK, Ayal has some soup now. Out> {True,True}; In> Ayal gave soup to Serge and Serge \ ate the soup; OK, Ayal has no soup now. OK, Serge has some soup now. OK, Serge has no soup now. Out> {True,True}; In> Serge has soup Out> no; In> Serge has apples Out> 1; In> Ayal has apples Out> 4; In> Serge has some soup Out> False; In> Serge has some apples Out> True; In> Ayal has some pears Out> False; In> Knowledge(); OK, this is what we know: Persons: Jitse, Ayal, Serge Object names: soup, pear, apple Countable objects: pears, apples Jitse has: 3628800 pears |
Ayal has: 4 apples no pears no soup |
Serge has: 1 apples 1 pears no soup |
Out> True; In> software is an object OK, software is an object. Out> True; In> Ayal made some software OK, Ayal has some software now. Out> True; In> Ayal gave some software to everyone OK, everyone is a person. OK, Ayal still has some software OK, everyone has some software now. Out> True; In> Ayal gave some software to Serge OK, Ayal still has some software OK, Serge has some software now. Out> True; In> Serge ate the software OK, Serge has no software now. Out> True; |
The string "OK" is printed when there is no error, "Note" when there is a warning, and "Error" on any inconsistencies in the described events. The special function Knowledge() prints everything the system currently knows.
Now we shall see how this system can be implemented in Yacas with very little difficulty.
It is logical to declare "had" as an infix operator and "a" as a prefix operator quantifying lamb. In other words, "Mary had a lamb" should be parsed into had(Mary, a(lamb)). This is how we can do it:
In> [ Infix("had", 20); Prefix("a", 10); ] Out> True; In> FullForm(Mary had a lamb) (had Mary (a lamb )) Out> Mary had a lamb; |
Note that we declared the precedence of the prefix operator "a" to be 10. We have in mind declaring another infix operator "and" and we would like quantifiers such as "a", "an", "the" to bind more tightly than other words.
Clearly, we need to plan the structure of all admissible sentences and declare all necessary auxiliary words as prefix, infix, or postfix operators. Here are the patterns for our admissible sentences:
"X is a person" -- this declares a person. Parsed: is(X, a(person))
"X and Y are persons" -- shorthand for the above. Parsed: are(and(X, Y), persons). "person" and "persons" are unevaluated atoms.
"A is an object" -- this tells the system that "A" can be manipulated. Parsed: is(A, an(object))
"there are many As" -- this tells the system that "A" can be counted (by default, objects are not considered countable entities, e.g. "milk" or "soup"). Parsed: are(there, many(As)). Here "As" is a single atom which will have to be stripped of the ending "s" to obtain its singular form.
"X ate N1 As", for example, Tom ate 3 apples -- parsed as ate(Tom, apples(3)). Since we cannot make the number 3 into an infix operator, we have to make apples into a postfix operator that will act on 3.
"X gave N As to Y" -- Here "N" is a number and "A" is the name of an object. Parsed as: gave(X, to(As(N), Y)). So to and gave are infix operators and to binds tighter than gave.
Sentences can be joined by "and", for example: "Tom gave Jane an apple and Jane ate 3 pears". This will be parsed as the infix operator "and" acting on both sentences which are parsed as above. So we need to make "and" of higher precedence than other operators, or else it would bind (apple and Jane) together.
"X made some A" -- note that if "A" is not countable, we cannot put a number so we need to write some which is again a prefix operator. made is an infix operator.
"X ate some A" -- the interpretation is that some A is still left after this, as opposed to "X ate the A" or "X ate A".
"X gave some A to Y" -- similarly, X still has some A left after this.
After each sentence, the system should know who has what at that time. Each sentence is parsed separately and should be completely interpreted, or "simplified".
All knowledge is maintained in the variable Knowledge which is an associative list with three entries:
Knowledge := { {"objects", {} }, {"countable objects", {} }, {"persons", {} } }; |
Infix("is", 20); Prefix("a", 10); _x is a person <-- DeclarePerson(x); |
The operator "and" will group its operands into a list:
Infix("and", 30); 10 # x_IsList and _y <-- Concat(x, {y}); 15 # _x and _y <-- Concat({x}, {y}); |
10 # there are many xs_IsList <-- MapSingle("DeclareCountable", xs); 20 # there are many _xs <-- DeclareCountable(xs); |
However, in the present system we cannot simultaneously parse "there are many apples and ideas" and "Isaac had an apple and an idea" because we have chosen had to bind tighter than and. We could in principle choose another set of precedences for these operators; this would allow some new sentences but at the same time disallow some sentences that are admissible with the current choice. Our purpose, however, is not to build a comprehensive system for parsing English sentences, but to illustrate the usage of syntax in Yacas.
Declaring objects is a little more tricky (the function DeclareCountable). For each countable object (introduced by the phrase "there are many ...s") we need to introduce a new postfix operator with a given name. This postfix operator will have to operate on a preceding number, so that a sentence such as "Mary had 3 lambs" will parse correctly.
If x were an unevaluated atom such as "lambs" which is passed to a function, how can we declare lambs to be a postfix operator within that function? The string representation of the new operator is String(x). But we cannot call Postfix(String(x)) because Postfix() does not evaluate its arguments (as of Yacas 1.0.49). Instead, we use the function UnList to build the expression Postfix(String(x)) with String(x) evaluated from a list {Postfix, String(x)}, and we use the function Eval() to evaluate the resulting expression (which would actually call Postfix()):
Eval(UnList({Postfix, String(x)} )); |
MacroRuleBase(String(x), {n}); |
Finally, we would need to define a rule for "had" which can match expressions such as
_Person had n_IsNumber _Objects |
Incidentally, the parser of Yacas does not allow to keep unevaluated atoms that are at the same time declared as prefix operators but it is okay to have infix or postfix operators.
A rule that we need for an operator named String(x) can be defined using MacroRule:
MacroRule(String(x), 1, 1, True) {x, n}; |
In> 5 lambs; Out> {lambs, 5}; In> grilled lambs Out> {lambs, grilled}; |
But what about the expression "many lambs"? In it, many is a prefix operator and lambs is a postfix operator. It turns out that for Yacas it is the prefix operator that is parsed first (and remember, we cannot have unevaluated atoms with the same name as a prefix operator!) so "many lambs" will be transformed into many(lambs) and not into an illegal expression {lambs, many}.
Ayal gave soup to Serge and Serge ate the soup |
{gave(Ayal, to(soup, Serge)), ate(Serge, {soup, 1}} |
Here is the simplest rule: "giving" is implemented as a sequence of "eating" and "making".
10 # _x gave _obj to _y <-- [ x ate obj; y made obj; ]; |
One more subtlety connected with the notion of "countable" vs. "uncountable" objects is that there are two different actions one can perform on an "uncountable" object such as "soup": one can eat (or give away) all of it or only some of it. This is implemented using the keyword "some" which is a prefix operator that turns its argument into a list,
some _obj <-- {obj, True}; |
the _x <-- {x, 1}; |
To implement this, we have made a special rule for the pattern
_x had {obj, True} <-- ... |
_x had {obj, n_IsNumber} <-- ... |
(_x had _obj )_(Not IsList(obj)) <-- x had {obj, 1}; |
Admittedly, the module wordproblems.ys has not very much practical use but it is fun to play with and it illustrates the power of syntactic constructions from an unexpected angle.
The libltdl library, part of the libtool package and distributed together with Yacas, is used to load the plugins. This means that plugins will only function on platforms supported by libltdl. This includes Linux, Mac OS X, Solaris, and HP-UX.
Plugins currently have to be written in C++ and require certain include files from the Yacas source tree. A plugin may be compiled as a shared object after Yacas itself has been compiled and installed successfully.
// FILE: func1.h double func1 (double x, double y); |
// FILE: func1.cc #include "stubs.h" // required #include "func1.h" // our exported API #include <math.h> // we need math.h for sin() double func1 (double x, double y) { return sin(x/y); } |
For our simple plugin, we shall only need a few lines of code in the Yacas stub:
/* FILE: func1_api.stub */ Use("cstubgen.rep/code.ys"); /* Start generating a C++ stub file */ StubApiCStart("Func1"); /* Write some documentation */ StubApiCRemark("This function computes beautiful waves."); /* define a plugin-specific include file */ StubApiCInclude("\"func1.h\""); /* Declare a plugin-specific function */ StubApiCFunction("double","func1","Func1", { {"double","x"},{"double","y"}}); /* generate a C++ stub file "func1_api.cc" */ StubApiCFile("libfunc1","func1_api"); |
Another example of a Yacas stub is found in the file plugins/example/barepluginapi.stub of the source tree.
yacas -pc func1_api.stub |
libtool c++ -I/usr/local/include/yacas/ -I/usr/local/include/yacas/plat/linux32/ -c func1.cc |
libtool c++ -I/usr/local/include/yacas/ -I/usr/local/include/yacas/plat/linux32/ -c func1_api.cc |
libtool c++ -module -avoid-version -no-undefined -rpath `pwd` -o libfunc1.la func1.lo func1_api.lo |
If compilation succeeds, the dynamic library file libfunc1.la is created. This is our plugin; now it could be installed into the Yacas plugin path (usually /usr/local/lib/yacas/) with
libtool cp example.la /usr/local/lib/yacas/example.la |
But we can also keep it in the working directory.
In> DllLoad("libfunc1"); Out> True; |
If the plugin is not installed in the Yacas plugin path, we have to specify the path to the plugin, eg. DllLoad("./libfunc1").
In> Func1(2,3); Out> 0.61837; |
In> DllUnload("libfunc1"); Out> True; |
Suppose we had a numerical function on real numbers, e.g.
In> f(x,y):=Sin(2*x*Pi)+Sqrt(2)*Cos(y*Pi); Out> True; |
We can generate some C++ code that calculates this function for given floating-point arguments (not for multiple-precision numbers):
In> CForm(f(x,y)); Out> "sin(2 * x * Pi) + sqrt(2) * cos(y * Pi)"; |
Now it is clear that all the steps needed to compile, link, and load a plugin that implements f(x,y) can be performed automatically by a Yacas script. (You would need a C++ compiler on your system, of course.)
This is implemented in the function MakeFunctionPlugin() (see file unix.ys in the addons/ subdirectory). To use it, we might first define a function such as f(x,y) above and then call
In> MakeFunctionPlugin("fFast", f(x,y)); Function fFast(x,y) loaded from ./plugins.tmp/libfFast_plugin_cc.so Out> True; In> fFast(2,3); Out> -1.41421; |
Now we can use the function fFast(x,y) which is implemented using an external plugin library. The function MakeFunctionPlugin() assumes that all arguments and return values of functions are real floating-point numbers. The plugin libraries it creates are always in the plugins.tmp/ subdirectory of current working directory and are named like libNNN_plugin_cc.so.
If MakeFunctionPlugin() is called again to create a plugin function with the same name (but different body), the DLL will be unloaded and loaded as necessary.
The Yacas header files are installed in /usr/local/share/yacas/include, and the libraries in /usr/local/share/yacas/lib. The main entry library is the cyacas library, which offers an interface which can be reached through a normal c or c++ program. The API it offers is the following:
void yacas_init(); |
void yacas_eval(char* expression); |
char* yacas_error(); |
char* yacas_result(); |
char* yacas_output(); |
void yacas_exit(); |
void yacas_interrupt(); |
The directory embed/ contains some examples demonstrating calling Yacas from your own programs. Use the makefile.examples makefile to compile the examples (it is in a separate makefile because it is not yet guaranteed to work on all platforms, so it might otherwise break builds on some platforms).
The simplest "hello world"-type example is example2, which reads:
#include <stdio.h> #include "cyacas.h" int main(int argc, char** argv) { yacas_init(); yacas_eval("D(x)Sin(x)"); if (!yacas_error()) printf("%s\n",yacas_result()); yacas_exit(); return 0; } |
And yields on the command line:
$ ./example2 Cos(x); |
The example example1 allows you to pass expressions on the command line:
$ ./example1 "D(x)Sin(x)" "Taylor(x,0,5)Sin(x)" Input> D(x)Sin(x) Output> Cos(x); Input> Taylor(x,0,5)Sin(x) Output> x-x^3/6+x^5/120; |
The file example3 shows a piece of code accepting an expression in Lisp syntax, the result being printed to the console using PrettyForm.
The interface described above will probably not change very much in the future, but the way to compile and link might. In an ideal world, there would be one library libcyacas.a and one header file cyacas.h, which get installed in a place where all programs can find it. Perhaps a dynamic link library is even better.
The answer to the above is, at least according to the author, a resounding no. It is doubtful such a program will ever exist, but it is not even sure that such a program would be desirable.
Humans have a long history of making tools to make their lives easier. One important property of a tool is that it is clear conceptually to the user of that tool what that tool does. A tool should not be clever. The user of the tool can be clever about using the tool, or combining it with other tools. A 'clever' tool often results in a tool that is not useful. It is hard to understand what a clever tool does, or why it does what it does. In short: its behavior will be unpredictable.
This is a passionate plea against generic commands like Simplify and Solve.
Consider this bit of interaction in Yacas:
In> a:= -x^(-1) Out> -x^(-1); In> b:= -1/x Out> (-1)/x; In> a = b Out> False; |
Now, that can not be right, can it? Clearly, these are the same? No, they are not. They have a slightly different form, and are thus represented differently internally. the = sign compares the internal representation of two expressions , so a = b returns False because the internal representations of the expressions a and b are bound to are different. Note that this is behaviour that is simple to explain. The = operator is a 'tool', it is simple, and does one thing but does it well. It is easy to use, an important property of a tool.
To drive home this point further, suppose we did modify the = operator to detect that a and b are indeed equal. Great! Wonderful! a=b now returns True. But consider
In> c:=1+r+r^2 Out> r+r^2+1; In> d:=(1-r^3)/(1-r) Out> (1-r^3)/(1-r); In> c=d Out> False; |
c and d are equal for all values of r where r != 1, but there a limit can be taken:
In> Limit(r,1)d Out> 3; In> Limit(r,1)c Out> 3; |
Now, we have to modify the = tool to also detect that these are the same. Actually, this will have to be done for all known identities! Or we shall have to explain for which expressions it can determine equality. This will be a complex story, it will be hard to explain. It will be a complex tool to use. And, more practically, it will be a slow tool.
So, how do we go about verifying that a and b are the same? Or that c and d are the same?
The solution lies in devising new tools.
A representation is called a normal representation if zero only has one representation. Thus nf(a-b)=0 should be something that should return True if a and b are the same mathematically.
Consider a normal form defined on rational functions:
In> MM(a) Out> MultiNomial({x},{{-1,-1}}); In> MM(b) Out> MultiNomial({x},{{0,-1}})/ MultiNomial({x},{{1,1}}); |
However:
In> MM(a-b) Out> MultiNomial({x},{{0,0}})/ MultiNomial({x},{{1,1}}); In> NormalForm(%) Out> 0; |
So here we have found a combination of tools that together allow us to decide that the a and b defined in the beginning of this section are the same: convert a-b to a normal form of a-b, and verify with the = tool that they are the same:
In> NormalForm(MM(a-b)) = 0 Out> True; |
Now consider the c and d defined above. c and d are both functions of r only, c=c(r) and d=d(r). Now, let us define a function f(r)=c(r)-d(r):
In> f(r):=Eval(c-d) Out> True; In> f(r) Out> r+r^2+1-(1-r^3)/(1-r); |
It is not quite clear yet that this is zero. But we can decide that this is zero (and thus c(r)=d(r)) by first noting that f(r) is zero for some r, and then that the first derivative of f(r) with respect to r is zero, independent of r:
In> f(0) Out> 0; In> D(r)f(r) Out> 2*r+1-(-((1-r)*3*r^2+r^3-1))/(1-r)^2; In> NormalForm(MM(%)) Out> 0; |
So here we have avoided bringing c and d to canonical forms, by for example first discovering that c is a geometric series, and gone straight to detecting that c-d is in fact zero, and thus c(r)=d(r).
Here again we have combined tools that are simple, do one thing but do it well, and for which it is easy to understand for human beings what they do.
http://www.xs4all.nl/~apinkus/backups/
The addition of new files to the Makefile.am ensures that it will be added to the tarball yacas-*.tar.gz which is uploaded to the backup repository. This has the nice side effect that you can have local files which don't automatically get added to the distribution (by not adding them to the Makefile.am file). Additionally, files which are not listed in Makefile.am may not be built and/or installed automatically. To make sure that the tar.gz distribution is complete, you can run the command
make distcheck |
If you want to do more complicated things (like adding files which are not Yacas script or test files, or files which should be compiled or installed only conditionally), or if you are simply curious, you can read more in the chapter entitled "The Yacas build system".
CVS uses a diff-like scheme for merging differences: it looks at two text files, determines the different lines, and merges accordingly. It discovers the changes you made by looking at the version you checked out last and the version you have now, to discover which lines changed (it maintains an automatic version number for each file).
If the version of a file on your system and the version in the cvs repository has a line that has been changed by both you and some one else, the cvs repository will obviously not know what to do with that, and it will signal a 'collision' which you will have to solve by hand (don't worry, this rarely happens). More on that later.
In commands to be described in this document are in short:
To check out Yacas as anonymous user, type:
cvs -d:pserver:anonymous@cvs.yacas. sourceforge.net:/cvsroot/yacas login cvs -z3 -d:pserver:anonymous@cvs.yacas. sourceforge.net:/cvsroot/yacas co yacas |
To check out as a maintainer, type:
export CVS_RSH=ssh1 |
This will tell CVS to use ssh1 for communication. Then, in order to download the yacas source tree, type
cvs -d:ext:loginname@cvs.yacas.sourceforge. net:/cvsroot/yacas co yacas |
Those lines typed above are long and obscure, but it is also the last time you need to type them. From now on, if you want to do anything with cvs, just go into the yacas/ directory you just checked out, and type the cvs command without the -d:... flag. This flag just tells cvs where to find the repository. But future cvs commands will know where to find them, which is why you don't need that flag.
cvs update -d |
on the command line in the yacas directory, and that should essentially download the latest version for you in that directory (just the changes). The -d option here states that you are also interested in new directories that were added to the repository. Oddly enough, cvs will only get you changed and added files, not added directories, by default.
A command
cvs -q update -d |
will print messages only about changed files.
First, you should test the new Yacas system:
make test |
Now you can start entering your changes to the CVS. If you created some new files, you need to tell CVS to add them to the source tree:
cvs add [list of file names of ascii text files]
This adds ascii text files. If you added binary files (GIF images in the documentation directory, or something like that), you can add it to the CVS with
cvs add -kb [list of file names of binary files]
Note that, when adding files to the CVS, you should normally also add them to the Yacas tar.gz distribution. This is done by adding the file name to the EXTRA_DIST variable in the file Makefile.am in the directory where you were adding the file.
In case files need to be removed, there are two options:
There seems to be no easy way to rename or move files; you would have to remove them at their old location and add them at a new location.
Now, when finished with that, you might want to 'commit' all changes with
cvs commit |
If the commit succeeds, an email is sent out to the maintainers, who can then scan the diff files for changes, to see if they agree with the changes, and perhaps fix mistakes made (if any).
If there is a collision, the commit fails (it will tell you so). This might happen because someone else also edited the same place in a file and their changes cannot be automatically merged with yours. In case of a collision, you need to invoke cvs update twice. The cvs update outputs a list of file names with a character in front of them. The important ones are the files with a 'C' before them. They have a collision. You can go into the file, and see the collision, which the cvs system conveniently marks as:
<<<<<< old version =========== new version >>>>>> |
You can edit the file by merging the two versions by hand. This happens very rarely, but it can happen. Use cvs commit afterwards to commit.
The commit and update commands can be performed in specific directories, and on specific files, if necessary, by stating them on the command line. Or you can go into a sub directory and do a cvs commit or cvs update there, if you are confident that is the only place that changed or whose changes you are interested in.
That is basically it, a quick crash course cvs. It is actually very convenient in that usually all that is needed is a cvs commit to fix small bugs. You type that in, and your version gets merged with the changes others made, and they get your changes, and you backed up your changes at the same time (all with that little command!).
You can find more information about cvs at http://cvsbook.red-bean.com/ .
The "source" form of all documentation is maintained in a special plain text format. The format is such that it is clearly readable without any processing and is easy to edit. To compile the documents, the system processes the plain text docs with a script to prepare Yacas-language files and then runs other scripts to produce the final documentation in HTML and other formats.
The source text must be formatted in a certain fashion to delimit sections, code examples, and so on, but the format is easy enough to enter in a plain text editor. Text is marked up mostly by TAB characters, spaces, and asterisks "*" at the beginning of a line. The format is easy enough so that, for example, the text of the GNU GPL (the file COPYING) can be used as a documentation source file without changes. The script txt2yacasdoc.pl converts this markup into Yacas code for further processing.
Currently, the Yacas documentation consists of four "core" books (the introductory tutorial, the programming tutorial, the user's reference manual, and the programmer's reference manual), and three "extra" books (algorithms, Lisp programming, essays).
All Yacas documentation books are distributed under the GNU Free Documentation License (FDL). If you add a new documentation book to Yacas, please include the file FDL.chapt.
The documentation books are meant to be stand-alone texts, except the two "reference manual" books which are meant to be used together because they share a common hyperlinked table of contents. The Yacas Help() command will show a reference article from either of the two reference books.
Stand-alone books are free-form, but reference books must be written with a certain template that allows online hyperlinking. The file manmake/dummies is an example template for a reference manual section.
Books are divided into "chapters", "sections" and "subsections". The reference manuals contain descriptions of Yacas commands and functions, and each function is given in a separate (unnumbered) section which is marked by a special *CMD label (see below).
At the beginning of each book there must be a book title and a short book description (labeled *BLURB). The short description appears in the printed version as the subtitle on the title page. It also goes into the HTML top-level book index as the book description.
At the beginning of each chapter there may be a "chapter introduction" labeled *INTRO which is also a short description of the contents of that chapter. It may be one paragraph only. The "chapter intro" feature is only used in the HTML reference manual because it is the text that appears at the very top of a reference manual section. As you can see in the HTML version of reference docs, each chapter contains a list of all functions described in it. This list goes right after the first paragraph of "chapter intro". In the printed (PS/PDF) documentation, the *INTRO label is ignored and the "chapter intro" paragraph is shown exactly like any other text paragraph.
Each printed book contains a table of contents at the beginning and an index at the end. The index should help a reader to quickly find a particular concept. For example, all documented Yacas commands are automatically entered into the index in the reference manual. Additional index entries should be inserted by hand using the *A label (see below).
Translators should try to prepare translated documentation in the same plain text format as the original documentation. Any problems with conversion to HTML and PS/PDF formats should be easy to solve (at least for European languages).
The most important documentation books to translate are the reference manual and the tutorials. There is substantial documentation aimed at developers, for instance the algorithms book or the essays book. It is probably not as important to translate such books, since Yacas developers have to speak English to communicate.
You may want to examine the source of this file (manmake/ YacasDocs.chapt.txt) to see how various features of the markup are used. Currently the following markup is implemented:
In> 1+2; Out> 3; |
While(x<0) [ x:=x+1; Write(x); ]; |
* Item * Another item |
* 0. First item * 0. Second item |
Note that the text of an item continues until the next itemized line is given or until end of paragraph (does not have to be all in one line).
Nesting of enumerated or itemized environments is not supported, except for fringe cases of nesting just one itemized list at the very end of an enumerated list or vice versa.
The enumerated environment is currently only implemented in LaTeX docs; HTML docs render them as itemized.
<*http://host.net/file.html#anchor*> |
<*click here|somewebpage.html*> |
Note: Currently, only the HTML documentation is hyperlinked, while the printed PS/PDF documentation contains only text.
<*yacasdoc://essays/5/2/*> |
<*this section|yacasdoc://essays/5/2/*> |
<*yacasdoc://#documentation!hyperlinks*> |
<*yacasdoc://Algo/3/#adaptive plotting*> |
<*yacasdoc://Algo/3/1/*> |
There is a special feature for displayed equations: Any punctuation immediately following the second pair of dollar signs will be displayed on the same line. (This is used to get around the limitation of mathematical expressions that cannot end with a comma or a period or with another punctuation mark.) For example, the formula "$$x^2/2$$," will produce
As special exceptions, one can enter the symbols " TeX" and " LaTeX" as if they are Yacas expressions, i.e. "$TeX$" produces " TeX". One can also create a capitalized form of the name Yacas as "{Yacas}".
Please note: Mathematical expressions must be valid Yacas expressions, with no unbalanced parentheses, no undefined infix operators, no hanging periods and so on, or else the Yacas script that formats the docs will fail! (This limits the scope of mathematical formulae but is hopefully not critical.)
Also, please avoid putting equations into documentation as plain text. Expressions such as a>0 are not correctly typeset by TeX if included into the plain text without the dollar signs: "a>0". (The HTML documentation is currently not affected.)
Currently, when creating online HTML documentation, all mathematics is kept in Yacas notation and set in boldface font. (This may change in the future.) Of course, LaTeX typesets the documentation with correct mathematical symbols.
Another feature of the LaTeX exporter is that it will first try to represent all functions and infix operators according to their mathematical meaning, and if no such meaning is defined in Yacas, then it will show them exactly as they are written in Yacas. For infix operators to work, they have to be declared in the standard library, or else an error will occur when processing the manual.
For example, the Yacas operators = and == are represented in LaTeX by an equals sign (=), the operator := becomes "identically equal" ( a:=b), and the cosmetic operators <> and <=> become a<>b and a<=>b. But you cannot use an infix operator such as ":=*" because it is not defined in the standard library. A Yacas function which is not defined in the standard library, for example "func(x)", will appear just like that: func(x).
*INCLUDE ../essays/howto.chapt |
Note that the included document is the file howto.chapt, not howto.chapt.txt, because it must be a Yacas-language file, not a .txt file. (The IncludeFile() call, an alias to Load(), will be used to execute the specified file.)
*REM this is a comment (documentation text continues) |
For example, the text
*FOOT This is an example footnote |
For example,
*EVAL "Yacas version: " : Version() |
A typical reference manual subsection documenting a certain function may look like this in plain text:
*CMD PrintList --- print list with padding *STD *CALL {PrintList}(list) {PrintList}(list, padding); *PARMS {list} -- a list to be printed {padding} -- (optional) a string *DESC Prints {list} and inserts the {padding} ... *E.G. In> PrintList({a,b,{c, d}}, " .. ") Out> " a .. b .. { c .. d}"; *SEE Write, WriteString |
Notes:
*EG notest // some advanced tests |
*CMD Sin, Cos, Tan --- Trigonometric ... |
In addition to the above labels, there are the following tags:
*PARMS |
*HEAD Parameters: |
Usage of the *A label currently does not directly affect the appearance of the docs. In the HTML docs, it inserts the insivible anchor tags <a></a>. In the printed LaTeX docs, the *A label adds an index entry. The *CMD tag generates all necessary HTML anchors and index entries for commands in the reference manual. So only non-command index entries need to be manually entered using *A.
The *INTRO and *BLURB tags only work for one paragraph. There must be no empty line between *INTRO/*BLURB and that paragraph. Also, there must be no empty line between the "blurb" and the book title (for technical reasons). There must be one and only one "blurb" paragraph in a "book" and no more than one "chapter intro" paragraph per chapter.
This markup should be sufficient for creating reference documentation in plain text.
Currently the following facilities are provided for indexing:
After LaTeX generates a "raw" index file *.idx, the makeindex utility is used to post-process and sort the index into the .ind file. If you do not have makeindex on your system, the book indices will not be generated.
Note that makeindex is not always friendly to special (non-alphanumeric) characters. For example, it uses the symbol ! to separate index topics, which may conflict with Yacas commands. In other words, document index must be tested and sometimes debugged.
In the HTML docs, the index is currently not generated on a separate page, although HTML anchors are inserted in the text. The reference manual uses the HTML anchors to provide online help through the ? command.
An index entry may be a "topic" with "subtopics", which usually appears in book indices like this:
gnus, 51 tame, 51 wild, 52-341 |
*A gnus *A gnus!tame *A gnus!wild |
Currently, it is possible to include a command or an equation into an index entry, for example,
*A {InterestingCommand} *A calculation of $Sqrt(x)$ |
Labels with an argument on the same line (affect only the current line):
Labels that affect the rest of the line and the subsequent paragraph:
Special labels for the reference manual that accept several arguments on the same line:
Special labels without arguments that generate headings for the reference manual:
Other special markup:
In case of a math syntax error, the documentation exporter cannot print the paragraph where the error occurred, but it usually prints the preceding paragraph. Currently, the easiest way to locate the error is to generate the .tex output and look at it, e.g.:
make ref.book.tex; less ref.book.tex |
If the last line is \end{document} but latex does not finish, you will have to run latex by hand, e.g.
latex ref.book.tex |
perl txt2yacasdoc.pl < file.txt > file.chapt |
In this example, file.txt contains some formatted plain text (source text) and the resulting file file.chapt will be produced in Yacas-language documentation format.
There is a single option for txt2yacasdoc:
perl txt2yacasdoc.pl -debug < file.txt \ > file.chapt |
book2TeX.sh intro.book intro.book.tex |
latex intro.book.tex dvips -o intro.book.ps intro.book dvi |
pdflatex intro.book.tex |
To generate printed docs, it is necessary to run latex (at least) three times in a row. This is because at first latex does not know how much space will be taken by the table of contents and the index, so the page numbers are all off by a few pages. Only on the second run latex generates correct page numbers for the TOC (the .aux file) and for the index (the .idx file). After this the index file has to be processed by the makeindex routine to sort it, and the third latex run is needed to actually insert the correct TOC and the processed index into the final document.
The shell commands in book2txt.sh execute the following Yacas commands:
Use("book2TeX.ys"); ToFile("file.chapt.tex") Load("file.chapt"); |
book2TeX.sh -run "yacas-dir/src/yacas --rootdir yacas-dir/scripts" file.book file.book.tex |
For example, the symbols %, { }, < >, #, \, _ and & are special to TeX. They should not normally be used in plain text; it is okay to use them in "typewriter" text (within braces {}) or code samples -- but not in section or chapter heads, because it makes it difficult to export to TeX correctly. TeX commands may be entered but will not be correctly rendered in HTML online documentation.
Sometimes fixed-font text will hang over the right edge of the printed page. A workaround is to break the fixed-font text into shorter fragments or to rephrase the text.
Another concern is that code examples (TAB-indented blocks) are typeset in a fixed-width font and may not fit into the width of the page. To avoid this, the lines in the code examples should not be longer than about 50 characters.
The current implementation uses a "report" style which allows chapters, sections, subsections and includes an automatically generated table of contents and an index. The standard 10 point font and two-column format are used to save space (and trees).
The script txt2yacasdoc.pl attempts to convert double quotes "" into proper English quotation marks "". However, this automatic conversion sometimes fails and produces wrongly-directed quotes. One such case is if the quotes are on the same line as a TAB character (e.g. in an itemized environment). This problem can be circumvented by putting the quoted words on a different line.
Some complicated mathematical expressions may not correctly render in TeX. This is because Yacas uses its library function TeXForm() to transform Yacas expressions to TeX. Mathematical expressions are entered in the plain text documentation source using Yacas syntax, then transformed to a special non-evaluating call TeXMath() in the Yacas-language documentation, which formats into HTML using a Write() call or into TeX using a TeXForm() call, as necessary. Testing should be performed on documentation before releasing it. The most stringent limitation is that the expression between dollar signs should evaluate in Yacas (preferably to itself) and not cause syntax errors. In case of doubt, check that the expression evaluates without errors and then try to use TeXForm on that expression and see if that evaluates without errors as well. For example, expressions such as x=+1 will cause a syntax error and this will break the compilation of the manual (both HTML and TeX).
Usage is similar to book2TeX.sh. For example, the benchmarking test code wester.yts can be automatically extracted from the corresponding essay chapter by the command
sh ../manmake/book2ys.sh wester-1994.chapt \ wester.yts |
To prepare a documentation chapter in such a way that code extraction is possible, one needs to make sure that all code examples in the chapter taken together will become a correct sequence of Yacas expressions when cut out and written sequentially into a file. So, for instance, semicolons at the end of each statement are required. The script book2ys will not export example Yacas session code with "In>" and "Out>" prompts but it will export all other example code.
The example code with "In>" and "Out>" prompts will become comments in the exported Yacas file. It is possible to suppress this comment generation by the -strip option to the book2ys.sh script.
If the output file name is not given, the name will be the same as the Yacas book source name but with the .ys extension.
See the source file wester-1994.chapt.txt to get a feeling of how the source documentation is formatted to allow completely automatic code extraction. Note that the printed documentation may be in twocolumn format, and therefore it is necessary to split lines that are too long.
Using the script book2ys.sh, one can write documentation and code together, a la "literate programming". The main idea of literate programming is that the program code and the documentation should be written as one large book explaining to humans how the program is organized and how it works. From this book, a printable copy is generated and all code is automatically extracted for compilation.
Literate programming may require that code be split between many source files, and yet it may be convenient to keep all descriptions in one book. The special document formatting label *YSFILE can be used to redirect output to different Yacas source files.
By default, the output goes to the file specified on the book2ys.sh command line. This default can be restored by the directive "*YSFILE -" at any time. Otherwise, all code output will be printed to the file specified after the *YSFILE label.
Note that the multiple file support is somewhat restrictive:
Here is an example of using multiple files. Note how documentation text is interspersed with TAB-indented code.
Text that does not appear in the code. // code into default file // more code *YSFILE script1.ys This will not appear in the file. x:=1; // some code for the file script1.ys *YSFILE script2.ys // some code for the file script2.ys End of the example, need to reset *YSFILE. *YSFILE - |
The converse approach is to write primarily code and embed some documentation as comments in the code files. This approach is implemented by the script ys2book.pl.
This script takes a Yacas script file and extracts Yacas comments from it into another file. Usage may look like this:
perl ys2book.pl < file.ys > file.chapt.txt perl ys2book.pl -strip < file.ys > file.chapt.txt |
Not all comments may be desirable as documentation. Formatting of comments is implemented as follows:
a:=1; // initialize a. ///// This function needs /// an initializer. b:=1; /// also initialize b. |
This function needs an initializer. |
/** documentation text */ /** continues here*/ |
documentation text continues here |
/** This starts * a multiline * comment. */ |
All exported text is printed to the output file as is, without any additional reformatting. The only change to the text is stripping of initial spaces.
Any leading spaces after the beginning of the comment sign are removed. For example,
/* text */ |
/* start of comment */ |
Empty lines can be introduced into the documentation either as part of a multiline comment block, or as a standalone empty comments such as
/// //////// |
With these features it is easy to prepare the embedded documentation in the Yacas plaintext documentation format. This format requires space- and TAB-based formatting, which is mostly preserved by the script ys2book.pl.
This is a standalone script; it is installed by default into /usr/local/bin and requires the Yacas executable also on the path, as well as the script files book2TeX.* and txt2yacasdoc.pl in the /manmake subdirectory of the Yacas installation tree /usr/local/share/yacas/. The script also requires perl and the Unix shell sh.
Limitations of this script are:
These limitations may be easily overcome by editing the resulting TeX file (but you need to know at least some TeX to do that).
The general usage pattern is
ytxt2tex [-o outputfile] file1.txt [file2.txt] ... |
To illustrate the usage of the script ytxt2tex, consider two examples.
The first example is just one plaintext file example1.txt. This file will have to be a "book" in itself, i.e. it will have to include a book title indented by four TAB symbols. For example:
*REM file: example1.txt Example Document Title *REM this is a section title: Numbers and letters *REM here are some index entries: *A numbers *REM simple index entries like this are OK *A letters Numbers and letters are very important, etc. |
This file example1.txt can be converted to a LaTeX file example1.tex by the following simple command:
ytxt2tex example1.txt |
ytxt2tex -o output1.tex example1.txt |
The second example is a longer "book" consisting of several plaintext files. One of these files is a "master file" and it should include all other files using the *INCLUDE label. The *INCLUDE label should contain file names without the .txt extension.
Suppose we have prepared the files book1.txt, chapter1.txt, and chapter2.txt containing the preamble text and two chapters. For example:
*REM this is the file "book1.txt" *BLURB or, The Multitudinous Attempts to Avoid Counterproductivity Relationships of Entities *INCLUDE chapter1 *INCLUDE chapter2 |
The chapter files might be:
*REM this is the file "chapter1.txt" Entities and Epiphenomena The history of the ambiguous question of epiphenomenological discourse can be traced to the pre-postmodern period... |
*REM this is the file "chapter2.txt" Substrates and Superficiality In the preceding chapter, we have thoroughly investigated the metaphilosophical aspects of the trans-homocentric considerations... |
The command to create the final LaTeX file book1.tex is
ytxt2tex book1.txt chapter1.txt chapter2.txt |
ytxt2tex -o MyBook.tex book1.txt chapter*.txt |
By default, both table of contents and the index are generated. The commands to create a PostScript file out of the LaTeX file might be:
latex MyBook.tex latex MyBook.tex makeindex MyBook.idx -o MyBook.ind latex MyBook.tex |
Note that the resulting LaTeX file needs to be processed three times if the table of contents or index are to be used. Without a table of contents and index, it is enough to process the file with LaTeX twice.
Currently, most but not all of the Yacas documentation markup functionality is implemented in the simple plaintext filter; also, documentation includes some extra HTML files. However, almost all of the reasonable markup needed to write documentation is present. Therefore it is possible to maintain most of the documentation in the plain text format described above. To convert existing Yacas documentation back to the plain text format, a script book2txt.ys/book2txt.sh can be used.
By using a command such as
book2txt.sh file.chapt |
12:51pm scriabin> book2txt.sh intro.book [editvi.ys] [gnuplot.ys] True; Out> True; Quitting... File 'intro.book.txt' was created. 12:51pm scriabin> |
In the above example, the shell commands in book2txt.sh executed the following Yacas commands,
Use("book2txt.ys"); ToFile("file.chapt.txt") Load("file.chapt"); |
Of course, it is possible that some features of Yacas documentation were not implemented in the script and in that case the resulting file must be edited by hand. But the purpose of the book2txt script is exactly this: to make a plain text source file to be edited and maintained.
Several files can be converted at once, for example:
book2txt.sh f1.chapt f2.chapt file3.book |
As the Yacas build system is built on the GNU autotools suite, which contains both automake and autoconf, we will start with a short description of this package. Then we will turn to the various components of Yacas: the program itself, the script files, the documentation, the test suite, and so on.
As explained in the INSTALL file, building Yacas requires the following steps.
Both configure and make accept many options. Some of them are explained below.
The autotools suite consists of a number of utilities. These are developed separately, but are designed to be used together. They are
The users do not need to run automake and autoconf (here, "users" refers to the people who do not want to make any changes to Yacas and includes those who just want to compile Yacas from a tar.gz source archive). They do need the libtool script. But the libtool package is included in the Yacas distribution, so the users do not need to install libtool themselves.
Developers do need to install autoconf and automake on their systems. But they usually do not need to run these tools directly, as the Makefiles contain the necessary commands. When the Makefiles are not present, which occurs for instance when installing afresh from the CVS repository, the makemake script in the root of the Yacas source tree can (and probably should) be used to invoke automake and autoconf in the right order and with the correct flags.
In the following three sections, these utilities are briefly explained. In all cases, the reader is referred to the documentation included in the autotools suite for more information. Another useful source of information is GNU Autoconf, Automake, and Libtool by Gary V. Vaughan, Ben Elliston, Tom Tromey and Ian Lance, which is published by New Riders. An online version is available from http://sources.redhat.com/autobook .
The Makefile.am file contains the definition of certain macros that are used by automake. The rest of the Makefile.am file is copied verbatim to the generated Makefile.in file.
The most important macros which are used by automake are the so-called primaries. These list the files that make up the Yacas package. For instance, in the src directory, the file Makefile.am contains the following line
bin_PROGRAMS = yacas |
The bin prefix in the example above says that yacas should be installed in the binary directory, as determined by the configure script. By default, this is the directory /usr/local/bin. There are also prefixes for the other directories, as well as some prefixes with different meanings: the noinst prefix says that the specified file need not be installed, and the check prefix says that the file is only needed when testing.
There are also so-called associated variables. The same Makefile.am contains the following variables associated to the Yacas executable:
yacas_SOURCES = yacasmain.cpp commandline.cpp \ unixcommandline.cpp stdcommandline.cpp yacas_LDADD = libyacas.a libyacasplatform.a \ @NUMBERS_LIB@ @NUMBERS_LDFLAGS@ |
From the information contained in these lines, automake can construct the necessary commands to go in the final Makefile. This Makefile does not only support building, testing, and installing the package, but also rolling the tar-ball for release (use make dist for this, as explained in the section "Targets for make" below). In the above example, automake can figure out that yacasmain.cpp should be included in the distribution.
Unfortunately not everything is supported that well. For instance, Yacas comes with its own documentation system, which is of course not supported by automake. So we need to tell automake how to handle these files. To specify which files should be included in the distribution, the EXTRA_DIST variable can be used. The developer should list all files to be included in the distribution that automake does not know about here. If we want to run some commands at build or installation time, we can specify them by Makefile rules in the traditional ways. Just write the rules in Makefile.am and they will be copied verbatim in the generated Makefile. Please keep in mind that the rules should work on a wide variety of platforms in order to retain portability. In particular,
We currently assume automake version 1.4 or later. Note that version 1.5 breaks backward compatibility and should be avoided. Version 1.6 contains some useful additions, like the nobase prefix and the possibility to define new prefixes, so at a certain point we may require version 1.6.
For more information about automake, the reader is referred to the documentation that comes with the package.
The configure.in file consists of standard shell code, interspersed with special macros defined by the autoconf package. These can be recognized by the AC_ prefix.
As the configure.in file only rarely needs to be changed, we will only describe the autoconf tool by one example. As explained in the previous section, "The automake tool" , the symbols @NUMBERS_LIB@ and @NUMBERS_LDFLAGS@ are used in the Makefile.in to link a library with basic numerical routines into the Yacas executable. This gives the user the option to choose between two libraries: the GNU multi-precision arithmetic (GNU MP) library and a library provided by the Yacas team.
This effect is achieved by the following fragment of configure.in (for clarity, a simplified version is presented).
AC_ARG_WITH(numlib, [ --with-numlib=LIB ... ], \ with_numlib=$withval, \ with_numlib="native") case $with_numlib in native) NUMBERS_LIB="libyacasnumbers.la" NUMBERS_LDFLAGS="-lm" gmp) AC_CHECK_LIB(gmp, __gmpz_init, \ have_gmp=yes, have_gmp=no) if test "$have_gmp" = "no" ; then AC_MSG_ERROR([GNU MP library not found]) fi NUMBERS_LIB="libgmpnumbers.la" NUMBERS_LDFLAGS="-lgmp -lm" esac AC_SUBST(NUMBERS_LIB) AC_SUBST(NUMBERS_LDFLAGS) |
The first line tells the configure script to accept an extra option, --with-numlib=LIB. The shell variable with_numlib is set to the value LIB given by the user, or to native if the user does not specify this option on the command line.
If the shell variable with_numlib has the value native (which means that the user has either given the --with-numlib=native option to configure, or not used the option at all), then the NUMBERS_LIB and NUMBERS_LDFLAGS shell variables are set to libyacasnumbers.la and -lm respectively.
If on the other hand the --with-numlib=gmp option is passed to the configure script, then first it is checked that the GNU MP library is available -- if this is not the case, the configuration terminates with an error. Then the NUMBERS_LIB and NUMBERS_LDFLAGS shell variables are set to suitable values.
The last two lines say that the value of the NUMBERS_LIB and NUMBERS_LDFLAGS shell variables should be substituted for the @NUMBER_LIB@ and @NUMBER_LDFLAGS@ symbols respectively, in the Makefile.in.
This ends the brief description of autoconf. For more information, the reader is referred to the documentation that comes with the package.
We currently assume autoconf version 2.13 or later.
For the people that want to know everything, we will now explain the idea of libtool in a nutshell by a simple example: suppose that we want to build and install a library from the source file hello.c. The following commands do the trick, on any system:
libtool gcc -c hello.c libtool gcc -rpath /usr/local/lib -o \ libhello.la hello.lo libtool cp libtrim.la /usr/local/lib |
The first command builds a libtool object with the name hello.lo. This file encapsulates the hello.o object file. Subsequently, the libtool library with the name libhello.la is built, which encapsulates both the static and the shared library. The last command installs both the static and the shared library in the /usr/local/lib directory. Note that the GNU compiler gcc need not be installed on the system; libtool will call the native compiler with the correct switches if necessary.
The libtool distribution includes libltdl, the LibTool Dynamic Loading library. This library is used to support Yacas plugins.
A nice feature is the possibility to build in a different directory than the source directory by simply running configure from that directory. This not only prevents the source directory from being cluttered up by object files and so on, but it also enables the user to build for different architectures in different directories or to have the source in a read-only directory. For example, suppose that the Yacas source is installed under /mnt/cdrom/src/yacas and that you want to build Yacas under /tmp/build-yacas. This is achieved by the following commands
mkdir /tmp/build-yacas cd /tmp/build-yacas /mnt/cdrom/src/yacas/configure make |
A list of options accepted by the configure script can be retrieved by invoking it with the help option
./configure --help |
./configure --prefix=/usr |
We will not describe the common configure options which are shared by all packages, but will restrict ourselves to the options exclusively used by Yacas.
The following three options pertain to the extensive documentation that comes with Yacas. By default, only HTML documentation is generated.
Then there are three options describing where to install various parts of the Yacas package.
Furthermore, the opposites of the above options (eg. disable-server) are also recognized.
Conceptually, the build process consists of the following parts:
At installation time, not only the yacas executable is installed but also the static libraries and the header files. This is also to enable developers to embed Yacas in their own programs.
All script files and .def files are listed in Makefile.am. This definition is used to generate the packages.ys file, which contains a list of all .def files. The corefunctions.ys file is also automatically generated. This is done by running the gencorefunctions program, which resides in the src/ directory.
At installation time, all the script files and .def files that are listed in Makefile.am are copied. During this installation, the directory structure should be preserved.
The check target checks that all the script files and .def files listed in the Makefile.am are indeed present, and vice versa, that all files present in the scripts/ hierarchy are indeed listed.
The documentation is generated from plain text source files with the txt extension in the manmake directory. The source files are converted to Yacas code by the txt2yacasdoc.pl script. The result is a set of Yacas source files containing the text of the documentation. These Yacas source files are not well suited for human reading and must be further processed to obtain the books in HTML and PS/PDF formats.
To generate the books in HTML format, the Yacas source files are processed by the Yacas interpreter using the scripts manualmaker and indexmaker. The script indexmaker creates the file books.html with the top-level index to all HTML documentation books. The script manualmaker creates the HTML files with the book text (both the framed and the non-framed versions are written out). It is also possible to generate more than one HTML book at once. The two reference manuals are generated together to produce the full hyperlinked index to all commands.
To generate the books in PostScript and PDF format, the book2TeX.sh script is used. This script converts the Yacas source files to TeX files. They can then be processed by the standard TeX tools.
By default, only the HTML-formatted documents are generated. To change this, use the options disable-html-doc, enable-ps-doc and enable-pdf-doc in the configure script. The with-html-dir and with-ps-dir options can be used to tell where the documentation should be installed.
At the moment, the Makefile.am for the documentation is rather messy. For this reason, we do not describe it in detail. Instead, we just point out some special features:
The yts files in the tests directory contain the tests for different subsystems of Yacas. For instance, the file arithmetic.yts contains tests for the basic arithmetic functions; the first test is to check that 3+2 evaluates to 5. All the test files are listed in the TESTFILES variable. This variable can be overridden. For example, the command
make TESTFILES='comments.yts complex.yts' clean |
The shell script test-yacas does the actual testing. It runs the scripts listed in the TESTFILES variable through the Yacas executable. The tests that take a long time are at the end of the list, so that these tests are performed last. The output of the tests is collected in the file testresult.txt. The test is considered to be failed, if the exit code is nonzero (which happens for instance if Yacas crashes) or if either of the strings ****** (six stars) or Error appear in the output.
A special case is the last test, which goes through the problems put forward by Michael Wester (see the section M. Wester's CAS benchmark and Yacas ). The commands for this tests are not in a yts file in the tests directory, but are extracted from the documentation (see the immediately preceding section ).
Support for the library archive resides in the src/ directory. It contains the code for the compressor program. This program generates the library archive.
The archive is only built if the enable-archive option is passed to the configure script. In that case, the compressor program is built and run. At installation time, the generated archive is installed in the library directory (/usr/local/lib by default). There is also a test to check that the generated archive in fact works.
Plugins are almost the same as shared libraries, except that they are not linked when the Yacas executable is started, but at the user's request while Yacas is being run. The steps for building plugins are as follows. First we need to run Yacas on the .stub file. Then the generated C++ file is compiled, together with the files which contain the actual code for the functionality to be provided by the plugin. Finally, the resulting object files are linked. We need to add the linker flags -module -avoid-version -no-undefined in order to get a plugin. If the plugin uses any functions from other libraries, these libraries should be specified with the -L option. The plugin is installed in pkglib (which defaults to /usr/local/lib/yacas) when the user gives the make install command.
The following plugins are included with Yacas:
The executable is called proteusworksheet. It is only built if the build-proteus option was given to the configure script, and if the fltk development libraries are present. If this is the case, the executable is built by linking three libraries from the src directory with the object files in the proteus directory.
The install target installs the executable and some auxiliary data files that Proteus needs to function.
The build system does not perform any actions for the following components of Yacas.
One hallmark of a mature programming language is that it supports modules, and a way to define its interface while hiding the internals of the module. This section describes the mechanisms for doing so in the Yacas scripting language.
The following piece of code is a little example that demonstrates the problem:
SetExpand(fn_IsString) <-- [expand:=fn;]; ram(x_IsList)_(expand != "") <-- ramlocal(x); expand:=""; ramlocal(x) := Map(expand,{x}); |
This little bit of code defines a function ram that calls the function Map, passing the argument passed if it is a string, and if the function to be mapped was set with the SetExpand function. It contains the following flaws:
The above code can be entered into a file and loaded from the command line at leisure. Now, consider the following command line interaction after loading the file with the above code in it:
In> ramlocal(a) In function "Length" : bad argument number 1 (counting from 1) Argument matrix[1] evaluated to a In function call Length(a) CommandLine(1) : Argument is not a list |
We called ramlocal here, which should not have been allowed.
In> ram(a) Out> ram(a); |
The function ram checks that the correct arguments are passed in and that SetExpand was called, so it will not evaluate if these requirements are not met.
Here are some lines showing the functionality of this code as it was intended to be used:
In> SetExpand("Sin") Out> "Sin"; In> ram({1,2,3}) Out> {Sin(1),Sin(2),Sin(3)}; |
The following piece of code forces the functionality to break by passing in an expression containing the variable x, which is also used as a parameter name to ramlocal.
In> ram({a,b,c}) Out> {Sin(a),Sin(b),Sin(c)}; In> ram({x,y,z}) Out> {{Sin(x),Sin(y),Sin(z)},Sin(y),Sin(z)}; |
This result is obviously wrong, comparing it to the call above. The following shows that the global variable expand is exposed to its environment:
In> expand Out> "Sin"; |
LocalSymbols(x,expand,ramlocal) [ SetExpand(fn_IsString) <-- [expand:=fn;]; ram(x_IsList)_(expand != "") <-- ramlocal(x); expand:=""; ramlocal(x) := Map(expand,{x}); ]; |
This version of the same code declares the symbols x, expand and ramlocal to be local to this module.
With this the interaction becomes a little bit more predictable:
In> ramlocal(a) Out> ramlocal(a); In> ram(a) Out> ram(a); In> SetExpand("Sin") Out> "Sin"; In> ram({1,2,3}) Out> {Sin(1),Sin(2),Sin(3)}; In> ram({a,b,c}) Out> {Sin(a),Sin(b),Sin(c)}; In> ram({x,y,z}) Out> {Sin(x),Sin(y),Sin(z)}; In> expand Out> expand; |
A rigorous solution to this is to make all parameters to functions and global variables local symbols by default, but this might cause problems when this is not required, or even wanted, behaviour.
The system will never be able to second-guess which function calls can be exposed to the outside world, and which ones should stay local to the system. It also goes against a design rule of Yacas: everything is possible, but not obligatory. This is important at moments when functionality is not wanted, as it can be hard to disable functionality when the system does it automatically.
There are more caveats: if a local variable is made unique with LocalSymbols, other routines can not reach it by using the UnFence construct. This means that LocalSymbols is not always wanted.
Also, the entire expression on which the LocalSymbols command works is copied and modified before being evaluated, making loading time a little slower. This is not a big problem, because the speed hit is usually during calculations, not during loading, but it is best to keep this in mind and keep the code passed to LocalSymbols concise.
This part describes how the arithmetic library is embedded into Yacas and how to embed other arithmetic libraries.
The basic layout is as follows: there is one class BigNumber that offers basic numerical functions, arithmetic operations such as addition and multiplication, through a set of class methods. Integers and floating-point numbers are handled by the same class.
The BigNumber class delegates the actual arithmetic operations to the auxiliary classes BigInt and BigFloat. These two classes are direct wrappers of an underlying arithmetic library. The library implements a particular internal representation of numbers.
The responsibility of the class BigNumber is to perform precision tracking, floating-point formatting, error reporting, type checking and so on, while BigInt and BigFloat only concern themselves with low-level arithmetic operations on integer and floating-point numbers respectively. In this way Yacas isolates higher-level features like precision tracking from the lower-level arithmetic operations. The number objects in a library should only be able to convert themselves to/from a string and perform basic arithmetic. It should be easy to wrap a generic arithmetic library into a BigNumber implementation.
It is impossible to have several alternative number libraries operating at the same time. [In principle, one might write the classes BigInt and BigFloat as wrappers of two different arithmetic libraries, one for integers and the other for floats, but at any rate one cannot have two different libraries for integers at the same time.] Having several libraries in the same Yacas session does not seem to be very useful; it would also incur a lot of overhead because one would have to convert the numbers from one internal library representation to another. For performance benchmarking or for testing purposes one can compile separate versions of Yacas configured with different arithmetic libraries.
To embed an arbitrary-precision arithmetic library into Yacas, one needs to write two wrapper classes, BigInt and BigFloat. (Alternatively, one could write a full BigNumber wrapper class but that would result in code duplication unless the library happens to implement a large portion of the BigNumber API. There is already a reference implementation of BigNumber through BigInt and BigFloat in the file numbers.cpp.) The required API for the BigNumber class is described below.
// Calculate z=x+y where x=10 and y=15 BigNumber x("10",100,10); BigNumber y("15",100,10); BigNumber z; z.Add(x,y,10)); // cast the result to a string LispString str; z.ToString(str,10); |
A calculation might modify one of its arguments. This might happen when one argument passed in is actually the object performing the calculation itself. For example, if a calculation
x.Add(x,y); |
Therefore, all class methods of BigNumber that allow a BigNumber object as an argument should behave correctly when called destructively on the same BigNumber object. The result must be exactly the same as if all arguments were copied to temporary locations before performing tasks on them, with no other side-effects. For instance, if the specific object representing the number inside the numeric class is shared with other objects, it should not allow the destructive operation, as then other objects might start behaving differently.
The basic arithmetic class BigNumber defines some simple arithmetic operations, through which other more elaborate functions can be built. Particular implementations of the multiple-precision library are wrapped by the BigNumber class, and the rest of the Yacas core should only use the BigNumber API.
This API will not be completely exposed to Yacas scripts, because some of these functions are too low-level. Among the low-level functions, only those that are very useful for optimization will be available to the Yacas scripts. (For the functions that seem to be useful for Yacas, suggested Yacas bindings are given below.) But the full API will be available to C++ plugins, so that multiple-precision algorithms could be efficiently implemented when performance is critical. Intermediate-level arithmetic functions such as MathAdd, MathDiv, MathMod and so on could be implemented either in the Yacas core or in plugins, through this low-level API. The library scripts will be able to transform numerical expressions such as x:=y+z into calls of these intermediate-level functions.
Here we list the basic arithmetic operations that need to be implemented by a multiple-precision class BigNumber. The operations are divided into several categories for convenience. Equivalent Yacas script code is given, as well as examples of C++ usage.
x.SetTo("2.e-19", 100, 10); |
x.SetTo("2a8c.e2", 100, 16); |
x.SetTo(12345); y.SetTo(-0.001); |
x.ToString(buffer, 200, 16); // hexadecimal x.ToString(buffer, 40, 10); // decimal |
double a=x.Double(); |
x.SetTo(y); |
x.Equals(y)==true; |
MathEquals(x,y) |
y.Equals(x) |
x.IsInt()==true; |
x.IsIntValue()==true; |
FloatIsInt(x) |
x.BecomeInt(); x.BecomeFloat(); x.BecomeFloat(100); |
x.LessThan(y)==true; |
LessThan(x,y) |
x.Floor(y); |
MathFloor(x) |
prec=x.GetExactBits(); |
GetExactBits(x) |
x.SetExactBits(300); |
SetExactBits(x, 300) |
x.Add(y,z, 300); |
MathAdd(x,y) |
x.Negate(y); |
MathNegate(x) |
x.Multiply(y,z, 300); |
MathMultiply(x,y) |
x.Divide(y,z, 300); |
MathDivide(x,y) |
x.IsSmall()==true; |
MathIsSmall(x) |
x.MultiplyAdd(y,z, 300); |
MathMultiplyAdd(x,y,z) |
x.Mod(y,n); |
MathMod(x,n) |
int sign_of_x = x.Sign(); |
MathSign(x) |
For floating-point numbers, BitCount should return the binary exponent of the number (with sign), like the integer output of the standard C function frexp. More formally: if n=BitCount(x), and x!=0, then 1/2<=Abs(x)*2^(-n)<1. The bit count of an integer or a floating 0 is arbitrarily defined to be 1. (Optimization of the binary logarithm.) C++:
x.BitCount(); |
MathBitCount(x) |
x.ShiftLeft(y, n); x.ShiftRight(y, n); |
ShiftLeft(x,n); ShiftRight(x,n); |
x.BitAnd(y,z); x.BitOr(y,z); x.BitXor(y,z); x.BitNot(y); |
BitAnd(x,y); BitOr(y,z); BitXor(y,z); BitNot(y); |
The API includes only the most basic operations. All other mathematical functions such as GCD, power, logarithm, cosine and so on, can be efficiently implemented using this basic interface.
Note that generally the arithmetic functions will set the type of the resulting object to the type of the result of the operation. For example, operations that only apply to integers (Mod, BitAnd etc.) will set the type of the resulting object to integer if it is a float. The results of these operations on non-integer arguments are undefined.
In some arithmetic operations (add, multiply, divide) the working precision is given explicitly. For example,
x.Add(y,z,100) |
For the Yacas arithmetic library, a "poor man's interval arithmetic" is proposed where the precision is represented by the "number of correct bits". The precision is not tracked exactly but almost always adequately. The purpose of this kind of rough precision tracking is to catch a critical roundoff error or to indicate an unexpected loss of precision in numerical calculations.
Suppose we have two floating-point numbers x and y and we know that they have certain numbers of correct mantissa bits, say m and n. In other words, x is an approximation to an unknown real number x'=x*(1+delta) and we know that Abs(delta)<2^(-m); and similarly y'=y*(1+epsilon) with Abs(epsilon)<2^(-n). Here delta and epsilon are the relative errors for x and y. Typically delta and epsilon are much smaller than 1.
Suppose that every floating-point number knows the number of its correct digits. We can symbolically represent such numbers as pairs {x,m} or {y,n}. When we perform an arithmetic operation on numbers, we need to update the precision component as well.
Now we shall consider the basic arithmetic operations to see how the precision is updated.
More formally, we have the estimates Abs(delta)<2^(-m), Abs(epsilon)<2^(-n) and we need a similar estimate Abs(r)<2^(-p) for r=delta+epsilon.
If the two numbers x and y have the same number of correct bits, we should double the error (i.e. decrease the number of significant bits by 1). But if they don't have the same number of bits, we cannot really estimate the error very well. To be on the safe side, we might double the error if the numbers x and y have almost the same number of significant bits, and leave the error constant if the numbers of significant bits of x and y are very different.
The answer expressed as a formula is p=Min(m,n) if Abs(m-n)>=D and p=Min(m,n)-1 otherwise. Here D is a constant that expresses our tolerance for error. In the current implementation, D=1.
If one of the operands is a floating zero x={0.,m} (see below) and x={x,n}, then p=m-BitCount(x)+1. This is the same formula as above, if we pretend that the bit count of {0.,m} is equal to 1-m.
Formally, we have the relative precision r of x+y as
Note that there is one important case when we can estimate the precision better than this. Suppose x and y have the same sign; then there is no cancellation when we compute x+y. The above formula for r gives an estimate
If one of the operands is a floating zero represented by x={0.,m} (see below), then the calculation of the error is formally the same as in the case x={1.,m}. This is as if the bit count of {0.,m} were equal to 1 (unlike the case of multiplication).
Finally, if the sum x+y is a floating zero but x!=0 and y!=0, then it must be that a=b. In that case we represent x+y as {0.,p}, where p=Min(m,n)-a.
x.Add(y,z,100) |
We should truncate one or both of the arguments to a smaller precision before starting the operation. For the multiplication as well as for the addition, the precision tracking involves a comparison of two binary exponents 2^(-g) and 2^(-h) to obtain an estimate on 2^(-g)+2^(-h). Here g and h are some integers that are easy to obtain during the computation. For instance, the multiplication involves g=m and h=n. This comparison will immediately show which of the arguments dominates the error.
The ideal situation would be when one of these exponentials is much smaller than the other, but not very much smaller (that would be a waste of precision). In other words, we should aim for Abs(g-h)<8 or so, where 8 is the number of guard bits we would like to maintain. (Generally it is a good idea to have at least 8 guard bits; somewhat more guard bits do not slow down the calculation very much, but 200 guard bits would be surely an overkill.) Then the number that is much more precise than necessary can be truncated.
For example, if we find that g=250 and h=150, then we can safely truncate x to 160 bits or so; if, in addition, we need only 130 bits of final precision, then we could truncate both x and y to about 140 bits.
Note that when we need to subtract two almost equal numbers, there will be a necessary loss of precision, and it may be impossible to decide on the target precision before performing the subtraction. Therefore the subtraction will have to be performed using all available digits.
It is impossible to track the relative precision of a floating zero, but it is possible to track the absolute precision. Suppose we store the bit count of the absolute precision, just as we store the bit count of the relative precision with nonzero floats. Thus we represent a floating zero as a pair {0.,n} where n is an integer, and the meaning of this is a number between -2^(-n) and 2^(-n).
We can now perform some arithmetic operations on the floating zero. Addition and multiplication are handled similarly to the non-zero case, except that we interpret n as the absolute error rather than the relative error. This does not present any problems. For example, the error estimates for addition is the same as if we had a number 1 with relative error 2^(-n) instead of {0.,n}. With multiplication of {x,m} by {0.,n}, the result is again a floating zero {0.,p}, and the new estimate of absolute precision is p=n-BitCount(x)+1.
The division by the floating zero, negative powers, and the logarithm of the floating zero are not representable in our arithmetic because, interpreted as intervals, they would correspond to infinite ranges. The bit count of the floating zero is therefore undefined. However, we can define a positive power of the floating zero (the result is again a floating zero).
The sign of the floating zero is defined as (integer) 0. (Then we can quickly check whether a given number is a zero.)
Suppose that x=12.0 and y=12.00. Then in fact x might represent a number such as 12.01, while y might represent 11.999. There may be two approaches: first, "12.0" is not equal to "12.00" because x and y might represent different numbers. Second, "12.0" is equal to "12.00" because x and y might also represent equal numbers. A logical continuation of the first approach is that "12.0" is not even equal to another copy of "12.0" because they might represent different numbers, e.g. if we compute x=6.0+6.0 and y=24.0/2.0, the roundoff errors might be different.
Here is an illustration in support for the idea that the comparison 12.0=12 should return True. Suppose we are writing an algorithm for computing the power, x^y. This is much faster if y is an integer because we can use the binary squaring algorithm. So we need to detect whether y is an integer. Now suppose we are given x=13.3 and y=12.0. Clearly we should use the integer powering algorithm, even though technically y is a float. (To be sure, we should check that the integer powering algorithm generates enough significant digits.)
However, the opposite approach is also completely possible: no two floating-point numbers should be considered equal, except perhaps when one is a bit-for-bit exact copy of the other and when we haven't yet performed any arithmetic on them. (The GMP library uses essentially this definition.) It seems that no algorithm really needs a test for equality of floats. The two useful comparisons on floats x, y seem to be the following:
Given these predicates, it seems that any floating-point algorithm can be implemented just as efficiently as with any "reasonable" definition of the floating-point equality.
1) The number x stays the same but further calculations are done with 10 digits. In terms of the internal binary representation, the number is padded with binary zeros. This means that now e.g. 1+x will not be equal to 1.1 but to something like 1.100000381 (to 10 digits). And actually x itself should evaluate to 0.1000003815 now. This was 0.1 to 5 digits but it looks a little different if we print it to 10 digits. (A "binary-padded decimal".)
This problem may look horrible at first sight -- "how come I can't write 0.1 any more??" -- but this seems so because we are used to calculations in decimals with a fixed precision, and the operation such as "increase precision by 10 digits" is largely unfamiliar to us except in decimals. This seems to be mostly a cosmetic problem. In a real calculation, we shouldn't be writing "0.1" when we need an exact number 1/10. When we request to increase precision in the middle of a calculation, this mistake surfaces and gives unexpected results.
2) When precision is increased, the number x takes its decimal representation, pads it with zeros, and converts back to the internal representation, just so that the appearance of "1.100000381" does not jar our eyes. (Note that the number x does not become "more precise" if we pad it with decimal zeros instead of binary zeros, unless we made a mistake and wrote "0.1" instead an exact fraction 1/10.)
With this approach, each number x that doesn't currently have enough digits must change in a complicated way. This will mean a performance hit in all calculations that require dynamically changing precision (Newton's method and some other fast algorithms require this). In these calculations, the roundoff error introduced by "1.100000381" is automatically compensated and the algorithm will work equally well no matter how we extend x to more digits; but it's a lot slower to go through the decimal representation every time.
3) The approach currently being implemented in Yacas is a compromise between the above two. We distinguish number objects that were given by the user as decimal strings (and not as results of calculations), for instance x:=1.2, from number objects that are results of calculations, for instance y:=1.2*1.4. Objects of the first kind are interpreted as exact rational numbers given by a decimal fraction, while objects of the second kind are interpreted as inexact floating-point numbers known to a limited precision. Suppose x and y are first assigned as indicated, with the precision of 5 digits each, then the precision is increased to 10 digits and x and y are used in some calculation. At this point x will be converted from the string representation "1.2" to 10 decimal digits, effectively making 1.2 a shorthand for 1.200000000. But the value of y will be binary-padded in some efficient way and may be different from 1.680000000.
In this way, efficiency is not lost (there are no repeated conversions from binary to decimal and back), and yet the cosmetic problem of binary-padded decimals does not appear. An explicitly given decimal string such as "1.2" is interpreted as a shorthand for 1.2000... with as many zeroes as needed for any currently selected precision. But numbers that are results of arithmetic operations are not converted back to a decimal representation for zero-padding. Here are some example calculations:
In> Precision(5) Out> True In> x:=1.2 Out> 1.2 In> y:=N(1/3) Out> 0.33333 |
In> Precision(20) Out> True In> y Out> 0.33333 |
In> y+0 Out> 0.33333333325572311878 |
In> x+0 Out> 1.2 |
In> z:=y+0 Out> 0.33333333325572311878 In> Precision(40) Out> True In> z Out> 0.33333333325572311878 |
In> z+0 Out> 0.33333333325572311878204345703125 |
Suppose we have floating-point numbers x and y, known only to 2 and 3 significant digits respectively. For example, x=1.6 and y=2.00. These x and y are results of previous calculations and we do not have any more digits than this. If we now say
Precision(10); x*y; |
The first interpretation of Precision() is only possible to satisfy if we are given a self-contained calculation with integer numbers. For example,
N(Sin(Sqrt(3/2)-10^(20)), 50) |
So it seems that the second interpretation of Precision(n), namely: "please use n digits in all calculations now", is more sensible as a general-purpose prescription.
But this interpretation does not mean that all numbers will be printed with n digits. Let's look at a particular case (for simplicity we are talking about decimal digits but in the implementation they will be binary digits). Suppose we have x precise to 10 digits and y precise to 20 digits, and the user says Precision(50) and z:=1.4+x*y. What happens now in this calculation? (Assume that x and y are small numbers of order 1; the other cases are similar.)
First, the number "1.4" is now interpreted as being precise to 50 digits, i.e. "1.4000000...0" but not more than 50 digits.
Then we compute x*y using their internal representations. The result is good only to 10 digits, and it knows this. We do not compute 50 digits of the product x*y, it would be pointless and a waste of time.
Then we add x*y to 1.4000...0. The sum, however, will be precise only to 10 digits. We can do one of the two things now: (a) we could pad x*y with 40 more zero digits and obtain a 50-digit result. However, this result will only be correct to 10 digits. (b) we could truncate 1.4 to 10 digits (1.400000000) and obtain the sum to 10 digits. In both cases the result will "know" that it only has 10 correct digits.
It seems that the option (b) is better because we do not waste time with extra digits.
The result is a number that is precise to 10 digits. However, the user wants to see this result with 50 digits. Even if we chose the option (a), we would have had some bogus digits, in effect, 40 digits of somewhat random round-off error. Should we print 10 correct digits and 40 bogus digits? It seems better to print only 10 correct digits in this case. The GMP library already has this functionality in its string printing functions: it does not print more digits than the number actually holds.
If we choose this route, then the only effect of Precision(50) will be to interpret a literal constant 1.4 as a 50-digit number. All other numbers already know their real precision and will not invent any bogus digits.
In some calculations, however, we do want to explicitly extend the precision of a number to some more digits. For example, in Newton's method we are given a first approximation x[0] to a root of f(x)=0 and we want to have more digits of that root. Then we need to pad x[0] with some more digits and re-evaluate f(x[0]) to more digits (this is needed to get a better approximation to the root). This padding operation seems rather special and directed at a particular number, not at all numbers at once. For example, if f(x) itself contains some floating-point numbers, then we should be unable to evaluate it with higher precision than allowed by the precision of these numbers. So it seems that we need access to these two low-level operations: the padding and the query of current precision. The proposed interface is GetExactBits(x) and SetExactBits(x,n). These operations are directed at a particular number object x.
IsInteger(0) =True IsIntValue(1.)=True IsInteger(1.) =False |
x:=1.33; y:=x/3; |
We give formulae for p in terms of x, y, m, and n. Sometimes the bit count of a number x is needed; it is denoted B(x) for brevity.
The bit count B(x) is an integer function of x defined for real x!=0 by
The bit count function can be usually computed in constant time because the usual representation of long numbers is by arrays of platform integers and a binary exponent. The length of the array of digits is usually available at no computational cost.
The absolute error Delta[x] of {x,n} is of order Abs(x)*2^(-n). Given the bit count of x, this can be estimated from as
From the definition of {x,n} with x!=0 it follows that 0 can be within the precision interval only if n<= -1. Therefore, we should transform any number {x,n} such that x!=0 and n<= -1 into a floating zero {0.,p} where
First, we can quickly check that the values x and y have the same nonzero signs and the same bit counts, B(x)=B(y). If x>0 and y<0 or vice versa, or if B(x)=B(y), then the two numbers are definitely unequal. We can also check whether both x=y=0; if this is the case, then we know that {x,m}={y,n} because any two zeros are equal.
However, a floating zero can be sometimes equal to a nonzero number. So we should now exclude this possibility: {0.,m}={y,n} if and only if Abs(y)<2^(-m). This condition is equivalent to
If these checks do not provide the answer, the only possibility left is when x!=0 and y!=0 and B(x)=B(y).
Now we can consider two cases: (1) both x and y are floats, (2) one is a float and the other is an integer.
In the first case, {x,m}={y,n} if and only if the following condition holds:
It is now necessary to compute x-y (one long addition); this computation needs to be done with Min(m,n) bits of precision.
After computing x-y, we can avoid the full evaluation of the complicated condition by first checking some easier conditions on x-y. If x-y=0 as floating-point numbers ("exact cancellation"), then certainly {x,m}={y,n}. Otherwise we can assume that x-y!=0 and check:
If neither of these conditions can give us the answer, we have to evaluate the full condition by computing Abs(x)*2^(-m) and Abs(x)*2^(-m) and comparing with Abs(x-y).
In the second case, one of the numbers is an integer x and the other is a float {y,n}. Then x={y,n} if and only if
This procedure is basically the same as comparing {x,n} with Floor(x).
First consider the case when x+y!=0.
If x is zero, i.e. {0.,m} (but x+y!=0), then the situation with precision is the same as if x were {1.,m}, because then the relative precision is equal to the absolute precision. In that case we take the bit count of x as B(0)=1 and proceed by the same route.
First, we should decide whether it is necessary to add the given numbers. It may be unnecessary if e.g. x+y<=>x within precision of x (we may say that a "total underflow" occurred during addition). To check for this, we need to estimate the absolute errors of x and y:
Suppose none of these checks were successful. Now, the float value z=x+y needs to be calculated. To find it, we need the target precision of only
Then we compute B(z) and determine the precision p as
In the case when x and y have the same sign, we have a potentially better estimate p=Min(m,n). We should take this value if it is larger than the value obtained from the above formula.
Also, the above formula is underestimating the precision of the result by 1 bit if the result and the absolute error are dominated by one of the summands. In this case the absolute error should be unchanged save for the Dist term, i.e. the above formula needs to be incremented by 1. The condition for this is B(x)>B(y) and B(x)-m>B(y)-n, or the same for y instead of x.
The result is now {z,p}.
Note that the obtained value of p may be negative (total underflow) even though we have first checked for underflow. In that case, we need to transform {z,p} into a floating zero, as usual.
Now consider the case when z:=x+y=0.
This is only possible when B(x)=B(y). Then the result is {0.,p} where p is found as
If the addition needs to be performed with a given maximum precision P, and it turns out that p>P, then we may truncate the final result to P digits and set its precision to P instead. (It is advisable to leave a few bits untruncated as guard bits.) However, the first operation z:=x+y must be performed with the precision specified above, or else we run the danger of losing significant digits of z.
It is enough to convert the integer to a float {x,m} with a certain finite precision m and then follow the general procedure for adding floats. The precision m must be large enough so that the absolute error of {x,m} is smaller than the absolute error of {y,n}: B(x)-m<=B(y)-n-1, hence
Sometimes the formula gives a negative value for the minimum m; this means underflow while adding the integer (e.g. adding 1 to 1.11e150). In this case we do not need to perform any addition at all.
First consider the case when x!=0 and y!=0. The resulting value is z=x*y and the precision is
If one of the numbers is an integer x, and the other is a float {y,n}, it is enough to convert x to a float with somewhat more than n bits, e.g. {x,n+3}, so that the Dist function does not decrement the precision of the result.
Now consider the case when {x,m}={0,m} but y!=0. The result z=0 and the resulting precision is
Finally, consider the case when {x,m}={0,m} and {y,n}={0,n}. The result z=0 and the resulting precision is
The last two formulae are the same if we defined the bit count of {0.,m} as 1-m. This differs from the "standard" definition of B(0)=1. (The "standard" definition is convenient for the handling of addition.) With this non-standard definition, we may use the unified formula
If the multiplication needs to be performed to a given target precision P which is larger than the estimate p, then we can save time by truncating both operands to P digits before performing the multiplication. (It is advisable to leave a few bits untruncated as guard bits.)
When x=0 and y!=0, the result of division {0.,m}/{y,n} is a floating zero {0.,p} where p=m+B(y)-1. When x is an integer zero, the result is also an integer zero.
Division by an integer zero or by a floating zero is not permitted. The code should signal a zero division error.
If the number {x,n} is nonzero, then only x changes by shifting but n does not change; if {x,n} is a floating zero, then x does not change and n is decremented (ShiftLeft) or incremented (ShiftRight) by the shift amount:
{x, n} << s = {x<<s, n}; {0.,n} << s = {0., n-s}; {x, n} >> s = {x>>s, n}; {0.,n} >> s = {0., n+s}; |
We could have implemented the exponent N as a big integer but this would be inefficient most of the time, slowing down the calculations. Arithmetic with floating-point numbers requires only very simple operations on their exponents (basically, addition and comparisons). These operations would be dominated by the overhead of dealing with big integers, compared with platform integers.
A known issue with limited exponents is the floating-point overflow and exponent underflow. (This is not the same underflow as with adding 1 to a very small number.) When the exponent becomes too large to be represented by a platform signed integer type, the code must signal an overflow error (e.g. if the exponent is above 2^31) or an underflow error (e.g. if the exponent is negative and below -2^31).
First, we recognize that we shall only have one particular numerical library linked with Yacas, and we do not have to compile our implementation of the square root if this library already contains a good implementation. We can use conditional compilation directives (#ifdef) to exclude our square root code and to insert a library wrapper instead. This scheme could be automated, so that appropriate #defines are automatically created for all functions that are already available in the given multiple-precision library, and the corresponding Yacas kernel code that uses the BigNumber API is automatically replaced by library wrappers.
Second, we might compile the library wrapper as a plugin, replacing the script-level square root function with a plugin-supplied function. This solution is easier in some ways because it doesn't require any changes to the Yacas core, only to the script library. However, the library wrapper will only be available to the Yacas scripts and not to the Yacas core functions. The basic assumption of the plugin architecture is that plugins can provide new external objects and functions to the scripts, but plugins cannot modify anything in the kernel. So plugins can replace a function defined in the scripts, but cannot replace a kernel function. Suppose that some other function, such as a computation of the elliptic integral which heavily uses the square root, were implemented in the core using the BigNumber API. Then it will not be able to use the square root function supplied by the plugin because it has been already compiled into the Yacas kernel.
Third, we might put all functions that use the basic API (MathSqrt, MathSin etc.) into the script library and not into the Yacas kernel. When Yacas is compiled with a particular numerical library, the functions available from the library will also be compiled as the kernel versions of MathSqrt, MathPower and so on (using conditional compilation or configured at build time). Since Yacas tries to call the kernel functions before the script library functions, the available kernel versions of MathSqrt etc. will supersede the script versions, but other functions such as BesselJ will be used from the script library. The only drawback of this scheme is that a plugin will not be able to use the faster versions of the functions, unless the plugin was compiled specifically with the requirement of the particular numerical library.
So it appears that either the first or the third solution is viable.
Suppose that the mantissa of a floating-point number is known to d decimal digits. It means that the relative error is no more than 0.5*10^(-d). The mantissa is represented internally as a binary number. The number b of precise bits of mantissa should be determined from the equation 10^(-d)=2^(-b), which gives b=d*Ln(10)/Ln(2).
One potential problem with the conversions is that of incorrect rounding. It is impossible to represent d decimal digits by some exact number b of binary bits. Therefore the actual value of b must be a little different from the theoretical one. Then suppose we perform the inverse operation on b to obtain the corresponding number of precise decimal digits; there is a danger that we shall obtain a number d' that is different from d.
To avoid this danger, the following trick is used. The binary base 2 is the least of all possible bases, so successive powers of 2 are more frequent than successive powers of 10 or of any other base. Therefore for any power of 10 there will be a unique power of 2 that is the first one above it.
The recipe to obtain this power of 2 is simple: one should round d*Ln(10)/Ln(2) upwards using the Ceil function, but b*Ln(2)/Ln(10) should be rounded downwards using the Floor function.
This procedure will make sure that the number of bits b is high enough to represent all information in the d decimal digits; at the same time, the number d will be correctly restored from b. So when a user requests d decimal digits of precision, Yacas may simply compute the corresponding value of b and store it. The precision of b digits is enough to hold the required information, and the precision d can be easily computed given b.
Higher up, Yacas only knows about objects derived from LispObject. Specifically, there are objects of class LispAtom which represent an atom.
Symbolic and string atoms are uniquely represented by the result returned by the String() method. For number atoms, there is a separate class, LispNumber. Objects of class LispNumber also have a String() method in case a string representation of a number is needed, but the main uniquely identifying piece of information is the object of class BigNumber stored inside a LispNumber object. This object is accessed using the Number() method of class LispNumber.
The life cycle of a LispNumber is as follows:
In order to fully support the LispNumber object, the function in the kernel that determines if two objects are the same needs to know about LispNumber. This is required to get valid behaviour. Pattern matching for instance uses comparisons of this type, so comparisons are performed often and need to be efficient.
The other functions working on numbers can, in principle, call the String() method, but that induces conversions from BigNumber to string, which are relatively expensive operations. For efficiency reasons, the functions dealing with numeric input should call the Number() method, operate on the BigNumber returned, and return a LispNumber constructed with a BigNumber. A function can call String() and return a LispNumber constructed with a string representation, but it will be less efficient.
When a BigNumber is constructed from a decimal string, one has to specify a desired precision (in decimal digits). Internally, BigNumber objects store numbers in binary and will allocate enough bits to cover the desired precision. However, if the given string has more digits than the given precision, the BigNumber object will not truncate it but will allocate more bits so that the information given in the decimal string is not lost. If later the string representation of the BigNumber object is requested, the produced string will match the string from which the BigNumber object was created.
Internally, the BigNumber object knows how many precise bits it has. The number of precise digits might be greater than the currently requested precision. But a truncation of precision will only occur when performing arithmetic operations. This behavior is desired, for example:
In> Precision(6) Out> True; In> x:=1.23456789 Out> 1.23456789; In> x+1.111 Out> 2.345567; In> x Out> 1.23456789; |
This behavior is implemented by storing the string representation "1.23456789" in the LispNumber object x. When an arithmetic calculation such as x+1.111 is requested, the Number method is called on x. This method, when called for the first time, converts the string representation into a BigNumber object. That BigNumber object will have 28 bits to cover the 9 significant digits of the number, not the 19 bits normally required for 6 decimal digits of precision. But the result of an arithmetic calculation is not computed with more than 6 decimal digits. Later when x needs to be printed, the full string representation is available so it is printed.
If now we increase precision to 20 digits, the object x will be interpreted as 1.23456789 with 12 zeros at the end.
In> Precision(20) Out> True; In> x+0.000000000001 Out> 1.234567890001; |
The advantages of this approach are:
The disadvantages are:
The Yacas system comes with a compiler that can compile scripts to C++ files. To invoke, one can call
CompileCpp(scriptfile,pluginname); |
while in Yacas. scriptfile needs to be a full path to a script file, and pluginname is the base name for the plugin. This function creates a file $(pluginname).cpp, a C++ file that can be compiled into a plugin.
For the developer, however, it is a nuissance to have to compile the scripts each time something is changed.
When working on the scripts that need to be compiled it is better if the scripts get loaded in favor of the plugins. To achieve this a programmer can create his own initialization script, in which he can store:
Set(LoadPlugIns,False); Use("yacasinit.ys"); |
and pass this file as argument to the --init command line flag.
The calling sequence for Defun is:
Defun(FunctionName,Arguments)Body; |
For example, Defun can be used in infix grammar script as:
Defun("add",{x,y}) MathAdd(x,y); |
Defun is defined entirely in terms of functions supported by the kernel, so it can be the very first function defined in the initialization script. As such it can be defined even before plugins get loaded.
There is one additional limitation: Compiled code can currently only call functions with a fixed number of arguments for which it already knows the function prototype (it needs to know that it is a function with fixed number of arguments, and the number of arguments). This includes the functions in the kernel, and the scripts and plugins loaded before this script or plugin was loaded. If a function needs to be called that is defined later in the script or defined in another script that is compiled, a forward-declaration of the function can be made through:
ForwardDeclareFixedFunction(name,nrArguments); |
Functions defined in script (or either in script or compiled), or functions with variable number of arguments, or macros, can be called by using the function ApplyPure. For instance, to invoke addition as defined in the scripts one could call
ApplyPure("+",{a,b}); |
Unfencing a routine does not work in compiled mode (currently). However, the preferred method would be to use macros here (which also don't work yet). Macros can be inlined and thus optimized some more.
Also constructs using LocalSymbols don't yet work, although this should be fixed soon. LocalSymbols is the mechanism used in Yacas to limit access to resources.
Global variables currently do not work yet (one can not use a global variable in a compiled script yet, it returns itself unevaluated).
Also, groups of transformation rules can currently not be converted to compiled code. This will only be a problem in cases where the transformation rules dominate performance. In all other cases one can write:
f(x_IsNumber,y_IsNumber) <-- fcompiled(x,y); |
with fcompiled defined in some script that gets compiled.
Other than that, all functions with fixed number of arguments in the kernel are supported, and some macros and functions or macros with variable number of arguments.
This will be extended over time.
There is specific code in the compiler that can handle compilation of specific transformation rules (in order to make it easier for the people writing the code that gets compiled). One example is an expression - n where n is a number. Officially, this would have to go through the transformation rules, where one rule would determine that this can be replaced with the evaluation of MathNegate(n). The compiler will make this transformation.
TrapError(DllLoad("libmath"),Use("base.rep/math.ys")); |
What this does is it tries to load the plugin with name libmath, and if this fails it tries to load the script base.rep/math.ys instead. So until the plugin exists Yacas uses the script file that is used to build the plugin instead.
The first time the Yacas executable is built the plugins are not available. To compile the plugins Yacas itself is needed. Being able to use the scripts in stead of the plugins solves this chicken-and-egg problem. At first there are the scripts, and Yacas will be a little bit slower, but it will work, and can be used to compile the scripts to plugins, after which it will be faster.
The scripts that get compiled in are grouped in the base.rep/ directory in the scripts/ directory.
For a function that accepts n arguments the following n slots on the stack contain the values of the arguments (which have already been evaluated). Thus for a function that accepts 2 arguments, stack[sp] is the slot for the return value, and stack[sp+1] and stack[sp+2] contain the two arguments to the function.
The routine that called this function is responsible for popping the arguments off of the stack. This model fits nicely with the evaluation order. Before a function gets evaluated (applied to its arguments), the arguments get evaluated first. Evaluating each argument leaves a new result on the stack. Then the function is applied to this set of arguments.
The called routine can infer the number of arguments on the stack because the stack also knows its size, stack.size.
The result slot is referred to with RESULT, and the arguments are referred to through ARGUMENT(i) where i is a value between 1 and the number of arguments n.
If m registers are needed, they can then be referred to through ARGUMENT(n+i) where n is the number of arguments, and i has a value between 1 and m.
Currently a separate register is created for each local variable declared through Local(...). This is done at the beginning of the execution of the routine. The necessary amount of slots is pushed onto the stack for use as a register. At the end the function needs to pop them again, because the caller does not know about the registers the function uses.
The stack starting at stack[sp+n+m+1] is used to pass arguments to other functions called from within this function.
VmFunction(name) |
where name represents the calling name.
At the end, a function is closed through
VmFunctionEnd() |
Objects can be pushed onto a stack through
VmPushNulls(nr) VmPush(register) VmPushConstant(constant) |
When pushing an object onto the stack, it is stored in the slot stack[stack.size], and stack.size is incremented by one afterwards. Values that can be pushed on the stack are:
Objects can be popped from the stack through
VmPop(i) |
When this instruction is invoked, i arguments are removed from the stack.
Registers can be initialized through
VmInitRegister(register,constant) |
Where register can have the form ARGUMENT(i) or RESULT. This is usually called at the place where the local variable got declared (through the Local(...) macro). Variables that have no value evaluate to themselves. This can be simulated by setting the variable to a constant.
After some evaluation has taken place, stack[stack.size-1] usually contains the result of the calculation. This result can be stored in a register through
VmSetRegister(register) |
where again register usually has the form ARGUMENT(i). This operation does not pop the value off the stack, it just copies it to the register. register could in principle also be RESULT.
At the end of a function for instance the result of the last calculation is usually stored in the result slot on the stack through a call VmSetRegister(RESULT).
In principle all sorts of optimizations would be possible by smart re-use of the slots RESULT and ARGUMENT(i) on the stack at places where they are not needed. The values stored in the arguments slot could be replaced with other values, and combined with a jump statement that jumps to the beginning of the routine. Calculating a factorial could be implemented for instance without needing extra registers. The RESULT slot could keep the current result, while ARGUMENT(1) gets replaced with ARGUMENT(1)-1 and the routine terminates if ARGUMENT(1) equals one.
Control flow can be controlled through labels and (conditional) jumps to these labels. A label is defined through
VmLabel(label) |
where label is some unique name. One can jump to that label from any other part of the routine through
VmJump(label) |
In addition, one can do a conditional jump with
VmJumpIfTrue(label) VmJumpIfFalse(label) |
These instructions perform a jump if the value stored in stack[stack.size-1] is True or False respectively. It doesn't pop the value from the stack.
Calling a function (after its arguments have been put on the stack) can be done with
VmCall(fname,nrArgs) |
where fname is the calling name of the routine and nrArgs the number of arguments that were pushed.
The instruction set has one instruction to build a list from elements currently on the stack. The procedure to use this would start by pushing a constant, List, onto the stack, and then evaluating n expressions. The instruction to combine this into a list is
VmConsList(n) |
it combines the n arguments on the stack, together with the initial constant List into a list, and stores it in the slot where List was initially stored. This operation is expensive on stack use but it is intended to be used for short lists any way.
Defun("fac",{n}) [ If(Equals(n,1),1,MathMultiply(n,fac(MathAdd(n,-1)))); ]; |
results in
VmFunction(fac) VmPushNulls(1) VmPush(1) VmPushConstant(0) VmCall(LispEquals, 2 ) VmPop(2 ) VmJumpIfFalse(C20 ) VmPop(1) VmPushConstant(0) VmJump(C21 ) VmLabel(C20 ) VmPop(1) VmPushNulls(1) VmPush(1) VmPushNulls(2) VmPush(1) VmPushConstant(1) VmCall(LispAdd, 2 ) VmPop(2 ) VmCall(Compiled_fac, 1 ) VmPop(1 ) VmCall(LispMultiply, 2 ) VmPop(2 ) VmLabel(C21 ) VmSetRegister(0) VmPop(1 ) VmFunctionEnd() |
Only the indented lines generate instructions, so this would be 23 instructions.
On the other end of the spectrum there is the byte code compiler. A conversion to byte code, for execution through a virtual machine, could also be performed. Code could then be compiled dynamically while Yacas is running, platform-independently, at a small performance cost. The byte code loader would have to handle preparing the constants at load time, and linking of function calls. One extra advantage is the compactness of the generated byte code.
If size is important, a mix of C++ with a byte code interpreter is possible. This leaves linking to the actual C++ linker. One has a C++ file which can be compiled into a plugin, but which defines the function through a byte array with the byte code. This would come at a performance cost, since interpreting the byte code would slow the system down.
The big difference, in this respect, to the other systems is the programming language the code is written in. All these algorithms will have to be re-implemented simply because Yacas runs on top of the Yacas scripting language, which is different from the languages in which the larger systems are implemented.
Another difference is of course that functions (algorithms) defined in these systems are built on top of other functions defined in these languages. It would thus at first seem that one can not simply take a small part of a system without taking the entire rest of the system with it.
This chapter addresses the first problem, the one where the algorithms are already defined in other languages, but with a different syntax. A simple scheme is presented here that will hopefully allow conversion of algorithms written in a certain syntax to the Yacas syntax. The aim is to preserve the structure of the original files as much as possible. Comments should thus be left untouched. Because of this, standard parsing and compilation techniques are not fully usable, as these schemes usually throw away comments from code parsed in as soon as possible, and the intermediate data structures don't offer a place for storing comments with sub-expressions.
The potential is huge; hundreds of man-years of research and development went into the other systems, and it would thus be a big boost to Yacas to be able to use these facilities. For this the licensing issues with these systems need to be resolved too.
This document will end with a discussion of the second problem; being able to take over parts of other systems as opposed to having to take over the entire system.
There are two steps to converting code so that it can be used in another system:
Converting the syntax is usually a large and cumbersome task which, as this document will try to show, can easily be explained to a computer. The second step, changing the semantics, might require hand-work.
The general steps taken by the interpreter are as follows:
Converting to another programming language is a difficult task, and the author is not aware of other projects that succeeded in this arena. There are alternative approaches; for instance one could simply write a parser that converts to an internal format, and write a layer of macros to emulate the other programming language. However, the chances are that some handwork will be required to get the code in the desirable form. In this case an initial conversion into a form that is accepted by the normal parser of the programming language already aids a lot.
In general it should only be necessary to load the [language][operation].ys file. This file should include the file [language]reader.ys which in turn should include the file codetransform.ys.
The second step is to convert the code to the native Yacas coding standards. There are two things to consider; the general coding standards of the entire system (including the use of the facilities provided by Common Lisp), and the coding standards used by isolated programmers. This is an issue since lots of programmers have worked on these large systems, in general. For this it might be necessary to write specific optimizers for specific routines, as opposed to optimizers for the entire system. The alternative is of course to optimize by hand.
The module genericparser.ys defines some functions that allow definition of a grammar. The algorithm implemented is a recursive-descent parser. This parser can be used with the tokenizer implemented in the rest of the code (that is, the code maintaining indices into the arrays of tokens).
The code is not concerned about whether the grammar is ambiguous per se. Interaction between the tokenizer and the parser might be necessary for some languages (like C++), if the parser is to read preprocessor directives and template definitions correctly (the tokenizer can temporarily switch to another set of definitions of tokens).
The top-level routine to call is:
MatchGrammar(grammar); |
Where a grammar is defined as follows. A grammar is either:
A sentence has the form:
{Tag,grammar_1,grammar_2,...,grammar_n} |
The result returned will be a list with Tag as the first element (so that pattern matchers can recognize it later), and one element for each grammar matched.
Grammars can be defined with the DefineGrammar macro. The syntax is:
DefineGrammar(name,grammar); |
where name is an atom, and other grammars can refer to this grammar using this name, and grammar is the grammar to recognize.
As a very simple example, consider the following definition of a grammar, defined in examplegrammar1.ys:
Use("codecheck.rep/codetransform.ys"); Use("codecheck.rep/genericparser.ys"); |
Expr'Tokens := { {"^[\\s]+",Spaces}, {"^//.*?\\n",Comments}, {"^[\\(\\)]",Brackets}, {"^[0-9]+",Atoms}, {"^[\\+\\-\\*]",Operators}, {"^[\\;]",Delimiters}, {"^[a-zA-Z_][a-zA-Z0-9_]*",Atoms}, {"^.+",UnParsed} }; DefineGrammar(statement, { {Statement,expression,";"} }); DefineGrammar(expression, { {Expr2,term,"+",expression}, {Expr3,term,"-",expression}, {Expr4,term} }); DefineGrammar(term, { {Term5,factor,"*",term}, {Term6,factor}, }); DefineGrammar(factor, { {SubExpression,"(",expression,")"}, {SimpleAtom,AcceptToken(Atoms)} }); |
100 # Expr'Test'TokenMap(_rest) <-- rest; Expr'Test'PreProcess():= [ ]; Expr'Test'PostProcess():= [ Local(nrInputTokens); nrInputTokens:=Length(CodeTrans'input); tokenIndex:=0; NextToken(); While(tokenIndex < nrInputTokens) [ Local(expression); expression := MatchGrammar(statement); MakePostFix(GetInputTokens(expression)); ]; ]; 10 # MakePostFix({Statement,_expression,";"} ) <-- [ ConvertExpression(expression); Echo(";"); ]; 10 # ConvertExpression({Expr2,_term,"+",_expression}) <-- [ ConvertExpression(term); Space(1); ConvertExpression(expression); Space(1); WriteString("+"); ]; 10 # ConvertExpression({Expr3,_term,"-",_expression}) <-- [ ConvertExpression(term); Space(1); ConvertExpression(expression); Space(1); WriteString("-"); ]; 10 # ConvertExpression({Expr4,_term}) <-- ConvertExpression(term); 10 # ConvertExpression({Term5,_factor,"*",_term}) <-- [ ConvertExpression(factor); Space(1); ConvertExpression(term); Space(1); WriteString("*"); ]; 10 # ConvertExpression({Term6,_factor}) <-- ConvertExpression(factor); 10 # ConvertExpression({SimpleAtom,_atom}) <-- WriteString(atom); 10 # ConvertExpression({SubExpression,"(",_expression,")"}) <-- ConvertExpression(expression); 100 # ConvertExpression(_rest) <-- Echo("UNHANDLED! ",rest); |
This code defines a simple grammar with addition, subtraction and multiplication. Statements are expressions that are terminated by a ";". Brackets can be used to define a sub-expression. The code further proceeds to convert the input infix expressions to postfix, as the following example interaction will show:
In> Load("examplegrammar1.ys") Out> True; In> FromString("A+B*2;")CodeTransform("Expr","Test") A B 2 * +; A+B*2;Out> True; In> FromString("(A+B)*2;")CodeTransform("Expr","Test") A B + 2 *; (A+B)*2;Out> True; |
This example also clearly shows how transformation rules would map naturally to the lists returned by the parser. The variable parts of the sentences can just be matched to variables.
The function also outputs the original input, which has not been transformed. This facility would allow converting the code in place.
59 Temple Place, Suite 330 Boston, MA, 02111-1307 USA |
Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed.
This License is a kind of ``copyleft'', which means that derivative works of the document must themselves be free in the same sense. It complements the GNU General Public License, which is a copyleft license designed for free software.
We have designed this License in order to use it for manuals for free software, because free software needs free documentation: a free program should come with manuals providing the same freedoms that the software does. But this License is not limited to software manuals; it can be used for any textual work, regardless of subject matter or whether it is published as a printed book. We recommend this License principally for works whose purpose is instruction or reference.
A ``Modified Version'' of the Document means any work containing the Document or a portion of it, either copied verbatim, or with modifications and/or translated into another language.
A ``Secondary Section'' is a named appendix or a front-matter section of the Document that deals exclusively with the relationship of the publishers or authors of the Document to the Document's overall subject (or to related matters) and contains nothing that could fall directly within that overall subject. (For example, if the Document is in part a textbook of mathematics, a Secondary Section may not explain any mathematics.) The relationship could be a matter of historical connection with the subject or with related matters, or of legal, commercial, philosophical, ethical or political position regarding them.
The ``Invariant Sections'' are certain Secondary Sections whose titles are designated, as being those of Invariant Sections, in the notice that says that the Document is released under this License.
The ``Cover Texts'' are certain short passages of text that are listed, as Front-Cover Texts or Back-Cover Texts, in the notice that says that the Document is released under this License.
A ``Transparent'' copy of the Document means a machine-readable copy, represented in a format whose specification is available to the general public, whose contents can be viewed and edited directly and straightforwardly with generic text editors or (for images composed of pixels) generic paint programs or (for drawings) some widely available drawing editor, and that is suitable for input to text formatters or for automatic translation to a variety of formats suitable for input to text formatters. A copy made in an otherwise Transparent file format whose markup has been designed to thwart or discourage subsequent modification by readers is not Transparent. A copy that is not ``Transparent'' is called ``Opaque''.
Examples of suitable formats for Transparent copies include plain ASCII without markup, Texinfo input format, LaTeX input format, SGML or XML using a publicly available DTD, and standard-conforming simple HTML designed for human modification. Opaque formats include PostScript, PDF, proprietary formats that can be read and edited only by proprietary word processors, SGML or XML for which the DTD and/or processing tools are not generally available, and the machine-generated HTML produced by some word processors for output purposes only.
The ``Title Page'' means, for a printed book, the title page itself, plus such following pages as are needed to hold, legibly, the material this License requires to appear in the title page. For works in formats which do not have any title page as such, ``Title Page'' means the text near the most prominent appearance of the work's title, preceding the beginning of the body of the text.
You may also lend copies, under the same conditions stated above, and you may publicly display copies.
If the required texts for either cover are too voluminous to fit legibly, you should put the first ones listed (as many as fit reasonably) on the actual cover, and continue the rest onto adjacent pages.
If you publish or distribute Opaque copies of the Document numbering more than 100, you must either include a machine-readable Transparent copy along with each Opaque copy, or state in or with each Opaque copy a publicly-accessible computer-network location containing a complete Transparent copy of the Document, free of added material, which the general network-using public has access to download anonymously at no charge using public-standard network protocols. If you use the latter option, you must take reasonably prudent steps, when you begin distribution of Opaque copies in quantity, to ensure that this Transparent copy will remain thus accessible at the stated location until at least one year after the last time you distribute an Opaque copy (directly or through your agents or retailers) of that edition to the public.
It is requested, but not required, that you contact the authors of the Document well before redistributing any large number of copies, to give them a chance to provide you with an updated version of the Document.
If the Modified Version includes new front-matter sections or appendices that qualify as Secondary Sections and contain no material copied from the Document, you may at your option designate some or all of these sections as invariant. To do this, add their titles to the list of Invariant Sections in the Modified Version's license notice. These titles must be distinct from any other section titles.
You may add a section entitled ``Endorsements'', provided it contains nothing but endorsements of your Modified Version by various parties -- for example, statements of peer review or that the text has been approved by an organization as the authoritative definition of a standard.
You may add a passage of up to five words as a Front-Cover Text, and a passage of up to 25 words as a Back-Cover Text, to the end of the list of Cover Texts in the Modified Version. Only one passage of Front-Cover Text and one of Back-Cover Text may be added by (or through arrangements made by) any one entity. If the Document already includes a cover text for the same cover, previously added by you or by arrangement made by the same entity you are acting on behalf of, you may not add another; but you may replace the old one, on explicit permission from the previous publisher that added the old one.
The author(s) and publisher(s) of the Document do not by this License give permission to use their names for publicity for or to assert or imply endorsement of any Modified Version.
The combined work need only contain one copy of this License, and multiple identical Invariant Sections may be replaced with a single copy. If there are multiple Invariant Sections with the same name but different contents, make the title of each such section unique by adding at the end of it, in parentheses, the name of the original author or publisher of that section if known, or else a unique number. Make the same adjustment to the section titles in the list of Invariant Sections in the license notice of the combined work.
In the combination, you must combine any sections entitled ``History'' in the various original documents, forming one section entitled ``History''; likewise combine any sections entitled ``Acknowledgements'', and any sections entitled ``Dedications''. You must delete all sections entitled ``Endorsements.''
You may extract a single document from such a collection, and distribute it individually under this License, provided you insert a copy of this License into the extracted document, and follow this License in all other respects regarding verbatim copying of that document.
If the Cover Text requirement of section 3 is applicable to these copies of the Document, then if the Document is less than one quarter of the entire aggregate, the Document's Cover Texts may be placed on covers that surround only the Document within the aggregate. Otherwise they must appear on covers around the whole aggregate.
Each version of the License is given a distinguishing version number. If the Document specifies that a particular numbered version of this License "or any later version" applies to it, you have the option of following the terms and conditions either of that specified version or of any later version that has been published (not as a draft) by the Free Software Foundation. If the Document does not specify a version number of this License, you may choose any version ever published (not as a draft) by the Free Software Foundation.
Copyright (C) YEAR YOUR NAME. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with the Invariant Sections being LIST THEIR TITLES, with the Front-Cover Texts being LIST, and with the Back-Cover Texts being LIST. A copy of the license is included in the section entitled ``GNU Free Documentation License''. |
If you have no Invariant Sections, write ``with no Invariant Sections'' instead of saying which ones are invariant. If you have no Front-Cover Texts, write ``no Front-Cover Texts'' instead of ``Front-Cover Texts being LIST''; likewise for Back-Cover Texts.
If your document contains nontrivial examples of program code, we recommend releasing these examples in parallel under your choice of free software license, such as the GNU General Public License, to permit their use in free software.