A discussion of binding modules, the principles behind the tool, and a
discussion of related work can be found in a research paper located at
http://www.cse.unsw.edu.au/~chak/papers/papers.html#c2hs. All features
described in the paper, except enum define
hooks are implemented in the
tool, but since the publication of the paper, the tool has been extended
further. The library interface essentially consists of the new Haskell FFI
Marshalling Library. More details about this library are provided in the next
section.
The remainder of this section describes the hooks that are available in binding modules.
{#import [qualified] modid#}
Is translated into the same syntactic form in Haskell, which implies that it
may be followed by an explicit import list. Moreover, it implies that
the module modid is also generated by C->Haskell and instructs the
tool to read the file modid.chi
.
If an explicit output file name is given (--output
option), this name
determines the basename for the .chi
file of the currently translated
module.
Currently, only pointer hooks generate information that is stored in a
.chi
file and needs to be incorporated into any client module that makes
use of these pointer types. It is, however, regarded as good style to use
import hooks for any module generated by C->Haskell.
Restriction: C->Haskell does not use qualified names. This can be a problem, for example, if two pointer hooks are defined to have the same unqualified Haskell name in two different modules, which are then imported by a third module. To partially work around this problem, it is guaranteed that the declaration of the textually later import hook dominates.
{#context [lib = lib] [prefix = prefix]#}
Context hooks define a set of global configuration options. Currently, there are two parameters which are both strings:
gtk_
, as a form of poor man's
name spaces. Any occurrence of underline characters between a prefix and the
main part of an identifier must also be dropped. Case is not relevant in a
prefix. In case of a conflict of the abbreviation with an explicitly defined
identifier, the explicit definition takes preference.Both parameters are optional. An example of a context hook is the following:
{#context prefix = "gtk"#}
If a binding module contains a binding hook, it must be the first hook in the module.
{#type ident#}
A type hooks maps a C type to a Haskell type. As an example, consider
type GInt = {#type gint#}
The type must be a defined type, primitive types, such as int
, are not
admissible.
{#sizeof ident#}
A sizeof hooks maps a C type to its size in bytes. As an example, consider
gIntSize :: Int
gIntSize = {#sizeof gint#}
The type must be a defined type, primitive types, such as int
, are not
admissible. The size of primitive types can always be obtained using
Storable.sizeOf
.
{#enum cid [as hsid] {alias1 , ... , aliasn}
[with prefix = pref] [deriving (clid1 , ... , clidn)]#}
Rewrite the C enumeration called cid into a Haskell data type
declaration, which is made an instance of Enum
such that the ordinals
match those of the enumeration values in C. This takes explicit enumeration
values in the C definitions into account. If hsid is given, this is
the name of the Haskell data type. The identifiers clid1 to clidn
are added to the deriving clause of the Haskell type.
By default, the names of the C enumeration are used for the constructors in
Haskell. If alias1 is underscoreToCase
, the original C names are
capitalised and the use of underscores is rewritten to caps. If it is
upcaseFirstLetter
or downcaseFirstLetter
, the first letter of the
original C name changes case correspondingly. It is also possible to combine
underscoreToCase
with one of upcaseFirstLetter
or
downcaseFirstLetter
. Moreover, alias1 to aliasn may be aliases
of the form cid as
hsid, which map individual C names to
Haskell names. Instead of the global prefix introduced by a context hook, a
local prefix pref can optionally be specified.
As an example, consider
{#enum WindowType {underscoreToCase} deriving (Eq)#}
Note: The enum define
hooks described in the C->Haskell paper are
not implemented yet.
{#call [pure] [unsafe] cid [as (hsid | ^)]#}
A call hook rewrites to a call to the C function cid and also ensures
that the appropriate foreign import declaration is generated. The tags
pure
and unsafe
specify that the external function is purely
functional and cannot re-enter the Haskell runtime, respectively. If
hsid is present, it is used as the identifier for the foreign
declaration, which otherwise defaults to the cid. When instead of
hsid, the symbol ^
is given, the cid after conversion from C's
underscore notation to a capitalised identifier is used.
As an example, consider
sin :: Float -> Float
sin = {#call pure sin as "_sin"#}
{#fun [pure] [unsafe] cid [as (hsid | ^)]
[ctxt =>] { parm1 , ... , parmn } -> parm
Function hooks are call hooks including parameter marshalling. Thus, the
components of a function hook up to and including the as
alias are the
same as for call hooks. However, an as
alias has a different meaning; it
specifies the name of the generated Haskell function. The remaining
components use literals enclosed in backwards and foward single quotes (`
and '
) to denote Haskell code fragments (or more precisely, parts of the
Haskell type signature for the bound function). The first one is the phrase
ctxt preceding =>
, which denotes the type context. This is followed
by zero or more type and marshalling specifications parm1 to parmn
for the function arguments and one parm for the function result. Each
such specification parm has the form
[inmarsh [* | -]] hsty [&] [outmarsh [* | -]]
where hsty is a Haskell code fragment denoting a Haskell type. The optional information to the left and right of this type determines the marshalling of the corresponding Haskell value to and from C; they are called the in and out marshaller, respectively.
Each marshalling specification parm corresponds to one or two arguments
of the C function, in the order in which they are given. A marshalling
specification in which the symbol &
follows the Haskell type corresponds
to two C function arguments; otherwise, it corresponds only to one argument.
The parm following the left arrow ->
determines the marshalling of
the result of the C function and may not contain the symbol &
.
Both inmarsh and outmarsh are identifiers of Haskell marshalling
functions. By default they are assumed to be pure functions; if they have to
be executed in the IO
monad, the function name needs to be followed by a
star symbol *
. Alternatively, the identifier may be followed by a minux
sign -
, in which case the Haskell type does not appear as an
argument (in marshaller) or result (out marshaller) of the generated Haskell
function. In other words, the argument types of the Haskell function is
determined by the set of all marshalling specifications where the in
marshaller is not followed by a minus sign. Conversely, the result tuple of
the Haskell function is determined by the set of all marshalling
specifications where the out marshaller is not followed by a minus sign. The
order of function arguments and components in the result tuple is the same as
the order in which the marshalling specifications are given, with the exception
that the value of the result marshaller is always the first component in the
result tuple if it is included at all.
For a set of commonly occuring Haskell and C type combinations, default
marshallers are provided by C->Haskell if no explicit marshaller is
given. The out marshaller for function arguments is by default void-
.
The defaults for the in marshallers for function arguments are as follows:
Bool
and integral C type (including chars): cFromBool
cIntConv
cFloatConv
String
and char*
: withCString*
String
and char*
with explicit length: withCStringLen*
*
: with*
*
where T is an integral type:
withIntConv*
*
where T is a floating type:
withFloatConv*
Bool
and T*
where T is an integral type:
withFromBool*
The defaults for the out marshaller of the result are the converse of the
above; i.e., instead of the with
functions, the corresponding peek
functions are used. Moreover, when the Haskell type is ()
, the default
marshaller is void-
.
As an example, consider
{#fun notebook_query_tab_label_packing as ^
`(NotebookClass nb, WidgetClass cld)' =>
{notebook `nb' ,
widget `cld' ,
alloca- `Bool' peekBool*,
alloca- `Bool' peekBool*,
alloca- `PackType' peekEnum*} -> `()'#}
which results in the Haskell type signature
notebookQueryTabLabelPacking :: (NotebookClass nb, WidgetClass cld)
=> nb -> cld -> IO (Bool, Bool, PackType)
which binds the following C function:
void gtk_notebook_query_tab_label_packing (GtkNotebook *notebook,
GtkWidget *child,
gboolean *expand,
gboolean *fill,
GtkPackType *pack_type);
{#get apath#}
A get hook supports accessing a member value of a C structure. The hook itself yields a function that, when given the address of a structure of the right type, performs the structure access. The member that is to be extracted is specified by the access path apath. Access paths are formed as follows (following a subset of the C expression syntax):
struct
tag.*
apath denotes dereferencing of
the pointer yielded by accessing the access path apath..
cid specifies that the
value of the struct
member called cid should be accessed.->
cid, as in
C, specifies a combination of dereferencing and member selection.For example, we may have
visualGetType :: Visual -> IO VisualType
visualGetType (Visual vis) = liftM cToEnum $ {#get Visual->type#} vis
{#set apath#}
Set hooks are formed in the same way as get hooks, but yield a function that assigns a value to a member of a C structure. These functions expect a pointer to the structure as the first and the value to be assigned as the second argument. For example, we may have
{#set sockaddr_in.sin_family#} addr_in (cFromEnum AF_NET)
{#pointer [*] cid [as hsid] [foreign | stable] [newtype | ->
hsid2] [nocode]#}
A pointer hook facilitates the mapping of C to Haskell pointer types. In
particular, it enables the use of ForeignPtr
and StablePtr
types and
defines type name translations for pointers to non-basic types. In general,
such a hook establishes an association between the C type cid or
*
cid and the Haskell type hsid, where the latter defaults to
cid if not explicitly given. The identifier cid will usually be a
type name, but in the case of *
cid may also be a struct, union, or
enum tag. If both a type name and a tag of the same name are available, the
type name takes precedence. Optionally, the Haskell representation of
the pointer can be by a ForeignPtr
or StablePtr
instead of a plain
Ptr
. If the newtype
tag is given, the Haskell type hsid is
defined as a newtype
rather than a transparent type synonym. In case of
a newtype
, the type argument to the Haskell pointer type will be
hsid, which gives a cyclic definition, but the type argument is here
really only used as a unique type tag. Without newtype
, the default
type argument is ()
, but another type can be specified after the symbol
->
.
For example, we may have
{#pointer *GtkObject as Object newtype#}
This will generate a new type Object
as follows:
newtype Object = Object (Ptr Object)
which enables exporting Object
as an abstract type and facilitates type
checking at call sites of imported functions using the encapsulated
pointer. The latter is achieved by C->Haskell as follows. The tool
remembers the association of the C type *GtkObject
with the Haskell type
Object
, and so, it generates for the C function
void gtk_unref_object (GtkObject *obj);
the import declaration
foreign import gtk_unref_object :: Object -> IO ()
This function can obviously only be applied to pointers of the right type, and thus, protects against the common mistake of confusing the order of pointer arguments in function calls.
However, as the Haskell FFI does not permit to directly pass ForeignPtr
s
to function calls or return them, the tool will use the type Ptr HsName
in this case, where HsName
is the Haskell name of the type. So, if we
modify the above declaration to be
{#pointer *GtkObject as Object foreign newtype#}
the type Ptr Object
will be used instead of a plain Object
in import
declarations; i.e., the previous import
declaration will become
foreign import gtk_unref_object :: Ptr Object -> IO ()
To simplify the required marshalling code for such pointers, the tool automatically generates a function
with :: Object -> (Ptr Object -> IO a) -> IO a
As an example that does not represent the pointer as an abstract type, consider the C type declaration:
typedef struct {int x, y;} *point;
We can represent it in Haskell as
data Point = Point {x :: Int, y :: Int}
{#pointer point as PointPtr -> Point#}
which will translate to
data Point = Point {x :: Int, y :: Int}
type PointPtr = Ptr Point
and establish a type association between point
and PointPtr
.
If the keyword nocode
is added to the end of a pointer hook,
C->Haskell will not emit a type declaration. This is useful when a
C->Haskell module wants to make use of an existing type declaration in a
binding not generated by C->Haskell (i.e., where there are no .chi
files).
Restriction: The name cid cannot be a basic C type (such as
int
), it must be a defined name.
{#class [hsid1 =>] hsid2 hsid3#}
Class hooks facilitate the definition of a single inheritance class hierachy for external pointers including up and down cast functionality. This is meant to be used in cases where the objects referred to by the external pointers are order in such a hierachy in the external API - such structures are encountered in C libraries that provide an object-oriented interface. Each class hook rewrites to a class declaration and one or more instance declarations.
All classes in a hierarchy, except the root, will have a superclass identified by hsid1. The new class is given by hsid2 and the corresponding external pointer is identified by hsid3. Both the superclass and the pointer type must already have been defined by binding hooks that precede the class hook.
The pointers in a hierachy must either all be foreign pointers or all be
normal pointers. Stable pointers are not allowed. Both pointer defined as
newtype
s and those defined by type synonyms may be used in class
declarations and they may be mixed. In the case of synonyms, Haskell's usual
restrictions regarding overlapping instance declarations apply.
The newly defined class has two members whose names are derived from the type
name hsid3. The name of first member is derived from hsid3 by
converting the first character to lower case. This function casts from any
superclass to the current class. The name of the second member is derived by
prefixing hsid3 with the from
. It casts from the current class to
any superclass. A class hook generates an instance for the pointer in the
newly defined class as well as in all its superclasses.
As an example, consider
{#pointer *GtkObject newtype#}
{#class GtkObjectClass GtkObject#}
{#pointer *GtkWidget newtype#}
{#class GtkObjectClass => GtkWidgetClass GtkWidget#}
The second class hook generates an instance for GtkWidget
for both the
GtkWidgetClass
as well as for the GtkObjectClass
.
A Haskell binding module may include arbitrary C pre-processor directives using the standard C syntax. The directives are used in two ways: Firstly, they are included in the C header file generated by C->Haskell in exactly the same order in which they appear in the binding module. Secondly, all conditional directives are honoured by C->Haskell in that all Haskell binding code in alternatives that are discarded by the C pre-processor are also discarded by C->Haskell. This latter feature is, for example, useful to maintain different bindings for multiple versions of the same C API in a single Haskell binding module.
In addition to C pre-processor directives, vanilla C code can be maintained in
a Haskell binding module by bracketing this C code with the pseudo directives
#c
and #endc
. Such inline C code is emitted into the C header
generated by C->Haskell at exactly the same position relative to CPP
directives as it occurs in the binding module. Pre-processor directives may
encompass the #include
directive, which can be used instead of specifying
a C header file as an argument to c2hs
. In particular, this enables the
simultaneous use of multiple header files without the need to provide a custom
header file that binds them together. If a header file lib.h
is
specified as an argument to c2hs
, the tool will emit the directive
#include"
lib.h"
into the generated C header before any other
CPP directive or inline C code.
As an artificial example of these features consider the following code:
#define VERSION 2
#if (VERSION == 1)
foo :: CInt -> CInt
foo = {#call pure fooC#}
#else
foo :: CInt -> CInt -> CInt
foo = {#call pure fooC#}
#endif
#c
int fooC (int, int);
#endc
One of two versions of the Haskell function foo
(having different arities)
is selected in dependence on the value of the CPP macro VERSION
, which in
this example is defined in the same file. In realistic code, VERSION
would be defined in the header file supplied with the C library that is made
accessible from Haskell by a binding module. The above code fragment also
includes one line of inline C code that declares a C prototype for fooC
.
Current limitation of the implementation: Inline C code can currently not contain any code blocks; i.e., only declarations as typically found in header files may be included.
The following grammar rules define the syntax of binding hooks:
hook -> `{#' inner `#}'
inner -> `import' ['qualified'] ident
| `context' ctxt
| `type' ident
| `sizeof' ident
| `enum' idalias trans [`with' prefix] [deriving]
| `call' [`pure'] [`unsafe'] idalias
| `fun' [`pure'] [`unsafe'] idalias parms
| `get' apath
| `set' apath
| `pointer' ['*'] idalias ptrkind
| `class' [ident `=>'] ident ident
ctxt -> [`lib' `=' string] [prefix]
idalias -> ident [(`as' ident | `^')]
prefix -> `prefix' `=' string
deriving -> `deriving' `(' ident_1 `,' ... `,' ident_n `)'
parms -> [verbhs `=>'] `{' parm_1 `,' ... `,' parm_n `}' `->' parm
parm -> [ident_1 [`*' | `-']] verbhs [`&'] [ident_2 [`*' | `-']]
apath -> ident
| `*' apath
| apath `.' ident
| apath `->' ident
trans -> `{' alias_1 `,' ... `,' alias_n `}'
alias -> `underscoreToCase' | `upcaseFirstLetter' | `downcaseFirstLetter'
| ident `as' ident
ptrkind -> [`foreign' | `stable'] ['newtype' | '->' ident]
Identifier ident
follow the lexis of Haskell. They may be enclosed in
single quotes to disambiguate them from C->Haskell keywords.