Kwalify Users' Guide (for Ruby and Java)

makoto kuwata <kwa(at)kuwata-lab.com>
last update: $Date: 2005-12-20 12:50:56 +0900 (Tue, 20 Dec 2005) $

Preface

Kwalify(*1) is a tiny schema validator for YAML and JSON document.

You know "80-20 rule" known as Pareto Law, don't you? This rule suggests that 20% of the population owns 80% of the wealth. Kwalify is based on a new "50-5 rule" which suggests that 5% of the population owns 50% of the wealth. This rule is more aggressive and cost-effective than Pareto Law. The rule is named as "Levi's Law".

schema technology (A) cover range (B) cost to pay (A)/(B) effectiveness
XML Schema 95% 100% 0.95 (= 95/100)
RelaxNG 80% 20% 4.0 (= 80/20)
Kwalify 50% 5% 10.0 (= 50/5)

Kwalify is small and in fact poorer than RelaxNG or XML Schema. I hope you extend/customize Kwalify for your own way.

Table of Contents:
(*1)
Pronounce as 'Qualify'.

Usage of Kwalify

Usage in Command-Line

usage1: validate YAML document in command-line
### kwalify-ruby
$ kwalify -f schema.yaml document.yaml [document2.yaml ...]

### kwalify-java
$ java -classpath kwalify.jar kwalify.Main -f schema.yaml document.yaml [document2.yaml ...]
usage2: validate schema definition in command-line
### kwalify-ruby
$ kwalify -m schema.yaml [schema2.yaml ...]

### kwalify-java
$ java -classpath kwalify.jar kwalify.Main -m schema.yaml [schema2.yaml ...]

Command-line options:

-h, --help
Print help message.
-v
Print version.
-s
Silent mode.
-f schema.yaml
Specify schema definition file.
-m
Meta-validation of schema definition.
-t
Expand tab characters to spaces automatically.
-l
Show linenumber on which error found.
-E
Show errors in Emacs-compatible style (implies '-l' option).

Notice that the command-line option -l is an experimental feature, for kwalify command use original YAML parser instead of Syck parser when this option is specified.

If you are an Emacs user, try -E option that show errors in format which Emacs can parse and jump to errors. You can use C-x ` (next-error) to jump into errors.


Usage in Ruby Script

The followings are example scripts for Ruby.

validate YAML document in Ruby script
require 'kwalify'

## parse schema definition and create validator
schema = YAML.load_file('schema.yaml')
validator = Kwalify::Validator.new(schema)  # raises Kwalify::SchemaError if wrong

## validate YAML document
document = YAML.load_file('document.yaml')
error_list = validator.validate(document)
unless error_list.empty?
   error_list.each do |error|     # error is instance of Kwalify::ValidationError
      puts "[#{error.path}] #{error.message}"
   end
end
validate YAML document and show linenumber on where error is found.
require 'kwalify'

## parse schema definition and create validator
schema = YAML.load_file('schema.yaml')
validator = Kwalify::Validator.new(schema)  # raises Kwalify::SchemaError if wrong

## parse YAML document with Kwalify's parser
str = File.read('document.yaml')
parser = Kwalify::Parser.new(str)
document = parser.parse()

## validate document and show errors
error_list = validator.validate(document)
unless error_list.empty?
   parser.set_errors_linenum(error_list)  # set linenum on error
   error_list.sort.each do |error|
      puts "(line %d)[%s] %s" % [error.linenum, error.path, error.message]
   end
end

Kwalify's YAML parser is experimental. You should notice that Kwalify's YAML parser is limited only for basic syntax of YAML.

The followings are example programs of Java.

validate YAML document and show linenumber on where error is found.
import kwalify.*;

public class Test {

  public static void main(String[] args) throws Exception {
    // read schema
    String schema_str = Util.readFile("schema.yaml");
    Object schema = new YamlParser(schema_str).parse();

    // read document file
    String document_str = Util.readFile("document.yaml");
    YamlParser parser = new YamlParser(document_str);
    Object document = parser.parse();

    // create validator and validate
    Validator validator = new Validator(schema);
    List errors = validator.validate(document);

    // show errors
    if (errors != null && errors.size() > 0) {
      parser.setErrorsLineNumber(errors);
      Collections.sort(errors);
      for (Iterator it = errors.iterator(); it.hasNext(); ) {
        ValidationException error = (ValidationException)it.next();
        int linenum = error.getLineNumber();
        String path = error.getPath();
        String mesg = error.getMessage();
        System.out.println("- " + linenum + ": [" + path + "] " + mesg);
      }
    }
  }
}


Schema Definition

Sequence

schema01.yaml : sequence of string
type:   seq
sequence:
  - type:   str
document01a.yaml : valid document example
- foo
- bar
- baz
validate
$ kwalify -lf schema01.yaml document01a.yaml
document01a.yaml#0: valid.
document01b.yaml : invalid document example
- foo
- 123
- baz
validate
$ kwalify -lf schema01.yaml document01b.yaml
document01b.yaml#0: INVALID
  - (line 2) [/1] '123': not a string.

Default 'type:' is str so you can omit 'type: str'.


Mapping

schema02.yaml : mapping of scalar
type:       map
mapping:
  name:
    type:      str
    required:  yes
  email:
    type:      str
    pattern:   /@/
  age:
    type:      int
  birth:
    type:      date
document02a.yaml : valid document example
name:   foo
email:  foo@mail.com
age:    20
birth:  1985-01-01
validate
$ kwalify -lf schema02.yaml document02a.yaml
document02a.yaml#0: valid.
document02b.yaml : invalid document example
name:   foo
email:  foo(at)mail.com
age:    twenty
birth:  Jun 01, 1985
validate
$ kwalify -lf schema02.yaml document02b.yaml
document02b.yaml#0: INVALID
  - (line 2) [/email] 'foo(at)mail.com': not matched to pattern /@/.
  - (line 3) [/age] 'twenty': not a integer.
  - (line 4) [/birth] 'Jun 01, 1985': not a date.

Sequence of Mapping

schema03.yaml : sequence of mapping
type:      seq
sequence:
  - type:      map
    mapping:
      name:
        type:      str
        required:  true
      email:
        type:      str
document03a.yaml : valid document example
- name:   foo
  email:  foo@mail.com
- name:   bar
  email:  bar@mail.net
- name:   baz
  email:  baz@mail.org
validate
$ kwalify -lf schema03.yaml document03a.yaml
document03a.yaml#0: valid.
document03b.yaml : invalid document example
- name:   foo
  email:  foo@mail.com
- naem:   bar
  email:  bar@mail.net
- name:   baz
  mail:   baz@mail.org
validate
$ kwalify -lf schema03.yaml document03b.yaml
document03b.yaml#0: INVALID
  - (line 3) [/1] key 'name:' is required.
  - (line 3) [/1/naem] key 'naem:' is undefined.
  - (line 6) [/2/mail] key 'mail:' is undefined.

Mapping of Sequence

schema04.yaml : mapping of sequence of mapping
type:      map
mapping:
  company:
    type:      str
    required:  yes
  email:
    type:      str
  employees:
    type:      seq
    sequence:
      - type:    map
        mapping:
          code:
            type:      int
            required:  yes
          name:
            type:      str
            required:  yes
          email:
            type:      str
document04a.yaml : valid document example
company:    Kuwata lab.
email:      webmaster@kuwata-lab.com
employees:
  - code:   101
    name:   foo
    email:  foo@kuwata-lab.com
  - code:   102
    name:   bar
    email:  bar@kuwata-lab.com
validate
$ kwalify -lf schema04.yaml document04a.yaml
document04a.yaml#0: valid.
document04b.yaml : invalid document example
company:    Kuwata Lab.
email:      webmaster@kuwata-lab.com
employees:
  - code:   A101
    name:   foo
    email:  foo@kuwata-lab.com
  - code:   102
    name:   bar
    mail:   bar@kuwata-lab.com
validate
$ kwalify -lf schema04.yaml document04b.yaml
document04b.yaml#0: INVALID
  - (line 4) [/employees/0/code] 'A101': not a integer.
  - (line 9) [/employees/1/mail] key 'mail:' is undefined.

Rule and Entry

Rule is set of entries. Entry usually represents constraint outside of a few exceptions.

The followings are constraint entries.

required:
Value is required when true (default is false).
enum:
List of available values.
pattern:
Specifies regular expression pattern of value.
type:
Type of value. The followings are available:
  • str
  • int
  • float
  • number (== int or float)
  • text (== str or number)
  • bool
  • date
  • time
  • timestamp
  • seq
  • map
  • scalar (all but seq and map)
  • any (means any data)
range:
Range of value between max/max-ex and min/min-ex.
  • 'max' means 'max-inclusive'.
  • 'min' means 'min-inclusive'.
  • 'max-ex' means 'max-exclusive'.
  • 'min-ex' means 'min-exclusive'.
Type seq, map, bool and any are not available with range:.
length:
Range of length of value between max/max-ex and min/min-ex. Only type str and text are available with length:.
assert:
String which represents validation expression. String should contain variable name val which repsents value. (This is an experimental function and supported only Kwartz-ruby).
unique:
Value is unique for mapping or sequence. See the next subsection for detail.

The followings are non-constraint entries.

name:
Name of schema.
desc:
Description. This is not used for validation.

Rule contains 'type:' entry. 'sequence:' entry takes a list of rule. 'mapping:' entry takes a hash which values are rules.

schema05.yaml : rule examples
type:      seq                                # new rule
sequence:
  -
    type:      map                            # new rule
    mapping:
      name:
        type:       str                       # new rule
        required:   yes
      email:
        type:       str                       # new rule
        required:   yes
        pattern:    /@/
      password:
        type:       text                      # new rule
        length:     { max: 16, min: 8 }
      age:
        type:       int                       # new rule
        range:      { max: 30, min: 18 }
        # or assert: 18 <= val && val <= 30
      blood:
        type:       str                       # new rule
        enum:
          - A
          - B
          - O
          - AB
      birth:
        type:       date                      # new rule
      memo:
        type:       any                       # new rule
document05a.yaml : valid document example
- name:     foo
  email:    foo@mail.com
  password: xxx123456
  age:      20
  blood:    A
  birth:    1985-01-01
- name:     bar
  email:    bar@mail.net
  age:      25
  blood:    AB
  birth:    1980-01-01
validate
$ kwalify -lf schema05.yaml document05a.yaml
document05a.yaml#0: valid.
document05b.yaml : invalid document example
- name:     foo
  email:    foo(at)mail.com
  password: xxx123
  age:      twenty
  blood:    a
  birth:    1985-01-01
- given-name:  bar
  family-name: Bar
  email:    bar@mail.net
  age:      15
  blood:    AB
  birth:    1980/01/01
validate
$ kwalify -lf schema05.yaml document05b.yaml
document05b.yaml#0: INVALID
  - (line 2) [/0/email] 'foo(at)mail.com': not matched to pattern /@/.
  - (line 3) [/0/password] 'xxx123': too short (length 6 < min 8).
  - (line 4) [/0/age] 'twenty': not a integer.
  - (line 5) [/0/blood] 'a': invalid blood value.
  - (line 7) [/1/given-name] key 'given-name:' is undefined.
  - (line 7) [/1] key 'name:' is required.
  - (line 8) [/1/family-name] key 'family-name:' is undefined.
  - (line 10) [/1/age] '15': too small (< min 18).
  - (line 12) [/1/birth] '1980/01/01': not a date.

Unique constraint

'unique:' constraint entry is available with elements of sequence or mapping. This is equivalent to unique key or primary key of RDBMS.

Type of rule which has 'unique:' entry must be scalar (str, int, float, ...). Type of parent rule must be sequence or mapping.

schema06.yaml : unique constraint entry with mapping and sequence
type: seq
sequence:
  - type:     map
    required: yes
    mapping:
      name:
        type:     str
        required: yes
        unique:   yes
      email:
        type:     str
      groups:
        type:     seq
        sequence:
          - type: str
            unique:   yes
document06a.yaml : valid document example
- name:   foo
  email:  admin@mail.com
  groups:
    - users
    - foo
    - admin
- name:   bar
  email:  admin@mail.com
  groups:
    - users
    - admin
- name:   baz
  email:  baz@mail.com
  groups:
    - users
validate
$ kwalify -lf schema06.yaml document06a.yaml
document06a.yaml#0: valid.
document06b.yaml : invalid document example
- name:   foo
  email:  admin@mail.com
  groups:
    - foo
    - users
    - admin
    - foo
- name:   bar
  email:  admin@mail.com
  groups:
    - admin
    - users
- name:   bar
  email:  baz@mail.com
  groups:
    - users
validate
$ kwalify -lf schema06.yaml document06b.yaml
document06b.yaml#0: INVALID
  - (line 7) [/0/groups/3] 'foo': is already used at '/0/groups/0'.
  - (line 13) [/2/name] 'bar': is already used at '/1/name'.

Validator#validator_hook()

You can extend Kwalify::Validator class (Ruby) or kwalify.Validator class (Java), and override Kwalify::Validator#validator_hook() method (Ruby) or kwalify.Validator#validateHook() method (Java). This method is called by Kwalify::Validator#validate() (Ruby) or kwalify.Validator#validate() (Java).

answers-schema.yaml : 'name:' is important.
type:      map
mapping:
  answers:
    type:      seq
    sequence:
      - type:      map
        name:        Answer
        mapping:
          name:
            type:      str
            required:  yes
          answer:
            type:      str
            required:  yes
            enum:
              - good
              - not bad
              - bad
          reason:
            type:      str
answers-validator.rb : validate script for Ruby
#!/usr/bin/env ruby

require 'kwalify'
require 'yaml'

## validator class for answers
class AnswersValidator < Kwalify::Validator

   ## load schema definition
   @@schema = YAML.load_file('answers-schema.yaml')

   def initialize()
      super(@@schema)
   end

   ## hook method called by Validator#validate()
   def validate_hook(value, rule, path, errors)
      case rule.name
      when 'Answer'
         if value['answer'] == 'bad'
            reason = value['reason']
            if !reason || reason.empty?
               msg = "reason is required when answer is 'bad'."
               errors << Kwalify::ValidationError.new(msg, path)
            end
         end
      end
   end

end

## create validator
validator = AnswersValidator.new

## load YAML document
input = ARGF.read()
document = YAML.load(input)

## validate
errors = validator.validate(document)
if errors.empty?
   puts "Valid."
else
   puts "*** INVALID!"
   errors.each do |error|
      # error.class == Kwalify::ValidationError
      puts " - [#{error.path}] : #{error.message}"
   end
end
document07a.yaml : valid document example
answers:
  - name:      Foo
    answer:    good
    reason:    I like this style.
  - name:      Bar
    answer:    not bad
  - name:      Baz
    answer:    bad
    reason:    I don't like this style.
validate
$ ruby answers-validator.rb document07a.yaml
Valid.
document07b.yaml : invalid document example
answers:
  - name:    Foo
    answer:  good
  - name:    Bar
    answer:  bad
  - name:    Baz
    answer:  not bad
validate
$ ruby answers-validator.rb document07b.yaml
*** INVALID!
 - [/answers/1] : reason is required when answer is 'bad'.

You can validate some document by a Validator instance because Validator class and Validator#validate() method are stateless. If you use instance variables in custom validator_hook() method, it becomes to be stateful.

Here is a Java program equivarent to 'answers-validator.rb'.

AnswersValidator.java : validate program for Java
import kwalify.Validator;
import kwalify.Rule;
import kwalify.Util;
import kwalify.YamlUtil;
import kwalify.YamlParser;
import kwalify.SyntaxException;
import kwalify.ValidationException;

import java.util.*;
import java.io.IOException;


/**
 *  validator class for answers
 */
public class AnswersValidator extends Validator {

    /** schema string */
    private static final String SCHEMA = ""
        + "type:      map\n"
        + "mapping:\n"
        + "  answers:\n"
        + "    type:      seq\n"
        + "    sequence:\n"
        + "      - type:      map\n"
        + "        name:      Answer\n"
        + "        mapping:\n"
        + "          name:\n"
        + "            type:      str\n"
        + "            required:  yes\n"
        + "          answer:\n"
        + "            type:      str\n"
        + "            required:  yes\n"
        + "            enum:\n"
        + "              - good\n"
        + "              - not bad\n"
        + "              - bad\n"
        + "          reason:\n"
        + "            type:      str\n"
        ;

    /** schema object */
    private static Map schema = null;
    static {
        try {
            schema = (Map)YamlUtil.load(SCHEMA);
        } catch (SyntaxException ex) {
            assert false;
        }
    }

    /** construnctor */
    public AnswersValidator() {
        super(schema);
    }

    /** hook method called by Validator#validate() */
    protected void validateHook(Object value, Rule rule, String path, List errors) {
        String rule_name = rule.getName();
        if (rule_name != null && rule_name.equals("Answer")) {
            assert value instanceof Map;
            Map val = (Map)value;
            assert val.get("answer") != null;
            if (val.get("answer").equals("bad")) {
                String reason = (String)val.get("reason");
                if (reason == null || reason.length() == 0) {
                    String msg = "reason is required when answer is 'bad'.";
                    errors.add(new ValidationException(msg, path));
                }
            }
        }
    }

    /** main program */
    public static void main(String[] args) throws IOException, SyntaxException {
        // create validator
        Validator validator = new AnswersValidator();

        // load YAML document
        String input;
        if (args.length > 0) {
            input = Util.readFile(args[0]);
        } else {
            input = Util.readInputStream(System.in);
        }
        YamlParser parser = new YamlParser(input);
        Object document = parser.parse();

        // validate and show errors
        List errors = validator.validate(document);
        if (errors == null || errors.size() == 0) {
            System.out.println("Valid.");
        } else {
            System.out.println("*** INVALID!");
            parser.setErrorsLineNumber(errors);
            Collections.sort(errors);
            for (Iterator it = errors.iterator(); it.hasNext(); ) {
                ValidationException error = (ValidationException)it.next();
                int linenum = error.getLineNumber();
                String path = error.getPath();
                String mesg = error.getMessage();
                String s = "- line " + linenum + ": [" + path + "] " + mesg;
                System.out.println(s);
            }
        }
    }
}
validate
$ java -classpath kwalify.jar AnswersValidator document07a.yaml
Valid.
$ java -classpath kwalify.jar AnswersValidator document07b.yaml
*** INVALID!
- line 4: [/answers/1] reason is required when answer is 'bad'.

Validator with Block

Notice: This is an experimental feature.

Kwalify::Validator.new() method can take a block which is invoked when validation.

validate08.rb : validate script
#!/usr/bin/env ruby

require 'kwalify'
require 'yaml'

## load schema definition
schema = YAML.load_file('answers-schema.yaml')

## create validator for answers
validator = Kwalify::Validator.new(schema) { |value, rule, path, errors|
   case rule.name
   when 'Answer'
      if value['answer'] == 'bad'
         reason = value['reason']
         if !reason || reason.empty?
            msg = "reason is required when answer is 'bad'."
            errors << Kwalify::ValidationError.new(msg, path)
         end
      end
   end
}

## load YAML document
input = ARGF.read()
document = YAML.load(input)

## validate
errors = validator.validate(document)
if errors.empty?
   puts "Valid."
else
   puts "*** INVALID!"
   errors.each do |error|
      # error.class == Kwalify::ValidationError
      puts " - [#{error.path}] : #{error.message}"
   end
end
validate
$ ruby validate08.rb document07a.yaml
Valid.
validate
$ ruby validate08.rb document07b.yaml
*** INVALID!
 - [/answers/1] : reason is required when answer is 'bad'.


Tips

Enclose Key Names in (Double) Quotes

It is allowed to enclose key name in quotes (') or double-quotes (") in YAML. This tip highlights user-defined key names.

schema11a.yaml : enclosing in double-quotes
type:   map
mapping:
  "name":
    required:  yes
  "email":
    pattern:   /@/
  "age":
    type:      int
  "birth":
    type:      date

You may prefer to indent with 1 space and 3 spaces.

schema11b.yaml : indent with 1 space and 3 spaces
type:   map
mapping:
 "name":
    required:  yes
 "email":
    pattern:   /@/
 "age":
    type:      int
 "birth":
    type:      date

JSON

JSON is a lightweight data-interchange format, especially useful for JavaScript. JSON can be considered as a subset of YAML. It means that YAML parser can parse JSON and Kwalify can validate JSON document.

schema12.yaml : an example schema written in JSON format
{ "type": "map",
  "required": true,
  "mapping": {
    "name": {
       "type": "str",
       "required": true
    },
    "email": {
       "type": "str"
    },
    "age": {
       "type": "int"
    },
    "gender": {
       "type": "str",
       "enum": ["M", "F"]
    },
    "favorite": {
       "type": "seq",
       "sequence": [
          { "type": "str" }
       ]
    }
  }
}
document12a.yaml : valid JSON document example
{ "name": "Foo",
  "email": "foo@mail.com",
  "age": 20,
  "gender": "F",
  "favorite": [
     "football",
     "basketball",
     "baseball"
  ]
}
validate
$ kwalify -lf schema12.yaml document12a.yaml
document12a.yaml#0: valid.
document12b.yaml : invalid JSON document example
{
  "mail": "foo@mail.com",
  "age": twenty,
  "gender": "X",
  "favorite": [ 123, 456 ]
}
validate
$ kwalify -lf schema12.yaml document12b.yaml
document12b.yaml#0: INVALID
  - (line 1) [/] key 'name:' is required.
  - (line 2) [/mail] key 'mail:' is undefined.
  - (line 3) [/age] 'twenty': not a integer.
  - (line 4) [/gender] 'X': invalid gender value.
  - (line 5) [/favorite/0] '123': not a string.
  - (line 5) [/favorite/1] '456': not a string.

Anchor

You can share schemas with YAML anchor.

schema13.yaml : anchor example
type:   seq
sequence:
  - &employee
    type:      map
    mapping:
     "given-name": &name
        type:     str
        required: yes
     "family-name": *name
     "post":
        enum:
          - exective
          - manager
          - clerk
     "supervisor":  *employee

Anchor is also available in YAML document.

document13a.yaml : valid document example
- &foo
  given-name:    foo
  family-name:   Foo
  post:          exective
- &bar
  given-name:    bar
  family-name:   Bar
  post:          manager
  supervisor:    *foo
- given-name:    baz
  family-name:   Baz
  post:          clerk
  supervisor:    *bar
- given-name:    zak
  family-name:   Zak
  post:          clerk
  supervisor:    *bar
validate
$ kwalify -lf schema13.yaml document13a.yaml
document13a.yaml#0: valid.

Default of Mapping

YAML allows user to specify default value of mapping.

For example, the following YAML document uses default value of mapping.

A: 10
B: 20
=: -1      # default value

This is equal to the following Ruby code.

map = ["A"=>10, "B"=>20]
map.default = -1
map

Kwalify allows user to specify default rule using default value of mapping. It is useful when key names are unknown.

schema14.yaml : default rule example
type: map
mapping:
  =:              # default rule
    type: number
    range: { max: 1, min: -1 }
document14a.yaml : valid document example
value1: 0
value2: 0.5
value3: -0.9
validate
$ kwalify -lf schema14.yaml document14a.yaml
document14a.yaml#0: valid.
document14b.yaml : invalid document example
value1: 0
value2: 1.1
value3: -2.0
validate
$ kwalify -lf schema14.yaml document14b.yaml
document14b.yaml#0: INVALID
  - (line 2) [/value2] '1.1': too large (> max 1).
  - (line 3) [/value3] '-2.0': too small (< min -1).

Merging Mappings

YAML allows user to merge mappings.

- &a1
  A: 10
  B: 20
- <<: *a1            # merge
  A: 15              # override
  C: 30              # add

This is equal to the following Ruby code.

a1 = {"A"=>10, "B"=>20}
tmp = {}
tmp.update(a1)       # merge
tmp["A"] = 15        # override
tmp["C"] = 30        # add

This feature allows Kwalify to merge rule entries.

schema15.yaml : merging rule entries example
type: map
mapping:
 "group":
    type: map
    mapping:
     "name": &name
        type: str
        required: yes
     "email": &email
        type: str
        pattern: /@/
        required: no
 "user":
    type: map
    mapping:
     "name":
        <<: *name             # merge
        length: { max: 16 }   # override
     "email":
        <<: *email            # merge
        required: yes         # add
document15a.yaml : valid document example
group:
  name: foo
  email: foo@mail.com
user:
  name: bar
  email: bar@mail.com
validate
$ kwalify -lf schema15.yaml document15a.yaml
document15a.yaml#0: valid.
document15b.yaml : invalid document example
group:
  name: foo
  email: foo@mail.com
user:
  name: toooooo-looooong-name
validate
$ kwalify -lf schema15.yaml document15b.yaml
document15b.yaml#0: INVALID
  - (line 4) [/user] key 'email:' is required.
  - (line 5) [/user/name] 'toooooo-looooong-name': too long (length 21 > max 16).