Eugene's Blog

I can't believe it's blog!

heya-unify: custom unification

Custom unification in heya-unify allows us to deal with our specific objects in JavaScript, create unification helpers, and even custom logical conditions. It is there to bridge unification with our existing projects.

Looking at the 1st part and the 2nd part of the series is recommended before diving into details.

Custom unification

Unification makes comparing simple objects a cinch no matter how complex they are, and we can easily apply it to JSON-like trees as is. Additionally heya-unify “knows” how to unify other common JavaScript objects: dates, and regular expressions. Yet in Real Life™ we are faced with complications like that:

  • Our objects contain secondary sub-objects, which are there for purely technical reasons, e.g., to cache related objects, and they should be ignored.
  • While objects have many properties only a few of them are useful for identity checks, e.g., a Person object can be compared by its social security number, or its equivalent, rather than name components, date of birth, and other fields.
  • When building a binary tree we may include a reference to a parent object to simplify our algorithms making the whole structure circular (a parent refers to children too), which is not something we want.
  • While JSON-like structures are a huge chunk of data we process (they come from databases, and web services), there are many legitimate scenarios, when we cannot use a naive unification.

How to deal with it? heya-unify has a number of provisions to do it safely and efficiently.

Unifier: the base class

One way to do it is to make our objects unification-aware. In order to do that we should base them on a well-known base Unifier:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
var unify    = require("heya-unify"), // for node.js
    Unifier  = unify.Unifier,
    variable = unify.variable;

// let's create a unification-aware object
function Person(name, dob, tin){
  Unifier.call(this); // calling our base constructor first

  this.name = name;
  this.dob  = dob;    // date of birth
  this.tin  = tin;    // tax identification number
}

// register our base using single inheritance
Person.prototype = Object.create(Unifier);

// because our government assigns unique TINs to individuals
// with similar names and even the same date of birth,
// we can use it for identification purposes disregarding the rest

// our custom unification function
Person.prototype.unify = function(val, ls, rs, env){
  return val instanceof Person && this.tin === val.tin;
};

Or, if we use dcl.js, it is even simpler:

1
2
3
4
5
6
7
8
9
10
11
12
var dcl = require("dcl");

var Person = dcl(Unifier, {
  constructor: function(name, dob, tin){
    this.name = name;
    this.dob  = dob;  // date of birth
    this.tin  = tin;  // tax identification number
  },
  unify: function(val, ls, rs, env){
    return val instanceof Person && this.tin === val.tin;
  }
});

Now we can use Person for unification without worrying, if we have more fields, or even circular references to parents, spouses, children, or whatever data model we have.

One good question is: what kind of object is TIN? Typically it is a string of numbers or symbols, but in some countries it can have its own structure, which can be represented by its own object. How to deal with it? In the above example we assumed that it is a primitive value, likely a string. What if it is not?

Actually this part is super easy. The secret is to delegate unification down the line:

1
2
3
4
5
6
7
8
9
// our new custom unification function
Person.prototype.unify = function(val, ls, rs, env){
  if(val instanceof Person){
    ls.push(this.tin);
    rs.push(val.tin);
    return true;
  }
  return false;
};

Now let’s discuss the unification method. It has following arguments:

  • val is an object, we are asked to unify with. It is up to us to check if it is a suitable target, and how to do the unification.
  • ls and rs are two arrays, which are used for pair-wise unification. A unification method should not read them, or modify existing items. The only proper way to use them is to put more items on them requesting them to be unified later. As a rule, we should put the same number of objects on ls and rs.
  • env is our current environment, which can be used to resolve or bind unification variables. In most cases, it is better to put variables on stacks rather than dealing with them directly.

A unification method should return a boolean value. A falsy value is used for an unsuccessful unifications, and terminates unification immediately, while a truthy value continues processing ls and rs stacks.

val can be “left” or “right” object depending on position of Unifier. It can be called against primitive values, but never against unbound variables.

Note: the very existence of stacks indicates that internally the unification algorithm is implemented iteratively rather than recursively for performance reasons.

Unifier: instanceof

What if our project already has objects, which know nothing about unification? Frequently we cannot afford to rewrite existing code for business reasons. How to deal with it?

No problem: there is a simple way to register our objects without invasive modifications.

unify() function has a property called registry, which is a simple array. All even items are assumed to be constructor functions, while corresponding odd items are unification functions. Let’s add an external unifier for our Person:

1
2
3
4
5
6
7
8
9
10
11
unify.registry.push(
  Person, // our constructor function
  function(l, r, ls, rs, env){
    if(l instanceof Person && r instanceof Person){
      ls.push(l.tin);
      rs.push(r.tin);
      return true;
    }
    return false;
  }
);

Now we can unify all Person objects, and their derivations.

A unification function returns the same values as a unification method described above, and practically the same arguments. The only difference is that the method unified this and val, while the function unifies l and r objects. Only one of those two objects is required to be an object of a declared constructor function. That object cannot be null and its typeof is “object”.

Note: initially registry is not empty. It contains unification functions for primordial JavaScript objects: Array, Date, RegExp. While we can reassign unify.registry completely, it is suggested to copy existing unifiers.

Unifier: a function

While instanceof-based unifiers cover legacy objects, what if we didn’t bother creating them with formal constructor functions? What if we recognize our objects dynamically?

heya-unify has us covered: there is another way to recognize our special objects.

unify() function has another property called filters, which is a simple array too, just like registry. All even items are assumed to be filter functions, while corresponding odd items are unification functions. Let’s add an external unifier for our Person:

1
2
3
4
5
6
7
8
9
10
11
12
13
unify.unifiers.push(
  function(l, r){
    return "tin" in l || "tin" in r;
  },
  function(l, r, ls, rs, env){
    if("tin" in l && "tin" in r){
      ls.push(l.tin);
      rs.push(r.tin);
      return true;
    }
    return false;
  }
);

Now unify() will handle all objects with tin property, provided it makes sense for our project.

A unification function has the same arguments, and returns the same values as in the previous case. A filter function returns a truthy value, if objects can be unified with its unification function. Both objects passed to a filter function cannot be null, and their typeof is “object”.

Note: initially filters is an empty array, but it may change in future versions.

Order of unifications

Now we are ready to describe the exact order of how we unify objects:

  1. Direct unity using ===.
  2. Any object is unify.any (see [the first part][]).
  3. Any object is a unification variable (see [the first part][]).
  4. Any object supports Unifier contract described above.
  5. Checking for matching typeof values.
  6. Checking for both NaN (unlike JavaScript two NaN values are unified successfully).
  7. At this point we pass only non-null objects (typeof is “object”).
  8. We iterate over registry from the beginning.
    • Array, Date, and RegExp are handled here.
  9. We iterate over filters from the beginning.
  10. We unify objects in JSON-like style on per-property basis.

Example: ref()

Sometimes when we have data like this:

1
2
3
4
5
6
7
8
var datum = {
  pos: {
    x: 23,
    y: 42
    // more properties
  }
  // more properties
};

We want to remember x and y inside pos, and pos object itself. Having references to a parent object, and its components makes a lot of sense, because frequently we don’t want to reconstitute a parent object:

  • It is more expensive to recreate an object than to reuse a perfectly good existing object.
  • Remember that while we have x and y, pos can have other properties we are not aware of, and chose to ignore.

Precisely for such cases there is a helper ref(), which allows to name an object, and unify it with a sub-pattern:

1
2
3
4
5
6
7
8
9
10
11
12
var ref = require("heya-unify/unifiers/ref");

var open = unify.open;

var pos = variable(), x = variable(), y = variable();

var pattern = open({
  pos: ref(pos, open({
    x: x,
    y: y
  }))
});

Mission accomplished: after a successful unification we will have three variables pointed to proper sub-objects.

ref() takes two arguments:

  • variable, which is a main variable to unify with. It can be a variable object like in example, or a string name of a variable.
  • value is an arbitrary object pattern, which in turn may contain more variables.

Let’s look at the code of this module to see how it is implemented: ref.js. 27 lines total. The main engine of this module is two first lines of unify() method:

1
2
3
4
5
6
7
8
9
10
11
12
function Ref(variable, value){
  this.variable = typeof variable == "string" ? new Var(variable) : variable;
  this.value = value;
  Var.call(this, this.variable.name);
}
// ...

Ref.prototype.unify = function(val, ls, rs, env){
  ls.push(this.value, this.variable);
  rs.push(val, val);
  return true;
};

As we can see, we unify an outer variable with a value, then an internal pattern with the same value, which binds all variables, and unifies all structures. This is how custom unification can be used to create compound notification statements.

Note: in many cases we can use chained unifications described in When to unify in JS to achieve the same affect, yet ref() allows to do it within one pattern, rather than create several patterns for a chain.

Custom conditions

There is practically no restrictions on implementing custom unification algorithms. For example, it is possible to unify objects of completely different types. This feature is there so we can implement arbitrary logical conditions easily.

heya-unify comes with several helpers, which use unification in such fashion.

matchInstanceOf()

This object matches its counterpart against a list of constructor functions using instanceof:

1
2
3
4
5
6
7
8
9
10
var matchInstanceOf = require("heya-unify/unifiers/matchInstanceOf");

var pattern = {
  date: matchInstanceOf(Date)
  // ...
};

var env = unify(pattern, {
  date: new Date(1961, 3, 12)
});

matchInstanceOf() takes a single argument, which can be a constructor function, or an array of constructor functions. It fails, if its counterpart does not match any of constructors.

matchTypeOf()

This is a sister object of matchInstanceOf(), which uses typeof instead of instanceof:

1
2
3
4
5
6
7
8
9
10
var matchTypeOf = require("heya-unify/unifiers/matchTypeOf");

var pattern = {
  date: matchTypeOf(["object", "string"])
  // ...
};

var env = unify(pattern, {
  date: new Date(1961, 3, 12)
});

matchInstanceOf() takes a single argument, which can be a string, or an array of string, indicating types as returned by typeof operator. It fails, if its counterpart does not match any of types.

matchString()

This object is a specialized matcher, which uses a regular expression to match a string:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
var matchString = require("heya-unify/unifiers/matchString");

var ssn = variable(), last4digits = variable();

var pattern = {
  ssn: matchString(
    /(\d{3})-(\d{2})-(\d{4})/,
    [ssn, unify.any, unify.any, last4digits]
  )
  // ...
};

var env = unify(pattern, {
  ssn: "123-45-6789"
});
console.log(ssn.get(env));         // 123-45-6789
console.log(last4digits.get(env)); // 6789

matchString() takes 3 arguments to construct its object:

  • regexp is a regular expression object, which will be exec() against a string.
  • matches is an optional array of strings, which will be compared against a result of match.
  • props is an optional object, which may define two properties:
    • index is a numeric index of a match.
    • input is an string, which was used for a match.

In order to better understand matches and props please consult RegExp documentation. One good source is RegExp.prototype.exec() on MDN — see result object.

One cool property is that matches and props or their components can be variables! We may use open arrays and objects, if we need a partial match for those properties.

match()

This is a generic matcher, which delegates its unify() method to a function:

1
2
3
4
5
6
7
8
9
10
var match = require("heya-unify/unifiers/match");

var pattern = {
  errorRate: match(function(val){ return typeof val == "number" && val < 0.2; })
  // ...
};

var env = unify(pattern, {
  errorRate: 0.123
});

match() takes a function, which uses the same arguments as unify() method described above, and it is expected to return the same values: truthy/false to pass/reject. It is called in a context of a object returned by match().

Its implementation is trivial (match.js is 19 lines):

1
2
3
4
5
6
7
8
function Match(f){
  this.f = f;
}
// ...

Match.prototype.unify = function(val, ls, rs, env){
  return this.f(val, ls, rs, env);
};

Caveats

heya-unify implements a first order unification algorithm. In practice it means that objects that implement some custom conditions should not be unified with unbound variables. If they do, variables will be bound to them, and instead of actual values, they will point to custom matchers. Arguably it is not that useful.

This restriction will be addressed in future major versions of heya-unify.

Summary

heya-unify provides a rich set of interfaces for custom unification to fit any project. It is equally easy to add the unification library to an existing project, or to write new unification-aware code.

Custom unification is one piece of the puzzle. Other pieces are incomplete objects, and utilities to reconstruct JavaScript objects from unification results. The latter can substitute variables with their values, so a consumer may not know about unification steps at all working with regular JavaScript objects.

Incomplete objects, and utilities are subjects of upcoming blog posts.

Unification posts

All installments will be posted in this list as soon as they go online:

Thank you for your help and suggestions!

Attributions

This post uses image by Louise Leclerc under Creative Commons License.

Comments