Eugene's Blog

I can't believe it's blog!

When to unify in JS

Unification for JS introduced heya-unify — a practical mini library to leverage unification in JavaScript. This post explains when it makes sense to use unification, and gives practical examples of how to use it.

When to unify?

Below is my laundry list for unification. As soon as I see a project, which deals with items on the list, I investigate if it makes sense to use heya-unify.

Matching and transforming

An obvious sweet spot is when we need to inspect deep objects saving sub-objects for future use, and possibly matching some sub-objects. It is worth noting that even complex arbitrary graphs with loops are a fair game. The same goes for sub-objects: it is equally easy to compare and save complex sub-trees as well as primitive objects.

Matching against circular structures is possible, but may require utilizing some simple techniques like variables, open objects, or custom unifiers, which are explained in details later.

NoSQL

Unification is a natural fit for document-based NoSQL databases. Imagine that we have a collection of so-called sparse or incomplete documents. If we use a traditional SQL database for that by mapping our documents to an extended collection of fields, we will have a database with an extraordinary number of nulls, and bad performance. Even a normalization, if it is possible at all, is unlikely to cure the problem.

The situation is even worse, when we don’t have a fixed schema for our data. For example, it should be discovered first by collecting representative data samples. Sometimes we know what we are dealing with, but lack statistics to properly normalize/denormalize our tables, and allocate indicies properly. This situation calls for NoSQL, and unification can help dealing with such dynamic data.

Web services

While NoSQL is an important use case, dealing with web services is even more important. Usually web services are designed to provide generic responses to cover a wide range of possible clients. In most cases they provide more information than we need. Typically we validate their responses, and extract only relevant pieces ignoring the rest.

Unification doesn’t care if our data come as JSON or XML (both are popular choices for web services), because it deals with already instantiated JavaScript structures making it data format-agnostic.

RAD

Unification is a good RAD tool. Instead of writing imperatively a bunch of if statements, and allocating technical variables to remember sub-objects, we just specify a shape we need declaratively. The result is easy to write, to understand, and to modify, if a need arises.

Example: trivial data processing

Let’s assume that we deal with personnel records, and we want to plot annual salary sorted from lowest to highest to see its distribution. For simplicity sake, each position has a base salary, which we can use. If an employee negotiated a custom compensation package, their salary is recorded directly in a personnel record overriding the base salary. Let’s sketch our objects trying to keep the example as minimal as possible:

1
2
3
4
5
6
7
8
9
10
11
var position = {
        title:      "Unit manager", // position's title
        baseSalary: 100000          // base compensation
        // more information
      };
var employee = {
        name:     "Jane Doe",       // employee's name
        position: position,         // employee's position
        salary:   101000            // optional salary information
        // more information
      };

employee is a model for personnel records. It refers to position. In fact, many records can refer to the same position object, and that object may have back links to employees in that position.

Our fictional charting package requires data as an array of following data points:

1
2
3
4
var dataPoint = {
        value:   100000,    // a numeric value to plot
        tooltip: "Jane Doe" // name to identify this data point
      };

First let’s pull in required modules (assuming node.js here, AMD is equally simple):

1
2
3
4
5
6
7
8
9
10
11
var unify    = require("heya-unify"),
    variable = unify.variable;

var preprocess = require("heya-unify/utils/preprocess");
// preprocess() is a helper that can mark all objects as "open".
// It allows to ignore additional properties.

var assemble   = require("heya-unify/utils/assemble");
// assemble() is a helper that clones objects resolving
// unification variables. The result can be consumed by
// JavaScript unaware of unification specifics.

Now we have everything. Let’s write our variables and patterns:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
var salary = variable(),
    base   = variable(),
    name   = variable();

// case #1: base package
var basic = preprocess({
        name: name,
        position: {
          baseSalary: base
        }
      }, true); // preprocess() makes all objects "open".
                // It allows to ignore additional properties.

// case #2: custom package
var custom = preprocess({
        salary: salary
      }, true);

Let’s write what we want to see at the end:

1
2
3
4
var datum = {
        value:   salary,
        tooltip: name
      };

Now we are ready to write the code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// employees is an input array of personnel records
// data is an output array for our charting package

var data = employees.map(function(employee){
        var env = unify(basic, employee);
        if(!env){
          console.log("Can't match:", employee);
          return null;
        }
        if(!unify(custom, employee, env)){
          // there is no custom salary => let's use the base salary
          unify(salary, base, env);
        }
        return assemble(datum, env);
      }).filter(function(employee){
        return employee; // only non-null data points are accepted
      }).sort(function(a, b){ return a.value - b.value; });

That’s it. Note that if our data format was changed (input or output), all we need to do is to update declarative structures. If we have more patterns to determine salary, they are straightforward to add.

Note: unification is an all-or-nothing proposition, so when chaining several unifications using the same environment, it is prudent to discard one, if there is a failure anywhere in a chain. The reason is simple: if we have several variables in a pattern, some of them can be bound before failing on other matches. In this particular case we have exactly one variable, and no other matches, so we are safe.

While we are already familiar with unify() and variable(), preprocess() and assemble() are new. They are simple helpers, which will be explained in details later.

Example: Flickr and Google Maps

Flickr is a popular repository for photos, which provides a rich API to use it as a web service. Let’s use it as an example. I would love to write a live code for it, but API requires a special key, so instead of dealing with provisioning, and risking people reusing the key, I will provide snapshots of data. You can always follow examples with Flickr’s interactive API Explorer.

Let’s look for the most interesting public photo, which has a location information, show the resulting picture, and the corresponding map using Google Maps.

Relevant docs:

  • flickr.photos.search
    • API Explorer: flickr.photos.search
    • Search parameters:
      • license should be anything but “All Right Reserved” — otherwise we cannot publish our result. This parameter should list all allowed values for a license (1-8).
      • sort is interestingness-desc so the most interesting photo is first.
      • media is photo. We don’t want to deal with video, or anything else.
      • extras is geo, so we have coordinates. Please note that it does not guarantee that coordinates are available. I saw 0 in them on many occasions. We will guard against it.
  • Flickr URLs
  • Static Maps API V2 Developer Guide

Using API Explorer we can query data manually. Using above search parameters it formed a long URL. The results arrive in a following format:

1
2
3
4
5
6
{ "photos": { "page": 1, "pages": "2910670", "perpage": 100, "total": "291066924",
    "photo": [
      { "id": "14016692823", "owner": "[email protected]", "secret": "b3cc697cb9", "server": "7336", "farm": 8, "title": "Anywhere Is: The Another Sky", "ispublic": 1, "isfriend": 0, "isfamily": 0, "latitude": -5.954698, "longitude": 39.368305, "accuracy": 16, "context": 0, "place_id": "XF41Ka5QVr5UmuTK", "woeid": "1443630", "geo_is_family": 0, "geo_is_friend": 0, "geo_is_contact": 0, "geo_is_public": 1 },
      { "id": "13250377165", "owner": "[email protected]", "secret": "c7c51feb94", "server": "7069", "farm": 8, "title": "'Under the Stars' - Moelfre, Anglesey", "ispublic": 1, "isfriend": 0, "isfamily": 0, "latitude": 53.352318, "longitude": -4.255946, "accuracy": 16, "context": 0, "place_id": "UfWnkGJTW7vp4Q", "woeid": "29167", "geo_is_family": 0, "geo_is_friend": 0, "geo_is_contact": 0, "geo_is_public": 1 }
      // 98 more items were removed for brevity
    ] }, "stat": "ok" }

We will form two URLs so we can show the most interesting geocoded photo, and its location. But first let’s deal with patterns:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
// our variables

var photo  = variable("photo"),
    title  = variable("title"),
    farm   = variable("farm"),
    server = variable("server"),
    user   = variable("user"),
    id     = variable("id"),
    secret = variable("secret"),
    latitude  = variable("latitude"),
    longitude = variable("longitude");

// We named variables explicitly, so we can use them directly
// by names later. We could use any name, but in this particular
// case it is easier to reuse API names to avoid any confusion.

// patterns

var validResponse = preprocess({
        stat: "ok",    // we need a valid response
        photos: {
          photo: photo // the array of photos
        }
      }, true),

    noLocation = preprocess({
        latitude:  0, // both zeros means no location is available
        longitude: 0
      }, true),

    data = preprocess({
        title:  title,
        farm:   farm,
        server: server,
        owner:  user,
        id:     id,
        secret: secret,
        latitude:  latitude,
        longitude: longitude
      }, true);

This is our data processing code, which includes a simple error handling:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// response is JSON we received from flickr.

var env = unify(response, validResponse);
if(!env){
  console.log("The response is invalid!");
  return;
}

var found = photo.get(env).some(function(pic){
        if(unify(pic, noLocation)){
          return false; // next picture
        }
        env = unify(pic, data);
        return true; // stop
      });
if(!found){
  console.log("No photos with location!");
  return;
}

// at this point env contains values we need

Now we can form our URLs:

1
2
3
4
5
6
7
8
9
10
11
12
var replace = require("heya-unify/utils/replace");
// replace() is a simple text templating tool, which pulls
// unification variables from an environment by name.

// Below we use names of our variables in templates.

var picUrl  = replace("http://farm${farm}.staticflickr.com/" +
                "${server}/${id}_${secret}.jpg", env),
    pageUrl = replace("http://www.flickr.com/photos/" +
                "${user}/${id}", env),
    mapUrl  = replace("http://maps.googleapis.com/maps/api/staticmap?" +
                "size=500x500&zoom=11&center=${latitude},${longitude}", env);

When I tried this code I received following results: the most interesting photo was Anywhere Is: The Another Sky by Sergey Golyshev:

"Anywhere Is: The Another Sky"

It was taken in Zanzibar’s Kichwele Forest Reserve:

Apparently astronomers are a very active community, and they like a good photo of a night sky!

Tip: chained unifications

One frequent use case is chained unifications, when instead of a single pair, a set of pairs are unified passing information using common unification variables. This way we can ensure that our data satisfies an arbitrary number of conditions rather than a simple match. Surely we have a helper for that, don’t we?

No, we don’t, because we don’t need it. Instead we should assemble our pairs in two arrays, and unify those. For example, in order to test that three given employees have the same manager we may write a function:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
var open = unify.open;

var manager = variable();

var pattern = open({
  manager: manager
});

function sameManager(employee1, employee2, employee3){
  // takes three employee records, and returns true/false
  var env = unify(employee1, pattern);
  if(!env) return null;
  // now our manager variable is bound,
  // and we simply test the rest
  env = unify(employee2, pattern);
  if(!env) return null;
  return !!unify(employee3, pattern, env);
}

Interesting fact is that here we don’t care about an actual manager object. It can be anything unifiable.

We can make sameManager() smaller, and more efficient, by using simple arrays:

1
2
3
4
function sameManager(employee1, employee2, employee3){
  return !!unify([employee1, employee2, employee3],
                 [pattern,   pattern,   pattern  ]);
}

The first unification will bind manager in pattern, and the next two will ensure that it is the same for the rest of employees. As easy as pie.

Summary

We explained typical unification use cases, and saw realistic examples of how to use it to our advantage. heya-unify provides helpers that make using unification a breeze.

Generally working with heya-unify revolves around a super simple recipe with two steps:

  1. Write down patterns you want to find in input data.
    • Patterns will look like data we model.
    • We may use variables instead of actual values.
    • We may use custom unifiers for our objects.
  2. Optionally write down patterns we want to recreate from our matched data.
    • Patterns will look like data we model.
    • We should use variables to fill in actual values.
    • We can recreate JavaScript objects, or strings using included helpers.

This recipe is a natural fit for RAD.

The next post will be about custom unification with numerous examples.

Unification posts

All installments will be posted in this list as soon as they go online:

Thank you for your help and suggestions!

Attributions

This post uses image by mcknight424 under Creative Commons License.

Comments