datasets.ui

One of the nicest features of smartos is the ability to deploy new instances using ZFS datasets. By default, a vanilla smartos install uses the official Joyent dataset image server which exclusively contains only Joyent built datasets. The other option which by now, most of you are probably aware, is datasets.at the community image server. This is what the project FiFo cloud orchestration suite uses as its official dataset image source.

Datasets.at is an incredibly well written dataset server – generously contributed by Daniel Malon aka “MerlinDMC” that not only mirrors all the Joyent images, but is the de-facto repository for a ton of community contributed datasets. Besides hosting and sharing datasets, DSAPID which datasets.at is built with, also features an intuitive web Interface with tools that assists you in building JSON payload files for any dataset you want to deploy.

Why a local image server makes sense

Let me share a little secret – I have been running private self hosted DSAPID servers locally for years – and now so can you!

This means you can automatically mirror all the dataset.at images locally, or selectively choose which images you want to synchronise – as well as host your own custom built images. You can then use your dataset server as your SmartOS server image source or configure project fifo to use it as its image source. DSAPID includes the exact same web ui and JSON payload generator you see when you visit datasets.at.

Just like datasets.at – You may even choose to make your dsapid server publicly accessible by exposing port 80 / 443 and sharing your datasets with the world.

Reliable, Fast & Efficient

Would you believe me if I told you that a full dsapid server with its web interface uses only about 50MB of ram?

Well it does! – for real – it’s written in GOlang and is astoundingly resource efficient and reliable – and has managed to save my bacon on more than one occasion. Even better – it’s trivial to get up and running and uses a logical – flat directory structure to store each dataset without requiring any database. This means all your datasets are accessible and can be backed up or rsynched to other locations with ease.

Being a member of the Team Project-FiFO, I frequently need to reinstall FiFo during testing and pre-release stages. I keep all datasets on a dsapid server and when the need arises to blow a FiFo instance away, I simply reinstall and use my local dsapid server as the image source – then re-pull down all images which only takes a couple of seconds.

What we’re going to cover

In this tutorial we are going to cover how to quickly get a local dsapid server up and running. Then how to automatically choose specific datasets to mirror using fine grained config control that selects via dataset name such as “base64” or via version such as “base64 14.2.0”. We will also cover how to automatically mirror all available datasets and how to get your own custom datasets into your own dsapid server.

Let’s get started

Assuming you are already using datasets.at as your dataset server source, you would simply download the “dsapid 0.6.7” using “imgadm” or if using Project Fifo simply import it. For the rest of this tutorial we will follow the steps as if fifo is not installed and you are using SmartOS from the global zone. Once we have pulled down the dataset we create a JSON payload file and create a new zone via “vmadm”.

imgadm import a0e719d6-4e21-11e4-92eb-2bf6399552e7
cd /opt
vi dsapid.json
vmadm create -f dsapid.json

dsapid.json

{
  "brand": "joyent",
  "image_uuid": "a0e719d6-4e21-11e4-92eb-2bf6399552e7",
  "autoboot": true,
  "alias": "dsapid-server",
  "hostname": "datasets.mydomain.com",
  "resolvers": [
    "8.8.8.8",
    "8.8.4.4"
  ],
  "max_physical_memory": 1024,
  "max_swap": 1024,
  "tmpfs": 1024,
  "quota": 120,
  "nics": [
    {
      "nic_tag": "admin",
      "ip": "10.1.1.21",
      "netmask": "255.255.255.0",
      "gateway": "10.1.1.1",
      "primary": true
    }
  ]
}

In the above payload file we are making sure we specify enough disk space in case you want to mirror every single dataset available. At the time of this post, mirroring all 312 datasets will require about 80GB of space.

Once your zone is up and running we simply login and edit the configuration file. There is a couple of ways of customising this file. Generally, you will probably use 1 or a combination of the following configurations:

vi /data/config.json
  • Mirror all available datasets automatically – nice when you have tons of bandwidth and disk space available.
  • Selectively sync specific datasets that match a certain name or pattern.
  • Selectively sync datasets based on name and version number.

Mirror All datasets from datasets.at

{
  "hostname": "datasets.coolweb.net",
  "base_url": "http://datasets.coolweb.net/",
  "datadir": "/data/files",
  "mount_ui": "/opt/dsapid/ui",
  "users": "/data/users.json",
  "listen": {
    "http": {
      "address": "0.0.0.0:80",
      "ssl": false
    }
  },
  "sync": [
    {
      "name": "official joyent dsapi",
      "active": false,
      "type": "dsapi",
      "provider": "joyent",
      "source": "https://datasets.joyent.com/datasets",
      "delay": "24h"
    },
    {
      "name": "official joyent imgapi",
      "active": false,
      "type": "imgapi",
      "provider": "joyent",
      "source": "https://images.joyent.com/images",
      "delay": "24h"
    },
    {
        "name": "datasets.at repository",
        "active": true,
        "type": "dsapi",
        "provider": "community",
        "source": "http://datasets.at/datasets",
        "delay": "12h"
    }
  ]
}

Sync based on a datasets name and version

{
      "name": "official joyent dsapi",
      "active": true,
      "type": "dsapi",
      "provider": "joyent",
      "source": "http://datasets.joyent.com/datasets?name=base64&version=14.2.0",
      "delay": "24h"
    },
{
      "name": "datasets.at repository",
      "active": true,
      "type": "dsapi",
      "provider": "community",
      "source": "http://datasets.at/datasets?name=base64&version=14.3.0",
      "delay": "24h"
    },
{
      "name": "datasets.onyxit.net",
      "active": true,
      "type": "dsapi",
      "provider": "community",
      "source": "http://datasets.onyxit.net/datasets?name=zurmo&version=1.1",
      "delay": "24h"
    }

As you can see from the examples you are not limited to a single source and you can mix multiple sync sources as needed.

Once you have added your settings and saved the file – just enable the service and you are up and running and DSAPID will start to sync the targets you have specified.

svcadm enable dsapid

All datasets are stored in /data/files/ in a directory that matches the dataset UUID. Within each directory is the gzipped zfs dataset and a manifest file. It really could not be any simpler – now could it?

You should start to see new directories (uuids) in /data/files as your datasets download.

e.g.

[root@datasets ~]# ls /data/files/5becfd74-a70d-11e4-93a6-470507be237c/
centos-6-20150128.zvol.gz  manifest.json

Well done – you now have your own official dataset server with its own web interface. You could even share all your datasets with anyone on the internet if you wanted. To view the web interface – connect to the zone ip or hostname in through a web browser.