Ellie-to-VaultSpeed API integration experience

Intro

A Data Vault (DV) model that doesn’t represent reality is useless.

Your data model needs to represent your business: it needs to contain the entities, attributes and relationships that are familiar to the people who work with you. For example, customers buy products, product purchases are billed to customers, companies store product inventory, etc.

Building a conceptual data model that accurately reflects your business has a lot of value. But don’t try to make it perfect because there’s no such thing as the ideal representation that everyone in the organization agrees on.

Put your conceptual model into practice by incorporating it into the physical Data Vault model. The physical DV model is the actual design blueprint for your relational database, including columns, column lengths, primary keys and foreign keys. It should be based on the actual data in your sources.

This article explains how to easily build a conceptual data model and then incorporate it into the physical DV model, using Ellie and VaultSpeed.

Ellie

I’ve been using ellie.ai to build conceptual data models. Ellie is an intuitive data modeling tool (visual diagramming tool) with enterprise-level data modeling and information architecture features. Moreover, it’s cloud-native, so it’s easy to set up and use, and to collaborate with your team.

Consider the business case of a car-motor-bike products store. This is how they categorized their offer:

To test the tool, I tried to build this model using Ellie. It took me only about 20 minutes to figure out how Ellie works, to add my business concepts to the business glossary and to build this small conceptual model. So is Ellie easy to use? Yes, it is!

Once I had my conceptual model, I wanted to continue by integrating all the source systems into the VaultSpeed Speedshop model. To do that, I harvested all the source metadata, using VaultSpeed. This showed me the model as it exists in the source.

Watch out for the fake Data Vault gap

It might be tempting to base yourself on the pure source metadata to build the DV model, but you would end up with a model that represents only one source and not your actual business. This anti-pattern is often referred to as “fake Data Vault”. Instead, we should now try to map this source structure to the business taxonomy. In VaultSpeed, you can very easily build towards that model using our various Data Vault modeling options such as hub groupings, multi-active satellites, many-to-many links, transactional links, etc...

It would be cool though if I could import the concepts from the conceptual model that I built in Ellie. That would save me some time setting things up in VaultSpeed. This is where API integration comes into play.

API integration

Building integrations between tools is made so much easier when you can use REST APIs. Fortunately, both Ellie and VaultSpeed have well-documented REST APIs in place. In this example, I’ve written a python script that automatically builds hub groups for all the core business concepts in my model. This provides me with a pre-filled canvas for hub groups, saving me the time it would have taken to create them manually in VaultSpeed.

Hooks

To make this work, I needed to add some more metadata objects into my Ellie model.

Luckily, Ellie allows you to create custom objects in the glossary definitions. 3 fields were added as hooks in this integration:

implementation level: this shows whether you want to implement your taxonomy at this level in the data vault
short name: used to name the hub group
abbreviation: used to name the hub group

Python script

Python is the easiest way to translate the Ellie model JSON file into readable input for VaultSpeed’s API endpoints.

The script (you can find it at the bottom of this article) does several things:

authenticate and get the Ellie model JSON
authenticate to VaultSpeed
iterate through the Ellie JSON to find all the elements that are to be implemented as a hub group
automatic hub group creation using the metadata.

See the video below!

https://vimeo.com/687600986

Conclusion

There are 2 main criteria for building a good Data Vault model:

automation to map source models to an integrated Data Vault model
a conceptual data model to make sure that your Data Vault model accurately represents your business.

VaultSpeed’s ‘best of breed’ strategy encourages customers to integrate with other state-of-the-art tools to deliver results. In terms of conceptual data modeling, Ellie certainly fits the description.

Resulting Data Vault model

Python code

import json
import requests

#VARIABLES
#indicate which data vault release (dv_vs_id) you want to apply the script to
#browse data vault releases using https://training-eu.vaultspeed.com/api/data-vault/releases
#script enhancement: automate this into the script using a find datavault release by name/number and add those as variables
dv_vs_id = 8 


#API request to Ellie instance using model ID and API token
#Ellie's API will soon have a call for getting the list of models
token_file = open('ellie_credentials.json','r')
ellie_creds = json.load(token_file)
token = ellie_creds['token']
model = ellie_creds['model']
ellie_url = 'https://vaultspeed.ellie.ai/api/v1/models/'+model+'?token='+token
# print(ellie_url)
ellie_response = requests.get(ellie_url)



#VaultSpeed authentication
creds_file = open('vaultspeed_credentials.json','r')
creds = json.load(creds_file)
# print(creds)
vsp_auth = requests.post("https://training-eu.vaultspeed.com/api/login" , json=creds)
# print(vsp_auth.json())
access_token = vsp_auth.json()['access_token']
auth_headers = {'Authorization' : 'Bearer ' + access_token}
# print(access_token)
data = ellie_response.json()
count=0

# Iterating through the Ellie json
for i in data['model']['entities']:
    #read only elements that have to be implemented as HUB GROUPS in the data vault
    if 'Implementation level' in i['metadata'] and i['metadata']['Implementation level'] == 'DATA VAULT HUB GROUP':
        count +=1
        hub_ms_ids = []
        hub_names =[]

        #digging 2 levels deep for related subentities
        #find the hub_tab_master_slave_ids using VaultSpeed's API
        #data vault (vs) id must be built into a variable
        for j in data['model']['relationships']:
            if  j['sourceEntity']['name'] == i['name'] and j['sourceEntity']['startType'] == 'superType':
                print (j['sourceEntity']['name'] + ' - ' + j['targetEntity']['name'])
                hub = j['targetEntity']['name']
                myurl = 'https://training-eu.vaultspeed.com/api/data-vault/'+str(dv_vs_id)+'/hubs?filter[where][table_short_name_orig]=' + hub.replace(" ", "").upper()
                print (myurl)
                gethubs = requests.get(url=myurl , headers=auth_headers)
                print(gethubs)
                y = int(json.dumps(gethubs.json()[0]['hub_tab_master_slave_id']))
                hub_ms_ids.append(y)
                hub_names.append(hub)

                for r in data['model']['relationships']:
                     if  r['sourceEntity']['name'] == j['targetEntity']['name'] and r['sourceEntity']['startType'] == 'superType':
                         print (r['sourceEntity']['name'] + ' - ' + r['targetEntity']['name'])
                         hub = r['targetEntity']['name']
                         myurl = 'https://training-eu.vaultspeed.com/api/data-vault/'+str(dv_vs_id)+'/hubs?filter[where][table_short_name_orig]=' + hub.replace(" ", "").upper()
                         gethubs = requests.get(url=myurl , headers=auth_headers)
                         y = int(json.dumps(gethubs.json()[0]['hub_tab_master_slave_id']))
                         hub_ms_ids.append(y)
                         hub_names.append(hub)

        #print ("Ellie Done")
        #print ("Hub MS IDS")
        print (hub_ms_ids)
        y = {
        "abbreviated_name" : i['metadata']['Abbreviation'] ,
         "short_name"  : i['metadata']['Short name'],
         "hub_tab_master_slave_ids" : hub_ms_ids,
        "hub_group_id": None
         }
        #create hub group in vaultspeed
        create_hub_groups = requests.post('https://training-eu.vaultspeed.com/api/data-vault/'+str(dv_vs_id)+'/hub-group' , headers=auth_headers, json=y)
        print(create_hub_groups.json())
        # print('Created a Hub Group: ' + i['name'] + ' with entities: ' + str(hub_names))

#close creds file
creds_file.close()
token_file.close()