Custom PIT (Point in Time) template (1/3)

When building a presentation layer on top of Data Vault, you need a couple of objects:

- computed satellites
- custom PIT
- dimension
- bridges (in many cases)
- facts (based on bridges, links or transactional links).

I covered computed SATs in a previous post, this post will talk about custom PITs.

To define a template, we need to do some prep first. First we need to create a new template.
In this case, with the following settings:

Name: PIT
Description: custom point in time
Signature object: CPIT
Prefix: PIT
Suffix: none
Object type: Table (so will be a persisted object with DDL and ETL template)
Load Type: ALL (so our ETL logic should cover both initial and incremental loads)
Signature schema: BUSINESS VAULT LAYER
DB type: <your desired DB platform>
Base type: Satellite on HUB (so our template will be based on raw vault satellites)

Next, we need to create our target definition. This defines what the object will look like.
In this case, I want to inherit the LOAD_TIMESTAMPs from the respective raw vault SATs and OBJECT_H_KEY from the raw vault HUBs.
We also need to create 4 new attribute targets:
- Snapshot date (to load the PIT snapshot)
- End snapshot date (load the PIT end date)
- PIT HKEY ( to load the PIT hash key)
- SAT_OBJECT_HKEY (to load respective sat HKEYs)

We don't need to define specific signature attributes, we can work with the signature attributes from the raw vault objects.

Next, we need to define for which raw vault objects, this template will generate code. We do this by adding dependent objects for the template.

Finally, we can start writing template code. This template will consist of multiple inline views component groups.

First, we'll build a group to calculate the snapshot dates for which we want to calculate PIT records. There are many options to create a list of snapshots. You could base it on a specific snapshot control table, or you could use a function that creates regular intervals. In this case, we'll base it on the list of load timestamps available in all SATs that will be loaded in the PIT.

Notice that we will loop this over every linked SAT_SRC object in the raw vault, they will all be included in the UNION.

Template POINT_IN_TIME

comp_group_start SNAPSHOTDATES_GROUP INL_V_GRP
componentGroupRepeatedByComponent CPIT

	comp_subgroup_start $ SNAPSHOTDATES
	componentsubgrouprepeatedbycomponent SAT

			consists of UNION set SSDV_TGT
			componentrepeatedbycomponent  CPIT
			With Integrationtype distinsert

				Attribute OBJECT_H_KEY
					expressedBy SAT_SRC$.OBJECT_H_KEY
						expressionRepeatedByColumn SAT_SRC$.OBJECT_H_KEY

				Attribute SNAPSHOT_DATE
					expressedBy SAT_SRC$.LOAD_TIMESTAMP
					expressionRepeatedByColumn SAT_SRC$.LOAD_TIMESTAMP

			consists of source table SAT_SRC$
			componentrepeatedbycomponent SAT$

		comp_subgroup_end

comp_group_end

Second, we will create inline views for all satellites, these inline views will calculate the load end timestamps. We don't have these in our raw vault since we are using insert-only logic. You will see that we created the ENDING_LOAD_TIMESTAMP as an artifact. This means it will not be persisted, but it is used as an intermediate calculated field in the resulting SQL statements. Again, the statement will be repeated for all raw vault SATs.

comp_group_start $ SAT_TGT_GROUP$ INL_V_GRP
componentGroupRepeatedByComponent SAT

	consists of target inline_view SAT_TGT$
	componentrepeatedbycomponent SAT$

			Attribute OBJECT_H_KEY
				expressedBy SAT_SRC$.OBJECT_H_KEY
				expressionRepeatedByColumn SAT_SRC$.OBJECT_H_KEY
			
			Attribute LOAD_TIMESTAMP
				expressedBy SAT_SRC$.LOAD_TIMESTAMP
				expressionRepeatedByColumn SAT_SRC$.LOAD_TIMESTAMP

			Artifact ENDING_LOAD_TIMESTAMP
				expressedBy COALESCE(LEAD(SAT_SRC$.LOAD_TIMESTAMP) OVER (PARTITION BY SAT_SRC$.OBJECT_H_KEY ORDER BY SAT_SRC$.LOAD_TIMESTAMP),GTIMECAST[@#CURRENT_RECORD_LOAD_END_DATE#])
				expressionRepeatedByColumn SAT_SRC$.LOAD_TIMESTAMP					

	consists of source table SAT_SRC$
	componentrepeatedbycomponent SAT$

The last inline view is a query that pulls in the combinations of HKEY and snapshot dates that we already have loaded in the PIT, this allows us to use this template for both initial and incremental logic. In the final insert group, well do a left outer join and filter to insert only NEW records in the PIT.

Continued in second post...

2