Enhancing Data Vault Performance with Vaultspeed DDL Settings

Dear Vaultspeed Community,

Today, we are excited to share some valuable insights on optimizing and boosting the speed of your Data Vault within the Greenplum Database using Vaultspeed's powerful DDL settings.

Obviously every Data Warehouse is different. And depending on the amount of tables, columns and data, you might wish to alter these recommendation to fit your need.

DDL Settings Overview (applicable to Greenplum):

  1. Storage Options:

    • Choose between Row-oriented and Column-oriented storage for efficient data handling.

    • Opt for Heap Storage or Append-Optimized Storage based on your specific requirements.

  2. Compression:

    • Utilize compression to reduce storage space and enhance overall performance.

  3. Distribution Key:

    • Configure the Distribution Key using specific attribute types tailored to your use case.

  4. Partitioning:

    • Customize partitioning using specific attribute types to further streamline data management.

Distribution Strategies:

Vault Layer						Distribution

	EXT DISTRIBUTION				DISTRIBUTED RANDOMLY
	STG DISTRIBUTION				DISTRIBUTED RANDOMLY
	FL HUB DISTRIBUTION				DISTRIBUTED BY (OBJECT_H_KEY)
	FL SAT DISTRIBUTION				DISTRIBUTED BY (OBJECT_H_KEY)
	FL SAT_TEMP DISTRIBUTION		DISTRIBUTED RANDOMLY
	FL LNK DISTRIBUTION				DISTRIBUTED BY (OBJECT_P_H_KEY)
	FL LKS DISTRIBUTION				DISTRIBUTED BY (OBJECT_H_KEY)
	FL LKS_TEMP DISTRIBUTION		DISTRIBUTED RANDOMLY
	FL LND DISTRIBUTION				DISTRIBUTED BY (OBJECT_H_KEY)
	FL LDS DISTRIBUTION				DISTRIBUTED BY (OBJECT_H_KEY)
	FL LDS_TEMP DISTRIBUTION		DISTRIBUTED RANDOMLY
	FL REF DISTRIBUTION				DISTRIBUTED REPLICATED


Partitioning with Load End Date:

Vault Layer						Partitioning
	
	EXT DISTRIBUTION				/
	STG DISTRIBUTION				/
	FL HUB DISTRIBUTION				PARTITION BY RANGE(LOAD_END_TIMESTAMP)
	FL SAT DISTRIBUTION				PARTITION BY RANGE(LOAD_END_TIMESTAMP)
	FL SAT_TEMP DISTRIBUTION		/
	FL LNK DISTRIBUTION				PARTITION BY RANGE(LOAD_END_TIMESTAMP)
	FL LKS DISTRIBUTION				PARTITION BY RANGE(LOAD_END_TIMESTAMP)
	FL LKS_TEMP DISTRIBUTION		/
	FL LND DISTRIBUTION				PARTITION BY RANGE(LOAD_END_TIMESTAMP)
	FL LDS DISTRIBUTION				PARTITION BY RANGE(LOAD_END_TIMESTAMP)
	FL LDS_TEMP DISTRIBUTION		/	
	FL REF DISTRIBUTION				/

Partitioning with Insert-Only Logic:

Vault Layer						Partitioning
	EXT DISTRIBUTION	
	STG DISTRIBUTION	
	FL HUB DISTRIBUTION				PARTITION BY RANGE(LOAD_TIMESTAMP)
	FL SAT DISTRIBUTION				PARTITION BY RANGE(LOAD_TIMESTAMP)
	FL SAT_TEMP DISTRIBUTION
	FL LNK DISTRIBUTION				PARTITION BY RANGE(LOAD_TIMESTAMP)
	FL LKS DISTRIBUTION				PARTITION BY RANGE(LOAD_TIMESTAMP)
	FL LKS_TEMP DISTRIBUTION
	FL LND DISTRIBUTION				PARTITION BY RANGE(LOAD_TIMESTAMP)
	FL LDS DISTRIBUTION				PARTITION BY RANGE(LOAD_TIMESTAMP)
	FL LDS_TEMP DISTRIBUTION	
	FL REF DISTRIBUTION	

Additional Recommendations:

  • Create Indexes:

    • Implement indexes on large tables to enhance query performance.

  • Vacuum & Analyze:

    • Regularly perform Vacuum and Analyze operations to optimize storage and maintain peak database performance.

By leveraging these DDL settings and distribution strategies, you can significantly improve the efficiency of your Data Vault on the Greenplum Database. Feel free to reach out if you have any questions or need further assistance.

Wishing you unparalleled speed and efficiency in your Data Vault endeavors!

Best regards,

DataSense Team

1
1 reply