Dear Vaultspeed Community,
Today, we are excited to share some valuable insights on optimizing and boosting the speed of your Data Vault within the Greenplum Database using Vaultspeed's powerful DDL settings.
Obviously every Data Warehouse is different. And depending on the amount of tables, columns and data, you might wish to alter these recommendation to fit your need.
DDL Settings Overview (applicable to Greenplum):
Storage Options:
Choose between Row-oriented and Column-oriented storage for efficient data handling.
Opt for Heap Storage or Append-Optimized Storage based on your specific requirements.
Compression:
Utilize compression to reduce storage space and enhance overall performance.
Distribution Key:
Configure the Distribution Key using specific attribute types tailored to your use case.
Partitioning:
Customize partitioning using specific attribute types to further streamline data management.
Distribution Strategies:
Vault Layer Distribution
EXT DISTRIBUTION DISTRIBUTED RANDOMLY
STG DISTRIBUTION DISTRIBUTED RANDOMLY
FL HUB DISTRIBUTION DISTRIBUTED BY (OBJECT_H_KEY)
FL SAT DISTRIBUTION DISTRIBUTED BY (OBJECT_H_KEY)
FL SAT_TEMP DISTRIBUTION DISTRIBUTED RANDOMLY
FL LNK DISTRIBUTION DISTRIBUTED BY (OBJECT_P_H_KEY)
FL LKS DISTRIBUTION DISTRIBUTED BY (OBJECT_H_KEY)
FL LKS_TEMP DISTRIBUTION DISTRIBUTED RANDOMLY
FL LND DISTRIBUTION DISTRIBUTED BY (OBJECT_H_KEY)
FL LDS DISTRIBUTION DISTRIBUTED BY (OBJECT_H_KEY)
FL LDS_TEMP DISTRIBUTION DISTRIBUTED RANDOMLY
FL REF DISTRIBUTION DISTRIBUTED REPLICATED
Partitioning with Load End Date:
Vault Layer Partitioning
EXT DISTRIBUTION /
STG DISTRIBUTION /
FL HUB DISTRIBUTION PARTITION BY RANGE(LOAD_END_TIMESTAMP)
FL SAT DISTRIBUTION PARTITION BY RANGE(LOAD_END_TIMESTAMP)
FL SAT_TEMP DISTRIBUTION /
FL LNK DISTRIBUTION PARTITION BY RANGE(LOAD_END_TIMESTAMP)
FL LKS DISTRIBUTION PARTITION BY RANGE(LOAD_END_TIMESTAMP)
FL LKS_TEMP DISTRIBUTION /
FL LND DISTRIBUTION PARTITION BY RANGE(LOAD_END_TIMESTAMP)
FL LDS DISTRIBUTION PARTITION BY RANGE(LOAD_END_TIMESTAMP)
FL LDS_TEMP DISTRIBUTION /
FL REF DISTRIBUTION /
Partitioning with Insert-Only Logic:
Vault Layer Partitioning
EXT DISTRIBUTION
STG DISTRIBUTION
FL HUB DISTRIBUTION PARTITION BY RANGE(LOAD_TIMESTAMP)
FL SAT DISTRIBUTION PARTITION BY RANGE(LOAD_TIMESTAMP)
FL SAT_TEMP DISTRIBUTION
FL LNK DISTRIBUTION PARTITION BY RANGE(LOAD_TIMESTAMP)
FL LKS DISTRIBUTION PARTITION BY RANGE(LOAD_TIMESTAMP)
FL LKS_TEMP DISTRIBUTION
FL LND DISTRIBUTION PARTITION BY RANGE(LOAD_TIMESTAMP)
FL LDS DISTRIBUTION PARTITION BY RANGE(LOAD_TIMESTAMP)
FL LDS_TEMP DISTRIBUTION
FL REF DISTRIBUTION
Additional Recommendations:
Create Indexes:
Implement indexes on large tables to enhance query performance.
Vacuum & Analyze:
Regularly perform Vacuum and Analyze operations to optimize storage and maintain peak database performance.
By leveraging these DDL settings and distribution strategies, you can significantly improve the efficiency of your Data Vault on the Greenplum Database. Feel free to reach out if you have any questions or need further assistance.
Wishing you unparalleled speed and efficiency in your Data Vault endeavors!
Best regards,
DataSense Team