Hash Functions to Identify Skewed Data

 

On day to day activity we often get situation when the database has a lot of space left but query fails with “2644: No more room in database”.

If any table has severe skewing it can cause the database to run out of space.

Here we will learn how to identify data causing skewness using hash functions.

Following query identifies the database space:

locking table dbc.diskspace for access

 

Example:
locking table dbc.diskspace for access

 

Following query identifies which Vproc has more perm space than the other Vprocs within a database:

locking table dbc.diskspace for access

 

Example:
locking table dbc.diskspace for access

 

From the above query output we can determine which Vproc contains skewed data.

Now we will isolate which table has highest skewing on Vproc 31.

Following query identifies which table contains skewing on a specific vproc within a database:

locking table dbc.tablesize for access

 

NOTE: Above query can yield huge data ,you can import it into Excel for easy review of output.

Example:
locking table dbc.tablesize for access

 

Now we will find out the hashbucket which contains the skewed data for table.

The following query identifies which hashbucket on a specifc vproc contains the skewed data for a specific table:

 

NOTE: Use SHOW TABLE to get the Primary Index name for the skewed table.

Example:

 

The following query identifies the actual data that hashes to a specific hashbucket for a specific table.

 

Example:

 

NOTE: Above output shows skewing is caused by NULLs in the Primary Index value, however it could be any data value.

Following query identifies to which AMP “NULLS” hash:

Saumya Maurya
Author: Saumya Maurya

Leave a Reply

Your email address will not be published. Required fields are marked *