Hash Functions to Identify Skewed Data

Saumya Maurya September 5, 2018 TERADATA

On day to day activity we often get situation when the database has a lot of space left but query fails with “2644: No more room in database”.

If any table has severe skewing it can cause the database to run out of space.

Here we will learn how to identify data causing skewness using hash functions.

Following query identifies the database space:

locking table dbc.diskspace for access

select databasename as "Database_Name",
sum(maxperm)/(1024*1024*1024) as "Max_Perm (GB)",
sum(currentperm)/(1024*1024*1024) as "Current_Perm (GB)",
(sum(maxperm) - sum(currentperm))/(1024*1024*1024) as "Free_Actual (GB)",
(((sum(maxperm) - sum(currentperm)) * (100)) /
CASE
WHEN sum(maxperm) = 0 THEN 1
ELSE sum(maxperm)
END
) AS "Free_Space %"
from dbc.diskspace
where databasename = 'databasename'
group by databasename
order by 1;

select databasename as "Database_Name",

sum(maxperm)/(1024*1024*1024) as "Max_Perm (GB)",

sum(currentperm)/(1024*1024*1024) as "Current_Perm (GB)",

(sum(maxperm) - sum(currentperm))/(1024*1024*1024) as "Free_Actual (GB)",

(((sum(maxperm) - sum(currentperm)) * (100)) /

CASE

WHEN sum(maxperm) = 0 THEN 1

ELSE sum(maxperm)

END

) AS "Free_Space %"

from dbc.diskspace

where databasename = 'databasename'

group by databasename

order by 1;

Example:
locking table dbc.diskspace for access

select databasename as "Database_Name",
sum(maxperm)/(1024*1024*1024) as "Max_Perm (GB)",
sum(currentperm)/(1024*1024*1024) as "Current_Perm (GB)",
(sum(maxperm) - sum(currentperm))/(1024*1024*1024) as "Free_Actual (GB)",
(((sum(maxperm) - sum(currentperm)) * (100)) /
CASE
WHEN sum(maxperm) = 0 THEN 1
ELSE sum(maxperm)
END
) AS "Free_Space %"
from dbc.diskspace
where databasename = 'testdatabase'
group by databasename
order by 1;

Database_Name Max_Perm (GB) Current_Perm (GB) Free_Actual (GB) Free_Space %
------------  ------------  ------------      ------------     ------------
testdatabase  10,837.80     124.23            10,713.58        98.85

select databasename as "Database_Name",

sum(maxperm)/(1024*1024*1024) as "Max_Perm (GB)",

sum(currentperm)/(1024*1024*1024) as "Current_Perm (GB)",

(sum(maxperm) - sum(currentperm))/(1024*1024*1024) as "Free_Actual (GB)",

(((sum(maxperm) - sum(currentperm)) * (100)) /

CASE

WHEN sum(maxperm) = 0 THEN 1

ELSE sum(maxperm)

END

) AS "Free_Space %"

from dbc.diskspace

where databasename = 'testdatabase'

group by databasename

order by 1;

Database_Name Max_Perm (GB) Current_Perm (GB) Free_Actual (GB) Free_Space %

------------ ------------ ------------ ------------ ------------

testdatabase 10,837.80 124.23 10,713.58 98.85

Following query identifies which Vproc has more perm space than the other Vprocs within a database:

locking table dbc.diskspace for access

select vproc,
currentperm as "Current_Perm"
from dbc.diskspace
where databasename='databasename'
order by 2 desc 1;

select vproc,

currentperm as "Current_Perm"

from dbc.diskspace

where databasename='databasename'

order by 2 desc 1;

Example:
locking table dbc.diskspace for access

select vproc,
currentperm as "Current_Perm"
from dbc.diskspace
where databasename='testdatabase'
order by 2 desc 1;

Vproc           Current_Perm
------          -------------------
31               3,218,931,212.00
2                234,423,654.00
45               234,128,231.00
5                234,213,213.00
7                234,754,232.00
<<..>>

select vproc,

currentperm as "Current_Perm"

from dbc.diskspace

where databasename='testdatabase'

order by 2 desc 1;

Vproc Current_Perm

------ -------------------

31 3,218,931,212.00

2 234,423,654.00

45 234,128,231.00

5 234,213,213.00

7 234,754,232.00

<<..>>

From the above query output we can determine which Vproc contains skewed data.

Now we will isolate which table has highest skewing on Vproc 31.

Following query identifies which table contains skewing on a specific vproc within a database:

locking table dbc.tablesize for access

select vproc,
currentperm as "Current_Perm",
tablename as "Table_Name"
from dbc.tablesize
where databasename='databasename'
and Vproc in (baseline vproc,skewed vproc)
order by 2 desc,3,1;

select vproc,

currentperm as "Current_Perm",

tablename as "Table_Name"

from dbc.tablesize

where databasename='databasename'

and Vproc in (baseline vproc,skewed vproc)

order by 2 desc,3,1;

NOTE: Above query can yield huge data ,you can import it into Excel for easy review of output.

Example:
locking table dbc.tablesize for access

select vproc,
currentperm as "Current_Perm",
tablename as "Table_Name"
from dbc.tablesize
where databasename='testdatabase'
and Vproc in (1,31)
order by 2 desc,3,1

Vproc     Current_Perm    Table_Name
------   ---------------  ------------
1         39,792.00        testtable2
31        23,212.00        testtable2
1         2,793.00         testtable1
31        2,911.00         testtable1
1         793,791.00       testtable
31        1,654,210,212.00 testtable
1         3,702.00         testtable3
31        1,791.00         testtable3
<<..>>

select vproc,

currentperm as "Current_Perm",

tablename as "Table_Name"

from dbc.tablesize

where databasename='testdatabase'

and Vproc in (1,31)

order by 2 desc,3,1

Vproc Current_Perm Table_Name

------ --------------- ------------

1 39,792.00 testtable2

31 23,212.00 testtable2

1 2,793.00 testtable1

31 2,911.00 testtable1

1 793,791.00 testtable

31 1,654,210,212.00 testtable

1 3,702.00 testtable3

31 1,791.00 testtable3

<<..>>

Now we will find out the hashbucket which contains the skewed data for table.

The following query identifies which hashbucket on a specifc vproc contains the skewed data for a specific table:

select distinct(hashrow(primaryindexname))
,count(hashrow(primaryindexname))
from databasename.tablename
where HASHAMP(HASHBUCKET(HASHROW(fld1)))=vproc number
group by 1
order by 2 desc;

select distinct(hashrow(primaryindexname))

,count(hashrow(primaryindexname))

from databasename.tablename

where HASHAMP(HASHBUCKET(HASHROW(fld1)))=vproc number

group by 1

order by 2 desc;

NOTE: Use SHOW TABLE to get the Primary Index name for the skewed table.

Example:

select distinct(hashrow(testpi))
,count(hashrow(testpi))
from testdatabase.testtable
where HASHAMP(HASHBUCKET(HASHROW(fld1)))=31
group by 1
order by 2 desc;

HASHROW(testpi)           Count(HASHROW(testpi))
--------------------      ------------------------
23D54C43                  43124378
E3213D6C                  3482

select distinct(hashrow(testpi))

,count(hashrow(testpi))

from testdatabase.testtable

where HASHAMP(HASHBUCKET(HASHROW(fld1)))=31

group by 1

order by 2 desc;

HASHROW(testpi) Count(HASHROW(testpi))

-------------------- ------------------------

23D54C43 43124378

E3213D6C 3482

The following query identifies the actual data that hashes to a specific hashbucket for a specific table.

select distinct(indexname)
from databasename.tablename
where hashrow(indexname)='hashbucket id'xb;

select distinct(indexname)

from databasename.tablename

where hashrow(indexname)='hashbucket id'xb;

Example:

select distinct(testpi)
from testdatabase.testtable
where hashrow(testpi)='23D54C43'xb;

testpi
-------------------
0

select distinct(testpi)

from testdatabase.testtable

where hashrow(testpi)='23D54C43'xb;

testpi

-------------------

NOTE: Above output shows skewing is caused by NULLs in the Primary Index value, however it could be any data value.

Following query identifies to which AMP “NULLS” hash:

select HASHAMP(HASHBUCKET(HASHROW(0)));

1	select HASHAMP(HASHBUCKET(HASHROW(0)));

Database Tutorials MSSQL, Oracle, PostgreSQL, MySQL, MariaDB, DB2, Sybase, Teradata, Big Data, NOSQL, MongoDB, Couchbase, Cassandra, Windows, Linux

Hash Functions to Identify Skewed Data

About Saumya Maurya

Leave a Reply Cancel reply