Skip to main content Link Menu Expand (external link) Document Search Copy Copied

Join Partition vs Join Key

Summary

Reports Join stages where the join key does not match the partitioning of the input links.

Description

In the parallel DataStage engine, data is internal split into separate partitions in order to run a number of smaller operations concurrently. This results in faster and more efficient processing, especially when sorting and joining data.

For Hash, Range and Modulus partitioning methods, they use column values to determine the partition which partition each record is placed into. For Same, the actual partitioning method is propagatedfrom upstream, and so we need to traverse up to the preceding stages until a specific method is selected (or until we reach the data source and can go no further).

For a pair of records in the left and right links to join, they must both be placed in the same partition.

If the columns used to determine the partition allocation are not in alignment with the keys used to join the two sources of data, records that are supposed to join may be in different partitions, and therefore not join as expected.

Join KeyJoin PartitionResult
key1key2fail
key1, key2key2fail
key1, key2key1, key2pass
key1, key2key1pass
key1key1pass

Actions

Ensure partitioning methods and columns are in alignment with selected join keys.


Back to top

Copyright © 2017-2024 Data Migrators Pty Ltd.