Join Partition vs Join Key
Summary
Reports Join stages where the join key does not match the partitioning of the input links.
Description
In the parallel DataStage engine, data is internal split into separate partitions in order to run a number of smaller operations concurrently. This results in faster and more efficient processing, especially when sorting and joining data.
For Hash, Range and Modulus partitioning methods, they use column values to determine the partition which partition each record is placed into. For Same, the actual partitioning method is propagatedfrom upstream, and so we need to traverse up to the preceding stages until a specific method is selected (or until we reach the data source and can go no further).
For a pair of records in the left and right links to join, they must both be placed in the same partition.
If the columns used to determine the partition allocation are not in alignment with the keys used to join the two sources of data, records that are supposed to join may be in different partitions, and therefore not join as expected.
Join Key | Join Partition | Result |
---|---|---|
key1 | key2 | fail |
key1, key2 | key2 | fail |
key1, key2 | key1, key2 | pass |
key1, key2 | key1 | pass |
key1 | key1 | pass |
Actions
Ensure partitioning methods and columns are in alignment with selected join keys.