A disjoint set, also known as a union-find or merge-find data structure, is a collection of non-overlapping (disjoint) sets that supports two primary operations efficiently:
- Union: Join two sets together.
- Find: Determine which set a particular element belongs to.
Disjoint sets are commonly used in various applications, such as network connectivity, image processing, clustering, and Kruskal's algorithm for finding the minimum spanning tree in a graph.
Key Concepts
- Set Representation: Each element points to a "parent" element, with the root element representing the set.
- Path Compression: A technique used in the find operation to flatten the structure of the tree whenever
find
is called, making subsequent queries faster. - Union by Rank: A technique used in the union operation to ensure that the smaller tree is always added under the root of the larger tree, thereby keeping the structure balanced.
Basic Operations
-
Find(x):
- Returns the representative (or root) of the set containing
x
. Ifx
is not the root, the function recursively finds the root ofx
and applies path compression to speed up future queries.
- Returns the representative (or root) of the set containing
-
Union(x, y):
- Merges the sets that contain
x
andy
. It finds the roots of each set and attaches one root to the other based on rank (or size).
- Merges the sets that contain
Example Implementation (Python)
Here's a simple implementation of a disjoint set:
class DisjointSet:
def __init__(self, size):
self.parent = list(range(size))
self.rank = [1] * size
def find(self, x):
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x]) # Path compression
return self.parent[x]
def union(self, x, y):
rootX = self.find(x)
rootY = self.find(y)
if rootX != rootY:
# Union by rank
if self.rank[rootX] > self.rank[rootY]:
self.parent[rootY] = rootX
elif self.rank[rootX] < self.rank[rootY]:
self.parent[rootX] = rootY
else:
self.parent[rootY] = rootX
self.rank[rootX] += 1
# Example usage:
ds = DisjointSet(10)
ds.union(1, 2)
ds.union(2, 3)
print(ds.find(1)) # Output: 3 (or 1 depending on union structure)
print(ds.find(2)) # Output: 3
print(ds.find(3)) # Output: 3
Complexity
- Find: Nearly constant time (O(α(n)), where α is the inverse Ackermann function)
- Union: Nearly constant time
Applications
- Network connectivity
- Kruskal's algorithm for minimum spanning trees
- Image segmentation
- Keeping track of connected components in dynamic graphs
The disjoint set structure is very efficient for handling dynamic connectivity queries and has become a fundamental algorithmic tool in computer science.