Consider a very large undirected graph of Email Networks. The nodes (or vertices) represent email addresses, and an edge represents the fact that there was at least one email in at least one direction between the two addresses.

Question

Which technique will you use for representation of the above mentioned Graph in a computer program for its manipulation?

MathMate · Answer

Hints:
Properties of the given dataset:

1. Could have a large number of nodes.
2. Generally very large and sparce.
3. Requires relatively efficient search algorithm given a node.
4. In general, contains cycles.

Since cycles are present, we can safely eliminate trees. Lists are not useful since the number of branches is not constant. Consider matrix (each column/row is a node, intersection determines edges) but matrices are not space efficient.
Hash tables should provide an efficient storage, as well as rapid searches.

Khadija · Answer

I would use 2 flat tables. Address (AddressID*, Email) Connection (Address1*, Address2*) Address1 will be the ID number that is numerically lower of the pair.