Graph Algorithm / Disjoint Set
I am trying to solve this problem but unable to do it fast.
In short - we have a graph (directed) and we want to find out from which node (a set of nodes to select from is given) we can visit the most nodes. A straightforward implementation will be to run DFS/BFS from every node and see how many we can visit. But that it too slow as there are over 5000 nodes in the graph. Running 5000 BFS/DFS will be take very long time.
On the other hand I also get a feeling that this problem may have something to do with Disjoint Set data structure? But I am unable to formulate it that way as in my disjoint set implementation some of the mentioned rules.
Can someone give a hint as to how to approach this problem?
- Find Strongly Connected Components (SCC) using Tarjan's algorithm (O(V+E)), and create the SCC graph.
- Topologically sort the resulting SCC graph (it is a DAG).
- From last to first, find the number of nodes reachable from each component.
- Choose a node which is in a SCC that can reach maximal number of nodes.
Step 3 - elaboration:
(For clarification reasons I will denote a vertex in the original graph as 'node', and a vertex in the SCC graph as 'vertex').
In step 3 you want to find the number of nodes that are reachable from each vertex of your SCC. This can be done by explicitly finding this set, or by finding only the number of nodes:
- Explicitly finding the set of nodes reachable from each vertex: This is pretty much straight forward, each vertex has an associated set of nodes, and you need to find the set associated to each vertex by doing a union on all edges on your SCC graph leading from the current vertex.
- Using inclusion/exclusion to find the number of nodes reachable: Inclusion/exclusion is a technique used to count size of union of sets where the sets might have repeats in them. For example, if you have 2 sets, the size of their union is |A|+|B|- |A[intersection]B|. For 3 sets A,B,C: |A|+|B|+|C|-|A[intersrction]B| - |A[intersection]C| - |B[intersection]C + |A[intersection]B[intersection]C| (and so on) Using inclusion/exclusion - the sets are the previous nodes, and the intersections are based on 2 different vertices that will later link themselves to the same vertex.