Abstract. Given a snapshot of a large graph, in which an infection has been spreading for some time, can we identify those nodes from which the infection started to spread? In other words, can we reliably tell who the culprits are? In this paper we answer this question affirmatively, and give an efficient method called NetSleuth for the wellknown SusceptibleInfected virus propagation model.
Essentially, we are after that set of seed nodes that best explain the given snapshot. We propose to employ the Minimum Description Length principle to identify the best set of seed nodes and virus propagation ripple, as the one by which we can most succinctly describe the infected graph.
We give an efficient algorithm to identify likely sets of seed nodes given a snapshot. Next, we show how given a set of seed nodes we can optimize the virus propagation ripple in a principled way by maximizing likelihood. All combined, NetSleuth can automatically identify the correct number of seed nodes, as well as which nodes are the culprits.
Experiments show we obtain high accuracy in the detection of seed nodes, in addition to the correct automatic identification of their number. Moreover, we show NetSleuth scales linearly in the number of nodes of the graph.
Implementation
Related Publications
Reconstructing an Epidemic over Time. In: Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pp 18351844, ACM, 2016. (18.1% acceptance rate) 

Hidden Hazards: Finding Missing Nodes in Large Graph Epidemics. In: Proceedings of the SIAM International Conference on Data Mining (SDM), pp 415423, SIAM, 2015. 

Efficiently Spotting the Starting Points of an Epidemic in a Large Graph. Knowledge and Information Systems vol.38(1), pp 3559, Springer, 2014. (IF 2.225) 

Spotting Culprits in Epidemics: How many and Which ones?. In: Proceedings of the IEEE International Conference on Data Mining (ICDM), pp 1120, IEEE, 2012. (full paper, 10.7% acceptance rate; overall 20%) 