Back to photostream

Quantifying paedophile users on a P2P system

P2P systems are known to host a large amount of paedophile activity. Thus, quantifying the number of paedophile users on a P2P system is crucial, formany reasons: easy access to such content is a major societal concern, policymaking and law-enforcement budgeting rely on this figure and the spreading ofonline paedophilia may influence real-world behaviors [1].

 

However, it is very challenging to deal with this issue. One must obtain andprocess large-scale data and cope with the high dynamicity of users in thesystem: they leave shortly after they have arrived. Plus, identifying users iseven a hard work in itself: several users may use the same computer or IPaddress and one may use several computers.

 

We focused our study on the eDonkey system. We performed a ten-weekmeasurement on a server to build a dataset [2] with 127millions of queries submitted by users to the search engine. We then thoroughlydesigned a paedophile query detection tool, which we evaluated [3]. We consider that a user becomes paedophile as soon as a paedophilequery originates from its identifier. We assess here two methods of identifyingusers: one based on their IP addresses only, whereas the second one makes use ofthe IP address and the port number.

 

This plot shows the ratios of paedophile queries, of paedophile IP addressesand of paedophile (IP, port) discovered from the beginning of the measurement.The plot of the ratio of paedophile IP addresses (red plot) clearly grows with the measurementduration. This reveals a pollution phenomenon: since IP addresses may hostdifferent users over the measurement and since a single paedophile user issufficient to consider an IP as paedophile, then the probability that any givenIP address is considered as paedophile grows with measurement time -- all IPaddresses may eventually be considered as paedophile. This confirms that usingIP addresses only is misleading in this case. Conversely, using both IP address and port (green plot)number gives a very different plot: it rapidly reaches a steady regime, verysimilar to the fraction of paedophile queries (blue plot). This showsthat pollution due to dynamic allocation of addresses and ports, and the changeof users on the same computers, has a negligible impact in a measurement of suchscale and duration.

 

We may then conclude that distinguishing users by their IP address and portis sufficient in our measurement (whereas IP address only is clearly not sufficient).

 

[1] Kim C., From Fantasy to Reality: The Link Between Viewing Child Pornography and Molesting Children, Prosecutor 39(2): 17-18,20,47, 2005.

 

[2] F. Aidouni, M. Latapy, and C. Magnien. Ten weeks in the life of an edonkey server.Proceedings of HotP2P'09, 2009.

 

[3] Matthieu Latapy, Clémence Magnien and RaphaëlFournier, Automatic Detection of Paedophile Queries, Technical report, Measurement and Analysis of P2P Activity Against Paedophile Content project.

 

The Complex Networks Team

372 views
0 faves
0 comments
Uploaded on November 22, 2010
Taken on November 22, 2010