View allAll Photos Tagged hadoop

Isabel Drost at FOSDEM answering questions about HADOOP.

 

(SOOC image - only cropped)

Each node is configured roughly like this-

- 2 CPUs

- 4 cores per CPU (so 8 cores total)

- 24 GB memory

- 4 x 2 TB hard drives

- GB Ethernet

Presented by O’Reilly and Cloudera, Strata + Hadoop World focuses on how to put big data, cutting-edge data science, and new business fundamentals to work.

The graph represents a network of up to 1000 Twitter users whose recent tweets contained "hadoop". The network was obtained on Monday, 26 March 2012 at 22:31 UTC. There is an edge for each follows relationship. There is an edge for each "replies-to" relationship in a tweet. There is an edge for each "mentions" relationship in a tweet. There is a self-loop edge for each tweet that is not a "replies-to" or "mentions". The earliest tweet in the network was tweeted on Friday, 23 March 2012 at 18:55 UTC. The latest tweet in the network was tweeted on Monday, 26 March 2012 at 19:46 UTC.

 

The graph is directed.

 

The graph's vertices were grouped by cluster using the Clauset-Newman-Moore cluster algorithm.

 

The graph was laid out using the Harel-Koren Fast Multiscale layout algorithm.

 

The edge colors are based on relationship values. The vertex sizes are based on followers values.

 

Overall Graph Metrics:

Vertices: 1000

Unique Edges: 6078

Edges With Duplicates: 1006

Total Edges: 7084

Self-Loops: 886

Connected Components: 237

Single-Vertex Connected Components: 223

Maximum Vertices in a Connected Component: 747

Maximum Edges in a Connected Component: 6752

Maximum Geodesic Distance (Diameter): 9

Average Geodesic Distance: 3.249811

Graph Density: 0.00584284284284284

Modularity: 0.380253

 

Top 10 Vertices, Ranked by Betweenness Centrality:

@cloudera

@hackingdata

@mikeolson

@al3xandru

@bigdata

@tlipcon

@infochimps

@allcloudnews

@merv

@twitteross

 

Top keyword pairs by frequency of mention

V1V2WEIGHT

bigdata219

addshadoop120

mapradds100

hadoopconnectors100

movehighlights50

amazonmove49

highlightshadoop47

hadoophurdles47

cloudcomputing41

@ulitzer#cloud40

#cloud#cloudexpo40

#cloudexpo#cloudcomputing40

#cloudcomputing#bigdata40

#bigdata@cloudexpo40

@cloudexpo@bigdataexpo40

apache#hbase39

dataprocessing33

opensource32

#codemotion#es30

definitiveguide29

 

More NodeXL network visualizations are here: www.flickr.com/photos/marc_smith/sets/72157622437066929/ and here:

www.nodexlgraphgallery.org/Pages/Default.aspx

 

A gallery of NodeXL network data sets is available here: nodexlgraphgallery.org/Pages/Default.aspx?search=twitter

 

NodeXL is free and open and available from www.codeplex.com/nodexl

 

NodeXL is developed by the Social Media Research Foundation (www.smrfoundation.org) - which is dedicated to open tools, open data, and open scholarship.

 

Donations to support NodeXL are welcome through PayPal: www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_bu...

 

The book, Analyzing social media networks with NodeXL: Insights from a connected world, is available from Morgan Kaufmann and from Amazon.

www.amazon.com/gp/product/0123822297?ie=utf8&tag=conn...

Can def get a good shot of the earrings here. So excited I got my ears pierced

Presented by O’Reilly and Cloudera, Strata + Hadoop World focuses on how to put big data, cutting-edge data science, and new business fundamentals to work.

Presented by O’Reilly and Cloudera, Strata + Hadoop World focuses on how to put big data, cutting-edge data science, and new business fundamentals to work.

In this Big Data training candidates will get a practical skill set on Hadoop in detail, along with its core and latest components, like HDFS, MapReduce, Pig, Hive, Impala HBase, Jasper, Sqoop, Flume, Oozie, Zoopkeeper, Spark and Storm. To know more, please visit: www.analytixlabs.co.in/big-data-analytics-hadoop-training...

Presented by O’Reilly and Cloudera, Strata + Hadoop World focuses on how to put big data, cutting-edge data science, and new business fundamentals to work.

1 3 4 5 6 7 ••• 79 80