That illustrate the connections among people who tweet the term “#ecomm2010″, scaled by the number of followers.
Abstract: Social network analysis (SNA) is a powerful method for gaining insight into the massive collections of connections created when many people connect to one another through mobile devices. SNA has been widely applied to desktop social media and is moving into the mobile world. Prominent studies of the “call graph” have been produced at national scales.
Mobile providers are applying SNA to identify key subscribers who can reduce churn and help gain adoption of new services and products. Network analysis has historically had a steep learning curve, but now new tools are making SNA easier for less technical users. This talk will describe social network concepts and their application to mobile data sets. A free and open add-in for the popular Excel 2007 spreadsheet called NodeXL (http://www.codeplex.com/nodexl) can perform many complex SNA tasks like data import, scrubbing, metrics calculation, clustering, and visualization. Applying this tool to call graph and subscriber data sets can reveal key positions in the network that can attract and hold other subscribers in the system.
Examples of network analysis of social media and mobile data sets can be found on the Connected Action blog (http://www.connectedaction.net).
NodeXL has a number of data importers that can create a network of connections from social media data sources like Twitter, YouTube, flickr, email, and the WWW (along with a number of other data import formats like GraphML, UCINet, CSV, and other Excel workbooks with data).
To create a network you just select the search terms and configurations you want from the NodeXL>Data>Import menu.
If you want to create the same network every day (or at any schedule), a recent feature (since version .125) of NodeXL can help. NodeXLNetworkServer.exe is an application that ships with NodeXL along with a sample configuration file called SampleNetworkConfiguration.xml. By editing the configuration file you can set NodeXL to collect anything available in the menu through Excel. So far we have exposed the two Twitter data collectors (more on the way) so the configuration file asks you to select a search term or a user’s name, the size of the network and the details you want reported along with the location and name of the destination file that NodeXL will create. Answer these questions by editing the config file and save it with a useful name that includes the search term.
A few days before the conference started the #CAT10 twitter social network map looked like this:
26 July 2010 NodeXL Twitter map of the connections among people who tweet “#CAT10″ the hashtag for this year’s Catalyst conference.
This is the list of the most “between” contributors in the #CAT10 Twitter graph on July 26, 2010.
A few days later, as people began to arrive at the conference, the graph became far more dense and populous.
The network of #CAT10 mentioning users in Twitter has become much more dense, with more people and more connections among them as people reply, retweet, follow, and mention one another.
While the core people in this list are similar to the list generated a few days earlier, several people have shifted position.
Filtering the graph, we can remove all but the most between people to reveal the core members of the community.
These people are likely to play an influential role in the #CAT10 community.
This is a “pinwheel” diagram using the author’s Facebook personal network (captured July 15, 2009).
Nodes represent the author’s friends and links represent friendships among them. The author is not shown. Each ‘wing’ radiating outwards is a partition using a greedy community detection algorithm (Wakita and Tsurumi, 2007). Wings are manually labelled. Node ordering within each wing is based on degree. Node color and size is also based on degree. Nodes position is based on a polar coordinate system: each node is on an equal angle of n/360º with a radius being a log-scaled measure of betweenness. Higher values are closer to the center indicating a sort of cross-partition ‘gravity’.
This layout has several notable features:
- The angle of each wing is proportionate to its share of the network. Thus 25 percent of nodes go from 0 to 90º.
- Partitions are distinguished by their position rather than a node’s color or shape.
- The tail indicates the periphery of each partition. A wing with many tail nodes indicates many people who are only tied to other group members.
- Edges crossing the center show between-partition connections. Since nodes are sorted by degree it is easy to see if edges originate from the most highly connected nodes or the entire partition.
Bring a laptop (running Windows and Office 2007 or 2010) to this workshop and you can be analyzing a social media network from systems like Twitter, flickr, YouTube and your own email by the end of the day. If you can make a pie-chart in Excel, using the free and open NodeXL (http://nodexl.codeplex.com) you can now make a rich network graph from data extracted from social media systems and other common formats. If you have a network, bring it, if not you can bring a suggested topic that we can map during the course of the day.
Even if you leave your laptop behind or have a Mac (sorry, no version is yet available for MacOS – unless you have a virtual machine with Windows and Office) this workshop will introduce the core concepts of network science with application to social networks in general and social media networks in particular. Applied to a range of topics and services, social media network maps can illuminate a variety of “publics” – populations who share a common interest and may share connections. Maps of topics like “oil spill”, “global warming” and other issue and event related keywords can reveal the groups and factions that cluster around different concepts and terms. Key contributors in these maps can be identified through the application of network measurements that capture various aspects of a person’s location in a network graph.
Businesses, entrepreneurs, individuals, and government agencies alike are looking to social network analysis (SNA) tools for insight into trends, connections, and fluctuations in social media. Microsoft’s NodeXL is a free, open-source SNA plug-in for use with Excel. It provides instant graphical representation of relationships of complex networked data. But it goes further than other SNA tools—NodeXL was developed by a multidisciplinary team of experts that bring together information studies, computer science, sociology, human-computer interaction, and over 20 years of visual analytic theory and information visualization into a simple tool anyone can use. This makes NodeXL of interest not only to end-users but also to researchers and students studying visual and network analytics and their application in the real world. NodeXL has the unique feature that it imports networks from Outlook email, Twitter, flickr, YouTube, WWW, and other sources, plus it offers a rich set of metrics, layouts, and clustering algorithms. This talk will describe NodeXL and our efforts to start the Social Media Research Foundation.
The NodeXL team has just released a new version (v.1.0.1.128) that contains a new “Automation” feature that allows users to define a collection of operations to perform on their network graphs and invoke the complete set in a single button click AND reuse that configuration on other workbook graphs. In fact, the feature will apply the configuration you define to all the files you specify, allowing easy processing of large collections of network data sets.
This week the feature is partially complete. Users can invoke the merge duplicate edges, calculate graph metrics, auto-fill columns, create sub-graph images, find clusters and show graph. These operations can require as many as dozens of clicks when performed manually. If you have dozens or hundreds of network data sets the result is a daunting case of repetitive strain injury and carpal tunnel syndrome. Instead, with automation, these operations can be carried out orders of magnitude more frequently without much pain!
The next release will feature the complete package which will then include control over the layout and graph options. As a result, automatically generated network visualizations can be produced in a pipeline: users will be able to specify a query using the NodeXL desktop network data collector and then automate the processing of large collections of data sets.
The result should be better analysis of time series data sets that have many “slices”. The feature points the way to additional development work for supporting the comparison between networks to evaluate their evolution.
A new paper on visualizing social media has been released on the University of Maryland, Human Computer Interaction Laboratory tech report archive. Co-authored by Derek Hansen, myself, and Ben Shneiderman, the paper describes and visualizes the patterns of connections formed when people tweet about events like conferences and news stories.
Hansen, D., Smith, M., Shneiderman, B. EventGraphs: Charting Collections of Conference Connections
HCIL-2010-13
EventGraphs are social media network diagrams constructed from content selected by its association with time-bounded events, such as conferences. Many conferences now communicate a common “hashtag” or keyword to identify messages related to the event. EventGraphs help make sense of the collections of connections that form when people follow, reply or mention one another and a keyword. This paper defines EventGraphs, characterizes different types, and shows how the social media network analysis add-in NodeXL supports their creation and analysis. The paper also identifies the structural and conversational patterns to look for and highlight in EventGraphs and provides design ideas for their improvement.
This is the NodeXL map of connections among people who tweeted the hashtag used for the conference “#sunbelt”.
Having now seen several of these maps for other topics and events (see: http://www.flickr.com/photos/marc_smith/sets/72157622437066929/) this map can be placed in context. It is a small group, but has a high density of connections. It lacks isolates, the people who say the term but do not connect to others who say that term. This means that this is a very “in-group” population: if you know to use the #sunbelt hashtag, you probably connect to someone else who uses the term. It is a single major cluster of connected people, no obvious sub-graphs or clusters are visible. Not everyone is central in the graph, and those who are have a prominent role in the network science community. Here is the top ten list of #sunbelt mentioning twitter users ranked by betweeness centrality.
Pierre has been a deep student of telecommunications policy regulation in the United States for many years. He has generated a remarkable network map built from the details of filings to the FCC over more than a decade. These filings are made by companies when they agree or disagree with a proposed policy. When two companies file in support (or opposition) to the same policy they create a tie between them. The collection of these connections creates a complex network of coalitions and factions.
“The graph is derived from meta-data associated with documents that are filed electronically whenever an organization interacts with the FCC, in accordance with the Administrative Procedures Act. Whenever a letter, comment or other document is filed, the filer provides information on the parties involved, number of pages, relevant proceedings, date, etc.”
…
“Once the data is cleaned up, an edge list is created in Excel by running another VBA macro. A graph is created from this list with NodeXL, a social network analysis and visualization add-in for Excel 2007. NodeXL’s Fruchterman-Reingold algorithm is used to prepare a preliminary layout; nodes are then moved by hand into visually intelligible positions, respecting the clusters suggested by NodeXL’s implementation of the Wakita-Tsurumi algorithm. Nodes are colored on the basis of eigenvector centrality. The degree of investment that organizations make in lobbying is measured by the total number of filings it made in this proceeding over the period of study, and reflected in the size of the node. This information is obtained by running another VBA macro against the underlying ECFS metadata, and then matching that to the vertices in the graph.”
The NodeXL team has released a new version (v.1.0.1.126) with better support for collecting data from social media network sources, starting with Twitter. The NodeXL Network Server program now ships in every NodeXL installation. Tony, the lead developer on the team, created the following FAQ to explain how to use the collector application.
This document describes how the NodeXL Network Server works.
What is the NodeXL Network Server?
It’s a Windows command-line program that downloads a network from Twitter and stores the network on disk in several file formats. It can be run directly from a command line, but is typically scheduled to run on a periodic basis via the Task Scheduler that is built into Windows.
Where can the files be found?
The files are in NodeXL’s program folder. To find out where the folder is, right-click the Microsoft NodeXL, Excel 2007 Template menu item in the Windows Start menu, then select Properties. On 32-bit English computers, the folder is “C:\Program Files\Microsoft Research\Microsoft NodeXL Excel Template.”
Who are its intended users?
The Server is meant for use by people with moderate system administration skills. It is not difficult to use, but it is not intended for the same audience as the NodeXL Excel Template, where ease of use is of high priority.
How do you run the Server from the Windows command line?
Like this:
NodeXLNetworkServer.exe NetworkConfiguration.xml
The program takes a single argument, which is the path to a configuration file that specifies which network should be downloaded and how the network should be saved to disk. A particular configuration file might specify “Get the Twitter search network for people whose tweets contain ‘Sociology,’ add an edge for each ‘mentions’ relationship, limit to 100 people, include tweets, include statistics, and store the network as a GraphML file in the C:\NodeXLNetworks folder.”
The program immediately gets the requested network, saves it to disk, and exits. On its own, it does not run on a periodic basis.
How do I create a configuration file?
You create a configuration file by copying a provided template file and editing the copy in Notepad. The template file is named SampleNetworkConfiguration.xml and is stored in the same folder as the program. The file is in XML format and the XML tags are clearly named and documented.
In what file formats can be the network be saved to disk?
You can save the network to either GraphML, which can be imported into a NodeXL workbook; directly to a NodeXL workbook; or both.
Do you typically run the program from the command line?
No. Instead, you typically run it as a scheduled task via a built-in Windows program called Task Scheduler
Task Scheduler is a powerful utility that lets your run any program, including NodeXL Network Server, on a periodic basis. You can, for example, tell Task Scheduler to run NodeXL Network Server using a particular network configuration file every twelve hours starting June 1, 2010 and ending June 30, 2010; or once a week starting now and continuing forever. The scheduling options are endless.
Why not just include scheduling features in the NodeXL Network Server?
For two reasons. First, Task Scheduler’s extensive scheduling options would be difficult to duplicate. Second, if NodeXL Network Server had to download a network on a periodic basis, it would have to run as a Windows service, and Windows services are more complex to implement and to use than a simple command-line program.
How are the network files named?
Scheduling the NodeXL Network Server to run periodically can create any number of network files in the specified directory, so a file-naming scheme is needed. The file name format is
So the above example, in which NetworkConfiguration.xml specifies that networks are to be saved as GraphML, might create a set of network files that look like this:
What happens if the computer is not turned on at the scheduled time?
By default, the task won’t be performed until the next scheduled time when the computer is turned on. However, if the computer is sleeping, you can tell Task Scheduler to wake it at the scheduled time to run the task.
What happens if the NodeXL Network Server encounters an error?
If the error prevents the network from being downloaded, the NodeXL Network Server creates an error file instead of a network file. The file name starts with “Error” to make it easy to spot:
The error file contains the details of what went wrong.
If one or more errors block part of the network but other parts of the network are successfully downloaded, then the NodeXL Network Server creates the network file containing the partial network, along with a text file that explains how many errors occurred. The text file name starts with “PartialNetworkInfo” to make it easy to spot:
What if I want to periodically download more than one network?
Simply schedule more than one task, each using a different network configuration file. The tasks are independent of one another and can be scheduled to run at different times.
The Enterprise 2.0 conference is about to get underway in Boston. The event focuses on all the ways social media tools that are familiar on the consumer Internet are making their way behind the firewall in many enterprises and institutions. Why can’t you “friend” a colleague or “like” a spreadsheet or slide deck? Employees often come to their jobs expecting tools that resemble the social media tools with which they already spend much of their time.
Like many conferences, this one has a hashtag, actually two that I know of: #e2 and #e2conf. There is already a good deal of activity leading up to the event. Here is a map of connections among a group of people who mentioned either #e2 or #e2conf in the last few days.
In this map there are 532 Vertices and 9,395 Unique Edges, creating 13 Connected Components, 11 of which had only a Single-Vertex, the largest component had 519 vertices which were interconnected 9,393 times. The small number of isolated components indicates that this is a cohesive community of highly connected participants. These people know and follow, reply and mention one another. The Graph had a Density of 0.03 and the Maximum Geodesic Distance (Diameter) was 5 steps with an Average Geodesic Distance of 2.
Within this mass of connected users is a core group of highly “between” people, those who most broadly span connections within the population. These are one possible set of “influentials” within the Enterprise 2.0 community.
Here is a two screen view of the list of the top most between #e2 OR #e2conf mentioning twitter users along with the overview graph of their internal linkages.
A closer look at the graph alone can reveal enough detail to read the names of these central participants.
This is a view of the list of authors sorted in Excel by their “Betweenness centrality” score, the measure of how much these people “bridge” across the network.
An alternative view plots these contributors in an X/Y space based on their count of followers (along the x axis) and count of tweets (along the y axis).
Twitter users who mentioned #e2 or #e2conf on June 13, 2010 scaled by number of followers, x = log(followers), y = log(tweets).
There is a correlation between tweets and followers, but not everyone converts tweets to followers at the same rate. Below the diagonal are those who over convert tweets to followers, those above the diagonal under convert tweets to followers.
Hello! If you would like to request a custom social media network map made with NodeXL, complete the form below. I will generate the maps as requests come in and email you a pointer to the results which I will post to my flickr feed here: http://www.flickr.com/photos/marc_smith/sets/72157622437066929/
Ben Shneiderman spoke on June 2, 2010 at the Faculty of Management, Tel Aviv University in an event organized by UPA Israel and the Leon Recanati School of Business.