Social media networks tend to be “clumpy”. Here is the map of connections among people who tweeted the term “global warming”:
NodeXL v.210 and newer now supports text analysis of content collected from social media data sources. NodeXL applies social network clustering and then analyzes text that is grouped by social clusters.
Connections among people who tweet about a topic, keyword or hashtag form patterns that can lead to the formation of sub-groups and clusters. Multiple clusters are formed within a network when a sub-population of people link to one another far more than to people in other groups. These regions of dense connections define the boundaries between sub-populations. Clusters often reflect the variation in interest in certain people and topics in the population. Some people and topics are more interesting to one group than others. Within these groups certain people and words get repeated more often than others.
Networks can be partitioned by many methods. NodeXL implements several. A collection of vertices can be grouped by the user by applying labels to the vertex worksheet (“Group by vertex attribute”). Or a group of vertices can be determined by an algorithm that looks for differences in the density of connections and divides by the points of least association (“Group by cluster algorithm”). Networks can also be grouped into separate isolated collections of nodes, called “connected components”.
In NodeXL groups can be visualized in multiple ways. Groups can be collapsed into meta-vertices that stand-in for the members of that group (right-click the graph pane and select “Groups>Collapse all groups”). Group members can also be displayed within a “box” with the “group-in-a-box” feature (found in the layout selection menu in the Graph Pane – select “Layout Options”).
Within each group is a population of people along with the tweets they authored in the time period captured by the data set. Each group has a collection of tweets that can be analyzed. The contents of all the tweets in a network can be scanned and certain types of strings can be counted to measure its frequency of mention. These counts can be repeated for each group, allowing groups to be contrasted based on the relative rates strings like URLs, hashtags, and @usernames. Here is a sample of the worksheet NodeXL creates to display all the data about people, URLs, and hashtags frequently mentioned in each group:
The worksheets offers top URLs, hashtags, and users across the entire network, and within each sub-group. The details offer insights into the people and topics of greatest interest.
This feature allows the content in sub-groups to be contrasted, thus answering the question: how is this sub-group the same or different from another sub-group?
On June 4th in Dublin, Ireland the 2012 International AAAI Conference on Weblogs and Social Media. ICWSM gathers computer scientists, linguists, communications scholars, and the social scientists to increase understanding of social media in all its incarnations. Now in its sixth year, ICWSM is a leading venue for cutting-edge research in social media.
ICWSM-12, features a program of workshops, tutorials, contributed technical talks, posters and invited presentations. The main conference features keynote talks from prominent social scientists and technologists.
Andrew Tomkins is an engineering director at Google working on measurement, modelling, and analysis of content, communities, and users on the World Wide Web. Prior to joining Google, he spent four years at Yahoo! as chief scientist of search, and eight years at IBM’s Almaden Research Center, where he co-founded the WebFountain project. Andrew holds Bachelors degrees in Math and CS from MIT, and a PhD in CS from Carnegie Mellon University; he has published over a hundred technical papers.
Patrick Meier is a recognized expert and thought leader on the intersection between new technologies, crisis early warning, humanitarian response and human rights. He is the co-founder of the International Network of Crisis Mappers and previously co-directed Harvard University’s Program on Crisis Mapping and Early Warning. Over the past 10 years, Patrick has consulted extensively with several international organizations including the UN, OSCE and OECD in Africa, Asia and Europe. Patrick is also a distinguished scholar completing his PhD at The Fletcher School during which time he was a Doctoral Fellow at Stanford University. In 2010, President Bill Clinton publicly thanked him for his leadership and contributions. He blogs at iRevolution.net.
Lada A. Adamic is an associate professor in the School of Information and the Center for the Study of Complex Systems at the University of Michigan. She is also affiliated with EECS. Her research interests center on information dynamics in networks: how information diffuses, how it can be found, and how it influences the evolution of a network’s structure. Her projects have included identifying expertise in online question and answer forums, studying the dynamics of viral marketing, and characterizing the structure in blogs and other online communities. She has received an NSF CAREER award, and best paper awards from Hypertext ’08, ICWSM-10 and ICWSM-11, and the most influential paper of the decade award from Web Intelligence ’11.
“The goal of the workshop is to bring together researchers and industry practitioners interested in visual and interactive techniques for social media analysis, particularly in social sciences and humanities as well as in industry and to discuss ideas, techniques, and applications to support social media analysis.”
I will present a tutorial on Social Media Network Analysis with NodeXL on June 4th at the event:
Networks are a data structure common found across all social media services that allow populations to author collections of connections. The Social Media Research Foundation’s NodeXL project makes analysis of social media networks accessible to most users of the Excel spreadsheet application. With NodeXL, Networks become as easy to create as pie charts. Applying the tool to a range of social media networks has already revealed the variations present in online social spaces. A review of the tool and images of Twitter, flickr, YouTube, and email networks will be presented.
This network graph represents a network of 29 Twitter users whose recent tweets contained “icwsm”. The network was obtained on Saturday, 21 April 2012 at 20:33 UTC. There is an edge for each follows relationship. There is an edge for each “replies-to” relationship in a tweet. There is an edge for each “mentions” relationship in a tweet. There is a self-loop edge for each tweet that is not a “replies-to” or “mentions”. The earliest tweet in the network was tweeted on Saturday, 14 April 2012 at 18:55 UTC. The latest tweet in the network was tweeted on Saturday, 21 April 2012 at 05:48 UTC.
The graph is directed.
The graph’s vertices were grouped by cluster using the Clauset-Newman-Moore cluster algorithm.
The graph was laid out using the Harel-Koren layout algorithm.
The edge colors are based on relationship values. The vertex sizes are based on followers values.
Top 10 Vertices, Ranked by Betweenness Centrality:
@icwsm
@johnbreslin
@IBMResearch
@CaptSolo
@marc_smith
@bde
@karenchurch
@imbenzene
@hemant_Pt
@_akisato Overall Graph Metrics:
Vertices: 29
Unique Edges: 68
Edges With Duplicates: 32
Total Edges: 100
Self-Loops: 18
Connected Components: 5
Single-Vertex Connected Components: 4
Maximum Vertices in a Connected Component: 25
Maximum Edges in a Connected Component: 96
Maximum Geodesic Distance (Diameter): 3
Average Geodesic Distance: 1.866455
Graph Density: 0.082512315270936
Modularity: 0.2488
This two-volume encyclopedia provides a thorough introduction to the wide-ranging, fast-developing field of social networking, a much-needed resource at a time when new social networks or “communities” seem to spring up on the internet every day. Social networks, or groupings of individuals tied by one or more specific types of interests or interdependencies ranging from likes and dislikes, or disease transmission to the “old boy” network or overlapping circles of friends, have been in existence for longer than services such as Facebook or YouTube; analysis of these networks emphasizes the relationships within the network. The Encyclopedia of Social Networks offers comprehensive coverage of the theory and research within the social sciences that has sprung from the analysis of such groupings, with accompanying definitions, measures, and research.
Featuring approximately 350 signed entries, along with approximately 40 media clips, organized alphabetically and offering cross-references and suggestions for further readings, this encyclopedia opens with a thematic reader’s guide in the front that groups related entries by topics. A chronology offers the reader historical perspective on the study of social networks. This two-volume reference work is a must-have resource for libraries serving researchers interested in the various fields related to social networks, including sociology, social psychology and communication and media studies.
Know who is becoming more important than know how. Networks are a data structure common found across all social media services that allow populations to author collections of connections. Innovation networks are created when new connections form among people who have a portion of a solution.
The Social Media Research Foundation‘s NodeXL project makes analysis of social media networks accessible to most users of the Excel spreadsheet application. With NodeXL, Networks become as easy to create as pie charts. Applying the tool to a range of social media networks has already revealed the variations present in online social spaces. A review of the tool and images of Twitter, flickr, YouTube, and email networks will be presented. In particular, innovation topics will be mapped to highlight the key people and groups talking about new ideas and opportunities.
I will speak about the results of collecting, analyzing and visualizing the collections of connections that form in political discussions in social media.
For example, this is a map of the connections among the people who recently tweeted about Scott Walker.
The graph represents a network of up to 1000 Twitter users whose recent tweets contained “scott AND walker”. The network was obtained on Friday, 13 April 2012 at 07:40 UTC. There is an edge for each “replies-to” relationship in a tweet. There is an edge for each “mentions” relationship in a tweet. There is a self-loop edge for each tweet that is not a “replies-to” or “mentions”. The earliest tweet in the network was tweeted on Thursday, 12 April 2012 at 03:32 UTC. The latest tweet in the network was tweeted on Friday, 13 April 2012 at 04:12 UTC. [Read more →]
Here is a map of connections among people who recently tweeted the term “peoplebrowsr”.
“But what does that picture mean?”
I hear this reaction frequently when I show people maps I have made of social media connections.
I often point out that the map and the data can reveal people who occupy important locations in the network as well as emergent clusters and groups.
“So why didn’t you just say so?”
I hear this reaction frequently when I explain what is important about a network.
In NodeXL version 203 we have released a new feature called Graph Summary. Our goal is to “just say so”.
In this version we introduce the basics of automatic captioning. In the NodeXL>Graph menu we now have a “Summary” button:
NodeXL will collect information about the creation and configuration of the network. The dialog box looks like this:
Note that NodeXL>Data>Save Import Details in Graph Summary must be selected in the Import menu for the “Data Import” field to be populated.
Selecting “Copy to Clipboard” will load a copy of these text fields into the buffer. An example of that caption is here:
The graph represents a network of up to 1000 Twitter
users whose recent tweets contained "peoplebrowsr".
The network was obtained on
Friday, 09 March 2012 at 01:21 UTC.
There is an edge for each follows relationship.
There is an edge for each "replies-to" relationship
in a tweet.
There is an edge for each "mentions"
relationship in a tweet.
There is a self-loop edge for each tweet that is
not a "replies-to" or "mentions".
The earliest tweet in the network was tweeted on
Friday, 02 March 2012 at 02:39 UTC.
The latest tweet in the network was tweeted on
Friday, 09 March 2012 at 00:47 UTC.
The graph is directed.
The graph was laid out using the
Harel-Koren Fast Multiscale layout algorithm.
The edge colors are based on relationship values.
The vertex sizes are based on followers values.
Overall Graph Metrics:
Vertices: 74
Unique Edges: 172
Edges With Duplicates: 123
Total Edges: 295
Self-Loops: 42
Connected Components: 15
Single-Vertex Connected Components: 13
Maximum Vertices in a Connected Component: 58
Maximum Edges in a Connected Component: 276
Maximum Geodesic Distance (Diameter): 4
Average Geodesic Distance: 2.014176
Graph Density: 0.036653091447612
Modularity: 0.288302
Top 10 Vertices, Ranked by Betweenness Centrality:
@peoplebrowsr
@andrewgrill
@traviswallis
@thenickfrost
@jas
@alexbudge
@getmingly
@milener
@jeffreyhayzlett
@johnnosta
The graph's vertices were grouped by cluster using the
Clauset-Newman-Moore cluster algorithm.
More NodeXL network visualizations are here:
www.flickr.com/photos/marc_smith/sets/72157622437066929/
and here:
www.nodexlgraphgallery.org/Pages/Default.aspx
A gallery of NodeXL network data sets is available here:
nodexlgraphgallery.org/Pages/Default.aspx?search=twitter
NodeXL is free and open and available from www.codeplex.com/nodexl
NodeXL is developed by the Social Media Research Foundation
(www.smrfoundation.org) - which is dedicated to
open tools, open data, and open scholarship.
Donations to support NodeXL are welcome through PayPal:
https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=J5AERGAAN552S
The book, Analyzing social media networks with NodeXL:
Insights from a connected world, is available from Morgan Kaufmann and from Amazon.
http://www.amazon.com/gp/product/0123822297?ie=utf8&tag=conneactio-20&linkcode=as2&camp=1789&creative=390957&creativeasin=0123822297
This caption will expand in our next several releases to include information about the top URLs, hashtags, and @usernames in text fields associated with nodes and edges. Following that we will release a series of features to allow for the extraction of keyword pairs in those text fields (our current version of this feature is described here: Keyword Networks: create word association networks from text with NodeXL (with a macro)).
This is the collection of keyword pairs that appeared in two clusters of people who Tweeted about “Paul Ryan”, the Republican Congressman from Wisconsin who delivered the GOP rebuttal to the 2011 United States State of the Union Address. This network illustrates the ways that certain word pairs appears only or predominantly in one cluster (colored here Red and Blue) or the other. Terms that appeared in both clusters appear as purple.
Social networks are built from relationships between people. Keyword networks are built from relationships between words and other text strings. When two words appear in the same message, sentence, or alongside one another ties of different strengths are created. The networks that result can illuminate the relationships among topics of importance in a collection of messages.
Markus Strohmaier from the Technical University Graz (TUG) along with Claudia Wagner gave us inspiration in a paper:
in which they defined a range of ways two words (technically these are strings, they may not really be words) can be associated with one another. Words could be linked if they are in the same tweet, next to one another, or sequential among other ways to link terms.
NodeXL has not had any features for exploring the networks in texts. Now with the addition of a new macro from Scott Golder, it is fairly simple to extract pairs of keywords from collection of tweets. NodeXL’s Twitter importer can optionally include the content of the tweet that included the search term and this column of text can now be processed itself into a new network based on the ways words appear together in tweets.
This feature builds on the work of several people. Scott Golder from Cornell started the ball rolling with a simple but effective VBA script that allowed others to build and refine the models of what counts as a tie between two words. Vladimir Barash added several refinements including support for stop word lists to remove common terms. Scott then picked up the code again and added a set of features for selecting the nature of the graph and making it easier to select the options needed.
The code for the Keyword Network macro is below.
The instructions to use it take a few steps to complete:
Interested in applying social network methods to better understand the structure of your business or organization?
In collaboration with Optimice, I will teach a workshop on Social Network Analysis for enterprises, organizations, and businesses using NodeXL.
Self-paced e-learning (4 hours)
Introduction to Social/Organisational Network Analysis
Network patterns and metrics
Software tools for network analysis
Managing an ONA Project
Module 1: Scoping your ONA Project (2 hour virtual session hosted by Patti Anklam)
Determining which business problem to solve with ONA
Review of case-studies
Determining your questions
Module 2: Setting up your ONA survey (2 hour virtual session hosted by Cai Kjaer / Laurence Lock Lee)
Setting up your survey
Working with mailing lists and other lists
Creating relationship sets and network questions
Previewing and launching the survey
Tracking progress and downloading responses
Module 3: Visualise networks with NodeXL (2 hour virtual session hosted by Marc Smith)
Getting started with NodeXL
Calculating and visualizing network metrics
Preparing data and filtering
Importing data from Social Media tools
Clustering and grouping
A number of ONA Practitioner Courses are available to suit the timezones of participants located in the US, Europe and/or Asia-Pacific (but not restricted to these regions):
Course Code
Date and Time
Time Zone
Payment
OPC-2012-13-APAC
27 March 2012 to 25 April 2012
(Registration deadline is 13 March 2012)Module 1: 11 April 2012 (11am – 1pm)
Module 2: 18 April 2012 (11am – 1pm)
Module 3: 25 April 2012 (11am – 1pm)
Self-paced to be completed before starting module 1.
Asia-Pacific – Sydney EST
$US 1,599
OPC-2012-17-US
25 April 2012 to 22 May 2012
(Registration deadline is 11 April 2012)Module 1: 8 May 2012 (4 – 6pm)
Module 2: 15 May 2012 (4 – 6pm)
Module 3: 22 May 2012 (4 – 6pm)
Self-paced to be completed before starting module 1.
Crowds of people gather in social media around many products, services, businesses, and events but they can be difficult to see and understand. With new free and open tools, it is now possible to map and measure social media spaces, capturing the sub-groups and key people within and between them. Learn how to capture social media data and quickly generate a visual map of the crowd. With maps in hand, we will discuss ways they guide a journey to the key influencers and concepts in the crowd.
Description: Maps of the complex connections that form when people link, like, reply, rate, review, favorite, friend, follow, edit, and mention one another can reveal important trends. It is possible to create network maps with free and open tools that identify key people and sub-groups in any social media population with just a few key clicks. Can you make a pie chart? You can now make a network chart.
Abstract: Networks are a data structure common found across all social media services that allow populations to author collections of connections. The Social Media Research Foundation’s (http://www.smrfoundation.org) free and open NodeXL project (http://nodexl.codeplex.com) makes analysis of social media networks accessible to most users of the Excel spreadsheet application. With NodeXL, Networks become as easy to create as pie charts. Applying the tool to a range of social media networks has already revealed the variations present in online social spaces. A review of the tool and images of Twitter, flickr, YouTube, and email networks will be presented.
We now live in a sea of tweets, posts, blogs, and updates coming from a significant fraction of the people in the connected world. Our personal and professional relationships are now made up as much of texts, emails, phone calls, photos, videos, documents, slides, and game play as by face-to-face interactions. Social media can be a bewildering stream of comments, a daunting fire hose of content. With better tools and a few key concepts from the social sciences, the social media swarm of favorites, comments, tags, likes, ratings, and links can be brought into clearer focus to reveal key people, topics and sub-communities. As more social interactions move through machine-readable data sets new insights and illustrations of human relationships and organizations become possible. But new forms of data require new tools to collect, analyze, and communicate insights.
My talk this year will focus on collecting and analyzing connections between digital objects (like users) and the insights these tools make possible.
Abstract: While digital content is archived in various ways, the “arcs” or links among people and their digital objects are not systematically saved. Efforts to store social media often overlooks including data about collections of connections. The Social Media Research Foundation is dedicated to open tools, open data, and open scholarship related to social media. It is producing tools that can collect, analyze and upload social media data, including the arcs that link people and objects. Using the free and open NodeXL application, users can collect, analyze and visualize complex networks and then upload the data to a growing archive on the web at NodeXLGraphGallery.org. As the group of researchers grows, an archive is being assembled to provide researchers around the world with the data about social media needed to understand the ways computer mediated communication tools shape society.
Mastering Social Media will give you practical tools on how to plan, execute and monitor your social media campaigns. Discussions will lead you through the introduction to social media marketing, understanding community dynamics, mapping social networks and applying network insights to your goals.
Brand Managers, Marketing Managers, Advertising Agencies, Digital Agencies, and PR Agencies are likely to find the day useful.
Venues and Dates
Cape Town
28 November 2011
Protea Hotel
Breakwater Lodge, Waterfront
Johannesburg
30 November 2011
Gordon Institute
of Business Science, Illovo