Applying Social Network Analysis to the Information in Cvs Repositories
By: Janna • Research Paper • 2,398 Words • November 25, 2009 • 1,721 Views
Essay title: Applying Social Network Analysis to the Information in Cvs Repositories
Applying Social Network Analysis to the Information in CVS Repositories
Abstract
The huge quantities of data available in the CVS repositories
of large, long-lived libre (free, open source) software
projects, and the many interrelationships among those data
offer opportunities for extracting large amounts of valuable
information about their structure, evolution and internal
processes. Unfortunately, the sheer volume of that information
renders it almost unusable without applying methodologies
which highlight the relevant information for a given
aspect of the project. In this paper, we propose the use of
a well known set of methodologies (social network analysis)
for characterizing libre software projects, their evolution
over time and their internal structure. In addition,
we show how we have applied such methodologies to real
cases, and extract some preliminary conclusions from that
experience.
Keywords: source code repositories, visualization techniques,
complex networks, libre software engineering
1 Introduction
The study and characterization of complex systems is an
active research area, with many interesting open problems.
Special attention has been paid recently to techniques based
on network analysis, thanks to their power to capture some
important characteristics and relationships. Network characterization
is widely used in many scientific and technological
disciplines, ranging from neurobiology [14] to computer
networks [1] [3] or linguistics [9] (to mention just
some examples). In this paper we apply this kind of analysis
to software projects, using as a base the data available in
their source code versioning repository (usually CVS). Fortunately,
most large (both in code size and number of developers)
libre (free, open source) software projects maintain
such repositories, and grant public access to them.
The information in the CVS repositories of libre software
projects has been gathered and analyzed using several
methodologies [12] [5], but still many other approaches are
possible. Among them, we explore here how to apply some
techniques already common in the traditional (social) network
analysis. The proposed approach is based on considering
either modules (usually CVS directories) or developers
(commiters to the CVS) as vertices, and the number of common
commits as the weight of the link between any two vertices
(see section 3 for a more detailed definition). This way,
we end up with a weighted graph which captures some relationships
between developers or modules, in which characteristics
as information flow or communities can be studied.
There have been some other works analyzing social networks
in the libre software world. [7] hypothesizes that the
organization of libre software projects can be modeled as
self-organizing social networks and shows that this seems
to be true at least when studying SourceForge projects.
[6] proposes also a sort of network analysis for libre software
projects, but considering