Research Methods
By: ebiotoadese • Research Paper • 984 Words • May 3, 2011 • 1,125 Views
Research Methods
Accountability Modules Data Analysis: Describing Data - Frequency Distribution
Texas State Auditor's Office, Methodology Manual, rev. 5/95 Data Analysis: Describing Data - Frequency Distribution - 1
WHAT IT IS Frequency distributions summarize and compress data by grouping it into
Return to Table of Contents classes and recording how many data points fall into each class. That is, they
show how many observations on a given variable have a particular attribute. For
example, a survey is taken of 50 people's favorite color. The frequency
distribution might indicate 15 people selected green, 12 blue, 6 red, 7 yellow,
and 10 purple. Converting these raw numbers into percentages would then
provide an even more useful description of the data.
The frequency distribution is the foundation of descriptive statistics. It is a
prerequisite for both the various graphs used to display data and the basic
statistics used to describe a data set -- mean, median, mode, variance, standard
deviation, and so forth. Note that frequency distributions are generally used to
describe both nominal and interval data, though they can describe ordinal data.
WHEN TO USE IT A frequency distribution should be constructed for virtually all data sets. They
are especially useful whenever a broad, easily understood description of data
concentration and spread is needed. Most data provided by third parties are
grouped into a frequency distribution.
HOW TO PREPARE IT Regardless of whether manual or automated methods are used to prepare a
frequency distribution, it is usually necessary to code data numerically to
facilitate further data analysis. This makes creating a data dictionary which
defines the numeric codes used to identify data categories necessary. For
example, assume that an auditor/evaluator wants to classify both demographic
data and information on the opinion of entity staff on a particular policy. A data
dictionary for use with computer software might resemble the following:
Variable Name Code Field Width Field Type
Division Actual Division 20 Alphanumeric
Age Age in Years 3 Numeric
Gender 1 = Male 1 Numeric
2 = Female
Salary Range 1 = $ 0 - 20,000 5 Numeric
2 = $20 - 30,000
3 = $30 - 50,000
4 = Over 50,000
Policy Opinion 1 = Excellent 1 Numeric
2 = Good
3 = Fair
4 = Poor
Data Analysis: Describing Data - Frequency Distribution Accountability Modules
Data Analysis: Describing Data - Frequency Distribution - 2 Texas State Auditor's Office, Methodology Manual, rev. 5/95
It is also necessary to determine how many classes one should use for the
frequency distribution. Selecting a number of classes is not as arbitrary as may
first appear. If data are nominal, simply list all possible classes (i.e. categories)
into which a data point might fall. If data are interval, the table below can
function as a rule of thumb:
Number of Observations Number of Classes
Under 50 5 - 7
50 - 200 7 - 9
200 - 500 9 - 10
500 - 1,000 10 - 11
1,000 - 5,000 11 - 13
5,000 - 50,000 13 - 17
Over 50,000 17 - 20
If