Data Management
By: Fatih • Essay • 916 Words • March 17, 2010 • 1,032 Views
Data Management
Project 1
Method
In this experiment, we took a detailed look at Edward Bloom’s Big Fish. In particular, we sampled 10 pages of the book, and from each of those pages, examined the number of lines starting with various types of words, and types of letters as well. In order to randomly select 10 pages from the book, we used the Vasser Stats randomizer to generate 10 random page numbers. We then went through each of the 10 randomly selected pages and recorded the number of lines that started with a noun, a verb, an adjective, a vowel, and a consonant. When recording the number of lines starting with nouns and verbs, we also made sure to divide the results into those that began with vowels and those began with consonants. The data was entered into a spreadsheet in Excel and then transferred to JMP IN. In dealing with the data, we treated each of the ten pages as an individual and so we had 10 values for the number of lines starting with each of our different word and letter types. Once the data was entered into JMP IN, we constructed 5 histograms to show the frequency distribution for the number of lines starting with each of our word and letter types. Once our histograms were created, we took a look at the descriptive statistics for each of our histograms and summarily grouped the 5 sets of data in a table. The descriptive statistics we chose to include were; the mean, median, maximum, minimum, upper quartile, lower quartile, 95% confidence intervals, and sample size. The histograms and statistics were calculated and created, respectively, in JMP. We then entered the number of noun and verb lines that started with vowels and consonants in JMP. From this spreadsheet, we used JMP to produce a contingency table. This was done to determine whether or not there exists a statistically significant relationship between the type of word and the type of letter that word begins with. Once the contingency table was created, JMP performed a Pearson chi-square test on the data.
Results
Figure 1. Histogram of lines starting with nouns, on 10 pages, in Edward Bloom’s, Big Fish
Figure 2. Histogram of lines starting with verbs, on 10 pages, in Edward Bloom’s, Big Fish
Figure 3. Histogram of lines starting with adjectives, on 10 pages, in Edward Bloom’s, Big Fish
Figure 4. Histogram of lines starting with vowels, on 10 pages in Edward Bloom’s, Big Fish
Figure 5. Histogram of lines starting with consonants, on 10 pages, in Edward Bloom’s, Big Fish
Table 1. Descriptive statistics of data sets for lines starting with nouns, verbs, adjectives, vowels, and consonants, on 10 pages, in Edward Bloom’s, Big Fish
Values for Lines Starting With
Statistic Nouns Verbs Adjectives Vowels Consonants
Mean 4.6 4.5 2.5 5.4 21.6
Median 4.5 4 2.5 4.5 22.5
Upper Quartile 5 6.25 4 7.25 23
Lower Quartile 3 3 1 4 19.75
Maximum 10 8 4 9 24
Minimum 2 2 1 3 18
95% Upper Confidence Interval 6.152689 5.90059 3.407999 6.83864 23.03864
95% Lower Confidence Interval 3.047311 3.09941 1.592001 3.96136 20.16136