x <- read.csv("http://www.stat.yale.edu/~jay/diving/Diving2000.csv", as.is=TRUE)
dim(x)
## [1] 10787 10
names(x)
## [1] "Event" "Round" "Diver" "Country" "Rank"
## [6] "DiveNo" "Difficulty" "JScore" "Judge" "JCountry"
Most of the variables are obvious. Country is the country of the diver, JCountry is the country of the judge. There are seven judges for each dive. Rank and DiveNo are used only within rounds of events and should be ignored (other than perhaps to observe who eventually won medals).
The legend is clunky but it’s fine for basic explorations.
There is an odd left/right division in the plot above. There is a lot of overplotting, which the jittering partly addresses. The coloring is a bit of a mess, not clearly explaining the left/right split. Let’s dig into this more…
## Final Prelim Semi
## 3.061364 2.981118 1.895745
##
## 1.5 1.6 1.8 1.9 2 2.1 2.4 2.5 2.6 2.7 2.8 2.9 3
## Final 0 0 0 0 0 0 49 0 0 91 189 119 553
## Prelim 0 0 0 0 0 0 294 42 14 637 1029 532 1771
## Semi 35 455 273 504 392 644 0 0 0 0 0 0 0
##
## 3.1 3.2 3.3 3.4 3.5 3.6 3.8
## Final 224 231 147 168 63 14 0
## Prelim 840 623 413 308 105 14 14
## Semi 0 0 0 0 0 0 0
The gap seems to be explained by semi dives on left, final & prelims on the right of the original scatterplot code provided. No semi dive was more than 2.1 difficulty. Now I’ll look at some summary stats:
## What is the average score in the competition?
mean(x$JScore)
## [1] 6.832576
# 6.832576
## What is the average score for Chinese divers (a single average)?
mean(x$JScore [x$Country=="CHN"])
## [1] 8.158986
# 8.158986
## What is the average score for American divers (a single average)?
mean(x$JScore [x$Country=="USA"])
## [1] 7.477191
# 7.477191
## Average score for all countries
tapply(x$JScore, x$Country, mean)
## ARG ARM AUS AUT AZE BLR BRA CAN
## 4.614286 5.238095 7.302885 6.445714 6.226190 6.651786 6.391534 7.440179
## CHN COL CUB CZE ESP FIN FRA GBR
## 8.158986 5.903361 6.486711 5.488095 6.243243 6.458333 6.109375 6.363839
## GEO GER GRE HKG HUN INA ITA JPN
## 6.000000 7.212798 5.544974 4.666667 6.510823 4.473214 6.811224 7.590909
## KAZ KOR MAS MEX PER PHI PRK PUR
## 6.606516 5.844156 6.010204 6.913095 6.017857 5.603896 6.672131 5.831633
## ROM RUS SUI SWE THA TPE UKR USA
## 5.662338 7.623894 5.240260 7.647619 5.107143 5.185714 6.824580 7.477191
## VEN ZIM
## 5.934783 5.583333
# Confirms our previous averages are correct
## What about just the semi-final round average?
tapply(x$JScore, x$Round, mean)
## Final Prelim Semi
## 7.474838 6.320148 7.793747
# "Semi" mean = 7.793747
The average level of difficulty is much lower in the semi round (~1.90) compared with the prelims (2.98) or the finals (3.06). Easier dives might imply higher scores because of fewer mistakes, or lower scores in the prelims (weaker divers) while divers seeking medals try difficult dives in the Finals with more room for error.
And here is one more graph given what we know now. Now we can clearly see the differences by round. Interesting that the lower scores of semi dives are relatively high compared to the more difficult dives performed in other rounds, but from a strategy standpoint this makes sense.
Now let’s see if we can’t find some judge biases. Will use same dataset as above so no need to re-read into R. I want to see if there are statistically significant differences in the scores a judge gives to divers from their own country compared with divers from other countries, so I calculate the discrepancies.
This exercise mirrors research conducted by Yale’s Jay Emerson, who was also my professor for this class. [Highly recommend.] His short report will be in my GitHub repo, labeled “MathHorizons.pdf.”
x$match <- x$Country == x$JCountry
x$divenum <- rep(1:1541, each=7)
temp <- tapply(x$JScore, x$divenum, mean)
x$avg <- rep(temp, each=7)
x$discrepancy <- x$JScore - x$avg
Next I choose a judge to explore. Mexico’s Jesus Mena sounds like a winner. Now I want to create a graphic with two histograms comparing the distributions of the discrepancies for the matching and non-matching “groups”. I’ve included vertical lines at the group means for added clarity.
Visually interesting, but to be thorough I’ll conduct a one-sided 2-sample t-test for this judge’s bias in favor of divers from the same country (if one exists). The null hypothesis is one of no bias, obviously. We were asked to also provide a 99% confidence interval.
a <- t.test(y$discrepancy[y$match],y$discrepancy[!y$match], alternative = "greater")
b <- t.test(y$discrepancy[y$match],y$discrepancy[!y$match], conf.level = .99)
names(a)
## [1] "statistic" "parameter" "p.value" "conf.int" "estimate"
## [6] "null.value" "alternative" "method" "data.name"
print(a["estimate"])
## $estimate
## mean of x mean of y
## 0.24744898 -0.05719462
0.24744898 - -0.05719462 # = 0.3046436
## [1] 0.3046436
print(a["p.value"])
## $p.value
## [1] 0.0001313695
names(b)
## [1] "statistic" "parameter" "p.value" "conf.int" "estimate"
## [6] "null.value" "alternative" "method" "data.name"
print(b["conf.int"])
## $conf.int
## [1] 0.1025203 0.5067669
## attr(,"conf.level")
## [1] 0.99
# a) estimated size of bias: 0.3046436
# b) one-sided test p-value: 0.0001313695
# c) 99% confidence interval: 0.1025203 0.5067669
# Discrepencies are significant!
But before we get mad at Judge Jesus Mena, let’s see if some of his fellow judges show any bias toward their respective compatriots. Here I also look at judges from Russia, Germany and the U.S.
I originally made the resulting graphs a .pdf, but for the sake of the website I’ll keep them as R plots so that they’re visible online. Just uncomment the next line and the dev.off() at the end of the loop if you want your own .pdf. Example will be posted in my repo.
Visually we see these judges follow the same pattern: countrymen get a bump in score compared with non-countrymen. The statistics bear this out as well.
But it is important to keep in mind: this may not be intentional bias; perhaps each judge above has implicit desires for their fellow citizens to do well. Regardless, we now have data that supports a nationalistic bias in Olympic diving judges, at least at the 2000 games.