Olympic Dives

x <- read.csv("http://www.stat.yale.edu/~jay/diving/Diving2000.csv", as.is=TRUE)
dim(x)

## [1] 10787    10

names(x)

##  [1] "Event"      "Round"      "Diver"      "Country"    "Rank"      
##  [6] "DiveNo"     "Difficulty" "JScore"     "Judge"      "JCountry"

Most of the variables are obvious. Country is the country of the diver, JCountry is the country of the judge. There are seven judges for each dive. Rank and DiveNo are used only within rounds of events and should be ignored (other than perhaps to observe who eventually won medals).

The legend is clunky but it’s fine for basic explorations.

There is an odd left/right division in the plot above. There is a lot of overplotting, which the jittering partly addresses. The coloring is a bit of a mess, not clearly explaining the left/right split. Let’s dig into this more…

##    Final   Prelim     Semi 
## 3.061364 2.981118 1.895745

##         
##           1.5  1.6  1.8  1.9    2  2.1  2.4  2.5  2.6  2.7  2.8  2.9    3
##   Final     0    0    0    0    0    0   49    0    0   91  189  119  553
##   Prelim    0    0    0    0    0    0  294   42   14  637 1029  532 1771
##   Semi     35  455  273  504  392  644    0    0    0    0    0    0    0
##         
##           3.1  3.2  3.3  3.4  3.5  3.6  3.8
##   Final   224  231  147  168   63   14    0
##   Prelim  840  623  413  308  105   14   14
##   Semi      0    0    0    0    0    0    0

The gap seems to be explained by semi dives on left, final & prelims on the right of the original scatterplot code provided. No semi dive was more than 2.1 difficulty. Now I’ll look at some summary stats:

## What is the average score in the competition?
mean(x$JScore)

## [1] 6.832576

# 6.832576

## What is the average score for Chinese divers (a single average)?
mean(x$JScore [x$Country=="CHN"])

## [1] 8.158986

# 8.158986

## What is the average score for American divers (a single average)?
mean(x$JScore [x$Country=="USA"])

## [1] 7.477191

# 7.477191

## Average score for all countries
tapply(x$JScore, x$Country, mean)

##      ARG      ARM      AUS      AUT      AZE      BLR      BRA      CAN 
## 4.614286 5.238095 7.302885 6.445714 6.226190 6.651786 6.391534 7.440179 
##      CHN      COL      CUB      CZE      ESP      FIN      FRA      GBR 
## 8.158986 5.903361 6.486711 5.488095 6.243243 6.458333 6.109375 6.363839 
##      GEO      GER      GRE      HKG      HUN      INA      ITA      JPN 
## 6.000000 7.212798 5.544974 4.666667 6.510823 4.473214 6.811224 7.590909 
##      KAZ      KOR      MAS      MEX      PER      PHI      PRK      PUR 
## 6.606516 5.844156 6.010204 6.913095 6.017857 5.603896 6.672131 5.831633 
##      ROM      RUS      SUI      SWE      THA      TPE      UKR      USA 
## 5.662338 7.623894 5.240260 7.647619 5.107143 5.185714 6.824580 7.477191 
##      VEN      ZIM 
## 5.934783 5.583333

# Confirms our previous averages are correct

## What about just the semi-final round average?
tapply(x$JScore, x$Round, mean)

##    Final   Prelim     Semi 
## 7.474838 6.320148 7.793747

# "Semi" mean = 7.793747

The average level of difficulty is much lower in the semi round (~1.90) compared with the prelims (2.98) or the finals (3.06). Easier dives might imply higher scores because of fewer mistakes, or lower scores in the prelims (weaker divers) while divers seeking medals try difficult dives in the Finals with more room for error.

And here is one more graph given what we know now. Now we can clearly see the differences by round. Interesting that the lower scores of semi dives are relatively high compared to the more difficult dives performed in other rounds, but from a strategy standpoint this makes sense.

Now let’s see if we can’t find some judge biases. Will use same dataset as above so no need to re-read into R. I want to see if there are statistically significant differences in the scores a judge gives to divers from their own country compared with divers from other countries, so I calculate the discrepancies.

This exercise mirrors research conducted by Yale’s Jay Emerson, who was also my professor for this class. [Highly recommend.] His short report will be in my GitHub repo, labeled “MathHorizons.pdf.”

x$match <- x$Country == x$JCountry

x$divenum <- rep(1:1541, each=7)
temp <- tapply(x$JScore, x$divenum, mean)
x$avg <- rep(temp, each=7)
x$discrepancy <- x$JScore - x$avg

Next I choose a judge to explore. Mexico’s Jesus Mena sounds like a winner. Now I want to create a graphic with two histograms comparing the distributions of the discrepancies for the matching and non-matching “groups”. I’ve included vertical lines at the group means for added clarity.

Visually interesting, but to be thorough I’ll conduct a one-sided 2-sample t-test for this judge’s bias in favor of divers from the same country (if one exists). The null hypothesis is one of no bias, obviously. We were asked to also provide a 99% confidence interval.

a <- t.test(y$discrepancy[y$match],y$discrepancy[!y$match], alternative = "greater")
b <- t.test(y$discrepancy[y$match],y$discrepancy[!y$match], conf.level = .99)

names(a)

## [1] "statistic"   "parameter"   "p.value"     "conf.int"    "estimate"   
## [6] "null.value"  "alternative" "method"      "data.name"

print(a["estimate"])

## $estimate
##   mean of x   mean of y 
##  0.24744898 -0.05719462

0.24744898 - -0.05719462 # = 0.3046436

## [1] 0.3046436

print(a["p.value"])

## $p.value
## [1] 0.0001313695

names(b)

## [1] "statistic"   "parameter"   "p.value"     "conf.int"    "estimate"   
## [6] "null.value"  "alternative" "method"      "data.name"

print(b["conf.int"])

## $conf.int
## [1] 0.1025203 0.5067669
## attr(,"conf.level")
## [1] 0.99

# a) estimated size of bias: 0.3046436
# b) one-sided test p-value: 0.0001313695
# c) 99% confidence interval: 0.1025203 0.5067669
# Discrepencies are significant!

But before we get mad at Judge Jesus Mena, let’s see if some of his fellow judges show any bias toward their respective compatriots. Here I also look at judges from Russia, Germany and the U.S.

I originally made the resulting graphs a .pdf, but for the sake of the website I’ll keep them as R plots so that they’re visible online. Just uncomment the next line and the dev.off() at the end of the loop if you want your own .pdf. Example will be posted in my repo.

Visually we see these judges follow the same pattern: countrymen get a bump in score compared with non-countrymen. The statistics bear this out as well.

But it is important to keep in mind: this may not be intentional bias; perhaps each judge above has implicit desires for their fellow citizens to do well. Regardless, we now have data that supports a nationalistic bias in Olympic diving judges, at least at the 2000 games.

Olympic Dives

Kat Devlin

2017年9月22日