R for Water Resources Data Science: 12. Exploratory Data Analysis

We would indeed have a different result if we passed in gwl instead of gwl_minimal. Recall that gwl has 31735 rows but there are 494 unique SITE_CODEs in gwl. If we pass in gwl instead of gwl_minimal, we’re computing a boxplot on duplicate values of WELL_DEPTH for each SITE_CODE, where the number of samples per individual well can influence the computed summary statistics. It’s correct to use gwl_minimal because there’s only one WELL_DEPTH for each SITE_CODE. This is easier to visualize than explain. Note the subtle difference in boxplots.

# verify that using gwl v gwl_minimal gives us different results
pbox1 <- gwl_minimal %>% 
  ggplot(aes(WELL_USE, WELL_DEPTH)) +
  geom_boxplot()

pbox2 <- gwl %>% 
  filter(!is.na(WELL_USE)) %>% 
  ggplot(aes(WELL_USE, WELL_DEPTH)) +
  geom_boxplot()

pbox1 + pbox2

We can also look at raw numbers to spot differences in median values computed from gwl and gwl_minimal.

# demonstrate differences in median well depth 
gwl %>% 
  group_by(WELL_USE) %>% 
  summarise(med = median(WELL_DEPTH, na.rm = TRUE)) %>% 
  st_drop_geometry()

# A tibble: 8 × 2
  WELL_USE        med
* <chr>         <dbl>
1 Industrial       85
2 Irrigation      310
3 Observation     250
4 Other           440
5 Residential     185
6 Stockwatering   191
7 Unknown         205
8 <NA>            210

gwl_minimal %>% 
  group_by(WELL_USE) %>% 
  summarise(med = median(WELL_DEPTH, na.rm = TRUE)) %>% 
  st_drop_geometry()

# A tibble: 7 × 2
  WELL_USE        med
* <chr>         <dbl>
1 Industrial     248.
2 Irrigation     317 
3 Observation    140.
4 Other          420 
5 Residential    180 
6 Stockwatering  175 
7 Unknown        166

This is all to highlight that it’s important to remember what data you’re feeding into functions. Many a nightmarish bug has been caused by the data analyst thinking their data is in one form, when it’s actually in another!

12. Exploratory Data Analysis

Authors

Affiliations

Published

DOI

What’s EDA?

Generate questions

Search for Answers

Question 1: Well use and location

Combining Plots

Question 2: Well depths

Question 3: Groundwater level change through time

Question 4: Relating groundwater to CES scores

Learn more

Communicate results

Footnotes

Corrections

Reuse