Monthly Archives: November 2013

HydroApp Part 1: Using Shiny and googleVis to downloading and displaying US water quality data

Update: I’ve added a link to the HydroApp under the Web App tab at the top of the page!  It’s a work in progress, so you might (probably) get some weird behavior.  Somethings I’ve already noticed:

1) It can take some time to load the Google Map with sampling locations. I’m guessing it has to do with calling the Google Map server, but not totally sure.

2) The map doesn’t display all the sites, I think it just displays the first 400.  This is something that I’m working on.  Maybe someone out there has some suggestions???

3) I’m still adding functionality, so right now, all it does is display the sampling locations.  I hope to add data visualization and download capabilities soon…

Anyway, enjoy!

It’s been a busy fall season for me, so I haven’t updated the blog as much as I was hoping.  But, the good news is I’ve been using R quite a bit in my research, and have continued playing around with the Shiny package.  Recently, I’ve been working on an app that’s (arguably) more useful than my first Shiny app, and has some potential for people interested in water quality data (like myself!)  I’d like to extend a big thank you again to Rstudio for creating Shiny and specifically starting a free beta program for hosting Shiny apps on their servers.  I don’t know how long this beta program will last, but it has worked great so far. Keep up the good work.  Anyway, on to the app…

For those who have ever attempted to examine water quality data collected by the United State Geological Survey (USGS) or the Enivornmental Protection Agency (EPA), it’s not the most intuitive website to say the least.  Though the USGS and the Environmental Protection Agency has begun to address this (their Water Quality Portal is actually pretty slick), it’s still quite laborious to quickly and efficiently parse and display the data, particularly if you’re interested in looking at many sites.  Even with the improved website, you still need to download and analyze the data in separate steps, and just organizing all the data from one station can be quite time consuming in a spreadsheet program.

So, I decided to do something about it.  The USGS and EPA have implemented a nice web service for querying their data through the web (details can be found here).  I wanted to create a simple app where you entered the US state and a water quality parameter of interest.  You would then be presented with a map with the all the sampling locations within the state where the parameter of interest has been measured. Finally, you would be able to view a the data, and download the data as a .csv or .txt file for offline analysis. I haven’t had time to add the plotting features or download features, but I hope to have it up soon.

As in my previous post, I will briefly go through the scripts, describing some key points and the general layout. This is just part 1, so the app is still a work in progress.  Let me know what you think!

First, the entire HydroApp script can be seen here. And, if you’d like to try it out, I’ve hosted it on the Rstudio server here!

As you can see, the server side is comprised of 3 reactive functions (makeURL(), testURL(), and stateStationsInput()). Essentially, the inputs from the states and Parameter selection boxes in the ui.R script are combined with the base Water Quality Portal URL to download a .csv file that contains all the stations that have the desired water quality parameter. The makeURL() function handles the correct formatting of the URL so we can download the data:

  makeURL <- reactive({
    inStateID <- input$stateID
    inStateID.url <- URLencode(paste("US", inStateID, sep=":"),
                               reserved = T)
    inParmID <- input$parmID
    inParmID.url <- urlEncode(inParmID)

    url <- paste("http://www.waterqualitydata.us/Station/search?countrycode=US",
                 "&statecode=", inStateID.url,
                 "&characteristicName=", inParmID.url,
                 "&siteType=Stream&mimeType=csv&zip=no", sep="")
    print(url)
    return(url)
  })

You will notice that both the State ID (inStateID) and parameter ID (inParmID) need to be appropriately encode to work within a URL. To do this I used a function called urlEncode() to convert the criteria into a % plus a two-digit hexadecimal representation, which can be read by a web browser:

urlEncode <- function(URL) {
gsubfn(".", list("_" = "%5F", "." = "%2E","+" = "%2B","!" = "%21",
                 "*" = "%2A","'" = "%27"), c(URL))
}

Once the URL is properly encoded, I then test the URL to see if any sites match the criteria. This is done with testURL():

  testURL <- reactive({
    url <- makeURL()
    urlHeaders <- HEAD(url)$header
    warnings <- which(names(urlHeaders) %in% "warning")

    if(!length(warnings)){
      noData = FALSE
      summary <- cbind(paste("Total Site Count: ",
                             urlHeaders$'total-site-count'))
      print(summary)
      print(noData)
    }
    if(length(warnings) > 0){
      noData = TRUE
      summary <- paste("WARNING :", unlist(urlHeaders)[warnings])
      print(summary)
    }
    return(list(noData, summary))

  })

Using the httr library, I read in the header information for the website using HEAD(), and check to see if there are any warnings, specifically, if no stations match both of the input values (ie. State and water quality parameter). The header information also contains information on how many sites exist that match the criteria; somthing I print out later for the user to see.

Finally, assuming there are no warnings, I use the URL to download the requested data and from the downloaded .csv file I create a new data frame with two columns, stationLatLong and stationTip. The stationLatLong column is need by googleVis to correctly display the location of the data. The stationTip column is used for the descriptive text that pops up whenever you click on a station location.

  stateStationsInput <- reactive({
    if(testURL()[[1]] == F){
      url <- makeURL()
      stateStations = read.csv(url, colClasses = "character",
                               sep =",", na.strings = "", fill = F,
                               allowEscapes = F, header = T)

      stationLatLong <- paste(stateStations$LatitudeMeasure,
                              stateStations$LongitudeMeasure, sep=":")
      stationTip <- paste(stateStations$MonitoringLocationName,
                          stateStations$MonitoringLocationIdentifier,
                          sep="<BR>")

      data.frame(stationLatLong, stationTip)
    }
  })

Finally, and the one output (right now), I use the awesome gvisMap() function in the googleVis package to display a Google map with all the stations that match the desired criteria (I think only the first 400 stations are displayed… I’m working on figuring out a way to see more).

  output$stationMap <- renderGvis({
    if(testURL()[[1]] == F){
      mapData <- stateStationsInput()
      gvisMap(mapData, "stationLatLong", "stationTip",
             options=list(showTip=TRUE, showLine=FALSE,
             enableScrollWheel=TRUE, mapType='terrain',
             useMapTypeControl=TRUE))
    }

  })