A shiny new tool? Developing apps using R

Rotem_D
Apr 19, 2021
5 min read

Updated: Apr 20, 2021

After a (too) long of a hiatus from running data projects (mostly due to my PhD commitments), I am back with a new tool – a shiny app! This week, I'll discuss the development of the app, including the data collection and preparation phases.

What are shiny apps?

Shiny is an R package that allows to build interactive web applications (apps) straight from R (for more, see Shiny tutorial).

My shiny app project (link to app)

Learning how to develop shiny apps is a skill I really was looking forward to get into. My idea was to begin with an area I have sound understanding of the data - NFL football metrics. The app that I designed offers multiple formats to view the league leaders in a variety of passing categories in multiple seasons. I chose two formats: a table to display the top-15 leaders in the selected category and season, and a bar-plot that provides a visual display of the top-10 leaders for the same selected metric and season. The data for the app includes both classic metrics (such as total number of touchdowns and passer rating), as well as advanced ones (for example, EPA/play and QBR).

Step 1: Data prep

In order to prepare the data to be displayed in the app, I began with an R script in which I code the steps for data collection, cleaning, and construction of the full dataset.

First, data collection - I use web scraping tools to pull the information from the relevant websites (the app includes links to the data sources, see more below).

Most of the metrics are collected from the same source. After pulling the data, I employ several editing tools to clean it. For example, the raw table includes the full name of each passer in a single column. In order to build the full data file and present clear results in the app, I split these cells into First and Last name columns.

Next, since the app will display the leaders in each selected metric, I use functions from dplyr and create ranking variables for each metric. Then, I arrange it in ascending order (to fit a top-10 or top-15 display in the app). The next step is to add other data (compiled using similar approach from other sources), and build a single dataset for each season (in the example below, this is the code for creating the data for the 2020 season).

After building 5 separate preliminary datasets (one for each season), the final step is joining all five into a large data file that consists of all QBs in all five seasons. Using cbind, I generate the final data file, and export it with a .csv format for further usage in the app code.

Step 2: building an app

I follow the common procedure of building a shiny app by first defining the UI component. The idea is that users can choose among five NFL seasons and a variety of metrics to display the leaders in every category. Thus, I generate two lists (with selectInput), one for the year, and then a second one for the selected metric. I breakdown the metric input to two short lists – either advanced or classic metrics.

To complete the UI design phase, I add two more input components. First, while classic NFL passing data is pretty straightforward (total number of TDs or yards/game), advanced metrics are a little more complicated. Thus, I add a glossary block that includes links to external sites where these metrics are discussed in more detail. Second, I add another block with two links to the sources of the data used in this project.

With the UI component defined, I shift the focus to the server function. This part of the code includes all the elements that would create the table and figure outputs based on the user’s selected inputs. Since the user can select among multiple years and relevant metric values, this section revolves around multiple if statements. These conditional statements allow me to “isolate” the relevant data (year and metric) and display both types of outputs in the main panel (this section of the code is repeated for every season, below is 2016).

First, I build the code for the tables. For each season, I create its own dataset. Then, another set of conditional statements allow me to filter the season data based on the selected metric. This reduced dataset is used to generate the output table, and I narrow it to the top-15 in the selected category with functions from dplyr.

The tables are built with the kableExtra package. One critical goal is to create outputs (both table and figure) that are clear and simple to understand within the main screen of the app. To ensure that the table does not capture too large of a portion of the output screen, I add a scroll box argument that offers the reader the option to stay within the main panel of the results but scroll throughout the table (this section of the code is repeated for every metric in every season; below we can see the code for tables of leaders in QBR or EPA for the 2016 season).

The second chunk of code creates the figures (top-10 in each passing category). I begin with the same code as the table section and create a dataset for each season. With another set of conditional statements, I narrow this season data to the relevant metric (input selected by the user). The last data preparation step is to filter the top-10 QBs in the relevant passing category, and arrange the final dataset.

The figures are barplots, and I leverage the powerful tools offered by the ggplot package to edit the visual outputs. First, I add the names of each QB and their associated metric to the relevant bars in the figure. Then, I remove the labels and ticks from both the x-axis and y-axis. The end result is a clean visual display of the data (in the code below, I create the marplot for top-10 leaders in QBR for the 2016 season).

The last part of the server function includes the code to add the links to the glossary and data source sections (as defined in the UI section). I use render UI and the a function to create hyperlinks for all relevant websites/pages.

Final Thoughts

I chose this particular project as an introduction to using shiny apps. I used various online resources (primarily Shiny tutorial) that were instrumental in the basic design of the application. The tricky part in this exercise is the structure of multiple layers of conditional statements (if and else) that are required in order to narrow down the full dataset to the relevant variables to be displayed in the table and figure outputs.

I published the app using the services offered by the website shinyapps.io.

As always, the code for the app (and the creation of the data with web-scraping and data editing tools) are on my Github.

Rotem Dvir

A shiny new tool? Developing apps using R

Recent Posts

Комментарии