Access this project

Demo URL: https://akashmagnadia.shinyapps.io/cs424_p2/
Introduction video: https://youtu.be/VtEGv0v-xSU
GitHub repo: https://github.com/komar41/Dont-Sleep-in-the-Subway
Tools used: Python, R, and Shiny.

The visulization was created for a screen with resolution of 5760x1620 (Sage screen of EVL UIC Lab). [For reference see picture below]

User interface displayed in SAGE screen of EVL lab, UIC.

This project is intended for visualizing the geographic information of all the CTA L stations and also to find out the trends and interesting patterns in Chicago 'L' Station ridership data.

User Interface

The All Stations page in the application displays total ridership entry of a single date of all the CTA stations. The opacity of the stations on the map changes according to ridership entry. Users can select a particular date using the option "Single Date" on the left side. To highlight a station on the map and on the bar chart, the user can click on that station on the map or use the dropdown menu "Stations" on the left. Users can also choose line colors using the dropdown menu "Lines on Map". Initially "All lines" option is selected. Upon selecting a line color (ie: pink line), the dropdown menu "Stations" will contain only the stations from the chosen line color. The map will also display only those stations from the chosen line color. The bar chart can be sorted in ascending, descending or alphabetic order using the dropdown menu "Sort Bar Chart" and the data table will also be sorted accordingly. Users can easily navigate to previous or next date using the two buttons from bottom-left.

All stations page

Upon selecting the checkbox "Ridership Difference between Two Dates" in the All Stations page, users will be able to see the change in entries between two selected days (Date 1 and Date 2) in a divergent color scheme on the bar chart. The data table and the map also change accordingly.

All stations page - ridership difference between two dates

The second page - One Station - gives users an overview of ridership data of a particular CTA station. Users can select a particular station by clicking on it on the map or using the dropdown menu. To narrow down the selection of station, users can choose a line color from the dropdown menu "Line On Map". Initially "All Lines" option is selected. The yearly bar chart on the left will give a general overview of how ridership data in a chosen CTA station changed over the years. On the right side, users can choose between three chart types: daily, monthly, or weekdays. Users also have to select a particular year for which the chart on the right side will be displayed. Users can also see the raw data below each chart.

One station page

On the About page, some details are listed such as creators of the application, date published, data sources, data owner etc.

About page
About the Data

Data source: Two datasets were used to build the application. Both datasets were collected from Chicago Data Portal.

  1. Dataset that contains information about all the CTA L stops including their latitude and longitude can be found at: https://data.cityofchicago.org/Transportation/CTA-System-Information-List-of-L-Stops/8pix-ypme. The file size is 48KB.

  2. Ridership data of all the CTA L stations can be found at: https://data.cityofchicago.org/Transportation/CTA-Ridership-L-Station-Entries-Daily-Totals/5neh-572f. The file size is 39MB.

The CTA L stops information data provides location and basic service availability information for each place on the CTA system where a train stops, along with formal station names, stop descriptions, and line colors (RED, BLUE, G (Green), O (Orange), BRN (Brown), P (Purple), Pexp (Purple Express), Y (Yellow), and Pnk (Pink)). DIRECTION_ID refers to the normal direction of train traffic at a platform (E - East, W- West, N - North, S - South). STOP_ID is a unique identifier for each stop and MAP_ID is a unique identifier for each station. ADA column tells if the stop is ADA (American’s with Disability Act) compliant.

CTA L stops information data

The ridership data contains entries of daily rides entries of all the CTA stations in Chicago starting 2001 to 2021. The dataset shows entries at all turnstiles, combined, for each station. Daytypes are as follows: W = Weekday, A = Saturday, U = Sunday/Holiday.

CTA L station ridership data

The free web-based version of the Shiny server that was used to publish this project has a limit of 5 MB for each data file. Thus, we split the ridership data file (39 MB) into smaller pieces to be able to upload it. Python script used for splitting the ridership data and creating the TSV files is provided below:

#!/usr/bin/env python3

import csv
import os
import sys

os_path = os.path
csv_writer = csv.writer
sys_exit = sys.exit

if __name__ == '__main__':
   # number of rows per file
   chunk_size = 130000

   # file path to master tsv file
   file_path = "C:/Users/Akash/UIC/CS 424/tsv_splitter/CTA_-_Ridership_-__L__Station_Entries_-_Daily_Totals.tsv"

   if (not os_path.isfile(file_path) or
        not file_path.endswith('.tsv')):
       print('You must input path to .tsv file for splitting.')
       sys_exit()
   file_name = os_path.splitext(file_path)[0]

   with open(file_path, 'r', newline='', encoding='utf-8') as tsv_file:
       chunk_file = None
       writer = None
       counter = 1
       reader = csv.reader(tsv_file, delimiter='\t', quotechar='\'')

       # get header_chunk
       header_chunk = None
       for index, chunk in enumerate(reader):
           header_chunk = chunk
           header_chunk[0] = header_chunk[0][1:]
           break

       for index, chunk in enumerate(reader):
           if index % chunk_size == 0:
               if chunk_file is not None:
                   chunk_file.close()

               chunk_name = '{0}_{1}.tsv'.format(file_name, counter)
               chunk_file = open(chunk_name, 'w', newline='', encoding='utf-8')
               counter += 1
               writer = csv_writer(chunk_file, delimiter='\t', quotechar='\'')
               writer.writerow(header_chunk)
               print('File "{}" complete.'.format(chunk_name))

           chunk[1] = chunk[1].replace("'", "")
           writer.writerow(chunk)


Interesting Findings

Findings 1: As we look at the data or, the opacity of each station on the map during 2021, it seems that there were more riders on Red Line than any other line. We can also observe that O'Hare station has the most riders of all stations when we look at every line.

Findings 1

Findings 2: As we zoom in towards the loop, we can see that there are significantly more riders at stations in the loop and more ridership on Red Line stations near the loop. One more interesting observation we can see is that the UIC-Halsted station stands out with darker blue shade as it contains more than usual entries, which can be explained by the fact that about 85% of UIC students commute to the university.

Findings 2

Findings 3: Among all the Red line CTA stations, 95th/Dan Ryan was the busiest during Covid lockdown restrictions. It was also the third busiest CTA station in 2020 (source).

Findings 3

Findings 4: In 2019, Lake/State had an average of 19,364 weekday passenger entries, making it the busiest 'L' station (source). During the Covid restrictions, it lost momentum in ridership. But after the restrictions were removed, it again became the busiest red line staiton.

Findings 4

Findings 5: Sox 35th Dan Ryan station usually has less number of riders but during games or concerts at Chicago White Sox statidum the ridership data spikes. On September 24, 2016, Chicago White Sox stadium had a record attendance of 47,754 hosting a concert of Chance the Rapper (source). Thus, we can see that the Sox 35th Dan Ryan station almost had as many entries of ridership as Ohare Airport that day.

Findings 5

Findings 6: As we can clearly see, most of the CTA stations have very less ridership entries on Christmas day (Dec 25th). However, O'Hare remains busy as numerous people fly to and from Chicago Airports during Christmas (source).

Findings 6

Findings 7: On March 24, 2014 at 2:50 a.m. local time, a CTA passenger train overran the bumper at O'Hare, injuring 34 people. Following the accident, the line between O'Hare and Rosemont was closed, with a replacement bus service in place. We can clearly see from the bar chart, that due to that incident there were almost no ridership entries on OHare station on 24th March and the following week.

Findings 7

Findings 8: The Grant Park Music Festival is a ten-week (Jul 2 – Aug 21, 2021) classical music concert series held annually in Chicago, Illinois, United States. During this time Washington/Wabash station remains busy as it is the public transport access to Grant Park. Also, the station remains busy as Millennium Park is a very popular destination for visitors.

Findings 8


Setup and Installations

Install R:

Download R from https://www.r-project.org/ (4.1.2). Click “download R”.

R installation

You can select the default link https://cloud.r-project.org/.

Select default link

Download and install a version that match your OS.

Match OS installation

Install RStudio:

Download and install RStudio from https://rstudio.com/products/rstudio/.

RStudio installation

Download the free version.

RStudio free version

Setup the project:

Create a folder in your local machine where you want the project to locate at. Open the terminal and set the direction to the created folder. Run the following command:

git clone https://github.com/komar41/Dont-Sleep-in-the-Subway.git.

git clone

Open RStudio. Go to “File” and select “Open Project”.

Open project in RStudio

Choose the project folder (“Dont-Sleep-in-the-Subway”) that you cloned from GitHub.

Select project folder

Open the file “app.R” and press “Run App” button on RStudio.

Run app

Rstudio will tell you if you are missing some of the packages. When the pop-up shows up, click “Yes” to install all those packages.

Install missing packages

After Installation of those packages, RStudio will start a Shiny app on your local machine.