An Analytical Report on the South East Queensland Public Transport Network

A comprehensive analysis of the SEQ public transport system using GTFS data, Python, and PostgreSQL to uncover transit insights and operational patterns.

Technology Stack

Language

Data Manipulation & Analysis

Database

Data Visualization

Development & Version Control



1. Interactive Dashboard Analysis in Power BI

To make the findings of this analysis accessible and allow for dynamic, user-driven exploration, a comprehensive dashboard was developed using Microsoft Power BI. This tool moves beyond the static charts generated by Python to provide a live, interactive experience.

 

Note: You need to log in to Power BI in order to use the interactive dashboard below.

Interactive Power BI Dashboard - SEQ Public Transport Analysis

Key Features & Insights from the Dashboard

The dashboard was designed to confirm the initial hypotheses while empowering users to discover more nuanced patterns through interactive filters.

1. KPIs and Dynamic Filtering
2. Geospatial Analysis
3. Hourly and Weekly Patterns
4. Performance and Hierarchy Analysis

By building this dashboard, the analysis transitions from a one-time report into a reusable and persistent analytical tool, allowing ongoing exploration of the dataset.


2. Executive Summary

This report presents a comprehensive analysis of the South East Queensland (SEQ) public transport network using scheduled data from Translink's General Transit Feed Specification (GTFS). The analysis reveals a system that is heavily optimized for weekday commuters and geographically centered around the Brisbane CBD.

Key findings indicate that a small number of high-frequency bus routes, such as the CityGlider (60 & 61) and university-focused Route 66, form the backbone of the network. Service levels drop by nearly 50% on weekends. The network's busiest hubs, including King George Square Bus Station and Central Train Station, are concentrated in the CBD and exhibit classic bimodal peak activity during morning (7-9 AM) and evening (4-6 PM) commuter periods. This analysis provides a foundational understanding of the network's operational structure, which is critical for urban planning and service optimization.

3. Introduction

Understanding the structure and usage patterns of a public transport network is essential for effective urban planning, resource allocation, and improving citizen mobility. This project undertakes a deep-dive analysis of the SEQ public transport system, operated by Translink. By processing and analyzing the publicly available GTFS data, we aim to transform raw, tabular data into actionable insights about the network's scale, key corridors, operational patterns, and geographic distribution.

4. Problem Statement & Hypotheses

The primary challenge is to process a complex, multi-file relational dataset to answer fundamental questions about the network's characteristics. This project tests the following hypotheses:

5. Methodology

The analysis was conducted using a two-phase methodology:

  1. Data Processing (ETL): Raw GTFS seq-translink-etl/data files were ingested into a PostgreSQL database using a Python script. This phase included critical data cleaning steps: handling missing values, correcting data types, removing duplicates, and standardizing text fields. This resulted in a clean, reliable, and analysis-ready relational database.
  2. Data Analysis & Visualization: A Jupyter Notebook was used to connect to the database, load the raw data, and perform all data cleaning. Subsequent analysis was performed using SQL queries, including window functions (RANK(), ROW_NUMBER()), to aggregate and extract insights. The results were visualized using Python libraries (Matplotlib, Seaborn, Folium).

6. Analysis and Findings

6.1. Top 10 Most Frequent Routes

Top 10 Bus Routes

Top 10 Bus Routes

Finding: As predicted by Hypothesis 1, a few routes dominate service frequency. The CityGlider (60, 61) and UQ Lakes (66) routes have substantially more scheduled trips than others, establishing them as the primary arteries of the bus network.

6.2. Service Levels by Day of the Week

Trips per Weekday

Trips per Weekday

Finding: This chart strongly supports Hypothesis 2. There is a clear and significant drop in service levels on Saturday and Sunday compared to the consistent, high volume of trips from Monday to Friday.

6.3. Busiest Transport Hubs

Top 10 Bus Stops

Top 10 Bus Stops

Finding: The busiest bus and train stops are overwhelmingly located in or directly adjacent to the Brisbane CBD (e.g., King George Square, Cultural Centre, Central Station). This confirms Hypothesis 3, highlighting the CBD's role as the network's central nexus.

6.4. Geographic Distribution of All Stops

Map of All Stops

Map of All Stops

Finding: This interactive map shows a high density of stops in the urban core of Brisbane, with services extending to surrounding regions. The clustering capability allows for an intuitive exploration of service coverage.

6.5. Weekly & Hourly Activity Heatmaps

Weekly Heatmap

Weekly Heatmap

Hourly Heatmap

Hourly Heatmap

Finding: These heatmaps provide compelling evidence for Hypothesis 4. The weekly heatmap reinforces the weekday vs. weekend service drop-off across all major stops. The hourly heatmap clearly illustrates the bimodal commuter pattern, with activity peaking between 7-9 AM and 4-6 PM.

6.6. Analysis with Window Functions

To gain deeper insights, we used SQL window functions for ranking and sequential analysis.

Service Span of Key Routes

Finding: By using ROW_NUMBER() to identify the first and last trips of the day for key routes, we can visualize their operational hours. The chart clearly shows that high-frequency routes like the CityGlider (60) not only have many trips but also the longest service span, starting early in the morning and running late into the night, confirming their role as the network's core arteries.

Ranking Routes by Number of Stops

Top 15 Bus Routes Ranked by Number of Unique Stops Serviced:
                route_short_name                route_long_name  stop_count  rank_val    dense_rank_val
              0               599     Great Circle Line anti-clockwise         173         1               1
              1               598        Great Circle Line clockwise           172         2               2
              2               330  Bracken Ridge to City, Queen Street         109         3               3
              ...

Finding: Using RANK() and DENSE_RANK(), I identified the routes with the most extensive coverage. The "Great Circle Line" routes (598/599) service the highest number of unique stops, indicating they are crucial for connecting a wide range of suburbs rather than just a direct point-to-point corridor. This analysis helps distinguish between high-frequency routes and high-coverage routes.


7. Limitations of the Analysis

It is crucial to acknowledge the limitations of this study, which offer avenues for future work:

8. Conclusion & Recommendations

This analysis successfully processed raw transit data to reveal the core operational characteristics of the SEQ public transport network. The findings confirmed all initial hypotheses, painting a picture of a system that is CBD-centric and tailored to a weekday-commuting population.

Recommendations for future work include:

Explore the Code

For a detailed look at the Python scripts, ETL process, and SQL queries used in this analysis, please visit the project repository on GitHub.

View on GitHub
Back to All Projects