Mission Success: SQL in Python in Colab

31 Jan 2025 — 3 min read

After my project to export RocketLaunch.Live's data via API, I was itching to really get into the datasets and knock out a couple of joins to merge my files into a clean dataset for further downstream analysis. But I didn't want to spin up a local DB server, so I used Colab to import the output CSV files from my Google Drive into a few DataFrames:

#Step 1: Set up environment

# Mount Google Drive to Colab
#Reference: https://stackoverflow.com/questions/48340341/how-to-read-csv-to-dataframe-in-google-colab
from google.colab import drive
drive.mount('/content/gdrive')

#Import libraries
import pandas as pd
import duckdb
import matplotlib.pyplot as plt
import numpy as np

path='/content/gdrive/MyDrive/faroutdata/rocketlaunch-live-api/'

companies=pd.read_csv(path+'companies.csv')
launches=pd.read_csv(path+'launches.csv')
locations=pd.read_csv(path+'locations.csv')
missions=pd.read_csv(path+'missions.csv')
pads=pd.read_csv(path+'pads.csv')
tags=pd.read_csv(path+'tags.csv')
vehicles=pd.read_csv(path+'vehicles.csv')

I could then run standard SQL queries over these DataFrames to establish logical relationships between the launch, company, vehicle and pad entities to get a single combined DataFrame and write to a new CSV file:

df = duckdb.query(
    "SELECT c.name as co_name, c.country_name as co_country_name, c.country_code as co_country_code, l.t0 as launch_planned_dt, l.date_str as launch_actual_dt, right(l.date_str ,4) as launch_actual_year, l.suborbital,l.result, l.missions_0_id as mission_id, l.missions_0_name as mission_name, v.id as vehicle_id, v.name as vehicle_name, v.company_name as vehicle_co_name, l.pad_name, l.pad_location_name, l.pad_location_state, l.pad_location_statename, l.pad_location_country, p.location_country_code as pad_location_country_code, p.location_latitude as pad_location_latitude, p.location_longitude as pad_location_longitude FROM launches as l LEFT JOIN companies as c on l.provider_id = c.id LEFT JOIN vehicles as v on l.vehicle_id = v.id LEFT JOIN pads as p on l.pad_id = p.id"
).to_df()

You can use Colab to merge multiple CSVs into one table-like DataFrame

It's not the most elegant approach, but perfectly suited to quick and dirty wrangling of multiple CSV files for downstream analysis. I was able to use my new dataset to visualize mission success rates for 2024:

Colab is great for parsing and plotting data, but Tableau is better for interactive visualizations, so I connected Tableau Public to my Google drive and created a new extract using the merged CSV file. Launch Dates from the API are coded as MMM DD YYYY (e.g. Jan 01 2025), so I had to use a Calculated Field to parse API values into actual Tableau Date time-series values:

MAKEDATE(int(right([Launch Actual Dt],4)), [mmm_str_to_int], int(mid([Launch Actual Dt],5,2)))

Finally, I created a quick dashboard that allows you to compare Mission Success/Failure Counts and Rates across different launch vehicles and time periods:

Success / Failure Count by Rocket launch vehicle

Now that we can drill down into the API data, we can run a few data quality checks to ensure the reliability of our dashboard:

Blue Origin's New Glenn had 1 successful launch on 16th January 2025, this is captured in our Tableau dashboard
SpaceX's Starship prototype had 1 planned failure on 16th January 2025, this is also captured in our Tableau dashboard
India launched 5 rockets in 2024, and all 5 missions are reflected both in our dashboard and in the ISRO's mission log

Areas for improvement are to automate the Colab notebook using colabctl to write the API data to Google Sheets, and set up a Tableau extract refresh schedule (which isn't available on Tableau Public) for end-to-end automation.

Stack: Google Colab, Python, Tableau
Code: Colab notebook

SQLove

My career began with SQL, from when I first dabbled in it in college, to when I started teaching it to professionals on weekends for quick money after I graduated, and all through my working years. It's become part of my analytical toolkit and one of the few

31 Jan 2025

SQL

Rocket launch API data

There are few publicly available datasets with detailed rocket launch information. Two that I found are LL2 (free) and RocketLaunch.Live (USD 3/monthly). I thought it'll be an interesting exercise to export launch data from an API to see what insights and charts could be created. I

29 Jan 2025

Python Datawrapper

Related Posts

SQLove

Rocket launch API data