3  Actually Getting Flight Data (feat. httr2)

Alright, I’ve talked your ear off (typed your eyes off?) about the project background and the mental model framing my development approach. This section actually walks you through some code! For this section, we’ll be putting all of our code into an R script called support_functions.R

3.1 Interacting with APIs

There are many methods of interacting with APIs, but for this project I’ll be using httr2. For a comprehensive rundown of httr2, nothing replaces the docs, but I’ll give a quick rundown of some overarching principles before we set up our data pipelines.

A request in httr2 generally starts like this:

install.packages("httr2")
library(httr2)

req <- request("https://some_example.com")

… where some_example.com is the base URL forming all API requests. For the OpenSky network, the root of the REST API is https://opensky-network.org/api, though you could also set up the request using a specific endpoint as the base_url as well, for example, https://opensky-network.org/api/states/all?

The next function following request() is usually what the request needs to pass to the API. In this project, I use the following request varieties:

  • req_url_query(): forms API requests where the API roughly takes on the form of variable=value, i.e., https://opensky-network.org/api/states/all?icao24=abcdefg
  • req_url_path_append(): for when you simply need to append something to the end of the base URL, i.e., https://api.adsbdb.com/v0/n-number/
  • req_body_json(): for when you need to pass information in a JSON to an API

These are chained together using the pipe operator like so:

request(url) |> req_url_query()

We then perform the request by chaining req_perform() onto it. Once we use req_perform(), our request is sent! But we still need to extract our data from the request. For that, we’ll use one of httr2’s functions starting with resp. In this project, I only use one type:

  • resp_body_json(): returns the parsed JSON information from the httr2 response object

For the rest of this section, we’ll be building functions so we can send httr2 requests and get data back for our aircraft, thereby building the pipeline to supply our tables with new information updated at regular intervals throughout the day.

3.1.1 Interacting with the ADSBDB API

We’ll start with the most straightforward API to work with: the ADSBDB API. The ADSBDB API doesn’t require any authentication and simply requires us to append some information onto a few base URLs. Among other things, the ADSBDB API allows us to switch between icao24 (or Mode-S) identification information and registration numbers, which is important for us to construct a list of aircraft from the United Fleet Google Sheet that we can check against the OpenSky API, since the OpenSky API only takes icao24 addresses.

Let’s start by reading through the ADSBDB API documentation for going from registration values to icao24 values.

Okay, so we’ve got the base URL, https://api.adsbdb.com/v0/, the endpoint, n_number/, and what we’ll query by, the registration, denoted by [N-NUMBER].

To keep it simple, we’ll just treat the URL up to n_number/ as the base URL when passing our url argument to request(). Since the registration number is simply appended to the base URL in this case, we’ll use req_url_path_append(). We will wrap our httr2 request in a function. Why? So we can programmatically use our HTTP request to the RESTful API across multiple aircraft using functions like map() later on. Let’s name the function something so we can easily remember what it does; maybe get_icao24_from_registration? And let’s call the variable we’re passing registration, since that’s what [N_NUMBER] serves as a stand-in for. Doing that gives us something like this:

library(httr2)

get_icao24_from_registration <- function(registration) {
  icao24 <- request("https://api.adsbdb.com/v0/n-number/") |>
    req_url_path_append(registration) |>
    req_perform() |>
    resp_body_json()

  return(icao24)
}

Alright, let’s try it on registration number N37502:

get_icao24_from_registration("N37502")
$response
[1] "A448D1"

This is a good start. But we might want the results to come out directly instead of being nested behind $response. We can change the return to return(icao24$response) to do that. We might also want to return the results in a tibble, so we can easily use the bind_rows() function from dplyr in combination with map() later to combine a bunch of results.

In order to use map(), we might also want to add some error handling to our function. Seeing that ADSBDB is being continuously updated to improve its coverage of registrations to icao24 values, there may be some gaps in the data, causing the function to error out if we input a registration that hasn’t been included in the database. There are several methods to deal with this, including the use of possibly() and safely(), though I chose to use tryCatch() in combination with the logger package to emit an error log.

Let’s put all these suggestions into practice. First, we’ll add the ability to export the result we want, icao24$response, as a tibble.

get_icao24_from_registration <- function(registration, return_tibble = TRUE) {
  icao24 <- request("https://api.adsbdb.com/v0/n-number/") |>
    req_url_path_append(registration) |>
    req_perform() |>
    resp_body_json()

  if (return_tibble) {
    return(as_tibble(icao24$response) |> rename(icao24 = value))
  } else {
    return(icao24$response)
  }
}

Next, we’ll wrap that in tryCatch(), emitting a log_error() whenever icao24 information isn’t found on a given registration.

library(logger)

get_icao24_from_registration <- function(registration, return_tibble = TRUE) {
  tryCatch(
    {
      icao24 <- request("https://api.adsbdb.com/v0/n-number/") |>
        req_url_path_append(registration) |>
        req_perform() |>
        resp_body_json()

      if (return_tibble) {
        return(as_tibble(icao24$response) |> rename(icao24 = value))
      } else {
        return(icao24$response)
      }
    },
    error = function(e) {
      log_error("No icao24 information found for registration {registration}")
      return(NULL)
    }
  )
}

And now we’ll try outputting the result again:

get_icao24_from_registration("N37502", return_tibble = FALSE)
[1] "A448D1"

Voila!

I think it’s useful to note things that I’d do differently if I had to do this again from scratch (now that I’m typing up this documentation after the fact). One thing I’d change is the error function in tryCatch(). Reason being that I’m making a pretty substantial assumption here about the nature of the error, i.e., that the icao24/mode-S value doesn’t exist, whereas the source of the error might be something different.

As such, I’d recommend doing something like this:

library(logger)

get_icao24_from_registration <- function(registration, return_tibble = TRUE) {
  tryCatch(
    {
      icao24 <- request("https://api.adsbdb.com/v0/n-number/") |>
        req_url_path_append(registration) |>
        req_perform() |>
        resp_body_json()

      if (return_tibble) {
        return(as_tibble(icao24$response) |> rename(icao24 = value))
      } else {
        return(icao24$response)
      }
    },
    error = function(e) {
      log_error("{registration} encountered error {e}")
      return(NULL)
    }
  )
}

get_icao24_from_registration("not a real registration", return_tibble = FALSE)
NULL

… which returns NULL as seen above, but also prints this message to the console:

ERROR [2025-09-30 20:03:09] not a real registration encountered error Error in `req_perform()`:
! HTTP 400 Bad Request.

… and, in fact, I’ve gone back and made sure {e} is included in my log_error() calls where appropriate.

We might also want a reverse function that allows us to move from icao24 numbers to registration numbers. That’s fairly easy, all we have to do is create a new function with a base URL hitting the endpoint mode_s and swap registration for icao24. That looks like so:

get_registration_from_icao24 <- function(icao24, return_tibble = TRUE) {
  tryCatch(
    {
      registration <- request("https://api.adsbdb.com/v0/mode-s/") |>
        req_url_path_append(icao24) |>
        req_perform() |>
        resp_body_json()

      if (return_tibble) {
        return(as_tibble(registration$response) |> rename(registration = value))
      } else {
        return(registration$response)
      }
    },
    error = function(e) {
      log_error("{icao24} encountered error {e}")
      return(NULL)
    }
  )
}

I’m storing these functions in a file called support_functions.R which I’ll call later in my other scripts, just to keep them in a separate, single source of truth.

Important

Hi reader. Future me here. Turns out I neglected to mention something important on the first go around. ADSBDB’s API actually has a time-boxed rate limit of 512 requests per 60 seconds.

So, we’ll need to add some request throttling. Luckily httr2 makes it really easy to do that with req_throttle(), which you can learn more about here. We’ll need to specify one parameter, capacity, which sets the number of requests we can make in a time period, fill_time_s, measured in seconds. fill_time_s conveniently defaults to 60. That gives us this line:

... |> req_throttle(capacity = 500)

We could set it to 512 exact. I put it to 500 just to add some buffer. That changes our functions above to this:

get_icao24_from_registration <- function(registration, return_tibble = TRUE) {
  tryCatch(
    {
      icao24 <- request("https://api.adsbdb.com/v0/n-number/") |>
        req_url_path_append(registration) |>
        req_throttle(capacity = 500) |>
        req_perform() |>
        resp_body_json()

      if (return_tibble) {
        return(as_tibble(icao24$response) |> rename(icao24 = value))
      } else {
        return(icao24$response)
      }
    },
    error = function(e) {
      log_error("{registration} encountered error {e}")
      return(NULL)
    }
  )
}

get_registration_from_icao24 <- function(icao24, return_tibble = TRUE) {
  tryCatch(
    {
      registration <- request("https://api.adsbdb.com/v0/mode-s/") |>
        req_url_path_append(icao24) |>
        req_throttle(capacity = 500) |>
        req_perform() |>
        resp_body_json()

      if (return_tibble) {
        return(as_tibble(registration$response) |> rename(registration = value))
      } else {
        return(registration$response)
      }
    },
    error = function(e) {
      log_error("{icao24} encountered error {e}")
      return(NULL)
    }
  )
}

Note that, as of the time of writing, there’s currently a bug in httr2 where the token bucket in req_throttle gets reset every time the request function is called, so you may still run into 429 limits until a fix is issued. In the meantime, see the commentary in that GitHub issue for ways to get around the bug.

3.1.2 Interacting with the adsb.lol API

The adsb.lol is a little trickier but still fairly friendly. Instead of req_url_path_append(), we’ve got to use req_body_json(), since the adsb.lol’s routeset endpoint only accepts requests formatted as JSON. Here’s the schema from the docs:

{
  "planes": [
    {
      "callsign": "string",
      "lat": 0,
      "lng": 0
    }
  ]
}

Okay, a little daunting if you’re not used to working with JSON - speaking for myself here! Luckily LLMs were a good tool that helped me tease this out. To prepare an R object for representation as JSON, we generally use list(). Likely an oversimplifcation, but in this case, each set of braces {} and brackets [] roughly represents where we need to include list(). Using that rule, our JSON setup should look like this:

json <- list(
  planes = list(list(callsign = callsign, lat = 0, lng = 0))
)

… where callsign is going to be the variable we use in a function. lat and lon are optional latitude and longitude parameters we won’t concern ourselves with for the time being.

Alright, so putting that together with what we’ve learned from working with ADSBDB, all we need to do is replace req_url_path_append() with req_body_json() and we should be good to go. Let’s call this one get_route_information.

get_route_information <- function(callsign) {
  json <- list(
    planes = list(list(callsign = callsign, lat = 0, lng = 0))
  )

  route <- request("https://api.adsb.lol/api/0/routeset/") |>
    req_body_json(json) |>
    req_perform() |>
    resp_body_json()

  return(route)
}

… and let’s try it with a random aircraft I pulled from the dashboard, N12003 operating as UAL881:

get_route_information("UAL881")
[[1]]
[[1]]$`_airport_codes_iata`
[1] "ORD-HND"

[[1]]$`_airports`
[[1]]$`_airports`[[1]]
[[1]]$`_airports`[[1]]$alt_feet
[1] 672

[[1]]$`_airports`[[1]]$alt_meters
[1] 204.83

[[1]]$`_airports`[[1]]$countryiso2
[1] "US"

[[1]]$`_airports`[[1]]$iata
[1] "ORD"

[[1]]$`_airports`[[1]]$icao
[1] "KORD"

[[1]]$`_airports`[[1]]$lat
[1] 41.9786

[[1]]$`_airports`[[1]]$location
[1] "Chicago"

[[1]]$`_airports`[[1]]$lon
[1] -87.9048

[[1]]$`_airports`[[1]]$name
[1] "Chicago O'Hare International Airport"


[[1]]$`_airports`[[2]]
[[1]]$`_airports`[[2]]$alt_feet
[1] 35

[[1]]$`_airports`[[2]]$alt_meters
[1] 10.67

[[1]]$`_airports`[[2]]$countryiso2
[1] "JP"

[[1]]$`_airports`[[2]]$iata
[1] "HND"

[[1]]$`_airports`[[2]]$icao
[1] "RJTT"

[[1]]$`_airports`[[2]]$lat
[1] 35.5523

[[1]]$`_airports`[[2]]$location
[1] "Tokyo"

[[1]]$`_airports`[[2]]$lon
[1] 139.78

[[1]]$`_airports`[[2]]$name
[1] "Tokyo-Haneda International Airport"



[[1]]$airline_code
[1] "UAL"

[[1]]$airport_codes
[1] "KORD-RJTT"

[[1]]$callsign
[1] "UAL881"

[[1]]$number
[1] "881"

[[1]]$plausible
[1] 0

Yeah, that doesn’t look very tidy. Well, they do say that most of data work is cleaning. Let’s see if we can massage this a bit. Looks like everything is stored under [[1]], so let’s get route[[1]] and try putting it into a tibble.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.1     ✔ stringr   1.5.2
✔ ggplot2   4.0.0     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(glue)

get_route_information <- function(callsign) {
  json <- list(
    planes = list(list(callsign = callsign, lat = 0, lng = 0))
  )

  route <- request("https://api.adsb.lol/api/0/routeset/") |>
    req_body_json(json) |>
    req_perform() |>
    resp_body_json()

  route <- route[[1]] |> as_tibble()

  return(route)
}

get_route_information("UAL881")
# A tibble: 2 × 7
  `_airport_codes_iata` `_airports`  airline_code airport_codes callsign number
  <chr>                 <list>       <chr>        <chr>         <chr>    <chr> 
1 ORD-HND               <named list> UAL          KORD-RJTT     UAL881   881   
2 ORD-HND               <named list> UAL          KORD-RJTT     UAL881   881   
# ℹ 1 more variable: plausible <int>

Better. But the shape isn’t entirely correct. If we think a little bit farther ahead to how the other data might look, we want one observation, i.e., one row, per plane. We’ve got two rows here, one for each airport it looks like. In fact, sometimes this API will return a multi-leg route - i.e., three rows or more! These routes are generally more common for point-to-point airlines like Southwest than they are with hub-and-spoke airlines like United. For now, to deal with these uncertain cases, we’ll plan on stopping the function entirely if there’s more than two rows returned by route.

In the future, one improvement would be to take the tracks provided by OpenSky and infer the departure airport based on spatial proximity using a spatial dataset of airports.

For now, we’ll focus on two-leg routes. After grabbing route[[1]], we need to pull apart the first row (origin) and second row (destination). We’ve also got to do something about these named lists sitting under the `_airports` column. Let’s try using unnest_wider(), separating the names with an underscore. We’re also going to try and bring these columns back together in a single row, so we’ve got to name them something different from one another. Let’s append origin_ to all columns in the object for the first row, and destination_ to all columns in the object for the second row. Then, let’s bind the columns using bind_cols().

get_route_information <- function(callsign) {
  json <- list(
    planes = list(list(callsign = callsign, lat = 0, lng = 0))
  )

  route <- request("https://api.adsb.lol/api/0/routeset/") |>
    req_body_json(json) |>
    req_perform() |>
    resp_body_json()

  route <- route[[1]] |> as_tibble()
  if (nrow(route) > 2) {
    stop(
      "Callsign has multiple routes or a multi-leg route. Unable to determine routing."
    )
  }
  route_origin <- route[1, ] |>
    unnest_wider(`_airports`, names_sep = "_") |>
    rename_all(~ glue("origin_{.x}"))
  route_destination <- route[2, ] |>
    unnest_wider(`_airports`, names_sep = "_") |>
    rename_all(~ glue("destination_{.x}"))

  route <- bind_cols(route_origin, route_destination)

  return(route_origin)
}

get_route_information("UAL881")
# A tibble: 1 × 15
  origin__airport_codes_iata origin__airports_alt_feet origin__airports_alt_me…¹
  <chr>                                          <dbl>                     <dbl>
1 ORD-HND                                          672                      205.
# ℹ abbreviated name: ¹​origin__airports_alt_meters
# ℹ 12 more variables: origin__airports_countryiso2 <chr>,
#   origin__airports_iata <chr>, origin__airports_icao <chr>,
#   origin__airports_lat <dbl>, origin__airports_location <chr>,
#   origin__airports_lon <dbl>, origin__airports_name <chr>,
#   origin_airline_code <chr>, origin_airport_codes <chr>,
#   origin_callsign <chr>, origin_number <chr>, origin_plausible <int>

Alright, just a few more things to take care of. Some of these columns now have a double underscore. Additionally, we don’t have the callsign info anywhere in this tibble, so we’ll need to mutate that in using mutate(callsign = callsign). Let’s also add some error handling similar to the other functions, where an error returns a tibble in a similar schema filled with NA, parsed down to only the columns used by table creation functions later in the project.

get_route_information <- function(callsign) {
  tryCatch(
    {
      json <- list(
        planes = list(list(callsign = callsign, lat = 0, lng = 0))
      )

      route <- request("https://api.adsb.lol/api/0/routeset/") |>
        req_body_json(json) |>
        req_perform() |>
        resp_body_json()

      route <- route[[1]] |> as_tibble()
      if (nrow(route) > 2) {
        stop(
          "Callsign has multiple routes or a multi-leg route. Unable to determine routing."
        )
      }
      route_origin <- route[1, ] |>
        unnest_wider(`_airports`, names_sep = "_") |>
        rename_all(~ glue("origin_{.x}"))
      route_destination <- route[2, ] |>
        unnest_wider(`_airports`, names_sep = "_") |>
        rename_all(~ glue("destination_{.x}"))

      route <- bind_cols(route_origin, route_destination) |>
        rename_all(~ str_replace_all(.x, "__", "_")) |>
        mutate(callsign = callsign)

      return(route)
    },
    error = function(e) {
      log_error("Error getting route info for {callsign} {e}")
      route <- tibble(
        callsign = callsign,
        origin_airports_iata = NA,
        origin_airports_name = NA,
        origin_airports_countryiso2 = NA,
        origin_plausible = 0, ### See the callout box below
        destination_airports_iata = NA,
        destination_airports_name = NA,
        destination_airports_countryiso2 = NA,
        destination_plausible = 0,
      )
      return(route)
    }
  )
}

get_route_information("UAL881")
# A tibble: 1 × 31
  origin_airport_codes_iata origin_airports_alt_feet origin_airports_alt_meters
  <chr>                                        <dbl>                      <dbl>
1 ORD-HND                                        672                       205.
# ℹ 28 more variables: origin_airports_countryiso2 <chr>,
#   origin_airports_iata <chr>, origin_airports_icao <chr>,
#   origin_airports_lat <dbl>, origin_airports_location <chr>,
#   origin_airports_lon <dbl>, origin_airports_name <chr>,
#   origin_airline_code <chr>, origin_airport_codes <chr>,
#   origin_callsign <chr>, origin_number <chr>, origin_plausible <int>,
#   destination_airport_codes_iata <chr>, …

And here’s our final product!

Important

Hi. Future me interjecting here in the hope of saving you some frustration. Remember when I said we were going to fill out the tibble when catching errors with NA for every variable? Turns out doing that, and deferring the correction of origin_plausible and destination_plausible to another function is actually a bad idea that’s going to cause some headaches down the line. It’s better for us to handle this at the source so we don’t run into issues, namely the issue of trying to check if these columns are equal to 1 or not, and giving our machine an existential crisis when it tries to determine if NA == 1.

As such, I’ve made sure that origin_plausible and destination_plausible return 0 here. I’ve further made sure in Section 5.1 to drop any NA values from these columns. I think it’s a sensical return, as given that we can’t determine where the origin/destination are, the result isn’t plausible, by virtue of there being no result (and I don’t think it’s plausible that a plane appeared/disappeared, even though I vaguely remember there being a network TV and then a Netflix show about this premise).

Hopefully that’ll save you from this:

3.1.3 Interacting with the OpenSky API

Even though I’ve listed the OpenSky API last, it isn’t really that daunting beyond authorizing your credentials. The authorization piece adds a little extra bit of complexity, but nothing we can’t handle. As alluded to in Section 2.1.1, the OpenSky API uses API credits as a way to deal with demand (among other things). For non-registered users, the limit is 400 credits a day, which roughly comes out to about 100 flights a day. That’s not very much. Luckily, with a simple and free registration, we can get ourselves up to 4,000 credits a day, or 1,000 flights a day instead!

3.1.3.1 Interacting as an Unauthenticated User

I’ll cover the unauthenticated piece first and provide more details about authentication in the next section. We might as well use up the 400 credits we get as an unauthorized user on messing around and actually getting our functions to work, since the 400 and 4,000 are separate limits.

We want both track and state vector data for a given aircraft, defined by an icao24 address. The base URLs for those endpoints are https://opensky-network.org/api/states/all? and https://opensky-network.org/api/tracks/all?. Let’s start with the track information first. Taking a look at the docs:

We can see that our request needs to at least pass the property icao24. Time is optional, if we had information on when we thought the given icao24 code was in the air (i.e., a time between the start and end of a given flight). Since we don’t have that information, we’ll just pass the icao24 code in our request. To do that, we’ll use req_url_query(), which is structured in a variable = value pattern, and call this function get_flight_track.

get_flight_track <- function(icao24, as_sf = TRUE) {
  opensky_response <- request("https://opensky-network.org/api/tracks/all?") |>
    req_url_query(icao24 = str_to_lower(icao24)) |>
    req_perform()
}

This is the basic body of our request. Note the str_to_lower() wrapping icao24. The OpenSky API only accepts lower case icao24 addresses. To make life a little easier, I added str_to_lower() as a failsafe to make sure the request doesn’t fail because of a malformed icao24 address. Before we include the code to actually get the response, we should probably add something here that we haven’t added to our previous functions, which is a way to tell how many credits we have left. For the OpenSky API, this is contained in the response header "X-Rate-Limit-Remaining". We’ll grab that information and output it to the console using log_info().

get_flight_track <- function(icao24, as_sf = TRUE) {
  opensky_response <- request("https://opensky-network.org/api/tracks/all?") |>
    req_url_query(icao24 = str_to_lower(icao24)) |>
    req_perform()

  check_remaining_credits <- opensky_response |>
    resp_header("X-Rate-Limit-Remaining")

  log_info("Remaining API credits: {check_remaining_credits}")
}

… and now we’ll add the code to actually get the response:

get_flight_track <- function(icao24, as_sf = TRUE) {
  opensky_response <- request("https://opensky-network.org/api/tracks/all?") |>
    req_url_query(icao24 = str_to_lower(icao24)) |>
    req_perform()

  check_remaining_credits <- opensky_response |>
    resp_header("X-Rate-Limit-Remaining")

  log_info("Remaining API credits: {check_remaining_credits}")

  flight_track <- opensky_response |>
    resp_body_json() |>
    as_tibble()

  return(flight_track)
}

What do we get? Well, depending on what time of day you run this, you might get a result, or you might not. So, for the purposes of this tutorial, I’m going to cheat a little bit, find an active flight, capture it, and store it to a CSV.

# flight_track <- get_flight_track("A126CC")
# write_csv(flight_track, "data/flight_track.csv")

read_csv("data/flight_track.csv")
Rows: 60 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): icao24, callsign
dbl (2): startTime, endTime
lgl (1): path

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# A tibble: 60 × 5
   icao24 callsign  startTime    endTime path 
   <chr>  <chr>         <dbl>      <dbl> <lgl>
 1 a126cc UAL2353  1759356477 1759359956 NA   
 2 a126cc UAL2353  1759356477 1759359956 NA   
 3 a126cc UAL2353  1759356477 1759359956 NA   
 4 a126cc UAL2353  1759356477 1759359956 NA   
 5 a126cc UAL2353  1759356477 1759359956 NA   
 6 a126cc UAL2353  1759356477 1759359956 NA   
 7 a126cc UAL2353  1759356477 1759359956 NA   
 8 a126cc UAL2353  1759356477 1759359956 NA   
 9 a126cc UAL2353  1759356477 1759359956 NA   
10 a126cc UAL2353  1759356477 1759359956 NA   
# ℹ 50 more rows

Alright, a few things to note here. One, even though path claims to be of type <lgl> and is full of NA values, that’s because path actually comes through as a list-column. So we’ll need to do that unnesting procedure that we’ve done many times before:

get_flight_track <- function(icao24, as_sf = TRUE) {
  opensky_response <- request("https://opensky-network.org/api/tracks/all?") |>
    req_url_query(icao24 = str_to_lower(icao24)) |>
    req_perform()

  check_remaining_credits <- opensky_response |>
    resp_header("X-Rate-Limit-Remaining")

  log_info("Remaining API credits: {check_remaining_credits}")

  flight_track <- opensky_response |>
    resp_body_json() |>
    as_tibble() |>
    unnest_wider(col = path, names_sep = "_")

  return(flight_track)
}
# flight_track <- get_flight_track("A126CC")
# write_csv(flight_track, "data/flight_track_2.csv")

read_csv("data/flight_track_2.csv")
Rows: 60 Columns: 10
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): icao24, callsign
dbl (7): startTime, endTime, path_1, path_2, path_3, path_4, path_5
lgl (1): path_6

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# A tibble: 60 × 10
   icao24 callsign  startTime  endTime path_1 path_2 path_3 path_4 path_5 path_6
   <chr>  <chr>         <dbl>    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl> <lgl> 
 1 a126cc UAL2353  1759356477   1.76e9 1.76e9   29.9  -95.3    304    142 FALSE 
 2 a126cc UAL2353  1759356477   1.76e9 1.76e9   29.9  -95.3    304    131 FALSE 
 3 a126cc UAL2353  1759356477   1.76e9 1.76e9   29.9  -95.3    304    124 FALSE 
 4 a126cc UAL2353  1759356477   1.76e9 1.76e9   29.9  -95.3    304    122 FALSE 
 5 a126cc UAL2353  1759356477   1.76e9 1.76e9   29.9  -95.3    304    121 FALSE 
 6 a126cc UAL2353  1759356477   1.76e9 1.76e9   29.9  -95.3    609    123 FALSE 
 7 a126cc UAL2353  1759356477   1.76e9 1.76e9   29.9  -95.3    609    124 FALSE 
 8 a126cc UAL2353  1759356477   1.76e9 1.76e9   29.9  -95.2    914    126 FALSE 
 9 a126cc UAL2353  1759356477   1.76e9 1.76e9   29.9  -95.2   1219    125 FALSE 
10 a126cc UAL2353  1759356477   1.76e9 1.76e9   29.8  -95.2   1524    124 FALSE 
# ℹ 50 more rows

Okay. Unlike the adsb.lol API response which returns named lists, the OpenSky API returns an unnamed list that we have to unnest. This means we get less descriptive column names when we run unnest_wider(), since we’re lacking the metadata that a named list gives us, hence the path_1, path_2, path_n schema. Luckily, if we take a look at the OpenSky API documents again, each property of the JSON response is documented by index.

R is a one-based indexed language, so shift every value of the index in that table by 1. Doing that, we can rename our columns by including this bit of code:

rename(
  timestamp = path_1,
  latitude = path_2,
  longitude = path_3,
  baro_altitude = path_4,
  true_track = path_5,
  on_ground = path_6
)

The columns startTime, endTime, and timestamp are currently long numeric strings, because they represent Unix time: a running total of seconds since January 1, 1970, midnight UTC. I’m not sure about the merits of using Unix timestamp, and I won’t go down that rabbit hole right now, but I’ll extend you the courtesy of linking the Wikipedia article on Unix time if you’re so inclined. lubridate’s got a function, as_datetime(), that helps us convert Unix timestamps into something more intelligible. We’ll mutate across all the time columns to convert them into human (and machine) readable timestamps.

get_flight_track <- function(icao24) {
  opensky_response <- request("https://opensky-network.org/api/tracks/all?") |>
    req_url_query(icao24 = str_to_lower(icao24)) |>
    req_perform()

  check_remaining_credits <- opensky_response |>
    resp_header("X-Rate-Limit-Remaining")

  log_info("Remaining API credits: {check_remaining_credits}")

  flight_track <- opensky_response |>
    resp_body_json() |>
    as_tibble() |>
    unnest_wider(col = path, names_sep = "_") |>
    rename(
      timestamp = path_1,
      latitude = path_2,
      longitude = path_3,
      baro_altitude = path_4,
      true_track = path_5,
      on_ground = path_6
    ) |>
    mutate(across(c(startTime, endTime, timestamp), \(x) as_datetime(x))) |>
    mutate(callsign = trimws(callsign)) |>
    arrange(timestamp)

  return(flight_track)
}

Finally, we’re dealing with geospatial data here. This data has a bunch of rows, and that’s because it has a bunch of different coordinates throughout time. For present purposes, we really just want the complete picture of an aircraft’s flight path up until the point in time we pulled its data. So let’s summarize the data, convert it to a simple features dataframe, and cast it to "LINESTRING". We’ll do that by using st_as_sf() to convert the data to a simple features dataframe, specifying coords = c("longitude", "latitude", "baro_altitude") as the X, Y, and Z (height) dimensions of our data. We can specify dim = "XYZ", though the default of this function already captures this, so it can be omitted. Finally, we need to define the projection system. The World Geodetic System, WGS, and its current version, 84, is the standard, hence we’ll specify 4326 for WGS84.

library(sf)
Linking to GEOS 3.12.1, GDAL 3.8.4, PROJ 9.4.0; sf_use_s2() is TRUE
get_flight_track <- function(icao24, as_sf = TRUE) {
  opensky_response <- request("https://opensky-network.org/api/tracks/all?") |>
    req_url_query(icao24 = str_to_lower(icao24)) |>
    req_perform()

  check_remaining_credits <- opensky_response |>
    resp_header("X-Rate-Limit-Remaining")

  log_info("Remaining API credits: {check_remaining_credits}")

  flight_track <- opensky_response |>
    resp_body_json() |>
    as_tibble() |>
    unnest_wider(col = path, names_sep = "_") |>
    rename(
      timestamp = path_1,
      latitude = path_2,
      longitude = path_3,
      baro_altitude = path_4,
      true_track = path_5,
      on_ground = path_6
    ) |>
    mutate(across(c(startTime, endTime, timestamp), \(x) as_datetime(x))) |>
    mutate(callsign = trimws(callsign)) |>
    arrange(timestamp)

  if (as_sf) {
    flight_track <- st_as_sf(
      flight_track,
      coords = c("longitude", "latitude", "baro_altitude"),
      dim = "XYZ",
      crs = 4326
    ) |>
      group_by(icao24, callsign) |>
      summarize(do_union = FALSE) |>
      st_cast("LINESTRING") |>
      st_wrap_dateline()
  }
  return(flight_track)
}

Easy, right? Well, easier than me trying to figure out why on Earth I was getting weird lines for the longest time. I’ll draw special attention to two things here:

  1. summarize(do_union = FALSE): prevents sf from trying to union all the geometries (in this case, points) together when summarizing. If you fail to specify this argument, what you get is a weird connection of lines rather than a flight track.
  2. st_wrap_dateline(): If you’ve ever mapped anything in the Pacific Ocean, you’ll probably be painfully aware that computers seem to have a tough time with the international date line.1 The date line causes a discontinuity in the coordinate system, particularly on the WGS84 projection. Interactive maps generally solve this issue, but, again, if you’re projecting data using WGS84, the coordinate system will have a discontinuity, and so sf will, rightfully, wrap your line around the other side of the world to make a valid, continuous geometry. st_wrap_dateline() solves this problem by detecting geometry that crosses the international date line, splitting it, and casting to "MULTILINESTRING".2

I mentioned above as well as in Section 2.1.1 that this function will fail and consume credits if the queried icao24 address is not actually in flight at the moment. Let’s add error handling for that similar to our other functions:

get_flight_track <- function(icao24, as_sf = TRUE) {
  tryCatch(
    {
      opensky_response <- request(
        "https://opensky-network.org/api/tracks/all?"
      ) |>
        req_url_query(icao24 = str_to_lower(icao24)) |>
        req_perform()

      check_remaining_credits <- opensky_response |>
        resp_header("X-Rate-Limit-Remaining")

      log_info("Remaining API credits: {check_remaining_credits}")

      flight_track <- opensky_response |>
        resp_body_json() |>
        as_tibble() |>
        unnest_wider(col = path, names_sep = "_") |>
        rename(
          timestamp = path_1,
          latitude = path_2,
          longitude = path_3,
          baro_altitude = path_4,
          true_track = path_5,
          on_ground = path_6
        ) |>
        mutate(across(c(startTime, endTime, timestamp), \(x) as_datetime(x))) |>
        mutate(callsign = trimws(callsign)) |>
        arrange(timestamp)

      if (as_sf) {
        flight_track <- st_as_sf(
          flight_track,
          coords = c("longitude", "latitude", "baro_altitude"),
          dim = "XYZ",
          crs = 4326
        ) |>
          group_by(icao24, callsign) |>
          summarize(do_union = FALSE) |>
          st_cast("LINESTRING") |>
          st_wrap_dateline()
      }
      return(flight_track)
    },
    error = function(e) {
      log_error(
        "Error occurred when retrieving flight track for {icao24}. {e} Check your icao24 value: {icao24} may be valid but currently inactive."
      )
      flight_position <- tibble(
        icao24 = str_to_lower(icao24),
        on_ground = TRUE
      )
    }
  )
}

If the function doesn’t return any data, it could very well be the case that the aircraft associated with the icao24 address is inactive (on the ground) at the moment. For actually valid icao24 values, it seems fair to return a tibble with the icao24 value and a status of TRUE for on_ground. If we were to build this out further, we’d probably want to add some validation to the user’s icao24 input.

The way I’ve handled the errors here is anticipation of using map() on this function, and this isn’t the only way, or maybe even the best way, to achieve this goal. One could use possibly() or safely() as alternatives which will move past errors when applying this function against a vector. But, for present purposes, this is what I’ve went with.

Okay, let’s do the same thing for state vectors. A lot of this is going to be similar so I’m going to move substantially quicker through this one. The base URL this time is https://opensky-network.org/api/states/all?, the arguments we need to pass (icao24) are the same, and we’ve got a whole bunch more return values. Spoiler: we’ll need to unnest, rename, coerce Unix values, and convert to simple features like last time.3 What’s different this time is that we’ve got a callsign column with some whitespace in it. We’ll deal with the whitespace using mutate(callsign = trimws(callsign)) to get rid of it. We’ll call this function get_state_vector.

get_state_vector <- function(icao24, as_sf = TRUE) {
  tryCatch(
    {
      opensky_response <- request(
        "https://opensky-network.org/api/states/all?"
      ) |>
        req_url_query(icao24 = str_to_lower(icao24)) |>
        req_perform()

      check_remaining_credits <- opensky_response |>
        resp_header("X-Rate-Limit-Remaining")

      log_info("Remaining API credits: {check_remaining_credits}")

      flight_position <- opensky_response |>
        resp_body_json() |>
        as_tibble() |>
        unnest_wider(col = states, names_sep = "_") |>
        rename(
          icao24 = states_1,
          callsign = states_2,
          origin_country = states_3,
          time_position = states_4,
          last_contact = states_5,
          longitude = states_6,
          latitude = states_7,
          baro_altitude = states_8,
          on_ground = states_9,
          velocity = states_10,
          true_track = states_11,
          vertical_rate = states_12,
          sensors = states_13,
          geo_altitude = states_14,
          squawk = states_15,
          special_purpose = states_16,
          position_source = states_17
        ) |>
        mutate(callsign = trimws(callsign)) |>
        mutate(across(c(time, time_position, last_contact), \(x) {
          as_datetime(x)
        }))

      if (as_sf) {
        flight_position <- st_as_sf(
          flight_position,
          coords = c("longitude", "latitude", "baro_altitude"),
          dim = "XYZ",
          crs = 4326
        )
      }

      return(flight_position)
    },
    error = function(e) {
      log_error(
        "Error occurred when retrieving state vector for {icao24}. {e} Check your icao24 value: {icao24} may be valid but currently inactive."
      )
      flight_position <- tibble(
        icao24 = str_to_lower(icao24),
        on_ground = TRUE
      )
    }
  )
}

Let’s save this data for later using a geoparquet file.

library(sfarrow)

# st_write_parquet(get_flight_track("A126CC"), "data/flight_track.parquet")
# st_write_parquet(get_state_vector("A126CC"), "data/state_vector.parquet")

st_read_parquet("data/flight_track.parquet")
Simple feature collection with 1 feature and 2 fields
Geometry type: LINESTRING
Dimension:     XYZ
Bounding box:  xmin: -95.3315 ymin: 28.0255 xmax: -87.0249 ymax: 29.9453
z_range:       zmin: 304 zmax: 10668
Geodetic CRS:  WGS 84
  icao24 callsign                       geometry
1 a126cc  UAL2353 LINESTRING Z (-95.3315 29.9...
st_read_parquet("data/state_vector.parquet")
Simple feature collection with 1 feature and 15 fields
Geometry type: POINT
Dimension:     XYZ
Bounding box:  xmin: -87.0249 ymin: 28.0255 xmax: -87.0249 ymax: 28.0255
z_range:       zmin: 10668 zmax: 10668
Geodetic CRS:  WGS 84
                 time icao24 callsign origin_country       time_position
1 2025-10-01 23:05:56 a126cc  UAL2353  United States 2025-10-01 23:05:56
         last_contact on_ground velocity true_track vertical_rate sensors
1 2025-10-01 23:05:56     FALSE   267.79     112.48             0      NA
  geo_altitude squawk special_purpose position_source
1     11254.74     NA           FALSE               0
                        geometry
1 POINT Z (-87.0249 28.0255 1...

3.1.3.2 Interacting as an Authenticated User

If we want to up our credit limit from 400 to 4,000 we’ll need to register with OpenSky. You can do that by clicking sign in here. Once you create an account, you’ll see a box on the right-hand sign of your screen to issue credentials. Your credentials will come in the form of a JSON file called credentials.json.

OpenSky uses an OAuth2 client credentials flow to authenticate requests to the API. All you need to know about that at the moment is httr2 has a fairly straightforward way to deal with this. First, we’ll need to define our oauth_client:

client1 <-
  oauth_client(
    id = Sys.getenv("OPENSKY_CLIENT_ID"),
    token_url = "https://auth.opensky-network.org/auth/realms/opensky-network/protocol/openid-connect/token",
    secret = Sys.getenv("OPENSKY_CLIENT_SECRET"),
    auth = "header"
  )

… where token_url is where we get our token from, id is the ID in our credential.json file we got from OpenSky, and secret is the secret in our credential.json file.

I use Sys.getenv() here for reasons I’ll get into in Section 6.2. You can use the keyring package and httr2 has some built-in functions for you to use as well. But for now, let’s use Sys.getenv().

As Hadley Wickham writes in the secrets portion of the httr2 documentation:

While you can manage the key explicitly in a variable, it’s much easier to store in an environment variable. In real life, you should NEVER use Sys.setenv() to create this env var because you will also store the secret in your .Rhistory. Instead add it to your .Renviron using usethis::edit_r_environ() or similar.

Add this to your .Renviron file:

OPENSKY_CLIENT_ID="{your_id}"
OPENSKY_CLIENT_SECRET="{your_secret}"

For example:

OPENSKY_CLIENT_ID="youropenskyid@whatever_this_domain_was_probably_openskyapi.com"
OPENSKY_CLIENT_SECRET="imnotputtingarealsecretinhereasanexamplewejustmetmaybegettoknowmealittlemorefirst"

After you’ve done that and defined client1 somewhere in your script (you can also define it within the functions below), we’ll modify our OpenSky functions slightly by adding one line: req_oauth_client_credentials(client1) |>

client1 <-
  oauth_client(
    id = Sys.getenv("OPENSKY_CLIENT_ID"),
    token_url = "https://auth.opensky-network.org/auth/realms/opensky-network/protocol/openid-connect/token",
    secret = Sys.getenv("OPENSKY_CLIENT_SECRET"),
    auth = "header"
  )

# The line we're adding:
# req_oauth_client_credentials(client1) |>

get_flight_track <- function(icao24, as_sf = TRUE) {
  tryCatch(
    {
      opensky_response <- request(
        "https://opensky-network.org/api/tracks/all?"
      ) |>
        req_oauth_client_credentials(client1) |>
        req_url_query(icao24 = str_to_lower(icao24)) |>
        req_perform()

      check_remaining_credits <- opensky_response |>
        resp_header("X-Rate-Limit-Remaining")

      log_info("Remaining API credits: {check_remaining_credits}")

      flight_track <- opensky_response |>
        resp_body_json() |>
        as_tibble() |>
        unnest_wider(col = path, names_sep = "_") |>
        rename(
          timestamp = path_1,
          latitude = path_2,
          longitude = path_3,
          baro_altitude = path_4,
          true_track = path_5,
          on_ground = path_6
        ) |>
        mutate(across(c(startTime, endTime, timestamp), \(x) as_datetime(x))) |>
        mutate(callsign = trimws(callsign)) |>
        arrange(timestamp)

      if (as_sf) {
        flight_track <- st_as_sf(
          flight_track,
          coords = c("longitude", "latitude", "baro_altitude"),
          dim = "XYZ",
          crs = 4326
        ) |>
          group_by(icao24, callsign) |>
          summarize(do_union = FALSE) |>
          st_cast("LINESTRING") |>
          st_wrap_dateline()
      }
      return(flight_track)
    },
    error = function(e) {
      log_error(
        "Error occurred when retrieving flight track for {icao24}. {e} Check your icao24 value: {icao24} may be valid but currently inactive."
      )
      flight_position <- tibble(
        icao24 = str_to_lower(icao24),
        on_ground = TRUE
      )
    }
  )
}

get_state_vector <- function(icao24, as_sf = TRUE) {
  tryCatch(
    {
      opensky_response <- request(
        "https://opensky-network.org/api/states/all?"
      ) |>
        req_oauth_client_credentials(client1) |>
        req_url_query(icao24 = str_to_lower(icao24)) |>
        req_perform()

      check_remaining_credits <- opensky_response |>
        resp_header("X-Rate-Limit-Remaining")

      log_info("Remaining API credits: {check_remaining_credits}")

      flight_position <- opensky_response |>
        resp_body_json() |>
        as_tibble() |>
        unnest_wider(col = states, names_sep = "_") |>
        rename(
          icao24 = states_1,
          callsign = states_2,
          origin_country = states_3,
          time_position = states_4,
          last_contact = states_5,
          longitude = states_6,
          latitude = states_7,
          baro_altitude = states_8,
          on_ground = states_9,
          velocity = states_10,
          true_track = states_11,
          vertical_rate = states_12,
          sensors = states_13,
          geo_altitude = states_14,
          squawk = states_15,
          special_purpose = states_16,
          position_source = states_17
        ) |>
        mutate(callsign = trimws(callsign)) |>
        mutate(across(c(time, time_position, last_contact), \(x) {
          as_datetime(x)
        }))

      if (as_sf) {
        flight_position <- st_as_sf(
          flight_position,
          coords = c("longitude", "latitude", "baro_altitude"),
          dim = "XYZ",
          crs = 4326
        )
      }

      return(flight_position)
    },
    error = function(e) {
      log_error(
        "Error occurred when retrieving state vector for {icao24}. {e} Check your icao24 value: {icao24} may be valid but currently inactive."
      )
      flight_position <- tibble(
        icao24 = str_to_lower(icao24),
        on_ground = TRUE
      )
    }
  )
}

Well, that was a lot. We still don’t even have all of our data yet! We’re still missing our fleet information. To be continued… in the next chapter.


  1. I imagine there’s an opportunity for a self-deprecating joke here. I don’t want to write it out and you’re probably clever enough to infer it considering I’m more content sitting in front of my computer typing this than actually going outside.↩︎

  2. This won’t fix all issues with respect to the dateline. For example, the issue might be that an aircraft simply doesn’t ping while its crossing the Pacific due to the lack of an ADS-B or other station along the flight path. Thus, when summarize() gets run, the computer defaults to drawing a line that doesn’t cross the date line. So, take this footnote as my note-to-self for improvement.↩︎

  3. You might notice there’s a disparity between the screenshot and the code below, namely that there’s no rename pair for index 17 (category). This API call doesn’t seem to return category data, at least not in a way that gets captured when unnesting wider.↩︎