Skip to content

combine_gtfs_feeds documentation

stefancoe edited this page Feb 4, 2023 · 21 revisions

Overview

combine_gtfs_feeds is a tool to combine multiple gtfs feeds into a single feed/dataset. This package can be run as a command line tool in the prompt or as a python library that will provide access to the merged feed as pandas DataFrames. The output only includes service for the specified 'service_date' parameter, so it is meant to represent some broader level of service, like a typical spring weekday, for example. The main purpose of combine_gtfs_tools is to be able work from one GTFS feed when performing transit service analysis for a particular geographic location that may include several transit agencies, each with their own GTFS feed. The Puget Sound region, for example, has 7 different transit agencies and each publish their own GTFS feed. We (PSRC) use GTFS for lots of analytical, mapping and network modeling applications. For example, we may need to find all the transit stops in the region that have frequent service and then determine the population or number of jobs within a certain distance from those stops. We often rely on other pthon packages for this work, but starting from one unified GTFS feed for the region makes this process a lot easier.

Installation

Enter the following in a command prompt:
pip install combine-gtfs-feeds
This will install combine_gtfs_feeds in your current python environment. You can visit the PyPI page here:
https://pypi.org/project/combine-gtfs-feeds/

Command line arguments:

Arguments:
combine_gtfs_feeds run -g <gtfs_dir> -s <service_date> -o <output_dir>

combine_gtfs_feeds is the entry point to the module.

run is a sub-command argument that was added because we hope to add the ability to download gtfs feeds, which would require it's own sub-command and arguments.

  • -g, --gtfs_dir The location of the folders containing the GTFS files from each feed. Each feed should be stored in it's own folder, which should all be in the same directory. Each folder should be named something to clearly identify the feed it contains. For example, a good name for the folder holding Sound Transit's feed might be 'ST'. This name will be prepended to each ID (route_id, stop_id, trip_id & shape_id) in the output GTFS files to uniquely identify the origin feed and prevent duplicate IDs.

  • -s, --service_date The date in YYYYMMDD format that represents the service the combined GTFS will represent. Each feed in the --gtfs_dir must have at least one service id that includes this date or the program will exit. The idea here is to pick a date that is typical of the service you wish to analyze. For example, we use a non holiday Tuesday in May to represent weekday spring service. Note, the output of this program will only include service for this date. The program must be run independently for each date of interest.

  • -o, --output_path The location of the resulting GTFS feed. This will include the following GTFS files and log file:

    • calendar.txt
    • routes.txt
    • trips.txt
    • stop_times.txt
    • stops.txt
    • shapes.txt
    • agency.txt
    • run_log.txt

Example:
combine_gtfs_feeds run -g c:/gtfs_folder -s 20210914 -o c:/output_folder

As a Python library:

Import the library:
from combine_gtfs_feeds import cli

Use the following command:
cli.run.combine(gtfs_dir, service_date, output_path)

This will return an instance of the Combined_GTFS class, where each merged GTFS file is available as a pandas DataFrame via class instance properties like .trips_df. The class also includes a method to export the files, called export_feed.

Here is an example:
merged_gtfs = cli.run.combine(gtfs_dir, service_date, output_path)
merged_gtfs.routes_df.head()
merged_gtfs.export_feed()

Notes:

Some feeds have a file called frequency.txt, which is often used for trips that have regular headways. In such cases, individual trips for routes specified in frequency.txt are not uniquely represented in trips.txt and stop_times.txt. Instead, one representative trip_id is used in trips.txt and stop_times.txt and the frequency of the representative trip is indicated in frequencies.txt. For analytical work involving transit service, it is convenient to have the schedule pattern represented in one way. So, using the information in frequencies.txt, combine_gtfs_feeds will create unique trips, which are included in trips.txt and stop_times.txt. The file frequencies.txt will not be included in the output. See more here:
https://developers.google.com/transit/gtfs/reference#frequenciestxt

Both calendar.txt and calendar_dates.txt can be used in different ways to define service. We feel that we have captured the different ways that an agency might use these two files, but please notify us by creating an issue if this is not the case.

Clone this wiki locally