Skip to content

combine_gtfs_feeds documentation

stefancoe edited this page Oct 14, 2021 · 21 revisions

Overview

combine_gtfs_feeds is a command line tool to combine multiple gtfs feeds into a single feed/dataset. The main purpose of combine_gtfs_tools is to be able work from one GTFS feed when performing transit service analysis for a particular geographic location. The Puget Sound region, for example, has 7 different transit agencies and each publish their own GTFS feed. We (PSRC) use GTFS for lots of analytical, mapping and network modeling applications. For example, we may need to find all the transit stops in the region that have frequent service and then determine the population or number of jobs within a certain distance from those stops. We often rely on other python packages to do this work, but starting from one unified GTFS feed for the region makes this process a lot easier.

Installation

First clone the repository, then navigate to the root directory and enter the following in a command prompt:
python setup.py install
This will install combine_gtfs_feeds in your current python environment.

Command line arguments:

Arguments:
combine_gtfs_feeds run -g <gtfs_dir> -s <service_date> -o <output_dir>

combine_gtfs_feeds is the entry point to the module.

run is a sub-command argument that was added because we hope to add the ability to download gtfs feeds, which would require it's own sub-command and arguments.

  • -g, --gtfs_dir The location of the folders containing the GTFS files from each feed. Each feed should be stored in it's own folder, which should all be in the same directory. Each folder should be named something to clearly identify the feed it contains. For example, a good name for the folder holding Sound Transit's feed might be 'ST'. This name will be prepended to each ID (route_id, stop_id, trip_id & shape_id) in the output GTFS files to uniquely identify the origin feed and prevent duplicate IDs.

  • -s, --service_date The date in YYYYMMDD format that represents the service the combined GTFS will represent. Each feed in the --gtfs_dir must have at least one service id that includes this date or the program will exit. The idea here is to pick a date that is typical of the service you wish to analyze. For example, we use a non holiday Tuesday in May to represent weekday spring service. Note, the output of this program will only include service for this date. The program must be run independently for each date of interest.

  • -o, --output_path The location of the resulting GTFS feed. This will include the following GTFS files and log file:

    • calendar.txt
    • routes.txt
    • trips.txt
    • stop_times.txt
    • stops.txt
    • shapes.txt
    • agency.txt
    • run_log.txt

Example:
combine_gtfs_feeds run -g c:/gtfs_folder -s 20210914 -o c:/output_folder

Notes: Some feeds have a file called frequency.txt, which is often used for trips that have regular headways. In this case, trips are not uniquely represented in trips.txt and stop_times.txt. Rather, one representative trip_id is used in trips.txt and stop_times.txt and the frequency of the representative trip is indicated in frequencies.txt. Using the information in frequencies.txt, combine_gtfs_feeds will create unique trips, which are included in trips.txt and stop_times.txt. The file frequencies.txt will not be included in the output. See more here:
https://developers.google.com/transit/gtfs/reference#frequenciestxt

Both calendar.txt and calendar_dates.txt can be used in different ways to define service. We feel that we have captured the different ways that an agency might use these two files, but please notify us by creating an issue if this is not the case.

Clone this wiki locally