Introduction
On December 19, 2024, the MBTA adopted a new version of its Service Delivery Policy (SDP) in an effort to update how the MBTA measures the quality of its transit service against agency objectives. This new version of the SDP introduces several new standards and makes various changes to existing standards, all aimed at ensuring that the agency’s public-facing performance measures better reflect rider experiences.
OPMI was heavily involved in the development and calculation of these changes to SDP standards, which range from minor definition adjustments to major overhauls of performance measures, including some changes which are made possible by new data collection systems. We’re proud to share a roundup of the SDP metrics which have been updated or added in the 2024 version of the policy, with a focus on answering the following questions:
- How have metrics’ calculation methods changed from previous versions of the policy?
- How do SDP performance scores increase or decrease as a result of these changes in standards?
In this post, we use the Fall 2023 service period as a case study, calculating how Fall 2023 service would be evaluated under the old policy (the 2021 SDP) compared to how it would be evaluated under the new policy (the 2024 SDP). However, it should be noted that these new metrics, computations, and/or standards did not exist when planning for or operating service in Fall 2023. Hence, these calculations provide a useful baseline for judging future service, but should not be used to evaluate the quality of the effort put into planning or operations over this period. Note: this blog post will not discuss the metrics for which data collection is still in progress, since calculation methods are being finalized.
Metric Comparison
Heavy Rail Reliability
One of the most important changes in the new policy is the introduction of the Excess Trip Time (ETT) metric, which overhauls how the MBTA measures service reliability. This new metric is being rolled out to heavy rail (Blue Line, Orange Line, and Red Line) first, but we are setting up the data processes necessary to evaluate other service modes using ETT as well. There are two major ways in which Excess Trip Time improves on our previous reliability metric and paints a more accurate picture of transit reliability: it benchmarks MBTA performance against an “ideal” level of service rather than against current schedules, and it captures full trip times rather than only evaluating passenger wait times. For more information on ETT, please visit the “Introducing Excess Trip Time” blog post.
Our previous reliability metric, On Time Performance (OTP), is a percentage of actual departures or headways that are on time relative to their scheduled departures or headways. Because service schedules regularly change in response to operational considerations (e.g. operator availability), OTP is susceptible to diverge from rider expectations of service reliability because changing schedules can effectively “move the goalposts” of performance. Additionally, OTP is narrowly focused on how long passengers wait for transit vehicles to arrive prior to boarding, meaning that delays during in-vehicle travel time do not factor into its scoring of reliability.
MBTA Red Line performance over the past 3-4 years provides a strong illustration of the difference between ETT and our previous reliability metric, because during that time, the MBTA experienced both operator availability challenges and speed restrictions which led to schedule adjustments. Our previous reliability metric, OTP, remains mostly flat during this time, because even though headways got longer in 2022 and 2023, they were evaluated against adjusted schedules, and longer travel times did not directly factor into OTP. ETT, meanwhile, decreased sharply in response to the introduction of speed restrictions, before recovering in 2024 in response to the removal of speed restrictions and the improvement of operating headways (Figure 1).

The Fall 2023 service period would receive significantly lower reliability scores for heavy rail using the new ETT methodology than it would receive using the previous OTP methodology, as seen in Figure 2. Speed restrictions and slower schedules remained widespread in Fall 2023, so the fact that ETT captures these sources of delay better matches the way heavy rail riders experienced service reliability during this time. An analysis of differences among the MBTA heavy rail routes shows that the lowest scores were on the Red Line, where speed restrictions were more prevalent than on the Blue or Orange Lines, and where passengers tend to take longer trips, making the 5-minute threshold more ambitious.

Bus Reliability
For Bus service, the 2024 SDP adjusts the existing On Time Performance measure to reflect the impacts of unplanned cancellations of service, often called “dropped trips”. This change represents a quality-of-life improvement to the metric, pending the completion of the additional work necessary to calculate Bus reliability using Excess Trip Time instead, a methodology which already factors dropped trips in.
Our previous measure of Bus OTP excluded cancelled service from the calculation. This stemmed from limitations in our data sources, which could not reliably distinguish cancelled service from sensor outages and other factors that could cause a bus that operated and carried passengers to be absent from our data set. Thanks to improvements in our data sources and processes, we can now accurately incorporate service cancellations into the Bus Reliability standard. The new standard counts all Bus timepoints affected by these cancellations as failures when computing the overall Bus reliability scores (Figure 3).

Under this new standard, the Fall 2023 service period would receive slightly lower scores on Bus Reliability than it would receive under the old policy, as seen in Figure 2 above. An analysis of route-by-route differences in reliability scores between the old and new policies showed that many of the largest differences are on Key Bus routes, since the MBTA tends to pull buses from more frequent routes in order to avoid missing service on less frequent routes, resulting in many higher-frequency routes having higher rates of trip cancellations.
Bus Frequency
The new policy adjusts the definitions and terminology used for the agency’s most frequent Bus routes to better align with the ongoing implementation of the Bus Network Redesign. The new policy does not alter the calculation methodology for the Bus Frequency standard, nor does it represent any proposed or actual change in service – it just simplifies the definition of what headways a given route needs to attain in order to pass the Frequency test.
Under the previous policy, the frequency expectations for key bus routes were for:
- 10 minute or better headways at peak weekday times
- 15 minute or better headways at early morning and midday times on weekdays
- 20 minute or better headways on weekday evenings and all day on weekends
The new policy evaluates the current key bus routes under a new frequent bus standard, which is:
- 15 minutes or better headways all day, every day
This streamlines the standard and makes it easier to interpret. The fact that the frequency standard is being tightened for weekend and off-peak times and loosened at peak times means that the Fall 2023 service period would have slightly higher weekday scores and significantly lower weekend scores than it would under the old policy, as seen in Figure 4.
Notably, these scores for how Fall 2023 service would be evaluated under the new policy are retroactively applying a standard that did not exist when Fall 2023 service was being planned. For example, many of the current Key Bus routes are scheduled to operate with 15- to 20-minute effective headways on weekends, in alignment with the previous frequency standard. As the MBTA increases the number of Frequent Bus routes and adds service to the current Key Bus routes to run 15-minute or better headways at all times, the weekend scores for the Frequent Bus category are expected to rise.

Platform Accessibility
The new policy modifies the Platform Accessibility standard to better align performance scores with rider experiences, particularly for times when shuttle alternatives are provided as mitigation for elevator outages.
The Platform Accessibility calculation counts how many hours of service at MBTA platforms are inaccessible because of elevator outages, a number which is subtracted from the total duration of service to yield a percentage of platform hours that are accessible. Some elevator outages do not impact platform access because of redundant elevators that provide alternative accessible pathways to platforms, while for other elevators, one outage may prevent access to multiple platforms at once. However, under the previous policy, elevator outages during which accessible shuttle alternatives were provided as a mitigation measure were considered accessible platform-hours.

The new policy tightens and clarifies the Platform Accessibility calculation such that the existence of shuttle alternatives by itself does not have any bearing on how an elevator outage is counted (Figure 5). To reflect the fact that an accessible shuttle alternative often doesn’t provide the same quality of service that riders would get with a working elevator, all elevator outages are now considered to be inaccessible hours for the purposes of this calculation by default. The new policy does outline some minor exceptions to this rule, mainly for situations when an elevator is proactively taken out of service in order to reconstruct it as part of an accessibility modernization project. However, these exceptional situations are excluded from the calculation entirely rather than counted as accessible hours.
This change in standard makes a small difference on performance scores, with the Fall 2023 service period having a Platform Accessibility score that is about two percentage points lower using the new policy than it would have had under the old policy, as seen in Figure 6. Going forward, the new standard should ensure that real changes in platform access aren’t obscured from SDP annual reports simply because of mitigating shuttle service.

Heavy Rail Comfort
The new policy adds a standard for Passenger Comfort on heavy rail services, taking advantage of data sources and processes that were not previously available. Similar to the existing Passenger Comfort standard for Bus, the Heavy Rail Comfort standard evaluates how many minutes of passenger time are in non-crowded conditions, using maximum vehicle load standards defined in the SDP to define what’s considered unacceptably crowded.
However, the calculation of crowding is somewhat more approximate for the new Heavy Rail standard than it is for the existing bus standard. The Heavy Rail calculation uses passenger counts that are derived from faregate entries and exits for each station, whereas for Bus we use passenger counts from the Automated Passenger Counters (APCs) that are installed on individual vehicles. The new heavy rail calculation therefore requires making assumptions about gate-to-platform walk times, and it evaluates average crowding across each train relative to the maximum capacity thresholds, not the crowding on individual cars.
The policy also defines the maximum capacity thresholds for heavy rail cars in a slightly different way than it does for buses. Both modes use separate thresholds for high-volume (peak) travel times and lower-volume(off-peak) times. However, where the Bus standard uses thresholds based on percentage ratios of passengers to the number of seats on a given vehicle, the heavy rail maximum vehicle capacity thresholds are set based on the seating capacity of each car plus a fixed amount of space per standing passenger, using information about the floor areas of Heavy Rail cars. This difference reflects the fact that on Heavy Rail cars, a smaller portion of vehicle floor area is dedicated to seating than on MBTA buses.
During the Fall 2023 service period, a greater portion of passenger minutes on the Blue Line were in crowded conditions than they were on the Red or Orange Lines, resulting in lower Passenger Comfort scores for the Blue Line, as seen in Figure 7.

Ferry Dock Accessibility
Another new SDP standard measures the accessibility of ferry docks. Docks are considered accessible if they allow accessible transition on/off the vessel via bridge plate or gangways level to the vessel, and if they are designed to mitigate excessive slopes caused by changing tides. OPMI calculates Ferry Dock Accessibility using both an unweighted score (the percent of docks that are accessible) and a ridership-weighted score (the percent of ferry riders using accessible docks, using average ridership data). For Fall 2023, the ridership-weighted score (30.5%) is lower than the unweighted score (56.3%) because several of the docks with the highest ridership, such as Rowes Wharf, are not accessible (see Figures 8 and 9).
One notable difference in how the new Ferry Dock Accessibility standard is calculated compared to the existing Station Accessibility standard is that instead of recording the accessibility status of whole ferry terminals (e.g. Long Wharf North), we evaluate the individual docks within terminals, since docks serving different lines at a single terminal can vary in their accessibility to passengers with mobility impairments.
%2520chart%2520cream%25202_edited.png)
%2520chart%2520cream%25202_edited.png)
Span of Service
The changes made in the new policy to the definition of the Span of Service standard are primarily for greater understandability rather than representing a tightening or loosening of service expectations, resulting in small and relatively meaningless changes in performance scores for some route categories. Specifically, the new standard is intended to align the expected start- and end-of-service hours defined in the policy with typical rider expectations of when service starts and ends on a given day.
The previous SDP used an “inside bounds” methodology, evaluating each route based on whether its first trip of the day was scheduled to arrive in downtown Boston or the route terminal at or before the expected start time, and whether its last trip of the day was scheduled to depart downtown Boston or the route terminal at or after the expected end time. This meant, for example, that the 6:00 am expected start of service for the Orange Line represented the time when the first train of the day was expected to arrive in Downtown Crossing, rather than the time when it was expected to leave from its first stop (Forest Hills or Oak Grove). However, riders have a variety of destinations, not just downtown, and most riders want to know what the earliest time is that they can depart from their stop, not when they would arrive downtown or at the route terminal.
The new policy therefore switches to an “outside bounds” methodology, evaluating each route based on whether its first trip of the day is scheduled to depart its origin station at or before the expected start hour and whether its last trip of the day arrives at its destination at or after the expected end hour, with the expected start and end hours being revised accordingly (Figure 10). For example, the new expected start of service for the Orange Line is 5:30am, and represents the time at which the Orange Line is expected to start picking up riders, rather than the time it’s expected to arrive downtown, which remains unchanged at 6:00am.

This change in definition has no effect on Span performance scores for most MBTA modes, as seen in Figure 11. For bus service, some route categories have lower scores for Fall 2023 service under the new policy, but all of the differences in score are relatively small, and stem mostly from edge cases where the round-numbered expected start and end times in the new definition do not exactly match some routes’ scheduled start and end times.

Green Line Reliability
Another minor change in the new policy concerns how we calculate On Time Performance for trunk stops on the Green Line (stations in the downtown subway portion of the Green Line which are served by multiple branches). This change is temporary, pending our assembling of the data processes necessary to evaluate Green Line reliability using Excess Trip Time instead.
The previous SDP evaluated reliability at GL trunk stops against a 3-minute standard instead of against scheduled headways, like we do for the rest of Green Line service, because typical scheduled headways at trunk stops can be on the order of 90-100 seconds between trains, and actual headways between 90 seconds and 3 minutes were deemed as still being acceptably reliable service. However, this meant that at times of the day or week when the scheduled times between GL trains at trunk stops are greater than 3 minutes, we evaluated whether each actual headway was less than 3 minutes instead of whether it was less than the scheduled headway.
Under the new policy, instead of evaluating all trunk-stop Green Line headways against a flat 3-minute standard, we evaluate these stops using either the 3-minute standard or the scheduled headway, whichever is greater. This results in the Green Line having slightly higher scores for its Fall 2023 service than it would have under the previous policy, as seen in Figure 2 above.
However, the method we used to calculate On Time Performance on the Green Line using this new standard does not perfectly reproduce how Green Line OTP has been traditionally calculated, owing to the recent retirement of the MBTA’s previous data collection systems for reliability. The preliminary results presented here are therefore approximate, and once the data processes necessary for evaluating the Green Line using Excess Trip Time are in place, SDP annual reports will use that standard instead of this interim revision of the traditional OTP standard.
Conclusion
Overall, the new Service Delivery Policy updates MBTA performance standards to paint a fuller and more accurate picture of how riders and residents experience service. Multi-year investments in improved data collection and processing are bearing fruit, as seen especially in the calculations of reliability, more accurately capturing the experience of riders over the past few years. We in OPMI are excited to start applying the new SDP metrics to the service delivered in 2024 (check back in May for our next Annual Report), and we look forward to expanding and extending the new data and metrics to cover more modes.
The new policy also aligns the definition of good service with future plans for the agency. As the MBTA progresses the Better Bus Program, procures new rail vehicles, prioritizes accessibility for all modes and stops, and plans for high frequency all day regional rail service, the service definitions and standards in the SDP help measure our progress toward a more reliable, more accessible, and more useful transit system.


