## Real-time data analytics and optimization of shared-vehicles networks

## Abstract

Bike-sharing systems have been growing rapidly over the past decade: more than 850 cities benefit today from a bike-share scheme. Recently, car-sharing systems have also appeared, working along the same principles. Such systems can help solve the last-mile problem, and induce a greater use of public transportation. But to be widely adopted, these systems need to be both convenient and reliable, and the ensuing costs need to be contained. This requires a well-designed network of stations, with enough docks and bikes (or cars) for it to run smoothly even during rush hours. Big Data techniques provide a unique insight on mobility patterns, which allows for the optimization of shared-vehicles networks. More precisely, at least three issues can be tackled this way: the fact that stations are too often either full or empty; the failure to detect broken bikes and docks soon enough; the complexity of designing an efficient system.

## Introduction

Bike-sharing systems have been growing rapidly over the past decade, and are now a familiar sight in many countries. In January 2015, more than 850 cities benefited from a bike-share scheme (up from 700 in June 2014), and that figure is still growing rapidly, especially in Asia and the Americas. For instance new systems are planned in Chennai, Lima, Lisbon, or Los Angeles, to name a few. But on a global level, the number of bikes has been growing even more quickly than the number of schemes—by more than 50% each year—, as already existing schemes continue to be expanded. See Chicago, Madrid, or New York, where Citibike is expected to double in size by 2017.

While specifics vary, bike-share schemes often work according to the same principles: bikes can be hired at docking stations spread across the city, and the first 30 minutes are free—at least for members, but membership can be purchased annually, often for a small fee.

Recently, the concept has been transposed to car-sharing systems. Car-sharing solutions have existed for some time now, with companies such as City Car Club, Flexcar, or Zipcar being founded in the early 2000s. However, in the 2010s a new car-share system was introduced in Paris, Autolib, which borrows the same model as a bike-share system, with predefined parking stations across the city working as the docking stations of a bike-share system (as its name suggests, Autolib sprung from Paris’s bike-share system, Vélib). The cost depends on the hire duration, without any free period, but there is a discount for subscribers.

Recently, the concept has been transposed to car-sharing systems. Car-sharing solutions have existed for some time now, with companies such as City Car Club, Flexcar, or Zipcar being founded in the early 2000s. However, in the 2010s a new car-share system was introduced in Paris, Autolib, which borrows the same model as a bike-share system, with predefined parking stations across the city working as the docking stations of a bike-share system (as its name suggests, Autolib sprung from Paris’s bike-share system, Vélib). The cost depends on the hire duration, without any free period, but there is a discount for subscribers.

The challenges encountered by these car-sharing schemes are thus almost the same as those encountered when dealing with bikes. Therefore, many of the solutions we present here are relevant to any shared-vehicle system. In order to keep this article as concise as possible, however, we will mostly focus in the following on bike-share networks.

All these schemes solve the last-mile problem, encouraging people to switch to using more public transportation instead of their own car, and thus reduce traffic congestion and air pollution. However, for this to happen on a noticeable scale, the system needs to be both convenient and reliable, and the ensuing costs need to be contained. This requires a well-designed network of docking or parking stations, with enough spaces and vehicles for it to run smoothly even during rush hours.

We believe that ubiquitous data sources, combined with the recent advancement of Big Data techniques, offer a unique opportunity to design smarter systems that better satisfy the population’s needs at a lower cost. In particular, we are working on the three main problems that occur with any shared-vehicles scheme: stations are often either full or empty; stations’ or vehicles’ failures remain undetected for too long; and the networks are not designed as well as they could be in the first place.

## Stations are either full or empty

As the reader might have already experienced, a frequent problem in any large bike-sharing scheme is that docking stations are always empty when they search for a bike, and always full when they want to return one.

This is clearly visible on the following figure. It shows the filling level of all the docking stations in Paris at around 3pm: a red circle indicates a full station, a blue circle an empty one. The city center, where most people work, is bright red while the residential areas are deep blue.

Figure – Filling level of the stations in Paris in the afternoon: red dots mean empty stations, blue dots mean full stations

This phenomenon shows the main limit to more widespread use of bike-sharing systems: people need bikes at roughly at the same time to do the same trips. A perfect scheme would have to have enough bikes and enough docks to handle the rush hours, but most of this infrastructure would remain unused most of the time.

*A worse problem for larger systems*

A thorough statistical analysis of available data reveals that, as a bike-share system expands, the number of trips grows much more quickly than the number of stations. Compiling the data from 202 cities of various sizes, we have estimated that if there are n stations, the order of magnitude for the number of trips is n1.6. In other words, each time the number of stations is multiplied by 2, the number of trips is roughly multiplied by 3.

The fact that the number of trips does not grow linearly with the system’s size reveals a “network effect” that relates to a rapid increase of the number of possible trips in a wider network. The concept of network effect is often used to describe what happens with social media platforms, such as Facebook or LinkedIn, where the value of a subscription for the consumers increases with the overall number of subscriptions. With bike-share systems, there may not be as strong of a positive feedback loop, but it remains that the value for the users of each station in the system increases with the overall number of stations, because the number of possible destinations from each station increases.

The following figure represents this. It shows the number of trips per day as a function of the number of stations on a log-log scale. Schemes above the average line are thus more “efficient” than the average in terms of trips relatively to their size than schemes under the line.

Figure – Relationship between the number of stations and the number of daily trips for 202 bike-share systems, on a log-log scale

*Current solutions*

For any shared-vehicles system, two approaches are possible to mitigate the problem of stations that are too often either full or empty.

The first solution consists in moving the vehicles between stations. For instance, bikes are moved using special trucks, and cars are moved by the network operator’s own drivers. To establish a bike redistribution strategy, the operators rely on their own expertise, drawn from their experience. This method is costly ($5-$10 to move one bike) and not very effective, since it has a limited impact and contributes to traffic and pollution.

A less expensive solution is to provide the users with real-time information on the network current status, so that they can plan their journeys accordingly. Today, most cities provide mobile applications to their users, allowing them to know the number of bikes and slots available in each station in real time across the network. While this may alleviate their frustration, it leaves the underlying problem unsolved.

Some operators have also introduced a gratification system to induce the users to contribute to the bike redistribution efforts. People are rewarded if they hire or leave a bike in some specific locations, for instance if they bring a bike to a station at the top of a hill. For instance, this is the case in Paris, where Vélib users can gain 15 minutes of free riding by dropping a bike at one of the most elevated stations. In Bordeaux, registered users have their membership extended by one day each time they take a bike in a full station or drop one at an empty station. This is certainly cheaper than trucks, and it may help reduce imbalances, but it can only go so far.

*Big Data solutions*

Any comprehensive solution to this problem should be based on a deep understanding of mobility patterns, so that future needs are predicted and an optimal strategy is designed to satisfy them.

Thanks to the recent advancement of Big Data techniques, this is now possible. We have been able to forecast the system needs for the next 24 hours, using a public dataset covering a 2-year period and comparing various, recent regressors (Ridge, AdaBoost, Support Vector Regression, Random Forest, Gradient Tree Boosting) [Giot–Cherrier 2014].

This approach has a big advantage: the more data sources are used, the more reliable the prediction become. Thus, the system’s current state, weather data, local events, incidents on the public transportation services, the urban graph provided by Open Street Map, etc., all contribute to a better understanding of the dynamic.

*Application 1: nudging the users*

By knowing beforehand how the bikes repartition should evolve in order to better meet the demand, and providing the right information in the right way to the bike-share system’s users, it is possible to nudge the latter to contribute to a better distribution of bikes in the system. A “nudge”, as originally defined by Richard Thaler and Cass Sunstein, is “any aspect of the choice architecture that alters people’s behavior in a predictable way without forbidding any options or significantly changing their economic incentives. To count as a mere nudge, the intervention must be easy and cheap to avoid. Nudges are not mandates. Putting fruit at eye level counts as a nudge. Banning junk food does not.” [Thaler–Sunstein 2009]

In September 2014, in partnership with Keolis Bordeaux, we have launched the first predictive mobile application for bike-share users that is able to forecast the stations’ loads 12 hours in advance, based on real-time contextual data. Using the current state of the network, but also past data, the calendar’s particularities, and weather forecasts, we predict the number of VCub bikes that will be docked at each station with a 94% accuracy [Keo 2015], which are then displayed on the VCub mobile app, La Bonne Station.

Figure – Screenshot of “La Bonne Station” mobile app, with our predictions for the next 12 hours.

While the real-time display of the number of bikes in each station only allows the users to decide whether to take a bike here or there, these predictions enable them to also decide whether to take a bike now or then. Each time the users choose to advance or postpone their trips slightly, the peak demand is spread across a longer interval of time. The issue of stations that are either full or empty is thus mitigated.

Knowing the future load of each station precisely also allows the bike-share operator to fine-tune its gratification scheme, if it has one. It is for instance unnecessary to give a reward to a user bringing a bike to a station that is empty but will soon gain many bikes naturally.

*Application 2: optimizing the bike redistribution*

Big Data techniques also offer the opportunity to optimize the bike redistribution operations performed by maintenance trucks.

The first step is to define a measure of performance for the network, which could for instance reflect the Service-Level Agreement. The most common key performance indicators (KPI) are thus often the number of trips on the one hand, and the availability of bikes and free docks across the network on the other hand. The latter is defined as the proportion of station having at least a given number of bikes and free docks.

This KPI is translated into an error function, for which our machine learning algorithms find an optimal model using past data, such as the network’s state, weather forecasts, calendar data, and our own predictions of the stations’ loads.

We can then use the resulting model to compute an optimal number of bikes, which maximizes the performance measured. The trucks drivers are given this information in real time through a mobile application and can then prioritize which stations to empty or replenish.

To illustrate our point, we have run a simulation in Bordeaux between July 2014 and June 2015, after having trained our models with data from July 2012 to June 2014. The targeted KPI was the “2-availability” of bikes and docks, defined as the proportion of stations having at least two bikes and two docks. Thus, broken bikes or docks do not raise much the indicator and do not give an artificial sense of efficiency, as would be the case with the plain “1-availability”.

Since moving bikes all day long can have a deep, complex impact on the users’ behaviors, in our simulation we raised or lowered the number of bikes in all the stations simultaneously at 6am only. We could thus assume that the number of trips induced by our bike redistribution would have been negligible in the short term, and yet enough in the longer term to ensure that by 6am the following day each station would have returned back to its natural state (that is to the state that was really observed).

The targeted numbers of bikes, which are the ideal numbers to have in each station to ensure a maximal 2-availability, were computed using only previous data from the system as well as contextual information—and not, obviously, the actual number of bikes that would be hired or returned in the following hours. By contextual information, we mean the current state of the system, weather forecasts for the day, planed events, calendar particularities (such has school or bank holidays), etc.

In our simulation, the average 2-availability between 6am and 11pm went up from 76.4% to 81.3%. This means that our method reduced the number of empty-or-full stations by 20.8%. At the same time, it required moving 75,000 bikes over a year. This number corresponds to what is generally required for a system like the one in Bordeaux (around 3% of the number of trips).

The following figure shows the evolution of the availability KPI over the day on average, first as it can be measured from what really happened (in red), and then as it would have been if our bike redistribution had taken place at 6am (in blue).

Figure – Proportion of stations having at least 2 bikes and 2 docks (“2-availability”) during the day in Bordeaux, historical average (red) and with our bike redistribution at 6am (blue)

Since we limited ourselves for our virtual operations to 6am, our figures start much higher, and tend to fall during the day while remaining always above the actual figures. It should be noted that, since real bike redistribution operations are performed all over the day (which is a clear advantage), it is not surprising that the real numbers are going up much of the day, while ours are only getting down—and yet nevertheless our method always outperforms.

**Bikes and docks break down unexpectedly**

Sadly, for the bike-share user the journey does not end when a non-full docking station has been found. Indeed, when only one dock remains, one can fear it is out-of-order, and unable to accept one’s bike. Similarly, when only one bike remains at a station, one can often expect it to be broken in some way (a flat tire, malfunctioning brakes, etc.). This leads to more frustration for the user, and can sometimes affect their safety.

*Current solutions*

In most systems, bikes are rather basic and no sensors are embedded. This means that damage cannot be directly monitored, and maintenance teams have to regularly visit all the stations to look for defective elements. This approach is not very efficient, as the staff spend most of their time on low-value-added duties. Moreover, stations that are less used are visited less often, and a bike with a flat tire or defective brakes there can remain undetected for a longer period.

To mitigate the problem, most bike-share operators have a way to get a feedback from their users, for instance allowing them to signal a defective bike on their mobile app. In Bordeaux, hiring a bike and returning it soon after will prompt the station to ask if there is a problem with it. However, the results are generally disappointing, as the actual number of problems signaled by the users is rather small. Moreover, docks malfunctions are not reported.

*Big-data solution*

By analyzing different streams of data collected from various sources (number of bikes, stations properties, bikes trajectories when available, etc.), it is possible to detect anomalous behaviors and prioritize them, so that maintenance teams can focus on the most urgent repairs.

*Detecting broken docks:* Observing the number of bikes in each station over a long period can give us an idea about which docks are dysfunctional. A station that had had several upper plateaux over a period of days without ever reaching its maximal capacity is a strong indicator of a malfunctioning dock.

*Detecting “severely damaged” bikes:* A severely damaged bike is one that shows clear signs of malfunction, such as a completely deflated tire, a broken seat, or other obvious features. As for broken docks, such a bike will most likely never be hired, and thus the station will remain non-empty for a suspiciously long period of time.

In both cases, since a bike-share network station has in many respects the same dynamics as a diffusive process, to detect a broken equipment (dock or bike) one can compare the time that has elapsed since the station last went from full to empty or empty to full (or “crossing time”), to the station’s diffusion time, and thus determine if there is an anomaly.

Figure – Evolution of the ratio between the diffusion time and the characteristic times for a station in Bordeaux. The peaks correspond to when the station had a broken slot (top, blue) or a broken bike (bottom, red).

When the empty-to-full crossing time is much larger than the diffusion time, that is if it takes too long for a station to be full again, one can suspect there is a defective dock. Similarly, when the full-to-empty crossing time is much larger, that is if it takes too long to be empty again, one can suspect that there is a defective bike.

The difficulty lies in estimating the diffusion time, as it depends on the number of bikes hired or dropped at the station. For instance in the winter, when a bike-share system is often less used, the diffusion time should be longer than in the summer.

Looking at the ratio between the diffusion time and the crossing times, one can quickly spot anomalies (see previous figure).

Identifying less obvious bike malfunction is however much more complex, as we need to follow each bike to spot any anomalous behavior and integrate many different aspects: staying for too long at a docking station; being used for shorter times than the average; or being used for significantly slower trips.

**Designing new schemes**

As of today, more than 200 bike-share schemes are planned or in construction around the world, and many more still are to be developed. Before a new scheme is set up, or an existing one is expanded, the new stations’ locations and sizes must be decided. Usually, multiple factors are taken into account, and the system is designed to maximize some parameter (e.g. the utility for the community) and/or minimizing some other (e.g. the cost).

*Current solutions*

One of the most common approaches consists of relying on urban-planning agencies. They conduct a study based on demographic factors (number of residents, number of shops, etc.) to infer the number of trips that will be made in each neighborhood of the city. Using this first result, they then estimate each station’s ideal size. This is the methodology that was used for instance in 2007 for the Vélib system in Paris by the Atelier Parisien d’Urbanisme, but as we have seen the result is far from optimal.

Installed in 2013, the New York bike-share program used a different approach. The population was surveyed to decide the stations’ locations, which were then matched with demographic variables to infer how much they would be used and determine their sizes accordingly. Even if it was a step forward toward a more optimal design, bike redistribution is still an issue.

*Data-analytics viewpoint*

Even though a planned bike-share system, by nature, does not have any data on itself yet, a data-analytics viewpoint is nevertheless possible. It suffices to leverage the experience of already existing systems (meaning their data) to design it.

The first step is therefore to be able to describe a network and its surroundings using a number of key variables, and find relationships between those variables that are as universal as possible—assuming enough data has been collected from enough cities around the world to provide a clear picture.

In the following sections, we will give two examples, in which this approach has first been applied to the computation of ideal number of bikes to have in the system to maximize availability, and then the recourse to seemingly universal mobility patterns.

*First example: optimal number of bikes to maximize availability*

The number of bikes to deploy is a key issue of any bike-share system. If there are too few bikes, users will often be unable to find one when they need one, and thus many trips will just not happen… Conversely if there are too many bikes, then too many stations will be full too often, and the users will soon take notice and be discouraged.

As for the optimization of the bikes redistribution, the first step is to define a Key Performance Indicator reflecting the Service-Level Agreement. In the following, we will work with the “2-availability”, which we previously defined as the proportion of stations having at least two bikes and two docks. This is to minimize the influence of broken bikes and docks, which can artificially raise the 1-availability.

We have compared the 2-availability of the Bordeaux bike-share system, or VCub, to the number of bikes it had between May 2012 and May 2014. This period was chosen to rule out any influence of the number of stations, which was basically constant the whole time at around 140.

More precisely, to get a clearer picture, we have computed the average 2-availability given different values for the number of bikes, and we also give the average 2-availability for bikes and for docks separately.

Figure – Top, average 2-availability of the docks (yellow), bikes (blue), and the system as a whole (red) for different value segments of the number of bikes. Bottom, number of days of observation in each of those value segments.

The previous figures present our findings. As one can see, the system as a whole was better off with around 1350 bikes, which is 100 more than the usual number of bikes during this period. It seems as if the network operator decided to give more weight to the availability of docks than that of bikes, perhaps judging that, for its users, not being able to drop one’s bike was much worse than not being able to hire one.

*Another example: using mobility patterns to design a network expansion or a whole new network*

By looking at the data produced by a bike-share system, we are able to identify all of the main mobility patterns: people commuting to work every day, or shopping on the weekend, or going out on the evening, etc.

For instance, the following figures represent the contribution of “working” to the filling rate of a station for Bordeaux (left) and New-York (right): during the night, a workplace station is empty; it fills up in the morning, and empties in the evening. In France, people like to take a long lunch break, and may move to eat, going back home or just riding a few minutes to find their restaurant préféré—this explains the glitch at around 12pm, which can be found in many French cities.

Figure – Influence of “going out” in Paris (left) and New York (right): stations near restaurants or pubs or movie theaters fills up in the evening and empties in the night

It should be noted that these behavioral patterns have been obtained without any a priori, just from the numbers of bikes recorded in each station over a long period, using unsupervised learning techniques. Their classification, as corresponding to “working” or “going out”, is an a posteriori interpretation.

The profiles change slightly with the culture, but keep often the same shape within a same country. They can thus be considered as somewhat universal. When designing a new bike-share network, it is therefore possible to use the profile of a nearby city that already has one to predict the dynamic of the future system. When merely expanding an already existing network, the city’s own profiles can be applied.

The next step is to identify which neighborhoods can enter in which category, that is where people go to work and where they go out. More precisely, one should determine for each neighborhood the contribution of each of these behaviors, for instance using demographic data, the urban graph provided by Open Street Map—or the dominant human activities in nearby stations when planning an extension (see next figures).

With this approach, it is not the predicted level of activity of a station that will determine the number of docks that a station should get, but rather the size it should have to accommodate the pendular nature of the trips made by the users at this particular location.

Figure – Unsupervised analysis of the stations in Paris and New York: red stations are close to working areas, blue stations are close to housing areas

**Acknowledgements**

Our R&D project has been funded by two grants from the French government (Concours Mondial d’Innovation 2014 and I-Lab 2014).

**References**

1. Borgnat P., P. Abry, P. Flandrin, C. Robardet, J.-B. Rouquier, and E. Fleury (2011). Shared bicycles in a city: A signal processing and data analysis perspective, Advances in Complex Systems, vol. 14, no. 03, pp. 415–438.

2. Giot R., R. Cherrier (2014). Predicting Bikeshare System Usage Up to One Day Ahead. IEEE Symposium Series in Computational Intelligence 2014 (SSCI 2014). Workshop on Computational, Intelligence in Vehicles and Transportation Systems (CIVTS 2014), pp.1-8.

3. Shaheen S. A., E. W. Martin, N. D. Chan, A. P. Cohen, M. Pogodzinski (2014). Public Bikesharing in North America during a Period of Rapid Expansion: Understanding Business Models, Industry Trends and User Impacts. Mineta Transportation Institute.

4. Thaler R. H., C. R. Sunstein (2009). Nudge: Improving Decisions about Health, Wealth, and Happiness. Penguin Books.

5. The Bike-sharing Blog. http://bike-sharing.blogspot.com/

6. Vcub Predict, un temps d’avance. Magazine Keo. Keolis. May 2015.

NYC Bike Share: Designed by New Yorkers. New York City DOT.

http://www.nyc.gov/html/dot/downloads/pdf/bike-share-outreach-report.pdf