Image sourced by unsplash.com
As cities contend with the rise in demand for private transport and the subsequent spike in congestion, data is considered a tool to address intractable challenges and simultaneously advance public policy goals. Municipalities and transit agencies, in particular, are starting to recognize the benefits of using mobility data to shape dynamic and responsive transport management.
In Los Angeles, data from private transport companies helps policymakers assess whether mobility services are delivered equitably and made accessible to communities. Developers in London rely on data feeds to create user-centric applications that improve the discoverability of transport choices for commuters[1]. In other parts of the world, data is harnessed from private and public transit to direct investment toward required infrastructure, effectively implement regulations and encourage passengers to opt for more sustainable transport options. To facilitate these possibilities, however, it is necessary to consider how data can be appropriately collected, securely stored, managed and shared.
While the opportunities that data-sharing presents are immense, there are several barriers that impede this process. First, a great deal of this data is held in silos by private mobility companies. To unlock this, governments either mandate or incentivize data-sharing. In both cases, this has largely been met with reluctance by companies, who argue it may threaten user privacy and compromise their competitive position due to the release of proprietary data. Second, without technical standards, mobility data is messy, lacks interoperability and comes with reconciliation cost[2]. If data is shared by other stakeholders in unusable formats, cities find it challenging to leverage it for transport management. While progress has been made on standards, questions remain on how data sharing can be balanced with privacy.
Demographics, location data and movement patterns are collected from citizens who use private mobility services, but often have little to no visibility on why data is collected, how it is used and with whom it is shared. This data may contain personal identifiers or could be sensitive, like location data, which reveals intimate patterns of how we live.
These privacy concerns also emerge due to the over-collection and subsequent sharing of data with third parties. Moreover, as is the case with other industries, data sharing between Mobility Service Providers (MSPs) and agencies is often governed by contractual data sharing agreements. While these agreements may be comprehensive, they are exclusive between parties and as a result offer individuals little by way of transparency or accountability.
Typically, in conjunction with smart city initiatives, several governments have adopted open data platforms and publish datasets or live feeds of mobility data. This is seen as an increasingly attractive data sharing model which, though meriting encouragement, is limited in scope. Yu and Robinson (2012)[3] argue that open government and open data cannot be conflated and that often the success of open data initiatives is dependent on the consistency of public policies that may shift according to the whims of administrations[4]. Also, as these are government-led, there are no guarantees of the quality or accuracy of the data shared.
While some of these open data sharing initiatives are designed to be collaborative and engage stakeholders and data-users throughout the process, other models do not take into account the usability of the data released. This is further complicated by the heterogeneity of mobility data, which requires technical expertise to reconcile and make available in multiple forms[5]. Public departments often do not possess the required capabilities or resources to carry out these processes.
Where aggregate data is made accessible through Application Program Interfaces (APIs) as in the case of mobility data specification, privacy concerns arise on how these datasets can be secured. These aggregated datasets, though anonymized, are vulnerable to re-identification due to their level of granularity[6].
Aside from this, the key limitation is that open data initiatives may exclude useful data that is considered Personally Identifiable Information (PII), Sensitive Personal Information (SPI) and proprietary information held by the private sector. Open data platforms and initiatives are most valuable when they comprise data from the government, private sector, research/academia and communities. However, they possess limited structures to incentivize or govern these relations.
The broader challenge in sharing mobility data is the lack of trust between stakeholders. Without collaboration with these players and access to data, cities will be unable to make policy decisions that reflect the needs of their stakeholders. Data sharing mandates imposed by cities on companies are considered burdensome and further disincentivize cooperation.
Frameworks for data sharing should be citizen-centric, enabling them to have greater control over data collection and sharing processes.
To address these barriers and challenges, data stewardship could present an alternative and effective solution.
The stewardship of mobility data can drive innovation and build an interconnected mobility ecosystem, by working with stakeholders to balance public requirements, business interests and preserve individual privacy rights.
To encourage data-driven policymaking, stewards can also work closely with the public sector to build data sharing standards, policies and protocol.
STEWARDSHIP IN MOBILITY
Stewardship is a mechanism for sharing data that fosters trust among stakeholders, enhances transparency and enables greater control over data. A steward acts as a trusted, neutral intermediary who engages and negotiates with stakeholders to represent their best interests while preserving the privacy of individuals.
As a third party, this entity provides oversight and accountability in data use, storage and sharing. A steward may also provide technical capabilities for processing and securing proprietary or PII data. The roles a steward can play, in facilitating the stock and flow of data, are dependent on the model employed. Personal data stores, exchanges, collaboratives and trusts represent a few of the frameworks that can be considered.
Personal data stores are data subject-centric, where individuals can exercise maximum control over their data based on the notion of individual data sovereignty. As part of a project to build open data commons called the Decentralised Citizen Owned Data Ecosystem (DECODE), Barcelona and Amsterdam are piloting technical solutions that enable individual ownership of data through distributed ledger technology[7].
Through this model, individuals have the option of sharing data with the larger open data commons through ‘smart contracts’, a series of rules that help users in managing sharing preferences and authorization[8]. Existing solutions enable residents to manage and share data derived from environmental sensors with their own communities, which could later be used to guide municipal policies on congestion and pollution. High levels of civic engagement and the ability to access personal data are two strengths of this model that can present opportunities in stewarding mobility data, in which vast amounts of personal data are collected and mined by private players. Involving a steward could enable individual access to data that in turn can be shared with third parties or governing agencies for broader social benefit or to achieve policy goals.
Data trusts carry out both stock and flow functions, whereby data is collected from various entities and shared on the basis of a prescribed legal framework and clear, purposeful conditions. The model is unique as it embeds a ‘fiduciary responsibility’ in the steward to protect the rights of data subjects or beneficiaries which, if underpinned by a legal precedent, may also enable this entity to take legal action in case of a violation or breach.
In Canada, Orion and Compute Ontario propose the creation of a mobility data trust for these reasons — trusts possess the necessary structural requirements to share data based on clearly articulated purposes while outlining specific rights, responsibilities and access controls for transit operators, public agencies, private organizations, startups, academia and civil society[9]. In this model of stewardship, data sharing is incentivized through value exchanges and enforced through checks and balances.
Regardless of the model employed, data stewardship can foster public-private partnerships that can drive innovation and shape the future of mobility.
For cities that require MSPs to directly share data, it often becomes subject to freedom-of-information access requests. When released, this data can compromise competitive interests and have broader business implications.
In contrast, sharing data with a steward grants companies assurances that the data will either be processed to remove company identifiers or secured through technical measures that will protect it against harm or malicious intent. As a result, MSPs may be more confident in sharing data with a trusted neutral intermediary, which may encourage collaboration and a reciprocal exchange of data.
This framework is also conducive to various other-use cases in mobility data sharing. It can provide support to cities in driving multi-modal transport integration which can reduce congestion levels. Stewards can help cities aggregate data, provide data visualization tools and pull out relevant mobility insights. Experts from the private sector also note that through the processing of public and private data, commuters can be provided with point to point connectivity options. Where there is high demand for public transportation, private mobility companies can be directed to provide hop-offs to transit stops or supplement transport to areas that do not receive public transit services.
Stewards can also consult with citizens to create data governance structures that account for their needs and communicate usage policies in a clear and accessible way. For cities, that often have limited technical capabilities and bandwidth, this entity could also bring in expertise to securely store data.
To strengthen the conceptualization of a steward and provide guidance in applying this framework, there are two fundamental principles to consider:
- Stewards must maintain and foster an ecosystem of collaboration and facilitate partnerships between key stakeholders
- Stewards must ensure processes are consultative and enable participation from design to implementation
In the mobility sector, key stakeholders include, but are not limited to, transport departments, transit agencies, municipalities, mobility service providers, third parties, institutions, funding agencies, community representative groups, non-profits, think-tanks and individuals. Incentives to share data may differ across stakeholder groups, and as a result the benefits and value of sharing data must be communicated to all. Moreover, stakeholders’ interests should be acknowledged and represented to build more robust, reflective policies of data sharing that cater to their interests. The first step of designing a good steward necessitates that governmental, non-government and private sector, and community organizations be brought together. Collaboration and consultation can be operationalized in a few different ways.
In Seattle, the Transport Data Collaborative (TDC) governed by the University of Washington (UW) manages, distributes and secures data shared by mobility providers. Framed as a collaborative, stakeholders involved are considered members or ‘project partners’ who share roles and responsibilities based on a trust framework. As the steward, UW is able to incentivize these private players to share data by creating a digital layer that prevents data from being accessed by competitors. Moreover, the TDC management structure comprises an Executive Oversight Committee chaired by key stakeholders including: USDOT (United States Department of Transportation), the University of Washington, Microsoft and the city of Seattle. They are involved to provide consultation, direction and project oversight[10].
Similarly, before building smart city data governance policies, the government of Ontario sanctioned Orion and Compute Ontario a research grant to study smart cities and various governance models. As an outcome, the government was recommended to form a personal mobility data trust as a not-for-profit corporation. Orion and Compute Ontario arrived at these learnings by hosting a workshop with 125 stakeholders from the public sector, industry, and academia and by carrying out user interviews that assessed incentives or barriers to sharing data.
Active involvement of stakeholders continued through the designing process, through an interactive board game tool and a prototyping workshop. These workshops evolved into diversely represented advisory committees that were frequently consulted. Orion and Compute Ontario’s objective was to actively involve the committee in consultations to ‘facilitate more equitable access to the data market’.[11]
In an alternative example, Sidewalk Labs was commissioned by an authority to develop a waterfront in Toronto. While their proposals were met with heavy resistance by several stakeholders, Sidewalk held several virtual and in-person town hall sessions with the aim of empowering stakeholders to participate and encouraged them to ask questions, provide solutions and air concerns[12]. These examples highlight the necessity of involving key stakeholders in the design of stewardship frameworks to ensure that access, sharing and accountability measures reflect the requirements of these actors.
Data stewardship models are seen as advantageous for their ability to ensure that data sharing processes are transparent, privacy is safeguarded and individual control over data is enabled. This can be implemented effectively if principles of independent governance, transparency and purpose limitation are adhered to.
To begin with, stewards must be able to exercise independent governance and act as impartial and neutral actors with no conflicting interests or desire to commodify data. These are defining qualities of stewards that can be put in action by instating a trustee board that provides oversight and checks for compliance of the steward. Transport for London’s governing board also ensures that board members are subject to frequent changes, and requires members to declare interest, terms, and conditions of appointment, remuneration and any gifts received on a public platform[13].
For increased transparency, information about key decisions or actions taken by the steward that may have implications for the sharing of stakeholder data must be made available to concerned parties . Sidewalk envisaged implementing this by making privacy assessments carried out by their proposed data trust publicly accessible. These assessments were required to be conducted by any company or entity that wished to collect urban data or PII data [14].
Transport for London (TfL) enhances transparency through its rights-centric policy framework that grants users several rights such as being informed of reasons for data collection, rectifying false information, restricting data processing, and erasure, among others. Additionally, under the European Union’s General Data Protection Regulation (GDPR), TfL is required to provide users access data collected about them after submitting a request to the Data Protection Officer (DPO). DPOs have similar functions to a steward and serve as contact points for individuals, enabling them to file complaints and seek redressal [15].
Lastly, purpose limitation requires that the steward restrict data processing for anything outside the scope of consent that was negotiated with beneficiaries. Consultations to secure consent and build data use policies and form data sharing agreements are possible routes to achieve this.
Stewards must also possess relevant technical capacity. The purpose of this is two-fold: first, the steward or an associated entity it oversees must carry out necessary data cleaning, pre-processing and processing of data. This ensures the data’s quality, integrity and interoperability. Second, stewards must ensure that data is protected both in storage and in transmission. Protection is defined in the context of the overarching responsibility the steward has toward the data user, which entails protecting the subject from any harm. To operationalize the de-personalization of personal data and secure datasets, a range of technical measures can be employed.
This may include innovative computational techniques that remove identifiers or prevent re-identification like data masking, synthetic data generation or recombinant sequencing. Other measures may include encryption of data at the source through one-time hashing techniques. TfL employs this method, which corresponds to the principle of anonymization at source. Data is de-personalized at the source through this process, which masks personal identifiers with another unique identifier[16].
While the measures employed depend on the data type that is collected, what should be broadly considered from a technical standpoint are principles of data minimization and access control. Data minimization entails advocating minimal data collection based on explicitly identified purposes, limited retention policies, and deletion policies. This prevents the over-collection of data that is burdensome for Mobility Service Providers (MSPs) and raises privacy issues for data subjects.
For instance, the LA Department of Transport requires MSPs to share both individual and aggregate level data[17]. Even when de-personalized, this trip data can be de-anonymized with relative ease. To get around this challenge and ensure the privacy of users is not compromised, regulators can require that data captured at the start and end of trips is mapped to public locations like bus stops or landmarks. If data minimization policies are put into practice, this would mean that individual-trip data need not be collected at all.
Access control can be technically implemented through credential-based logins, architecture controls and security features. The TDC model relies on a three-tiered technical architecture, which enables the storage of raw, semi-structured, and unstructured data. Through the cloud provider’s access control features, mobility service providers may access any data they have sent UW.
Stewardship has the potential to enable data sharing in mobility that will grant more control over data, facilitate partnerships between key stakeholders, and build in transparency and accountability mechanisms.
For cities and urban planners, mobility data opens up opportunities to invest in required infrastructure, effectively implement regulations and encourage passengers to opt for more sustainable transport options. This is largely dependent on gaining access to data held by the private sector. Data sharing in mobility is complicated by existing challenges of trust, privacy, and transparency that prevent effective sharing.
The framework of stewardship can address these challenges and unlock data by incentivizing both public and private stakeholders to share data in a way that is accessible, usable and interoperable.
With trusted, neutral intermediaries’ stewards ensuring data is protected against harm, stakeholder interests are aligned and individual privacy rights are upheld. One of the primary roles of stewards is to build and enforce governance and accountability mechanisms. In the context of mobility, this model is relevant and can encourage collaboration and inculcate a sense of trust among stakeholders.
Depending on the design, a steward may also provide technical capabilities in processing and securing proprietary or PII data. This is crucial as mobility data is heterogeneous and public agencies often lack technical capacities and bandwidth to store, share and secure data. Crucially, this framework is beneficial, and extends beyond open data sharing in that it can steward both non-PII and PII data based on requirements, rights and responsibilities imagined by communities.
In mobility, where data collected from people is sensitive, stewards can push to ensure minimal data is collected and relevant technical measures are put in place to prevent it from being accessed by malicious actors.
While models of data stewardship may take different forms, the design and implementation of this model in mobility must adhere to the principles of collaboration and consultation among stakeholders. This ensures that trust is built which can help the steward in navigating conflicts that arise in data access and ownership. In an effort to map the contours of a good steward, governance, accountability and technical measures must be put in place to ensure the steward exercises independent governance, enables transparency and maintains purpose limitations. Potential examples of how this could be tangibly implemented have been drawn from real-world data sharing models operational in Europe, the U.K. and the U.S.
Stewardship opens up critical pathways to access data, but simultaneously harbours the structures that empower the entity to govern data in a way that suits all stakeholders: citizens, mobility service providers, and cities.
This article is part of a study carried out by The Data Economy Lab, a space that explores systems, processes and models that can enable safe and secure data sharing to innovate such that individual rights, agency or security are safeguarded.