g0: Paradigms for community participation in data stewardship

Diagnosis of challenges – What inhibits participation of communities in the stewardship ecosystem?

Our research highlights that while participation seems to be an area of interest, many data stewards are still reconciling how this can be factored in and prioritised – particularly during the early stages of business development and at the point of scaling these initiatives. This is corroborated in part by the emergence of efforts across the ecosystem, like Mozilla Foundation’s Data Futures Lab which seeks to support builders and ‘supportive entities’ capacities in the ecosystem to move beyond extractive, status quo data governance models and instead prototype more just, inclusive and participatory efforts. Despite this progress, our interviews with experts and practitioners suggest substantial ground is yet to be covered – and in the absence of blueprints around participation, this remains an area that demands greater attention.

This foundational play grapples with the concept of participation itself in the context of data stewardship, and considers the barriers that data stewards/intermediaries and supportive ecosystem enabler organisations face when looking to define, design and pilot more participatory mechanisms and subsequently, outline emerging findings and recommendations. These insights build on an existing domain of literature by like-minded organisations in this ecosystem and aim to kickstart greater dialogue and action around making participation more tangible. Given the dynamic nature of participation and its centrality to data stewardship, we consider this Play as foundational, one that sits above the other Plays which look specifically at how participatory data stewardship (and data governance more broadly) can be fostered.

Quicklinks

Challenge g0.1

Participation is structured as a ‘one-off’ engagement and does not persist throughout the lifecycle of data usage & governance

Most contemporary efforts for data collection and use focus on the participation of individuals and communities at the early stages of the data lifecycle – when the project is being mapped out and when data is being collected.

Avenues for participation even within citizen-science-oriented efforts assume that participation is sufficient at the stage of seeking consent. However, emerging research suggests that traditional paper or oral based consent mechanisms fail to be inclusive of diverse capabilities and awareness of data values and related data rights and protections.

This challenge is likely due to a few variables that may include the capacity/resources of a steward, limited individual/ community collectivization around the value of data, insufficient regulatory frameworks for participatory data governance and protections. It may also relate to the limited understanding of data and related perception around the value and possibilities of participation for individuals and communities.

These limit opportunities for citizens to be in control of how data may be used and distributed, and subsequently offer little option to be a part of the broader data value chain. Few engagements account for participation of communities throughout the lifecycle of data – from collection to processing through to storage, sharing with third parties and the different purposes for which data is used. The consequent marginalisation of communities could be attributed to a variety of factors ranging from resource constraints of stewards (See Play III for more on this challenge) who may not have the financial or physical means to enable granular consent provisioning for the use of their data.

Elsewhere, the problem of lack of engagement could also be attributed to the absence of meaningful blueprints to inform and involve communities in data decisions. The emergence of consent dashboards and comic-based consent narratives are emerging pathways to address this long standing problem but are few and far between to constitute a robust blueprint for enabling participatory data governance. The above limitations are only bolstered by the prevailing regulatory landscape around data and information markets which is overwhelmingly concerned with privacy protection as the overarching goal of regulatory action.

As a result, individual and community participation in data sharing and governance is limited to consent provisioning through cookie notices and prolix of terms of use as displayed across a variety of digital platforms. Not only are cookie notices inaccessible in themselves, but they also lead to ‘consent fatigue’ among users, actively undermining their ability to make informed decisions about their data. For instance, a study about cookie consent mechanisms examining 80,000 unique users on German websites has demonstrated that they are manipulative inasmuch they do not offer a reject button on the notice.

Furthermore, users are “nudged” towards assenting to such notices through deceptive design and where privacy-preserving options exist, they lay buried or inaccessible within cookie banners. Given the manifest limitation of the ‘notice and choice’ approach furnished by cookie notices, there is a need to move away from this “privacy model” of data regulation in favour of an “accountability model” for data governance. The “accountability model” places dual emphasis on one’s ability to control the downstream use of data and hold data users liable for their actions. As a result, meaningful consent and responsible data use become the guiding imperatives for regulation of data.

Strategy g0.1.1

Identify incentives of data generators (individuals and communities) through consultation to better identify approaches to embed participation

In order to enhance the possibilities for participation, it is necessary to first be aware of the varying value, incentives, bandwidth, interest and capabilities of individual communities in participating in data collection and governance- related decision-making. Worth noting is that there is likely to be significant variation in incentives at an individual vs community level, this may present challenges where individual values may at times be at odds with collective community goals. Assessing these factors will be necessary in better understanding how stewarding entities can structure participation and for those supporting these efforts, what must be done to facilitate greater engagement.

Tools such as the Data Maturity Assessment help a variety of organisations take stock of their data goals by examining three axes: purpose, practice and people – to produce a data lifecycle evaluation. Purpose relates to an organisation’s strategy, analysis and applications of data while practice encompasses the infrastructure, quality, ethics and security protocols in place to handle data. The ‘people’ axis looks at the decision-makers within the organisation and their approach to data use. These assessments, however, are likely to have the greatest utility for established organisations – and carrying out these assessments also requires significant commitment and organisational buy-in – which may be viewed as burdensome for newer ‘stewards’ or data intermediary entities.

Similarly, other tools that may be used to assess data priorities include the Stewardship Navigator – this guides potential data stewards (or interested parties) and the communities they aim to serve through essential considerations about structure, sector and internal data governance standards of an entity, providing pathways towards responsible data use and sharing for public benefit.

The results of the above assessments function as precursors to crucial conversations with a community of data producers who can determine their subjective interests and incentives for participation in data governance. On the one hand, certain communities may (choose) not to be involved throughout the lifecycle of data. To this end, mechanisms of delegated representation may prove to be useful to ensure trusted intermediation and sharing of their data. A valuable example of such a mechanism for delegated representation can be found in the MindKind Study, supported by the Wellcome Trust. The goal of the study is to establish a Global Mental Health Databank – a kind of data collaborative – through volunteer contribution of mental health data by youth in participating institutions across India, South Africa, United Kingdom and United States. Participants in the study – in effect, the research community – are involved in co-designing the questions that the study should aim to address as well as design a mobile platform to collect mental health data from other youth. Therefore, the involvement of the community in the MindKind Study extends so far as project design, data collection and recruitment of youth data contributors to the platform, while the data bank and the purposes for which it can be used are not in the scope of the community’s control. The role of the community falls within the ‘inform’ and ‘consult’ components of the Arnstein spectrum of participation.

Elsewhere, communities may wish to be involved at every stage of the data lifecycle – right from its collection to processing and sharing with third parties for certain pre-defined purposes with certain approved and vetted data users. Such participation is also anchored in perceptions of value generated through data use and sharing, and whether community members can exercise meaningful control over their data – falling in the ‘empower’ category of Arnstein’s spectrum.

For instance, a group of patients with multiple sclerosis might be more invested in sharing their data for medical research on the disease as opposed to monetizing their data for drug development by pharmaceutical companies. The patient community here chooses to prioritise creation of public value through knowledge generation and research on multiple sclerosis, as opposed to monetizing data for drug development that generates proprietary value for a narrow subset of pharmaceutical companies. MiDATA, a health data cooperative founded in Zurich, Switzerland, attempts to precisely create this sort of public value through its platform by allowing users to share their data for specific medical research projects, among which multiple sclerosis features prominently. The cooperative structure herein allows its members to contribute data for causes and projects that resonate with community values by sharing data through democratic voting mechanisms within a general assembly of cooperative members.

Strategy g0.1.2

Remodel the prevailing regulatory landscape for data governance to embed mechanisms for community participation throughout the lifecycle of data usage

The prevailing landscape for regulation of information markets is predominantly preoccupied with consent and data protection as means to authorise data sharing in the context of personal data. The problem with such an approach is that it violates the contextual integrity of privacy in ways that hinder individuals and communities from engaging effectively with downstream uses of their data, to direct its use by actors and for purposes that ultimately benefit communities that produce this data. The European Union’s General Data Protection Regulation, 2016, India’s Digital Personal Data Protection Bill, 2022 and Ghana’s Data Protection Act 2012 are some of the data protection legislations that follow this “privacy model” of regulation. Such regulations suffer due to their inability to comprehend the ‘social value’ of data, failing to account for the many positive externalities (ex: research through combining different datasets) and negative externalities (ex: potential privacy loss through data de-anonymization) that are inherent to data-driven innovation. Communities neither benefit from the positive externalities nor have any avenues to mitigate risks produced by negative externalities due to their fundamental lack of autonomy over data decisions. The paradigm of creation, collection and use is even worse in the context of non-personal data, where there is yet no significant legislation that recognises the interests of communities over non-personal data. Aside from fledgling efforts in Europe and India there are no significant policy endeavours worldwide to govern non-personal data sharing. Data such as energy use information, crop yields, air quality data are created through shared endeavour and have important insights about communities, and have immense public value. Communities however do not derive any benefit from such data.

Therefore, it is incumbent upon public institutions to recognize the invisibilisation of communities and remedy the same through the introduction of participatory mechanisms for data governance such as data stewardship. Precursors of such an impulse to further participation of communities find expression in the EU’s Data Governance Act which recognises data cooperatives. However, the language of the Act frames cooperatives in a very restrictive way, with significant debate over whether individuals can even delegate their rights under the GDPR to cooperatives. Elsewhere public authorities are contemplating standard-setting for data stewardship, opening doors for institutionalisation of participatory mechanisms for data governance within policymaking. Canada’s CIO Strategy Council has proposed operational models for data stewardship like data trusts, data collaboratives and data cooperatives as a part of its National Standards for Responsible Data Sharing.

Challenge g0.2

Data gathering and sharing efforts risk being one-sided and exploitative in the absence of clear incentives that would deliver broad-based public benefit for communities that share data

Despite the many positive externalities afforded by data, emerging technologies such as machine learning have led to new approaches to the collection, use and sharing of data that are often extractive, inequitable and disenfranchise communities from participating in the governance of their own data. For instance, users signing up on digital health platforms have little control over how their data is used, just as gig workers are excluded from the audit of algorithms that govern their working conditions.

This practice is also prevalent in mainstream scientific and academic data gathering exercises which often maintain and reproduce the ‘researcher-subject’ dynamic, where the researcher extracts data and knowledge often with little in return to the ‘subject’ of enquiry – often treated as passive agents in these one-off, transactional engagements.

Persistent concerns around the misuse of data, combined with a lack of bottom-up engagement with communities that are affected by data use have systematically eroded public trust in the process of data sharing. This trust deficit manifests itself as ‘data hoarding’ and ‘data fearing’ scenarios – two inter-related phenomena where communities and organisations either restrict access to data and prevent it from being leveraged for public benefit or communities withdrawing consent for use of data due to fear of privacy loss that altogether prevent data from being collected inthe first place.

This is further exacerbated as communities are not able to visualise the benefits (tangible or otherwise) due to the complex routes of data usability. For marginalized communities (indigenous peoples, gender or sexual minorities, refugees, disenfranchised groups, etc.) that have historically been subject to these extractive relationships between the state or academic communities, the consequences of this dynamic impacts more than just the willingness to engage, participate or trust in data collection or sharing exercises. More insidiously, many of these dominant frameworks of knowledge, data gathering, and usage can often subvert, limit or contribute to the erasure of indigenous ways of knowing and traditional forms of knowledge and practices.

Strategy g0.2.1

Data gathering and sharing efforts risk being one-sided and exploitative in the absence of clear incentives that would deliver broad-based public benefit for communities that share data

Many organisations that actively hold data assets now presume a greater responsibility or custodianship over the data of their beneficiaries (either end user organisations like communitybased organisations, individuals or communities) and are keen to understand how their rights can similarly be protected.

For instance, trade or credit unions are increasingly concerned about the harm poorly stewarded data may inflict on their members and in parallel are alive to the possibilities data presents in furthering collective negotiation efforts. Building the capacities of these entities to enhance their role as data stewards can be a meaningful way of facilitating greater participation and decisionmaking around data.

Prospect’s Lighthouse ‘purpose-made digital governance maturity test for trade unions’ offers a useful case-study of how this guidance can be designed to be domain or organisation specific. However, some of these stewards or steward-like entities may not have an active tether to the community. This may be intentional on the part of some organisations who choose to focus their resources and vision on building out the technology or tools to enable self governance and collective decisionmaking.

For organisations with a similar vision to Pesca Data, there is a need to build the capacities of these end-user organisations to leverage their tool efficiently and self-sufficiently. In this process, they invest significant resources to support the onboarding process as well.

However, the relationship between these end user organisations can also be a feedback loop of sorts – where the needs and priorities of their beneficiaries can be better surfaced. In these cases, stewards would do well to first identify organisations, associations or collectives that work closely with communities – ideally those that have pre-existing relationships of trust. From there, a steward’s role more specifically would be to understand how these organisations deliver value to their communities – for example, many of these organisations provide legal advocacy services or are an intermediary organisation that creates offline architectures to facilitate greater access to state services and associated rights.

Many of these efforts can be better aligned with or strengthened through participatory and secure data collection and governance processes. For instance, through an organisation called Rainforest Connection, GIS and bioacoustics data has been leveraged by indigenous communities to alert instances of poaching and can be used as a tool to highlight encroachment of land by illegitimate entities in legal cases. In this instance, Rainforest Connection provides the technical resources and guidance around the collection of data – yet the end goal of its usage is determined by the communities who can choose how and where (or for which use-case) that data can be put to its greatest value.

The value and associated “use-cases” around data act as both a precursor and incentive for communities to participate or see tangible benefit(s) in engaging with a steward or steward-like entity.

A steward attempting to define and deliver on the “value” of data in isolation, without the consultation of communities and end-users is often a complex task. Mainstream discourse valorizes the monetization of data in the current data economy, even if that’s not always what individuals or communities actually desire. This sole focus on monetizing data as the only expression of its value can also create perverse incentives for communities to forgo their data along with the rights it bears as well as stewards who may privilege monetary returns over community imperatives.

Monetization of data is likely to deepen existing inequalities, and make privacy a privilege for the rich. Therefore, it is important for stewards or steward-like initiatives to broaden their understanding of the value of data and how this can surface differently depending on use-case and communities’ subjective needs and requirements. For newly established organisations (technical with limited resources/personnel), other supportive entities in the ecosystem may provide the missing key of what this value can look like and therefore they should explore how to collaborate to combine and supplement this value addition.

Digital Democracy, a nonprofit that has codeveloped an open-source mapping tool for indigenous communities (Mapeo), identified a few qualities in a partner that are important to account for. For instance, those with technical personnel/capacity are better placed to provide communities with greater on-ground onboarding support and meaning making around data. These partners are also better enmeshed in the societal conditions, realities and needs of the communities to support them in identifying new use cases for data.

One such methodology that can be leveraged by partners (whether end-users or the stewards themselves) is to carry out data feedback sessions. In partnership with Blue Ventures, Abalobi released a report with a toolkit on ‘Community Engagement with Data’. This document outlines their experience and recommendations for practitioners building out community based marine management systems.

Key to their methodology is facilitating ‘data feedback sessions’, a participatory strategy which brings together communities in a forum to engage on questions, reflect on experiences and take active decisions around both the insights from data and its possible usage or sharing. These feedback sessions are described to have several benefits in that they support communities in recognizing data rights, empowering communities (through building credibility around knowledge contribution and enabling decision-making input), building trust and resolving conflicts, data validation and informing decision-making, adaptation, and change.

Strategy g0.2.2

Creating supportive legal and technical infrastructures that prioritise ‘participation by design’ for meaningful community participation in data stewardship

An essential corollary to “privacy by design”, the concept of “participation by design” borrows from the former and refers to a range of technical instruments available to public institutions to embed community participation in the process of data governance and sharing. Such a move to embed participation within the technical architecture of platforms can redress longstanding asymmetries in the digital economy where individuals and communities have little visibility into how their data is used by private corporations and public agencies that collect their data. Supportive regulation that carves out a role for communities in data governance is necessary and Strategy 1.2 of this Play delves into the specifics of what such legal infrastructure might look like.

More importantly, it is imperative that public institutions also invest in the creation of digital public infrastructure (DPIs) – ‘digital solutions that enable basic functions essential for public and private service delivery, i.e., collaboration, commerce, and governance” – that would allow communities to participate meaningfully in data decisions. Examples of such DPIs include the X-tee data exchange layer built and managed by Estonia’s Information Systems Authority that provides a secure information exchange which is confidential and interoperable. Estonian citizens can access a variety of services such as health insurance, digital signatures, banking and voting through their digital identifiers which is linked to the X-tee framework, retaining control over who has access to this information and how it is shared.

In a similar vein, India’s banking sector regulator, the Reserve Bank of India, has rolled out the Account Aggregator framework – a data intermediary that facilitates consent-driven data exchange between financial information providers (FIPs) (ex: an individual’s bank account) and financial information user (FIU) (ex: credit lending agencies). When FIUs request data from FIPs, the AA will request the data principal (owner of data) for consent to share the data. The promise of the AA framework lies in creating an efficient and connected financial information ecosystem that is powered by user consent and recognition of one’s agency over their data.

Challenge g0.3

Data gathering efforts although defined as participatory continue to be surface-level and non-diverse as structural barriers (class, gender, ethnicity/race, age, citizenship status), vulnerabilities or capacities (technical, financial and data literacy) are not accounted for

Several data gathering projects that rely on citizen generation or “donation” of data often presuppose a few different factors

that contribute to a ‘citizen’s’ ability to substantively participate. Varying incentives, access restrictions and abilities are often not accounted for when designing these data-related projects or infrastructures. For data intermediaries that are cognizant of these barriers and are intent on building more diverse and inclusive data pipelines and spaces for effective participation, this lack of representation is more of an unintended outcome – largely the result of narrow sampling methods or limited pathways to “customer acquisition”. For example, a recent mobility data cooperative Posmo, based in Switzerland, captures data from a limited set of citizens that are characterised as able-bodied, city dwelling and perhaps from a particular age range. This sample size is reflective of the early outreach PosMo carried out to its existing community of supporters, colleagues and other value-aligned individuals. This challenge is particularly acute among emerging stewards who are in the process of defining the modes and mechanisms to best acquire customers or build out their membership. This lack of diversity in the representation of data generators must be addressed as without adequate inclusion both in the data collected and the ability for a range of individuals being able to participate – this may lend itself to reinforcing existing data biases.

Strategy g0.3.1

Forge partnerships with existing community-based organisations to solve for issues of representation within data stewardship initiatives

Reimagining data futures is critical to ensure that data governance and the communities that are helming these efforts are sensitive to problems of lack of diversity that are pervasive within data stewardship initiatives. Forging partnerships with community mobilisation and advocacy organisations present a tangible pathway to resolve challenges of inclusion faced by stewardship entities. Research by Aapti, undertaken as a part of its efforts to build the Stewardship Navigator tool, documenting stewardship initiatives across the world indicates that 56.6% of all such initiatives originate in the Global North – countries based in Europe, North America and Oceania.

The Open Data Institute’s Data Institutions Register contains a log of 204 organisations working as stewards, of which 89.70% entities operate in the Global North. Consequently, low- and middle-income countries in the developing world, as well as marginalised communities within the Global North, find little mention within such databases. While it is likely that there are overall fewer stewardship or steward-like initiatives in the Global South, the difference in numbers is unlikely to be at the level of causing an overwhelming majority of institutions in such databases to be from the Global North. Nonetheless, there is cause for greater emphasis to be put by actors across the stewardship ecosystem to push for more stewardship initiatives in the Global South and in marginalised communities world over.

As a result, there is a pressing need to move away from Eurocentric visions and practices of data governance to ensure that the stewardship community is alive to the experiences of discrimination and exclusion faced by disenfranchised groups. Stewarding organisations can stand to benefit from partnering with initiatives such as Data for Black Lives – “a movement of activists, organisers and scientists committed to the mission of using data to make concrete and measurable change in the lives of black people”. In turn, Data for Black Lives partners with organisations working for racial justice to counter bias inherent to data and algorithmic systems. Forming partnerships with such initiatives can help stewards solve for twin issues of lack of diversity and representation as well as scalability by leveraging existing networks within communities that Data for Black Lives enjoys.

Other valuable partners include the Environment Data and Governance Initiative – a research and advocacy network working with organisations and communities concerned with climate change, science policy, good governance, and environmental and data justice. The EDGI hopes to focalize stewardship of public knowledge about environmental issues by enhancing the use of existing environmental data, through tools like Jupyter Notebooks, which can support greater awareness and data-driven decision-making. Ethics such as intersectionality and a commitment towards anti-oppression are further affirmed through partnerships with grassroots communities that lead climate action. Lastly, EDGI also offers much needed technical support to communities to gather, process, make sense and act using their data.

Strategy g0.3.2

Locate and empower engaged community members to build bottom-up data-oriented communities and facilitate more diverse onboarding for stewarding organisations

In the absence of well-defined communities or relevant infrastructures (e.g unions, collectives, community-based organisations and self-help groups), stewards or enabling organisations should invest in identifying specific individuals (champions and early adopters) from the community interested and aligned with the goals of data stewardship. Stewards may be best placed to also identify these members of the community that are typically not represented in other broader community groups. For instance, while fisher women or women in the fishery management supply chain contribute significant labour, in many communities this remains invisibilized and unrewarded. This asymmetry of power was recognized by Abalobi, a social enterprise that empowers fishers through cocreated ICT technologies and data analytics products based in South Africa.

Upon reaching out to some of these members, Abalobi found that women were also more likely to both demonstrate interest and, in some instances, possessed greater capacity and bandwidth to leverage their technologies. They also showcased a greater engagement and incentive to participate in the cocreation, development and governance of Abalobi technologies.

Abalobi chose to centre the fisherwomen as key pioneers in building their own collective data cultures and associations. At a structural level this meant creating a layer of foundational organisational governance composed of representatives from these communities. Responsibilities involved also seeking and onboarding new members through the articulation of benefits. This intentional sampling and overarching self-governance model Abalobi have put into action is closely tied in with their theory of change – which is to empower and build capacities for more agential and transparent collection and usage of data.