Why Public Datasets Are Crucial for Building Inclusive AI

Everyone has plenty to say about AI. Instead, they should be talking about the data that feeds into it.

The dizzying speed of innovations in artificial intelligence (AI) technology may conjure up wild conversations with ChatGPT, machines making medical diagnoses, or self-driving cars. All of those things are within the incredible range of possibilities that AI presents to the world. And they all rely on massive amounts of data to train them. 

Where that data is coming from is a pivotal element in the direction that AI takes, and we are at a crucial moment for ensuring that democracy and equity are at the forefront of new developments. According to this recent T7 Policy Brief, that’s not just a lofty ideal—it’s a necessity for inclusive global progress.

Democratizing New Technologies

The brief, a collaboration between the Digital Public Goods Alliance, Open Future, the Open Knowledge Foundation, and others, encourages the G7 member nations to act upon their stated shared commitment to consider the global common good in the development of AI. This means acknowledging the current imbalance in access to and use of AI and prioritizing the democratization of emerging technologies. Without this intentional focus, building a new system on top of old systems that contain bias and drive inequality will only perpetuate those cycles on autopilot. 

One of the brief’s key recommendations is to invest in public datasets and digital public goods as the basis for AI. Like Open Supply Hub (OS Hub), public efforts prioritize the inclusion of a wide swath of stakeholders for differing perspectives and solutions. Closed solutions shut out those who cannot pay, resulting in products skewed towards the wants and needs of a select few. We have the opportunity right now to create an AI ecosystem that effectively serves users, no matter who they are or where they live.

Open Supply Hub and Inclusive AI 

AI has incredible potential to alter the supply chain ecosystem, automating once-laborious processes across a complex global network. OS Hub’s platform currently uses machine learning (ML) to deduplicate and match data contributed by users. As of this writing, we host more than 456,000 production locations, deduplicated by our algorithm from over 1.6 million contributed records. Identifying production locations is a major barrier to solving problems in supply chains due to poor quality data, inaccurate addresses, or partial addresses. By deduplicating the contributed data, our algorithm is cleaning the world’s supply chain data to make it trustworthy and useful. And gone will be the days of humans manually digging through websites and scouring PDFs and Excel files. 

Our next step is using AI to grab vastly more data from the internet, building a new AI-driven pipeline to ingest public data into the platform. In addition to increasing the quantity of data, the pipeline also reduces the human effort to pull clean, quality data and make it accessible and available to all. AI is crucial and significantly increases the speed and accuracy of the data ingested. 

It’s easy to see how quickly good data can result in good AI. OS Hub occupies a unique spot in the AI ecosystem; AI systems built upon an enormous, trustworthy data set like ours feed into other models, benefitting all stakeholders by reducing bias in future models and contributing to the safe innovation of AI technologies. It’s a counterpoint to the frantic, massive investments in AI by for-profit entities, which prioritize shareholders over all else. 

An Imperative Element of Just Transitions

Supply chains thrive on deeply entrenched and unequal power dynamics: over four days, a garment company CEO earns the lifetime salary of a Bangladeshi garment worker. The systemic issues propagated by the current state of supply chains—forced labor, human rights abuses, environmental degradation and more—all disproportionately affect the Global South. 

We know that opening up data drives change for the people and communities most marginalized in this ecosystem—those making the products. Openly available tools like OS Hub provide the foundational data for mapping production locations and driving many stakeholders’ work to improve facility conditions and provide workers with access to remedy and grievance mechanisms. 

Right now, so many organizations around the world are focusing on just transitions: how we can be inclusive and equitable as we work to shift power towards the collective good in terms of economic opportunity and environmental sustainability. AI will be crucial in those efforts as we harness its power to work towards clean air and water, improved biodiversity, and equal work opportunities. New technologies can supercharge those efforts—but if not built with equity in mind, they can continue to impose some of the world’s most grievous harms. Public data and open platforms like OS Hub can be powerful tools—and are non-negotiable—in the creation of a safe, equitable future for all.

 


OS Hub is a non-profit platform that relies on philanthropic support to sustain the world’s most complete, open and accessible supply chain map. Join us in powering the transition to safe and sustainable supply chains by making a donation today

Learn more about OS Hub or explore other stories on our blog.

Author