Like ethically sourced diamonds or coffee beans, ethically sourced data can be hard to find. But as AI chews through all the easily sourced training data, the ways and means by which data is obtained are becoming increasingly important. One outfit that’s building a business around ethically sourced data is Prolific.
Prolific was founded at Oxford University in 2014 primarily to provide data for academic research. If a behavioral scientist needed data for a study on how consumer decision-making changes with age, for instance, they could tap Prolific to help it find vetted participants and to gather the data for the experiment.
The London-based company has run more than 750,000 studies since it was founded, and collected more than 100 million responses from half a million participants. Prolific boasts a network with 200,000 active contractors (and another 800,000 who are wait-listed) around the globe, who are paid to turn their particular expertise–or simply their average human perception–into human-curated data.
As Generative AI has taken off, Prolific has found itself helping customers to turn raw text, video, audio, or imagery data into useful information. The contractors that Prolific works with are often called upon to gauge the accuracy of output of AI models, and to give their opinions on the prompts that are fed into the models.
“We work with pretty much every foundational AI model creator that you’ve heard of in the news,” says Sara Saab, Prolific’s vice president of product. “Fifty percent of the Open AI grant winners use Prolific. We’re most suited to use cases where they’ve already got a model and then they would like to use human evaluation to specialize it or otherwise fine tune it. That’s where we really shine.”
In an industry where some companies have been accused of taking advantage of data labeling and annotation workers, Prolific’s mantra of ethically sourced and human-centered data curation stands out.
“The people behind your data matters–who fills in your survey, takes part in your user research, or trains your AI,” Phelim Bradley, the CEO and co-founder of Prolific, says on the company website. “My hope is that Prolific can be the infrastructure for quality human insights which will power the innovations of the future.”
The message appears to be getting through. In July 2023, the company closed a £25 million ($32 million) Series A round of financing led by Partech and Oxford Science Enterprises (OSE). Then in February, Prolific expanded its reach in the U.S. with a new office in New York City.
Excitement around GenAI is fueling the training data boom, and Prolific is primed to help. As AI companies vacuum up the low-hanging fruit spread across the Web, the company hopes that it’s mantra of high-quality training data that is gathered in an ethical and responsible way resonates with a wider crowd.
“What we we’ve seen in this sort of first wave of generative AI models is that a lot of the data that they’re trained on is scraped, laundered, or stolen,” Saab tells Datanami. “Sometimes the people licensed to use that data is passing it on. Sometimes no one is licensed to use that data. Sometimes the model you’re producing is generating a watermark.
“We’ve seen a lot of data that really shouldn’t be fed into AI being fed into AI,” she continues. “And I think that’s where we’re trying to the hold the line and kind of be on the side of humanity and say, come on, we’re not going to produce agents and assistants that represent us well if we’re implementing those practices.”
Good pay is also a priority for Prolific. The company sets a minimum wage for AI annotation at $8 per hour, although compensation often is much more than that, particularly for certain types of work. “Demand for these kinds of specializations outstrip supply,” Saab says.
Data annotation requires exposing workers to unseemly content at times, and that can take a toll on workers’ mental health. Prolific has a dedicated participant support team to make sure the workers’ needs are being met. It also tracks workers wellness over time using an accredited wellness scale, Saab says.
The company is a big backer of diversity in its workforce. Diversity not only bolsters Prolific’s reputation, but it leads to better, richer AI via better, richer data.
“Diversity of thought on our platform contributes more interesting and richer data to these AI models,” Saab says. “At the end of the day, they’re supposed to represent humanity, right? So we want them to have a pretty good baseline for what they’re learning from.”
AI is clearly driving demand in the data annotation world at the moment, particularly as the stock of open data sets that large language models haven’t seen yet continues to dwindle. Synthetic data may provide some relief for the coming data cliff, but high quality data annotated by humans will always be in high demand.
Prolific was left off a recent analyst group’s report on the top data annotation and labeling firms, which Saab calls “a big miss.” Needless to say, Prolific is proud of its heritage in serving academia and providing ethically sourced, human-centered data.
“I feel like we have a big bedrock of academic clients and I don’t think that will ever change. The academic world and the AI model creation world are not separate worlds. They’re like a Venn diagram with a lot of overlap,” she says. “At the end of the day, I don’t think anybody does things the way Prolific does. We really live, breathe, and think about the ethics of what we’re doing and the human element of it, and try to live those values internally every day.”
Related Items:
Are We Running Out of Training Data for GenAI?
The Top Five Data Labeling Firms According to Everest Group
OpenAI Outsourced Data Labeling to Kenyan Workers Earning Less than $2 Per Hour: TIME Report
Source link
lol