Dhruv Rathee, Marques Brownlee, PewDiePie YouTube video subtitles used to train AI models

The Hindu Bureau The Hindu Bureau | 07-18 00:11

Dhruv Rathee, Marques Brownlee, and PewDiePie YouTube video subtitles were used to train AI models, according to a tool shared by the Proof News outlet.

Anthropic, Nvidia, Apple, and Salesforce were among the leading tech firms that used a YouTube video subtitle dataset to train their AI models, according to the outlet

The outlet said it found subtitles from 173,536 YouTube videos that were pulled from over 48,000 channels, but warned that the tool could result in false negatives.

Some of the videos that were used to train AI included uploads by tech reviewer Marques Brownlee, apart from content creators such as PewDiePie and Dhruv Rathee, as well as news publications and talk shows worldwide.

Based on a search using the tool, a 2020 video by The Hindu was also seen in the results.

(For top technology news of the day, subscribe to our tech newsletter Today’s Cache)

Most of the videos were from 2020 or earlier, suggesting a cut-off of sorts.

Brownlee criticised companies that scraped video transcripts for AI training content.

“Fun fact, I pay a service (by the minute) for more accurate transcriptions of my own videos, which I then upload to YouTube’s back-end. So companies that scrape transcripts are stealing *paid* work in more than one way. Not great.,” posted Brownlee on X on Tuesday.

Anthropic and Salesforce confirmed using training datasets that included the scraped video subtitles, but did not accept any wrongdoing, per the outlet. Nvidia, Apple, Databricks, and Bloomberg did not confirm or deny the allegations.

The question of scraping YouTube videos—or their transcripts—to train AI models is a contentious one.

Earlier in the year, when OpenAI ​official Mira Murati was asked about whether the ChatGPT-maker used YouTube videos for AI training, she struggled with the question and could not answer clearly.

Disclaimer: The copyright of this article belongs to the original author. Reposting this article is solely for the purpose of information dissemination and does not constitute any investment advice. If there is any infringement, please contact us immediately. We will make corrections or deletions as necessary. Thank you.


ALSO READ

Saudi Arabia jails cartoonist Mohammed al-Hazza for 23 years for insulting leadership, rights group says

Dubai — A Saudi artist has been sentenced to more than two decades in prison over political cartoons...

world | 2 hours ago

Rain may have helped form the first cells, kick-starting life as we know it

Billions of years of evolution have made modern cells incredibly complex. Inside cells are small com...

science | 2 hours ago

The Science Quiz: AI in science, from neurons to nodes

Questions: 1. The functioning of organic neurons is the model for artificial neural networks. In bio...

science | 2 hours ago

Today’s top tech news: Meta’s U.S. legal troubles; Intel and AMD team up; Apple’s new iPad mini

(This article is part of Today’s Cache, The Hindu’s newsletter on emerging themes at the intersectio...

technology | 2 hours ago

AI firm Perplexity offers a peek into a new financial analysis tool

AI company Perplexity revealed a work-in-progress finance-centric platform that would let users look...

technology | 2 hours ago

Apple iPhone 16 Pro Max and Samsung Galaxy S24 Ultra | Prices, specs, features compared

As the festival season rolls by, many shoppers in India are considering whether it’s time to take ad...

technology | 2 hours ago