AI Giants Steal YT Videos for Training

Published in 2020, YouTube Subtitles includes subtitles from over 12,000 videos that have since been deleted from YouTube, and in at least one instance, a creator’s entire deleted online presence has been incorporated into numerous AI models. (Source: Image by RR)

Content from Popular Channels and Educational Institutions Used Without Permission

An investigation by Proof News has revealed that several major AI companies, including Apple, Nvidia, Anthropic, and Salesforce, have been using subtitles from over 173,000 YouTube videos without the creators’ knowledge or consent to train their AI models. This practice goes against YouTube’s policies, which prohibit harvesting materials from the platform without permission. As reported in proofnews.org, the dataset, called YouTube Subtitles, includes transcripts from educational channels like Khan Academy and media outlets such as the BBC and NPR, as well as videos from popular YouTube creators and even those promoting conspiracy theories.

Content creators affected by this, such as David Pakman and Dave Wiskus, have expressed their frustration, noting that their work has been used without compensation and could potentially undermine their livelihoods. They argue that using their content without consent is theft and disrespectful, highlighting the need for proper regulation and compensation for creators whose work is utilized in AI training. Representatives from the companies involved either declined to comment or justified their actions by stating the data was publicly available, despite it being against YouTube’s terms of service.

The dataset, which forms part of a larger compilation called the Pile, includes not just YouTube subtitles but also material from other sources like the European Parliament and Wikipedia. AI companies, including those with substantial financial backing, have used this data to train high-profile models, raising concerns about the ethics and legality of using such datasets. This issue has sparked lawsuits from authors whose works were similarly used without permission, with ongoing litigation highlighting the complex legal landscape surrounding AI training data.

Many creators remain uncertain and deeply concerned about the future, fearing that AI technology could generate content similar to theirs and potentially replace their roles entirely. Instances like David Pakman encountering a fake Tucker Carlson video on TikTok that used his own words, down to the exact cadence, underscore the unsettling capabilities of AI in mimicking human content. This incident highlights a broader fear among creators that AI-generated content could become indistinguishable from their own, leading to a proliferation of digital copycats that could diminish the value and uniqueness of original work. The ethical implications of such technology are significant, as creators worry about losing control over their intellectual property and the potential economic impact on their livelihoods.

About the Author: Roque Ramirez

Leave A Comment Cancel reply

Our Company Mission

Seeflection.AI / Seeflection.com is focused in two areas, which provide synergies to each other. First, Seeflection.com provides AI news, information and e- learning and associated development resources. Second, we provide AI-based development and support services to companies focused in AI, quantum-AI and AI-enabled blockchain development. We have a rapidly growing set of affiliations with a range of corporate and non-profit Artificial Intelligence laboratories and research centers-- as well as individuals in various AI specialties. We are active in both primary and applied AI research and development programs, as well as AI applied to medicine, robotics, media and related markets.

Our Philosophy

Create synergy through applying technology to address long-term problems and create lasting opportunities for people.

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Content from Popular Channels and Educational Institutions Used Without Permission

About the Author: Roque Ramirez

Meet the Robot That Doesn’t Need a Brain

Anthropic Launches Claude 4 Models

Tech Giants Swarm Nevada Desert

AI Learning Mirrors Children’s Brains

OpenAI and Jony Ive Plot the Next iPhone

Leave A Comment Cancel reply

Our Company Mission

Our Philosophy

AI Giants Steal YT Videos for Training