Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
April 19, 2024 05:22 pm

Meta's Not Telling Where It Got Its AI Training Data

An anonymous reader shares a report: Today Meta unleashed its ChatGPT competitor, Meta AI, across its apps and as a standalone. The company boasts that it is running on its latest, greatest AI model, Llama 3, which was trained on "data of the highest quality"! A dataset seven times larger than Llama2! And includes 4 times more code! What is that training data? There the company is less loquacious. Meta said the 15 trillion tokens on which its trained came from "publicly available sources." Which sources? Meta told The Verge that it didn't include Meta user data, but didn't give much more in the way of specifics. It did mention that it includes AI-generated data, or synthetic data: "we used Llama 2 to generate the training data for the text-quality classifiers that are powering Llama 3." There are plenty of known issues with synthetic or AI-created data, foremost of which is that it can exacerbate existing issues with AI, because it's liable to spit out a more concentrated version of any garbage it is ingesting.

Read more of this story at Slashdot.


Original Link: https://tech.slashdot.org/story/24/04/19/1236256/metas-not-telling-where-it-got-its-ai-training-data?utm_source=rss1.0mainlinkanon&utm_medium=feed

Share this article:    Share on Facebook
View Full Article

Slashdot

Slashdot was originally created in September of 1997 by Rob "CmdrTaco" Malda. Today it is owned by Geeknet, Inc..

More About this Source Visit Slashdot