Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
March 8, 2019 07:50 pm PST

A machine-learning system that guesses whether text was produced by machine-learning systems

Gltr is an MIT-IBM Watson Lab/Harvard NLP joint project that analyzes texts and predicts whether that text was generated by a machine-learning model.

Automatically produced texts use language models derived from statistical analysis of vast corpuses of human-generated text to produce machine-generated texts that can be very hard for a human to distinguish from text produced by another human. These models could help malicious actors in many ways, including generating convincing spam, reviews, and comments -- so it's really important to develop tools that can help us distinguish between human-generated and machine-generated texts.

Gltr uses Open AI's GPT-2 117M language model, which is also widely used by text-generating models. Gltr looks for texts that fit the GPT-2 model too well, on the basis that texts produced by humans nearly always have "surprising" word combinations that are considered highly unlikely by GPT-2's lights. In other words, if a text seems too human, it's probably machine generated.

It's not hard to think of ways to defeat this tactic: GPT-2 could be modified to inject some random word-choices that roughed up the otherwise overly smooth statistical model; but Gltr relies on human beings to review the scores it gives to text, producing confidence scores that pop up when you hover your mouse over a word in a candidate text. This makes it harder to trick Gltr with random words, but it also means that Gltr is hard to scale up to analyzing large volumes of information, like all the tweets under a popular hashtag. Read the rest


Original Link: http://feeds.boingboing.net/~r/boingboing/iBag/~3/pJ_qG6vNs-A/gpt-2-vs-gpt-2.html

Share this article:    Share on Facebook
View Full Article