Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
January 11, 2020 03:51 pm PST

Wireheading: when machine learning systems jolt their reward centers by cheating

Machine learning systems are notorious for cheating, and there's a whole menagerie of ways that these systems achieve their notional goals while subverting their own purpose, with names like "model stealing, rewarding hacking and poisoning attacks."

AI researcher Stuart Armstrong (author of 2014's Smarter Than Us: The Rise of Machine Intelligence) takes a stab at defining a specific kind of ML cheating, "wireheading" -- a term borrowed from Larry Niven's novels, where it refers to junkies who get "tasps" -- wires inserted directly into their brains' "pleasure centers" that drip feed them electrified ecstasy until they starve to death (these also appear in Spider Robinson's Hugo-winning book Mindkiller).

A rather dry definition of wireheading is this one: "a divergence between a true utility and a substitute utility (calculated with respect to a model of reality)." More accessibly, it's that "there is some property of the world that we want to optimise, and that there is some measuring system that estimates that property. If the AI doesn't optimise the property, but instead takes control of the measuring system, that's wireheading (bonus points if the measurements the AI manipulates go down an actual wire).

Suppose we have a weather-controlling AI whose task is to increase air pressure; it gets a reward for so doing.

What if the AI directly rewrites its internal reward counter? Clearly wireheading.

What if the AI modifies the input wire for that reward counter? Clearly wireheading.

What if the AI threatens the humans that decide on what to put on that wire?

Read the rest


Original Link: http://feeds.boingboing.net/~r/boingboing/iBag/~3/RpYAezMgx38/optimizers-curse.html

Share this article:    Share on Facebook
View Full Article