An Interest In:
Web News this Week
- March 28, 2024
- March 27, 2024
- March 26, 2024
- March 25, 2024
- March 24, 2024
- March 23, 2024
- March 22, 2024
Wireheading: when machine learning systems jolt their reward centers by cheating
Machine learning systems are notorious for cheating, and there's a whole menagerie of ways that these systems achieve their notional goals while subverting their own purpose, with names like "model stealing, rewarding hacking and poisoning attacks."
AI researcher Stuart Armstrong (author of 2014's Smarter Than Us: The Rise of Machine Intelligence) takes a stab at defining a specific kind of ML cheating, "wireheading" -- a term borrowed from Larry Niven's novels, where it refers to junkies who get "tasps" -- wires inserted directly into their brains' "pleasure centers" that drip feed them electrified ecstasy until they starve to death (these also appear in Spider Robinson's Hugo-winning book Mindkiller).
A rather dry definition of wireheading is this one: "a divergence between a true utility and a substitute utility (calculated with respect to a model of reality)." More accessibly, it's that "there is some property of the world that we want to optimise, and that there is some measuring system that estimates that property. If the AI doesn't optimise the property, but instead takes control of the measuring system, that's wireheading (bonus points if the measurements the AI manipulates go down an actual wire).
Read the restSuppose we have a weather-controlling AI whose task is to increase air pressure; it gets a reward for so doing.
What if the AI directly rewrites its internal reward counter? Clearly wireheading.
What if the AI modifies the input wire for that reward counter? Clearly wireheading.
What if the AI threatens the humans that decide on what to put on that wire?
Original Link: http://feeds.boingboing.net/~r/boingboing/iBag/~3/RpYAezMgx38/optimizers-curse.html