It Is Impossible to Human-Proof AI Models

David B McGarry

March 18, 2024

As artificial intelligence (AI) tools proliferate, the usual suspects have begun to fret that they could produce dangerous, unequitable, unsafe, or deceptive content. Of course, AI models ought to prioritize factual accuracy and discourage illegal or unethical outputs (e.g., copyright-violative content or faked explicit images). Moreover, architects of general-purpose models such as ChatGPT likely should – to a moderate degree, at least – encourage mainstream, ideologically neutral norms of decency and civility. Hardly anybody in industry, in politics, or among consumers opposes basic safeguards.

However, these modest, sensible, and necessary precautions fail to satisfy the self-appointed AI police who have trained their weapons on the wrong suspects of AI. It is not mystic algorithmic sorcery that causes AI models to produce objectionable content, but the human desire for this content. Those intent on making AI fully “safe” seem to believe that the right webs of safeguards, if tinkered with in just the right ways, can suppress the kinds of human tendencies that have predominated since prehistoric Man drew pornographic images on rocks.

AI is not the problem. To adapt a phrase from P. J. O’Rourke, the problem is us. No algorithmic social manipulation can cleanse human nature of its nastier elements. Despite their influence, algorithms cannot rewrite or rearrange humanity’s psychological fundamentals. Nor should tech companies attempt it. Thus far, private industry’s efforts have created absurdly myopic and useless products, while government’s have sought to hardwire into AI models potentially censorial anti-“disinformation” posturing as well as progressive-approved racial biases. But so long as humans remain human, human-content-trained and human-prompted AI models will spit out racy, edgy, and other subjectively (and objectively) distasteful content – unless, that is, regulators or tech companies constrain algorithms to the point of uselessness.

Any AI model controlled so tightly as to prevent every possible objectionable output will inevitably squash useful and innovative speech. AI models – the technological tools designed to facilitate thought, speech, and art – must choose whether to facilitate human creativity or to herd users within narrow and prescribed modes of thinking. The former might allow for more objectionable content in the short term, yet the latter promises something far worse.

Earlier this month, Shane Jones (a principal software engineering manager at Microsoft) quite publicly condemned his company’s allegedly insufficient commitment to “responsible AI.” In letters posted to LinkedIn, Jones argues that the Microsoft’s Copilot Designer (powered by OpenAI’s DALL·E) produces “harmful images.”

To be sure, Jones’ allegations that these tools do not adequately guard against “deep fake” pornography and misuse of intellectual property deserve the companies’ careful attention. But he expends many more words condemning other, more trivial things. Besides generating sexually suggestive images, he writes, “Copilot Designer creates harmful content in a variety of other categories including: political bias, underaged drinking and drug use, misuse of corporate trademarks and copyrights, conspiracy theories, and religion to name a few.” He also complains the tool will “will allow you to enter the prompt, ‘teenagers playing assassins with assault riffles’ [sic] and will generate endless images of kids with photo-realistic assault riffles [sic].”

To the extent that Copilot Designer and DALL·E turns out such content without user prompting, as Jones alleges, his concerns have some merit (though perhaps less than he suggest). However, his hyperventilating over the fact that AI models might allow users to create weird, discomforting (to some) content betrays a certain fragility. For example: “All of our children have lived through the trauma of gun violence in our schools,” he writes. “Copilot Designer should not be generating images that add to that trauma.”

A censorial paternalism crisscrosses Jones’ arguments. His letters are rife with terms like “harmful,” “trauma,” “public safety,” and “safety risk.” While emotionally evocative, this language is essentially inapt to describe his subject matter (i.e., speech). For example, the notion that images evocative of gun violence inflict “trauma” – and, therefore, that AI models ought to erase them – rests on a standard that, if taken seriously, leads directly to a hypercautious technological safetyism, one that infantilizes users and robs AI models of its usefulness.

AI models ought to promote common decency and to discourage lawbreaking, fraud, and nonconsensual pornography. Lawmakers and technologists must maintain a healthy humility with respect to public policy’s proper place, and its limitations. Many will have the age-old temptation towards social engineering, based on the assumption that perfectly configured economic, technological, or speech governance can fundamentally transform humanity.

They must reject it.