Critics denounce a lack of transparency around GPT-4’s tech

 

By Chris Stokel-Walker

The much ballyhooed, long-rumored release of GPT-4 is finally upon us. In a blog post on Tuesday, OpenAI, the company behind ChatGPT, announced its latest landmark AI model.

As predicted, the results are impressive: The multimodal model, which unlike its predecessor can accept images as well as text inputs, can show “human-like performance” on a number of tasks. It scores in the top 10% of a bar exam for lawyers, and generally can pass a battery of tests with uncanny ability.

But to some, it’s not the eye-popping results that are the most significant element of the report; it’s the lack of transparency.

Though OpenAI released an accompanying 98-page paper on Tuesday, the company is relatively tight-lipped on the matter of the tech itself. “Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar,” the paper reads.

That explanation hasn’t sat well with some members of the AI and academic communities. Citing safety implication is a “cop-out,” claims Catherine Flick, a researcher in computing and social responsibility at De Montfort University, in the U.K. “It’s entirely the competitive landscape; if they truly were ‘Open’ AI, they would be wanting to be as transparent as possible.”

The issue, according to Irina Raicu, director of the Internet Ethics Program at the Markkula Center for Applied Ethics at Santa Clara University, is that it’s vital for fellow researchers to have access to the chatbot’s training data set “Knowing what’s in the data set enables researchers to point out what’s missing,” she says.

The move may well be a legal defense, says Andres Guadamuz, an intellectual property law researcher at the University of Sussex, perhaps reactive to the ongoing drama around competitor Stability AI, which was hit with a lawsuit by stock image database Getty in January for allegedly using its paid-for database of stock images to train its image-generation model. “This is going to become the standard going forward,” he says. (For its part, OpenAI did not immediately respond to a request to comment for this story.)

 

The decision to not disclose anything about how the GPT-4 model is trained, nor even the size of the model or how it’s built, is “safety through obscurity,” says Guadamuz. “Openness makes you vulnerable.”

But openness is also important when talking about a transformative technology. GPT-4’s predecessors, ChatGPT and GPT-3, have already been layered into a number of different applications. Within hours of OpenAI’s announcement, GPT-4 was already being used in humanitarian applications, and as an AI customer service bot run by software company Intercom. The likely ubiquity of the GPT-4 technology makes it more important than ever that the general population is able to keep track of the unique foibles inherent in any machine learning model, critics say. Without knowing how it functions, it’s harder to know how it’s malfunctioning—which will be more important than ever as AI becomes intertwined in our daily lives.

Beyond just OpenAI, Flick, the De Montfort University researcher, is concerned that the broader race to develop newer, more powerful AI models has created an arms race that isn’t conducive to careful development that accounts for the societal impact it can have. 

“The space is competitive, and they want to keep pushing out untested prototypes to keep ahead of the others, promising future fixes for key noncore technical aspects, such as transparency,” she says. “In practice, these are likely to not actually be properly worked through because the focus will shift to firefighting this version and pushing out the next version, and will instead just be hand waved over or pushed to a back burner.”

Then there’s the issue of how OpenAI is hyping its own creation, and what the company is implicitly suggesting through its assessment of the latest chatbot. “For me, the striking thing in this new release was how much they foregrounded this performance on the bar exam and LSAT,” says Nick Seaver, the director of the Science, Technology & Society program at Tufts University. “It’s starting to suggest the thing that makes lawyers lawyers and humans humans is their ability to pass these tests. But these things are already bad for evaluating humans.”

Fast Company

(16)