Watching Your Live Streams For Violence And Porn Is Now A Job For AI
(July 21, 2016), a week after the girlfriend of Philando Castile broadcast the aftermath of his shooting by a police officer on Facebook Live, another live stream showing yet more violence began spreading on Facebook. A young black man listening to music with two friends in a car in Norfolk, Virginia, was broadcasting on Facebook Live when he and his cohorts were shot in a flurry of bullets. Unlike the first video, which was briefly taken off Facebook due to a “technical glitch,” the second video remained on the man’s Facebook page.
But as an increasing torrent of violent content is popping up in live streams, platforms like Facebook and Periscope are asking themselves what role they should have in choosing what their users see, and how exactly their teams of moderators will do that.
To the second question at least, platforms have one emerging idea: artificial intelligence.
“Being able to bounce porn inside livestreams or inside pre-recorded videos is already within the grasp of all the major tech companies,” says David Luan, founder of Dextro, a New York-based company that uses machine learning to make sense of video. Software like his is already being used to monitor video that’s both pre-recorded and live-streamed on services like Periscope, YouTube, and Facebook—all of which prohibit sexually explicit content. Luan says AI may be one reason why your feeds on those platforms feature little to no porn.
“We can already pick out when guns are present or when there’s a protest going on,” says Luan. And it can do it quickly. Luan says it takes his technology 300 milliseconds to determine what’s in a video once it hits their servers. That speed would be crucial for a platform like Facebook, with its 1.65 billion users, where live videos can quickly command an enormous audience.
In general, Luan says, image recognition has come a long way in the last two years. Companies like his use models and algorithms to identify concepts in streams as a way to help companies and users find the best content, or the section of a video they’re looking for. As such, artificial intelligence is becoming adept at perceiving objects in both images and video. Twitter’s AI team, known as Cortex, is using a large simulated neural network to determine what is happening in Periscope feeds in real time, in order to better recommend content to users.
And Facebook, which has already made big bets (and significant progress) in facial and object recognition in still images, and is working on a similar system for Live videos.
“One thing that is interesting is that today we have more offensive photos being reported by AI algorithms than by people,” Facebook’s director of engineering for applied machine learning, Joaquin Candela, told TechCrunch in March. “The higher we push that to 100%, the fewer offensive photos have actually been seen by a human.”
AI can even attach sentiment or overarching descriptions to images like “happiness” or “anger.” Clarifai, another company that uses machine learning to analyze video, can recognize 11,000 different concepts, which includes both objects and scene descriptions. Matthew Zeiler, the company’s founder and CEO, says that AI can detect fighting by homing in on, say, clenched fists in a physical fight. But focusing on weaponry can be more predictive, he says, “because we could see these weapons before they’re used.” Once artificial intelligence knows what it’s looking for, it can set off an action—like shutting down a stream, or alerting a moderator—if these elements arise.
While researchers have made significant progress in “teaching” computers to see things in still images, processing live video is much harder. At Twitter, the AI team effectively built a custom supercomputer made entirely of heavy-duty graphics processing units (GPUs) to perform the video classification.
AI is also hampered in understanding the context of a situation, Luan says. “You have things that are very contextual, like someone being heckled in a way that’s really inappropriate, but that depends upon understanding some key characteristics about the scene.”
For example, an algorithm would not understand the racial undertones of a black man breaking a stained glass window depicting slaves picking cotton at one at the nation’s most prestigious universities. Artificial intelligence also wouldn’t be able to understand the nuanced hate speech in the heated argument between a group of white teenagers and a man with tan skin that erupted on a tram in Manchester after the U.K. announced its planned exit from the European Union. That requires cultural and historical context that artificial intelligence isn’t capable of capturing, at least not yet.
But an algorithm would be able to spot the police officer’s gun pointed at Philando Castile bleeding out in the driver’s seat of his car in Diamond Reynolds’s Facebook Live broadcast. What a human moderator with that information would do next is less clear.
The extent to which Facebook uses AI to weed out bad content is unknown, but the moderation system is still mostly human. Once a user flags a widely viewed live stream or video, it’s sent to one of the company’s four moderation operations, in Menlo Park, Austin, Dublin, and Hyderabad, India. There, moderators are told to stop any live stream that’s in violation of Facebook’s community standards, which forbids threats, self-harm, “dangerous organizations,” bullying, criminal activity, “regulated goods,” nudity, hate speech, and glorified violence.
Among the live videos Facebook has stopped this year was one from Paris showed an ISIS sympathizer streaming threats after allegedly murdering a police commander and his partner, and a video from Milwaukee of three teenagers who filmed themselves having sex. Another stream, filmed by a man as he was murdered in daylight on a Chicago street, remains on the site.
Part of the reason human moderators are still necessary—and widely used—in moderation systems is because of what artificial intelligence can’t understand. While AI may be faster at finding indications of violence, humans can understand more complicated scenarios like the altercation between that tram passenger and those angry Manchester teens.
That may be changing. “The pace of development in AI as a whole is super exponential,” says Luan. Gesture recognition is rapidly improving, he says, and while artificial intelligence can’t see concealed weapons, by the end of the year it may be able to.
While the kinds of things that AI is able to turn up and moderate against is getting more refined, that doesn’t mean we’ll be able to do away with human filters altogether. At the end of the day content flagging by human users is a crucial component of any platform, because in aggregate those flags say a lot about the kind of content users want to see as a whole. Furthermore, human moderators provide a crucial role in determining what content has public interest value (“raises awareness” in Facebook’s words) and which doesn’t.
But Facebook isn’t always so clear about why it deems a video permissible or unacceptable. Perhaps as artificial intelligence moderation tools are able to take on more of the burden of moderation and their accuracy inspires a greater degree of confidence, it will give platforms like Facebook and Periscope the opportunity to be more thoughtful and transparent about their decision to take down a video or keep it up.
related video: Facebook Wants To Win At Everything, Including Artificial Intelligence
Fast Company , Read Full Story
(30)