Detecting Offensive Content In The Era Of AI

This piece was taken from The Technocrat, a weekly email from MIT Technology Review about Silicon Valley politics, power, and technology. Sign up here to get it delivered to your email every Friday.

Big Tech has improved significantly over the past ten years in a few areas, including language, prediction, personalization, archiving, text parsing, and data processing. However, it still does a shockingly poor job of identifying, labelling, and eliminating hazardous content. To comprehend the harm this produces in the actual world, one just needs to think back to the recent rise in conspiracy theories in the US about vaccines and elections.

And there are some issues with the discrepancy. Why hasn’t content moderation at IT businesses advanced? Can they be made to do so? Will new AI developments enhance our capacity to identify false information?

Tech corporations typically claim the inherent difficulties of languages for why they’re falling short when they’ve been called before Congress to answer for spreading hate and false information. It’s challenging to understand context-dependent hate speech on a big scale and in several languages, according to executives at Meta, Twitter, and Google. Mark Zuckerberg frequently repeats the idea that tech companies shouldn’t be responsible for handling all of the world’s political issues.

Today, the majority of businesses employ a mix of technology and human content moderators, whose labour is undervalued as evidenced by their meagre compensation packages.

For instance, Facebook today detects 97% of the information that is deleted off the platform using artificial intelligence.

However, according to Renee DiResta, the research manager at the Stanford Internet Observatory, AI is not particularly good at interpreting nuance and context, thus it is not possible for it to completely replace human content moderators—who are not necessarily brilliant at doing so either.

Due to the fact that the majority of automatic content moderation systems were developed with English data and perform poorly with other languages, cultural context and language might also present difficulties.

A more straightforward explanation is offered by Hany Farid, a professor at the University of California, Berkeley School of Information. Because it is not in the tech corporations’ best interests financially, content moderation has not kept up with threats, claims Farid. “Greed is at the heart of this. Stop trying to make this about something other than money.

Additionally, because there is no government regulation in the US, it is very difficult for victims of online abuse to pursue financial compensation from platforms.

Tech corporations and criminal actors appear to be engaged in an endless battle over content control. Tech companies publish guidelines to control content, but bad actors figure out ways to get around them by publishing with emojis or purposefully misspelling words. Then the businesses try to seal the gaps, the offenders discover other ones, and so on.
The huge language model is now introduced.

That’s challenging enough as it is. But given the rise of generative AI and significant language models like ChatGPT, it’s soon going to get a lot harder. Although there are issues with the technology, such as its inclination to confidently invent facts and present them as truth, one thing is undeniably true: AI is becoming significantly better at language.

How does that impact content moderation, then?

Although Di Resta and Farid agree that it’s too soon to predict how things will turn out, both appear cautious. Even though many of the larger systems, such as GPT-4 and Bard, have built-in content moderation filters, it is still possible to manipulate them to produce undesired results, such as hate speech or bomb-making instructions.

Bad actors may be able to launch effective disinformation campaigns at a much faster and larger scale thanks to generative AI. That’s a disturbing idea, especially considering how inadequate current techniques are for classifying and recognizing content produced by artificial intelligence.

On the other hand, modern large language models do text interpretation far better than older AI systems. They may theoretically be applied to improve automatic content moderation.

However, to make that work, computer companies would have to spend money retooling substantial language models for that particular use. And although some businesses, such as Microsoft, have started researching this, there hasn’t been much noteworthy movement.

Despite the fact that there have been many technological developments that should improve content control, Farid is doubtful that it will get any better.

Large language models still have difficulty understanding context, thus it is likely that they won’t be able to understand the subtleties of postings and photos as well as human moderators can. Questions are also raised about scalability and specificity across cultural boundaries. Do you use a single model for every type of niche? Do you categories by nation? Are you guided by the community? It’s not a problem that has a universal solution, claims DiResta.
tools for innovative technology

If tech companies can develop reliable, widely used tools to let us know if content is AI-generated or not, that will likely determine whether generative AI ends up being more damaging or useful to the online information realm.

That presents a significant technical problem, and according to DiResta, the identification of synthetic media is probably going to be given top attention. This includes techniques like digital watermarking, which incorporates a small piece of code to act as a permanent marker to indicate that the associated piece of content was created using artificial intelligence. Unlike watermarking, automated techniques for detecting postings produced or altered by AI are appealing since they don’t require the person who created the AI-generated content to explicitly label it as such. However, the present techniques that attempt to do this have not been very successful in recognizing information that has been created by a machine.

However, this would rely on voluntary disclosure strategies like watermarking. Some businesses have even offered cryptographic signatures that utilize arithmetic to securely log information like how a piece of material originated.

The most recent version of the European Union’s AI Act, which was just put up this week, mandates that businesses using generative AI notify customers when material is actually generated artificially. As the desire for openness surrounding AI-generated content rises, we’re sure to hear a lot more about these kind of cutting-edge solutions in the upcoming months.
Other things I’m reading

The EU may soon outlaw both predictive police algorithms and facial recognition in public areas. This ban would represent a significant victory for the anti-face recognition movement, which has slowed down recently in the US. 
After a bipartisan dinner the previous night, OpenAI CEO Sam Altman will appear before the US Congress on Tuesday as part of a hearing about AI oversight. My hopes aren't very high, but I'm curious to see how knowledgeable US lawmakers are about artificial intelligence and whether anything concrete comes out of the meeting. 
Chinese law enforcement detained a guy last weekend for disseminating false information through ChatGPT. As part of a series of tougher restrictions governing the use of generative AI, China outlawed ChatGPT in February. This seems to be the first arrest that has followed.

What I discovered last week

Despite the fact that misinformation is a serious issue for society, less people seem to be exposed to it than you might think. Despite the fact that disinformation is frequently shared on Telegram, most users don’t seem to continue doing so, according to Oxford Internet Institute researchers who looked at more than 200,000 posts.

In their study, the authors draw the following conclusion: “Contrary to popular received wisdom, the audience for misinformation is not a general one, but a small and active community of users.” Telegram is largely unmoderated, but the research raises the possibility that there may be a degree of an organic, demand-driven impact that controls false information.

Leave a Reply

Your email address will not be published. Required fields are marked *