OpenAI Releases an Improved Version of Its Content Moderation Tool, ‘The Moderation Endpoint,’ and Its Free to OpenAI API Developers

A series of meticulously planned and carried out stages are primarily responsible for the success of natural language categorization algorithms for real-world content filtering. Designing content taxonomies and labeling guidelines is one of these processes. Data quality assurance is another. An active learning pipeline is used to catch unusual occurrences, and various techniques are used to make the model robust and prevent overfitting. Open AI has launched the Moderation endpoint, a more comprehensive method for developing a reliable and practical natural language categorization system, to assist developers in safeguarding their applications from potential exploitation. Developers of the OpenAI API now have free access to GPT-based classifiers that identify objectionable content thanks to this faster and more precise endpoint. 

The moderation system has been taught to recognize several objectionable material types, such as sexual content, hate speech, violence, self-harm, and harassment. This method can produce superior content classifiers that outperform commercial models since it generalizes to various content taxonomies. The technical article outlining the team’s methods and the evaluation dataset has also been made available to the public. The endpoint has been educated to operate swiftly, precisely, and reliably across various applications. This, crucially, lowers the possibility of items “saying” the incorrect thing, even when they are widely distributed to users. As a result, AI can be advantageous in delicate contexts, such as education, where it would otherwise be challenging to deploy. Developers can take advantage of Open AI’s infrastructure developments thanks to the Moderation endpoint. The developers can acquire correct classifiers using a single API request, as noted in their documentation, rather than creating and maintaining their classifiers, which is sometimes a time-consuming procedure.

The endpoint supports OpenAI’s mission to make the AI ecosystem safer by enabling free moderation of all content produced by the OpenAI API. For instance, the Moderation endpoint is used by OpenAI API client Inworld to assist their AI-based virtual characters in “staying on-script.” Inworld can concentrate on its primary offering, which entails developing memorable characters using OpenAI’s technology. The team also supports using the endpoint to control the content that was not created via the OpenAI API. In this use case, the Moderation endpoint is used by NGL, an anonymous messaging service that prioritizes safety, to identify bullying and inflammatory language on their platform. NGL discovers that these classifiers can generalize to the most recent slang, enabling them to maintain their confidence over time. The Moderation endpoint is in private testing and has a cost associated with using it to monitor non-API traffic. In their technical paper, they go into great depth about the training procedure and model performance. They anticipate that the release of their evaluation dataset, which includes Common Crawl data, will encourage additional study in this field.

This Article is written as a research summary article by Marktechpost Staff based on the research paper 'A Holistic Approach to Undesired Content Detection in the Real World'. All Credit For This Research Goes To Researchers on This Project. Check out the paper, docs, github and reference article.

Please Don't Forget To Join Our ML Subreddit

Khushboo Gupta is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Goa. She is passionate about the fields of Machine Learning, Natural Language Processing and Web Development. She enjoys learning more about the technical field by participating in several challenges.