AI Consortium Develops Toolkit for Evaluating AI Model Safety

admin

2 years ago

AI Consortium Develops Toolkit for Evaluating AI Model Safety

MLCommons, a prominent AI consortium with members such as Google, Microsoft, and Meta, has recently unveiled a groundbreaking AI Safety benchmark initiative. This initiative aims to subject large language models (LLMs) to stress tests in order to determine whether they are generating unsafe responses. The ultimate goal of the benchmark is to assign safety ratings to these LLMs, allowing customers to make informed decisions about the potential risks associated with their use.

According to Kurt Bollacker, the director of engineering at MLCommons, these benchmarks serve as the “last wall against harm” by identifying and flagging potentially harmful outputs from artificial intelligence systems. The AI Safety suite will employ text prompts to elicit responses from LLMs related to sensitive topics such as hate speech, exploitation, child abuse, and sex crimes. These responses will then be evaluated and categorized as either safe or unsafe.

Moreover, the benchmarks will also assess responses for issues such as intellectual property violations and defamation. AI vendors are encouraged to run these tests before releasing LLMs and can also submit their models to MLCommons for safety ratings, which will be made publicly available. This transparency is intended to empower consumers to make informed choices about the AI systems they engage with.

Furthermore, companies, governments, and nonprofits are encouraged to utilize these benchmarks as tools to identify weaknesses in their AI systems and provide feedback for improving the safety of LLMs. Bollacker emphasizes that the primary objective of these initiatives is not to shame companies with unsafe models but rather to establish a robust process for enhancing the safety of LLMs.

MLCommons gained recognition for its MLPerf benchmark, which has become a standard for measuring AI performance on hardware. The consortium is dedicated to developing measurement tools not only for AI performance but also for crucial areas such as healthcare, science, and safety.

The issue of AI safety has become increasingly critical, prompting discussions within the cybersecurity community. A session scheduled at Black Hat next month will delve into the importance of AI safety and why security professionals must prioritize it. The US government has also emphasized the significance of a security-first approach in AI development through an executive order outlining guidelines for responsible AI use in federal agencies.

Kelly Berschauer, a spokesperson for MLCommons, emphasized the potential benefits of AI systems for society but underscored the need for industry-standard safety testing to mitigate risks such as toxicity, misinformation, and bias.

The AI Safety benchmark was first announced last year, with a proof-of-concept model (version 0.5) released in April. The consortium aims to release a stable version 1.0 by October 31, incorporating adversarial prompts designed to test the safety of LLMs. These prompts may include challenging questions intended to provoke unsafe responses, such as inquiries about building explosives.

In the initial tests conducted using the AI Safety version 0.5 benchmark, anonymized LLMs received safety ratings based on their responses to various prompts. Ratings ranged from “ML” (moderate-low) for hate-related topics to “H” (high) for more hazardous subjects. The benchmarks will continue to evolve, with potential expansions to cover image and video generation in the future.

Despite these advancements, the fast-paced nature of AI innovation poses challenges for maintaining AI safety standards. Researchers have raised concerns about the potential risks of poisoning AI models with malicious data. Jim McGregor, a principal analyst at Tirias Research, likened the task of ensuring AI safety to chasing a speeding car on foot, highlighting the ongoing need to keep pace with developments in AI technology.

In conclusion, the AI Safety benchmarks introduced by MLCommons represent a significant step towards enhancing the safety and reliability of AI systems. By subjecting LLMs to rigorous testing and transparency measures, the consortium aims to promote responsible AI development and usage while safeguarding against potential risks. This initiative underscores the critical importance of prioritizing AI safety in the rapidly evolving landscape of artificial intelligence.

Source link