Cisco Introduces AI Provenance Tool to Enhance Security and Compliance

Cisco Launches Model Provenance Kit to Enhance AI Transparency

In an age where artificial intelligence (AI) models are deeply embedded in myriad enterprise applications, the origin of these models remains an ongoing concern in the realm of cybersecurity. Understanding and tracing the lineage of AI has become increasingly critical, particularly as organizations strive to maintain compliance with various regulations. Recognizing this pressing need, Cisco has recently unveiled the Model Provenance Kit, an innovative open-source tool designed specifically to trace the exact lineage of AI models. This initiative aims to foster transparency across the intricate web of AI supply chains while also assisting organizations in meeting stringent compliance requirements.

The Problem of Opaque AI Supply Chains

The contemporary landscape of AI development is significantly reliant on open-source repositories, like Hugging Face, which currently boasts a library of over two million models. Developers often download and modify these models without maintaining precise records of their changes. This lack of comprehensive documentation presents considerable risks for organizations looking to implement generative AI tools effectively.

When an enterprise deploys a model that contains corrupted data, or hidden vulnerabilities, these issues can propagate unnoticed into newly generated versions derived from the initial model. Consequently, tracing the source of the problems becomes imperative for effective incident management and identifying the root cause of unexpected model behavior. Compounding this challenge is the regulatory pressure exerted by frameworks such as the EU AI Act, which demand meticulous documentation of both training datasets and system components throughout the supply chain.

To address the visibility shortcomings prevalent in AI model management, Cisco has developed the Model Provenance Kit, likening its functionality to a DNA testing system for AI technologies. The risk of fraudulent documentation or the manipulation of metadata prior to a model’s public release is a real threat. Moreover, with many contemporary models sharing identical architectural frameworks, it often becomes impossible to discern their origins based solely on configuration files.

Innovative Functionality of the Model Provenance Kit

Cisco’s Model Provenance Kit combats these limitations through a methodical examination of both metadata and the inherent learned parameters of AI models. The tool operates in a two-stage verification process aimed at determining the model’s origins accurately.

In the initial stage, a rapid architectural screening occurs, wherein model configurations and structural metadata are compared. If the system encounters ambiguous metadata, it transitions to the second stage, delving deeper to analyze the learned weights directly. During this extensive analysis, the toolkit extracts five complementary signals from the model’s inner workings, each contributing to establishing a unique fingerprint for the model.

Embedding anchor similarity assesses the geometric relationships between tokens to identify structural integrity remaining after fine-tuning.
Embedding norm distribution scrutinizes word frequency patterns that were established during the model’s original training.
Norm layer fingerprints focus on tiny normalization layers that prove stable across various modifications.
Layer energy profiles evaluate normalized energy curve distributions at multiple depths of the neural network.
Weight-value cosine metrics offer a direct comparison of weight values across layers.

Cisco subjected the Model Provenance Kit to a stringent benchmark test comprising 111 pairs of similar and dissimilar models. The evaluation included challenging real-world cases that mimicked aggressive distillation, same-tokenizer traps, and cross-organizational fine-tuning efforts. By integrating these extracted signals into a cumulative provenance score, the toolkit effectively determines whether two model pairs share a common lineage.

Impressive Performance Metrics

The performance of the Model Provenance Kit was notably impressive. The tool achieved a 100% recall rate for standard derivatives, including fine-tuning, quantization, and alignment. Likewise, it maintained the same recall rate for cross-organization derivatives, where models may be renamed or republished. The system demonstrated an impressive 100% specificity when it encountered same-tokenizer traps involving independent models utilizing a shared tokenizer while accurately classifying independent reproductions as unrelated.

In total, the toolkit achieved an overall classification accuracy of correctly identifying 107 out of 111 model pairs. Such metrics underline Cisco’s commitment to enhancing transparency in AI supply chains and establishing a foundation for responsible AI governance.

User-Friendly and Accessible

Constructed as a Python-based toolkit, the Model Provenance Kit features a user-friendly command-line interface capable of operating efficiently on standard CPUs without the need for specialized hardware. Users can access the command-line interface in a comparison mode, allowing them to analyze two particular models side-by-side and obtain a detailed similarity breakdown. Alternatively, the scan mode enables users to match a single model against an extensive database to identify the closest lineage candidates.

The Model Provenance Kit is presently available on GitHub and initially features a fingerprint dataset of 150 base models hosted on Hugging Face. This means that organizations can readily access and utilize this tool to bolster the transparency and accountability of their AI systems, ultimately paving the way for a more secure AI future.

Conclusion

As the demand for AI technologies escalates, so too does the critical need for transparent tracking and validation processes. Cisco’s Model Provenance Kit stands at the forefront of this movement, equipping organizations with the tools necessary to ensure responsible AI deployment. With regulatory pressures mounting, the urgency for enhanced model provenance has never been greater, positioning this toolkit as an essential resource for organizations navigating the complex landscape of AI governance.

Source link

Select a plan

Monthly plan

Yearly plan

All plans include

Search for an article