A Fake OpenAI Model Hit #1 Trending on Hugging Face With Bot-Inflated Likes. The Trending Rank Became a Publisher Attestation It Was Never Built to Be.
A developer browsing Hugging Face on May 7 would have seen a repository called Open-OSS/privacy-filter sitting at the top of the trending list, carrying OpenAI's "Privacy Filter" model card copied word for word. The signal a reasonable engineer reads from that placement is straightforward: this is popular, it is current, and other people are already using it, so it is probably the real thing. Every one of those inferences was manufactured, and the model card's setup instructions told the reader to clone the repo and run python loader.py.
The interesting part of this incident is not the malware, which was competent but ordinary. It is that two different ecosystems looked at the same problem (how does a buyer know a published artifact comes from who it claims to come from) and arrived at opposite answers. The package registries built cryptographic provenance. The model registry built a popularity leaderboard, and an attacker turned that leaderboard into a fake identity badge.
The Model-Layer Attack
HiddenLayer's disclosure lays out the chain in detail. The repository reached the #1 trending position with roughly 244,000 displayed downloads and 667 likes in under 18 hours. That download figure is what the platform displayed, and HiddenLayer concluded it was almost certainly inflated rather than a count of real victims; the true number of compromised machines is unknown. The engagement was the tell: of the 667 accounts that liked the repository, 504 matched a "firstname-lastname###" naming template and 153 matched an "adjectivenoun####" pattern, leaving only about ten that looked organic. The likes and downloads existed to move the repository up the ranking, and the ranking existed to tell the next visitor that this artifact was trustworthy. Engagement metrics stop functioning as a trust signal the moment bots can manufacture them at will, and a model registry's leaderboard turned out to be no exception.
What ran when a user followed the instructions was plain Python, not a serialized model. A function named _verify_checksum_integrity() disabled SSL verification, decoded a base64 URL pointing at jsonkeeper.com, extracted a command field, and piped it to a hidden PowerShell process invoked with -ExecutionPolicy Bypass -WindowStyle Hidden. That stage pulled down an update.bat, added Windows Defender exclusions, and created a scheduled task disguised as MicrosoftEdgeUpdateTaskCore. The final payload was a 1.07 MB Rust infostealer that disabled AMSI and ETW, checked for VMs and debuggers, and harvested Chromium and Firefox cookies, Discord tokens, crypto wallets, FileZilla and SSH and VPN credentials, and screenshots. Using jsonkeeper as the command-and-control channel let the attacker rotate the payload without ever touching the repository, and six more malicious repositories were found under an account named "anthfu" sharing the same infrastructure.
The consequence is worth stating plainly because it changes the remediation math: stolen session cookies bypass multi-factor authentication even after a password reset, which is why HiddenLayer advised treating any affected system as fully compromised and reimaging it rather than cleaning it.
Why Every Model Scanner Was Structurally Irrelevant
For two years the industry has been hardening Hugging Face against malicious models, and the work was real. ReversingLabs documented the "nullifAI" technique in February 2025, which hid a reverse shell inside a pickle stream and used 7z compression to break picklescan. JFrog separately found more than 100 models on the hub capable of arbitrary code execution. Hugging Face's own scanning stack runs ClamAV for malware, picklescan for dangerous pickle imports, and TruffleHog for secrets, and Protect AI reported scanning 4.47 million model versions across 1.41 million repositories and flagging roughly 352,000 unsafe or suspicious issues.
None of that touched this attack, and the reason is categorical rather than a matter of scanner quality. Those defenses all inspect file content for a deserialization exploit: picklescan reads imports through pickletools without executing anything, ClamAV matches known-malware signatures, TruffleHog looks for leaked keys. The Open-OSS/privacy-filter repository contained no pickle exploit and no embedded malware in the model weights. It contained a model card that asked a human to run a script, and a script that fetched its real payload from somewhere else. This was a social-engineering attack wearing a model registry as a costume, and the entire malicious-model defense industry was aimed at a different class of threat. Sakshi Grover of IDC put the structural mismatch directly: "Traditional SCA was designed to inspect dependency manifests, libraries, and container images, not the increasingly complex behaviors associated with AI development workflows."
The Package Registries Already Solved This
Now set the model layer next to the package ecosystems, because they faced this same trust problem and answered it years ago. The question "did this artifact come from the publisher it claims?" is exactly the question npm and PyPI have spent the better part of a decade closing. npm now ships signed provenance attestations built on sigstore and tied to the build that produced the package, so a consumer can cryptographically verify that a release came from a specific source repository and workflow rather than from an account that merely chose a convincing name. PyPI added trusted publishing and mandatory two-factor authentication for maintainers, which binds a release to a verified identity rather than a reusable token.
A signature is not a guarantee that the contents are safe, and it is worth being honest about its limits: a provenance attestation can be cryptographically valid and still attest to a compromised build, as the TanStack incident showed when SLSA provenance truthfully recorded a hijacked pipeline. But that is an argument for better-scoped attestations, not against having one at all. A trending rank is a weaker signal by an entire category: it measures attention, and attention is cheap to buy, while a sigstore attestation measures provenance, and provenance is expensive to forge.
The model registry has no equivalent enforced control. There is no mandatory publisher verification and no required signing, which is precisely why a popularity ranking could be pressed into service as an identity signal. The attacker did not break a cryptographic control because there was no cryptographic control to break. He gamed the only trust signal the platform offered, and that signal was a leaderboard.
The fix exists and is in motion, which is the part worth being precise about. OpenSSF published Model Signing v1.0, built on sigstore in collaboration with NVIDIA and HiddenLayer. Adoption is uneven: NVIDIA's NGC catalog and Google's Kaggle have moved to support the standard, while Hugging Face support is still in development. This is the provenance control that was not in place for Open-OSS/privacy-filter and would have given a verifier something real to check.
What a Buyer Should Do Before the Registry Catches Up
Waiting for the platform to ship signing is not a plan, because the gap is open right now and the controls a buyer can apply do not depend on the registry. The practitioner move is to stop trusting any platform-rendered signal, rank, download count, or like total, as evidence of who published an artifact, and to substitute identity and integrity checks you control.
Three concrete steps:
- Record model provenance in an AI bill of materials. Capture the model identity and its SHA-256 hash at acquisition, the same way you would pin a container image by digest. CISA's 2025 minimum elements for an SBOM, published with the G7, added AI-specific fields including model identity, data provenance, neural architecture, framework, and cryptographic component hashes. IDC projects that by 2027, 60% of enterprises deploying agentic AI will require an AI bill of materials; building the inventory now is cheaper than retrofitting it later.
- Pin by hash and verified publisher, never by name or rank. A repository named
Open-OSS copying an OpenAI model card is exactly the failure mode that name-based and rank-based trust produces. Resolve artifacts to a specific digest and a verified publisher identity, and reject anything that resolves only to a popular string.
- Verify signatures before load, where they exist. For artifacts published with OpenSSF Model Signing, validate the sigstore attestation in your ingestion pipeline before the model is ever loaded, and treat an unsigned artifact from an unverified account as untrusted by default rather than trusted until scanned.
This is procurement work as much as engineering work, and it is the same vendor-questionnaire gap that lets self-hosted AI runtimes operate as invisible shadow IT: the contract language that should govern where a model comes from and how its provenance is verified usually does not exist yet. Gartner's Jaishiv Prakash framed the organizational version cleanly: "Enterprises must establish dedicated controls for model sources, approved versions, access, and runtime validation at the registry layer." The controls at that layer are not the same controls that catch a poisoned pickle, and an organization that has invested only in model scanning has covered one class of attack and left this one open.
The lesson from Open-OSS/privacy-filter is not that Hugging Face is dangerous; it is that a popularity ranking is not an identity, and an attacker will use whatever trust signal a platform exposes. The package registries learned to answer "who published this?" with a signature instead of a number. Until the model registry does the same, a security team's job is to record the SHA-256 hash and the verified publisher of every model it loads, and to refuse to let a trending rank stand in for either one.