2 min read
|
Saved February 14, 2026
|
Copied!
Do you care about this?
This GitHub repository provides an open-source dataset of over 20,000 identified malicious software packages. It includes samples from npm, PyPI, and IDE extensions, along with tools for analysis. Users can check package versions for malicious intent and must handle the software with caution.
If you do, here's more
The repository hosts an open-source dataset featuring over 20,000 identified malicious software packages, primarily sourced from ecosystems like npm, PyPI, and IDE extensions. Datadog spearheaded this initiative to enhance software supply-chain security, with most packages flagged by their GuardDog tool. The malicious packages are stored in an encrypted ZIP file, with the password "infected." The date in the file name indicates when the malware was discovered, not when it was published.
Samples are categorized by ecosystem and divided into two types: compromised benign packages and those created with malicious intent. Each ecosystem's subdirectory includes a manifest.json file, which helps determine the malicious status of a package. If a package isn't listed in the manifest, its status remains unclear. A null entry in the manifest signifies that all versions are malicious, while other entries detail specific compromised versions.
The dataset is shared under the Apache-2.0 license, requiring attribution for use. Datadog emphasizes that the included software is actively malicious, advising against running any packages on personal machines. While the dataset aims to provide a comprehensive view, it may suffer from bias since most samples were identified using a single ruleset. Datadog plans to expand the dataset regularly and has manually reviewed each package to ensure accuracy. Contributions aren't currently accepted, but feedback and findings are welcome via email.
Questions about this article
No questions yet.