Script to check VirusTotal for many files and folders

Edit 09/04/2021: Clarified installation instructions by adding explanation on how to install the YAML module.

Edit 13/03/2021: Corrected reference for VirusTotal module, thanks to a hint of user Ray O.: The required VT module was incorrectly described as “virustotal-python”. The correct required module is however “virustotal-api“. The setup instructions were updated accordingly.

Edit 10/02/2021: Added new features and described these (additional configurations, automatic config file creation, etc.).

If you followed this blog for a while, you might know that I use NetworkMiner for forensic analyses of captured network traffic. The tool automatically extracts data from pcap files and can calculate hashes for files that were observed in these captured network streams. If you assume that a file could be suspicious, you can manually look up the file hash in NetworkMiner, then navigate to VirusTotal.com and check if the hash belongs to a known malicious piece of software.

Purpose of the Script

Following a process as explained above is ok if you just want to check a handful of files, but if you have to analyze a huge pcap with hundreds of even thousands of files, it is unrealistic to manually check all of these files individually.

For this reason, I wrote this small Python script which recursively goes through all the extracted NetworkMiner files, calculates their SHA256 hash and checks VirusTotal for matches. While the idea for the script arose from my work with NetworkMiner, you can basically you it for any situation where you want to check a large number of files against VirusTotal. There is no dependency for NetworkMiner, all my script does is to recursively crawl through a specific directory path, calculate the hash for each file and print out an alert in case there are enough scan engines that alert on it.

For a test I used a pcap that contains Emotet malware traffic. The full description and the pcap download can be found on Palo Alto’s Unit42 blog. I then loaded the pcap into NetworkMiner, which results in a large number of extracted files under /opt/NetworkMiner_2-6. I then used my script to recursively go through all subfolders and check all files for malicious contents:

As you can see, the script quickly ran through 116 files and identified the one that contains the malicious Emotet code. Of course, analyzing the network traffic can also lead you to the same result, but to get a quick overview without any manual intervention, this script can really safe a lot of time.

Also note that the file does only perform one check per hash, this means that if several files with different names or in different locations are actually of identical content (and therefore have the identical hash), the script will only perform the security check once. This saves valuable analysis time. If you did the checks manually, you wouldn’t know which files are identical (unless you remembered the hash of each file) and would waste time on checking identical files.

Below is another example from another pcap, this time with 354 resources that needed to be checked. Note that the file path of resources extracted by NetworkMiner allows to understand which system was involved, because the IP address and protocol form part of the file path. You can then go back to NetworkMiner and focus on specifically the identified file, IP address and protocol to gain a better understanding of what was happening:

Download and Setup

  1. Download the script from my GitHub page.
  2. Get a VirusTotal API key.
  3. Run the script once, so that it will create a default config file for you. Then, open this file (it should be named config.yaml and be located in the same folder where you saved the script) and replace the values for api_key and file_path with your own VirusTotal API key and the file path where you want the script to check files.
  4. Since this is a Python script, you will need to have the Python runtime environment installed on your computer.
  5. If you never worked with the VirusTotal API before, you will also need to install the VirusTotal library for Python. If you use pip, the installation is easy:
pip install virustotal-api
  1. If you don’t have the YAML parsers PyYAML installed yet, you can run the pip installation via:
pip install PyYAML
  1. If you don’t have suspicious data to be checked on your computer already, you might want to get some captured malware traffic for testing. I explained this step by step in a previous post.
  2. Now that you have suspicious files to be checked in the provided file_path, you can go ahead and run the script. In a command line interface, navigate to the folder where you saved the script and run the following command.

Windows:

python recursive-vt.py

Linux:

python3 ./recursive-vt.py

Tips and Tricks

  • If you want to use different parameters than the ones provided in your config.yaml file, you can simply pass them as a parameter when calling your script, e.g.
python recursive-vt.py --path=C:\my_alternative\search-path

The following options are available as of now (use –help with the script to get the list):

usage: recursive-vt.py [-h] [-p PATH] [-a ALERTLV] [-r recursive]

optional arguments:
  -h, --help            show this help message and exit
  -p PATH, --path PATH  Directory of files to be checked, e.g.
                        C:\SuspiciousFiles\
  -a ALERTLV, --alertlv ALERTLV
                        Percentage of reporting scanners to define a file as
                        malicious, e.g. 0.1
  -r recursive, --recursive recursive
                        Include subfolders into search or not, e.g. True
  • The script will wait 15 seconds between each file if you are going to check more than 4 files at a time. This is simply because the free VirusTotal API is rate-limited to 4 requests per minute. If you are not subject to these restrictions (i.e. if you are paying for a premium API key), you can of course adjust my script accordingly. If you don’t care about clean code, the single edit that will immediately speed up the script is to remove the following line:
time.sleep(waiting_time)
  • The threshold in the script that needs to be surpassed in order to recognize a file as malicious is currently defined as “at least 10% of all scan engines reported this file as malicious”. You can of course use another threshold, by adjusting the value in the config file (config.yaml)
alerting_level: 0.1

If you want to have the alert already if 5% of all scanners report the file, you can simply say:

alerting_level: 0.05

Alternatively, if you only want to work once with a different parameter, you can pass it as a command line parameter:

python recursive-vt.py --alertlv=0.05
  • If you don’t want to include any subfolders in your file check, just switch the value “recursive” in the config file:
recursive: False

Alternatively, you can pass it as a command line parameter:

python recursive-vt.py --recursive=False

This article was written by Fabian

4 thoughts on “Script to check VirusTotal for many files and folders”

  1. Thanks Fabian. Looks like the required python package has changed, solution:
    pip install virustotal-api

    Perhaps add to documentation, such as readme.md, changelog & release notes.

    Initial problem:
    python3 ./recursive-vt.py
    Traceback (most recent call last):
    File “./recursive-vt.py”, line 20, in
    from virus_total_apis import PublicApi as VirusTotalPublicApi
    ModuleNotFoundError: No module named ‘virus_total_apis’

    More information including module name
    https://pypi.org/project/virustotal-api/
    https://github.com/blacktop/virustotal-api/blob/master/virus_total_apis/api.py

    1. Hi Ray,

      Thanks a lot for the detailed feedback including the solution! I checked my installed modules and in fact I think the modules did not change, but it was a slip on my behalf that I was referencing the wrong library. I had considered both of them (virustotal-api; virustotal-python) when I started the project and then when writing the documentation I copy/pasted the wrong one. Sorry for that… I updated the documentation and added some clarification on GitHub as well.

      Thanks again and happy threat hunting,
      Fabian

  2. Thanks for this. For using the script on Linux you have to change, in line 156, \\ to // otherwise it messed up the path to the config.yaml

    1. Hi Thomas,

      Thank you very much for testing and delivering the solution right away! I updated the code so that the path won’t be hardcoded but is using the os.path.join function instead.

      If there are any further issues that should be fixed, please don’t hesitate to let me know, I will try to fix it faster than last time πŸ˜‰

      Regards,
      Fabian

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.