Zeekurity Zen – Part VI: Zeek File Analysis Framework

Zeekurity Zen – Part VI: Zeek File Analysis Framework

If you're looking for professional services on this topic or interested in other cybersecurity consulting services, please reach out to me via my Contact page to discuss further.

This is part of the Zeekurity Zen Zeries on building a Zeek (formerly Bro) network sensor.

Overview

In our Zeek journey thus far, we’ve:

Zeek’s incredible network traffic visibility goes beyond just protocol analysis.  Using the File Analysis Framework, we can perform automatic file hashing (e.g., MD5, SHA1, SHA256), identify malicious files, and extract suspicious files to disk for forensic analysis.  These capabilities are easily some of Zeek’s most impressive and useful features.

To do this, we’ll walkthrough these steps:

  1. Enable file hashing and Team Cymru’s Malware Hash Registry lookups.
  2. Enable SHA256 hashing for all files.
  3. Understand the contents of files.log.
  4. Enable automatic file extraction of commonly exploited file types.
  5. Discuss a real world example.
  6. Troubleshoot common issues.

Enable file hashing and Team Cymru’s Malware Hash Registry lookups

  1. By default, automatic file hashing and Team Cymru’s Malware Hash Registry lookups are enabled.  To confirm this, open /opt/zeek/share/zeek/site/local.zeek and look for the following lines. Ensure they appear as below and the @load lines are not commented out (e.g., do not have a # symbol in front). Update the file if needed.
    # Enable MD5 and SHA1 hashing for all files.
    @load frameworks/files/hash-all-files
    # Detect SHA1 sums in Team Cymru's Malware Hash Registry.
    @load frameworks/files/detect-MHR

Enable SHA256 hashing for all files

  1. SHA256 hashing is not enabled by default.  We will enable this by creating a simple Zeek script.  As the zeek user, create a new file /opt/zeek/share/zeek/site/hash_sha256.zeek, add the following lines, and then save the file.
    ##! Perform SHA256 hashing on all files.
    @load base/files/hash
    event file_new(f: fa_file)
        {
        Files::add_analyzer(f, Files::ANALYZER_SHA256);
        }
  2. As the zeek user, edit /opt/zeek/share/zeek/site/local.zeek, add the following lines, and then save the file.
    # Add SHA256 hash for files
    @load hash_sha256
  3. As the zeek user, stop zeek.
    zeekctl stop
  4. As the zeek user, apply the new settings and start zeek.
    zeekctl deploy

Understand files.log

  1. Take a look at your own files.log and note the types of files that are hashed.  Below is a sample files.log file in JSON format.
    {
      "ts": 1597593633.224633,
      "fuid": "FB4Sx62yaleypxnhIb",
      "tx_hosts": [
        "23.246.2.148"
      ],
      "rx_hosts": [
        "10.2.2.23"
      ],
      "conn_uids": [
        "CUgYfkjoZLP4BR8Ol"
      ],
      "source": "HTTP",
      "depth": 0,
      "analyzers": [
        "JPEG",
        "SHA1",
        "MD5",
        "SHA256"
      ],
      "mime_type": "image/jpeg",
      "duration": 0.01756000518798828,
      "local_orig": false,
      "is_orig": false,
      "seen_bytes": 58175,
      "total_bytes": 58175,
      "missing_bytes": 0,
      "overflow_bytes": 0,
      "timedout": false,
      "md5": "0671e92b0fb8ffe5724579c229a43689",
      "sha1": "e855561e88f0bc57733eafa05a9d7681d276e55a",
      "sha256": "fc58cf109988af3b3dbc499001ff300584eff638cb120405558d3df69c22fdf4"
    }
    
  2. Let’s examine some of the key fields to better understand how we can use them to analyze files on our own network.  For a full listing, check out the official Zeek documentation.
    • fuid (e.g., FB4Sx62yaleypxnhIb): The file’s unique ID.  Note that this is not the same as the uid commonly found in other Zeek logs.
    • tx_hosts (e.g., 23.246.2.148): The host that transferred the file.
    • rx_hosts (e.g., 10.2.2.23): The host that received the file.
    • conn_uids (e.g., CUgYfkjoZLP4BR8Ol): This is equivalent to the uid or unique ID that’s used to correlate activity across conn.log and other Zeek logs.
    • source (e.g., HTTP): This indicates which protocol the file was transferred over.
    • analyzers (e.g., JPEG, SHA1, MD5, SHA256): The file analyzers used to analyze this file.
    • mime_type (e.g., image/jpeg): What Zeek believes the MIME type of the file is.
    • seen_bytes (e.g., 58175): The number of bytes that Zeek observed.
    • total_bytes (e.g., 58175): The total number of bytes that the file should be.
    • missing_bytes (e.g., 0): The number of bytes that were missing in the analysis, likely due to dropped packets.
    • overflow_bytes (e.g., 0): The number of bytes that were not analyzed either due to overlapping bytes or reassembly errors.
    • md5 (e.g., 0671e92b0fb8ffe5724579c229a43689): The MD5 hash of the file.
    • sha1 (e.g., e855561e88f0bc57733eafa05a9d7681d276e55a): The SHA1 hash of the file.
    • sha256 (e.g., fc58cf109988af3b3dbc499001ff300584eff638cb120405558d3df69c22fdf4): The SHA256 hash of the file.

Enable automatic file extraction

  1. As the zeek user, stop Zeek if it is currently running.
    zeekctl stop
  2. Use zkg to install the file extraction package.
    zkg install zeek/hosom/file-extraction
    The following packages will be INSTALLED:
      zeek/hosom/file-extraction (2.0.3)
    Proceed? [Y/n] y
    Installing "zeek/hosom/file-extraction".
    Installed "zeek/hosom/file-extraction" (2.0.3)
    Loaded "zeek/hosom/file-extraction"
  3. Configure file extraction options by editing /opt/zeek/share/zeek/site/file-extraction/config.zeek. Below is a sample config.zeek that will set the directory to store extracted files to /opt/zeek/extracted/ and set the files we want to automatically extract to commonly exploited file types (e.g., Java, PE, Microsoft Office, and PDF).
    # All configuration must occur within this file.
    # All other files may be overwritten during upgrade
    module FileExtraction;
    # Configure where extracted files will be stored
    redef path = "/opt/zeek/extracted/";
    # Configure 'plugins' that can be loaded
    # these are shortcut modules to specify common
    # file extraction policies. Example:
    # @load ./plugins/extract-pe.bro
    @load ./plugins/extract-common-exploit-types
  4. Create the directory to save all extracted files. It must match what we set in config.zeek.
    mkdir /opt/zeek/extracted
  5. If this is your first time installing a Zeek package, edit /opt/zeek/share/zeek/site/local.zeek and add the following lines to the bottom. This will load all packages you’ve installed. You will only need to do this once.
    # Load Zeek Packages
    @load packages
  6. As the zeek user, apply the new settings and start zeek.
    zeekctl deploy

Real World Example

So how could we use this in the real world? Imagine a user was sent a malicious link via their email that claimed to be this quarter’s employee bonus payouts.  The user proceeds to click on this link and immediately downloads a file.  We want to know whether the file was malicious and if so, determine what actions we can take to prevent other systems from downloading the same file.  Since we’ve got our Zeek instance automatically configured to hash all files, extract Windows PE files, and perform Team Cymru Malware Hash Registry lookups, we’re confident that we can perform a thorough analysis of the event.

  1. We’re first alerted to suspicious activity through an alert raised in notice.log. The log entry below tells us the file’s MIME type is “application/x-dosexec”, that the notice is in regards to a “TeamCymruMalwareHashRegistry::Match”, and that there’s a Team Cymru detection rate of 38%. Additionally, the notice provides a direct VirusTotal link to the suspicious file that shows virtually every scanner detecting this file as malicious.  From the detection names, we see that this is related to the WannaCry ransomware. The notice also conveniently tells us where the file originated from (149.202.220.122) and which host downloaded the file (10.2.2.23).
    {
      "ts": 1597850503.829048,
      "uid": "CO3tTx2lknzNvQe7P3",
      "id.orig_h": "10.2.2.23",
      "id.orig_p": 56197,
      "id.resp_h": "149.202.220.122",
      "id.resp_p": 80,
      "fuid": "F1sCdV2rXJ9afKdlP2",
      "file_mime_type": "application/x-dosexec",
      "file_desc": "http://s000.tinyupload.com/download.php?file_id=91645583928538055155&t=9164558392853805515507216",
      "proto": "tcp",
      "note": "TeamCymruMalwareHashRegistry::Match",
      "msg": "Malware Hash Registry Detection rate: 38%  Last seen: 2020-06-05 08:29:39",
      "sub": "https://www.virustotal.com/en/search/?query=5ff465afaabcbf0150d1a3ab2c2e74f3a4426467",
      "src": "10.2.2.23",
      "dst": "149.202.220.122",
      "p": 80,
      "peer_descr": "worker-1-2",
      "actions": [
        "Notice::ACTION_LOG"
      ],
      "suppress_for": 3600
    }
  2. Using the uid (CO3tTx2lknzNvQe7P3) from the notice, let’s search our logs for related activity and see what comes up.  You could search for this in Splunk or use grep to search through your raw logs.  Assuming we use grep, we find related activity in conn.log, http.log, and files.log as shown below.
    • conn.log
      First, we confirm the connection metadata detailed in notice.log and observe that the file was transferred via HTTP.

      {
        "ts": 1597850493.368458,
        "uid": "CO3tTx2lknzNvQe7P3",
        "id.orig_h": "10.2.2.23",
        "id.orig_p": 56197,
        "id.resp_h": "149.202.220.122",
        "id.resp_p": 80,
        "proto": "tcp",
        "service": "http",
        "duration": 113.54712104797363,
        "orig_bytes": 624,
        "resp_bytes": 3514699,
        "conn_state": "RSTR",
        "local_orig": true,
        "local_resp": false,
        "missed_bytes": 0,
        "history": "ShADadfr",
        "orig_pkts": 1398,
        "orig_ip_bytes": 73512,
        "resp_pkts": 2433,
        "resp_ip_bytes": 3641211
      }
    • http.log
      Next, we see that the user (10.2.2.23) made a GET request to s000.tinyupload.com to download a file.  Note the file information that Zeek includes in this log, the file’s unique ID (F1sCdV2rXJ9afKdlP2), the file’s name (bonus.exe), and the file’s MIME type (application/x-dosexec).

      {
        "ts": 1597850493.556732,
        "uid": "CO3tTx2lknzNvQe7P3",
        "id.orig_h": "10.2.2.23",
        "id.orig_p": 56197,
        "id.resp_h": "149.202.220.122",
        "id.resp_p": 80,
        "trans_depth": 1,
        "method": "GET",
        "host": "s000.tinyupload.com",
        "uri": "/download.php?file_id=91645583928538055155&t=9164558392853805515507216",
        "referrer": "http://s000.tinyupload.com/index.php?file_id=91645583928538055155",
        "version": "1.1",
        "user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.125 Safari/537.36",
        "request_body_len": 0,
        "response_body_len": 3514368,
        "status_code": 200,
        "status_msg": "OK",
        "tags": [],
        "resp_fuids": [
          "F1sCdV2rXJ9afKdlP2"
        ],
        "resp_filenames": [
          "bonus.exe"
        ],
        "resp_mime_types": [
          "application/x-dosexec"
        ]
      }
    • files.log
      Finally, we again see the same file information that the http.log provided — unique ID, name, and MIME type.  But now we also see the MD5, SHA1, and SHA256 hashes of the file.  Since we’ve also enabled automatic file extraction for commonly exploited file types, we see a new field named “extracted” that tells us where Zeek extracted a copy of the file to (/opt/zeek/extracted/HTTP-F1sCdV2rXJ9afKdlP2.exe).  Note that the filename is formatted SOURCE-fuid.  We confirm that “seen_bytes” matches “total_bytes” and that there are zero “missing_bytes”, ultimately telling us that Zeek was able to successfully analyze and fully extract the file in its entirety.

      {
        "ts": 1597850493.672357,
        "fuid": "F1sCdV2rXJ9afKdlP2",
        "tx_hosts": [
          "149.202.220.122"
        ],
        "rx_hosts": [
          "10.2.2.23"
        ],
        "conn_uids": [
          "CO3tTx2lknzNvQe7P3"
        ],
        "source": "HTTP",
        "depth": 0,
        "analyzers": [
          "SHA1",
          "EXTRACT",
          "PE",
          "MD5",
          "SHA256"
        ],
        "mime_type": "application/x-dosexec",
        "filename": "bonus.exe",
        "duration": 10.055749893188477,
        "local_orig": false,
        "is_orig": false,
        "seen_bytes": 3514368,
        "total_bytes": 3514368,
        "missing_bytes": 0,
        "overflow_bytes": 0,
        "timedout": false,
        "md5": "84c82835a5d21bbcf75a61706d8ab549",
        "sha1": "5ff465afaabcbf0150d1a3ab2c2e74f3a4426467",
        "sha256": "ed01ebfbc9eb5bbea545af4d01bf5f1071661840480439c6e5babe8e080e41aa",
        "extracted": "/opt/zeek/extracted/HTTP-F1sCdV2rXJ9afKdlP2.exe",
        "extracted_cutoff": false
      }
  3. From here, we can use our endpoint security systems to determine if the user executed the file or examine additional Zeek logs to identify subsequent suspicious behavior.  To prevent other systems from downloading this file, we can block the identified file hashes or IP/URL in our network and endpoint security platforms.  Additionally, since we have a copy of the raw file we can perform deeper analysis and generate additional IOCs and threat intelligence, further strengthening our defenses.  Pretty cool, huh?

Troubleshooting

If you find that files aren’t properly captured in files.log or automatically extracted, there are two likely causes:

  1. You’re not actually performing full packet capture. In Part I of this series, we enabled network optimizations to ensure your sensor is performing full packet capture and not utilizing any “NIC offloading functions.”  Refer to the steps in the section titled “Enable network service and disable NIC offloading functions” and confirm they’re applied properly on your system.  Zeek will typically warn you in reporter.log if it believes that NIC offloading functions have not been disabled.
  2. You’re dropping packets. This could be due to an underpowered Zeek sensor or an overwhelmed network mirror/tap.  Make sure your Zeek sensor uses appropriately sized hardware for the traffic it’s monitoring and that your network mirror/TAP is capable of handling your network’s traffic volume.

Up Next

In Part VII of this series, we’ll look at how to analyze and gain visibility into encrypted traffic.


If you're looking for professional services on this topic or interested in other cybersecurity consulting services, please reach out to me via my Contact page to discuss further.



2 thoughts on “Zeekurity Zen – Part VI: Zeek File Analysis Framework”

  • Eric,
    I am a big fan of your online work and the good spirit of sharing your knowledge.
    I ran into some issues while doing the Zeek series. I followed your steps and found some issues using the zkg file extraction package. I did the steps below.

    zkg install file-extraction
    The following packages will be INSTALLED:
    zeek/hosom/file-extraction (2.0.3)

    Proceed? [Y/n] y
    Installing “zeek/hosom/file-extraction”
    Installed “zeek/hosom/file-extraction” (2.0.3)
    Loaded “zeek/hosom/file-extraction”

    Then I tested the package to confirm it installed properly.

    zkg load file-extraction
    The following installed dependencies could not be loaded for “zeek/hosom/file-extraction”.
    zeek: Loading dependency failed. Package not installed. //will this prevent me of extracting the download files to /opt/zeek/extracted folder? //

    Failed to load “zeek/hosom/file-extraction”: True

    On the same note:

    Will the “ENABLE FILE HASHING AND TEAM CYMRU’S MALWARE HASH REGISTRY LOOKUPS” only work for files downloaded from HTTP websites?
    I did multiple tests from websites using HTTPS and the files.log does not reflect the intended file; the extracted folder neither has data.

    I confirmed the offloading parameters from the NIC are as stated in your part I, and my zeek capstats is not yelling packet drops.
    I am using a VM (CentoOS 8) from my ESXi server.

    Any guidance is appreciated.

    Regards.
    DR

    • Hi DR,

      Thanks for the kind words on the blog! It’s always great to feel appreciated. 🙂

      I tried that same “zkg load” command on my instance and got the same output as you did. Assuming you’ve followed the steps I’ve outlined to install the package, a better way to confirm that it has loaded successfully is to grep for it (e.g. cat /opt/zeek/logs/current/loaded_scripts.log | grep file-extraction) in your logs folder as soon as you start zeek. If it’s there, you should be good to go. If it isn’t, double-check that you’ve added @load packages in your /opt/zeek/share/zeek/site/local.zeek as detailed in Part II. But since you brought this up, I realized that some people may not have completed that step since I listed it as optional, so I’ve updated this page with that step, too (see step 5 under “ENABLE AUTOMATIC FILE EXTRACTION”).

      You are correct, those lookups will only work on cleartext HTTP traffic. Zeek does not have a way to natively decrypt traffic on its own. Check out the next part of my series for techniques on analyzing encrypted traffic.

      Hope that helps!
      Eric

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.