Zeekurity Zen – Part VI: Zeek File Analysis Framework

Zeekurity Zen – Part VI: Zeek File Analysis Framework

This is part of the Zeekurity Zen Zeries on building a Zeek (formerly Bro) network sensor.

Overview

In our Zeek journey thus far, we’ve:

Zeek’s incredible network traffic visibility goes beyond just protocol analysis.  Using the File Analysis Framework, we can perform automatic file hashing (e.g., MD5, SHA1, SHA256), identify malicious files, and extract suspicious files to disk for forensic analysis.  These capabilities are easily some of Zeek’s most impressive and useful features.

To do this, we’ll walkthrough these steps:

  1. Enable file hashing and Team Cymru’s Malware Hash Registry lookups.
  2. Enable SHA256 hashing for all files.
  3. Understand the contents of files.log.
  4. Enable automatic file extraction of commonly exploited file types.
  5. Discuss a real world example.
  6. Troubleshoot common issues.

Enable file hashing and Team Cymru’s Malware Hash Registry lookups

  1. By default, automatic file hashing and Team Cymru’s Malware Hash Registry lookups are enabled.  To confirm this, open /opt/zeek/share/zeek/site/local.zeek and look for the following lines. Ensure they appear as below and the @load lines are not commented out (e.g., do not have a # symbol in front). Update the file if needed.
    # Enable MD5 and SHA1 hashing for all files.
    @load frameworks/files/hash-all-files
    # Detect SHA1 sums in Team Cymru's Malware Hash Registry.
    @load frameworks/files/detect-MHR

Enable SHA256 hashing for all files

  1. SHA256 hashing is not enabled by default.  We will enable this by creating a simple Zeek script.  As the zeek user, create a new file /opt/zeek/share/zeek/site/hash_sha256.zeek, add the following lines, and then save the file.
    ##! Perform SHA256 hashing on all files.
    @load base/files/hash
    event file_new(f: fa_file)
        {
        Files::add_analyzer(f, Files::ANALYZER_SHA256);
        }
  2. As the zeek user, edit /opt/zeek/share/zeek/site/local.zeek, add the following lines, and then save the file.
    # Add SHA256 hash for files
    @load hash_sha256
  3. As the zeek user, stop zeek.
    zeekctl stop
  4. As the zeek user, apply the new settings and start zeek.
    zeekctl deploy

Understand files.log

  1. Take a look at your own files.log and note the types of files that are hashed.  Below is a sample files.log file in JSON format.
    {
      "ts": 1597593633.224633,
      "fuid": "FB4Sx62yaleypxnhIb",
      "tx_hosts": [
        "23.246.2.148"
      ],
      "rx_hosts": [
        "10.2.2.23"
      ],
      "conn_uids": [
        "CUgYfkjoZLP4BR8Ol"
      ],
      "source": "HTTP",
      "depth": 0,
      "analyzers": [
        "JPEG",
        "SHA1",
        "MD5",
        "SHA256"
      ],
      "mime_type": "image/jpeg",
      "duration": 0.01756000518798828,
      "local_orig": false,
      "is_orig": false,
      "seen_bytes": 58175,
      "total_bytes": 58175,
      "missing_bytes": 0,
      "overflow_bytes": 0,
      "timedout": false,
      "md5": "0671e92b0fb8ffe5724579c229a43689",
      "sha1": "e855561e88f0bc57733eafa05a9d7681d276e55a",
      "sha256": "fc58cf109988af3b3dbc499001ff300584eff638cb120405558d3df69c22fdf4"
    }
    
  2. Let’s examine some of the key fields to better understand how we can use them to analyze files on our own network.  For a full listing, check out the official Zeek documentation.
    • fuid (e.g., FB4Sx62yaleypxnhIb): The file’s unique ID.  Note that this is not the same as the uid commonly found in other Zeek logs.
    • tx_hosts (e.g., 23.246.2.148): The host that transferred the file.
    • rx_hosts (e.g., 10.2.2.23): The host that received the file.
    • conn_uids (e.g., CUgYfkjoZLP4BR8Ol): This is equivalent to the uid or unique ID that’s used to correlate activity across conn.log and other Zeek logs.
    • source (e.g., HTTP): This indicates which protocol the file was transferred over.
    • analyzers (e.g., JPEG, SHA1, MD5, SHA256): The file analyzers used to analyze this file.
    • mime_type (e.g., image/jpeg): What Zeek believes the MIME type of the file is.
    • seen_bytes (e.g., 58175): The number of bytes that Zeek observed.
    • total_bytes (e.g., 58175): The total number of bytes that the file should be.
    • missing_bytes (e.g., 0): The number of bytes that were missing in the analysis, likely due to dropped packets.
    • overflow_bytes (e.g., 0): The number of bytes that were not analyzed either due to overlapping bytes or reassembly errors.
    • md5 (e.g., 0671e92b0fb8ffe5724579c229a43689): The MD5 hash of the file.
    • sha1 (e.g., e855561e88f0bc57733eafa05a9d7681d276e55a): The SHA1 hash of the file.
    • sha256 (e.g., fc58cf109988af3b3dbc499001ff300584eff638cb120405558d3df69c22fdf4): The SHA256 hash of the file.

Enable automatic file extraction

  1. As the zeek user, stop Zeek if it is currently running.
    zeekctl stop
  2. Use zkg to install the file extraction package.
    zkg install zeek/hosom/file-extraction
    The following packages will be INSTALLED:
      zeek/hosom/file-extraction (2.0.3)
    
    Proceed? [Y/n] y
    Installing "zeek/hosom/file-extraction".
    Installed "zeek/hosom/file-extraction" (2.0.3)
    Loaded "zeek/hosom/file-extraction"
  3. Configure file extraction options by editing /opt/zeek/share/zeek/site/file-extraction/config.zeek. Below is a sample config.zeek that will set the directory to store extracted files to /opt/zeek/extracted/ and set the files we want to automatically extract to commonly exploited file types (e.g., Java, PE, Microsoft Office, and PDF).
    # All configuration must occur within this file.
    # All other files may be overwritten during upgrade
    module FileExtraction;
    
    # Configure where extracted files will be stored
    redef path = "/opt/zeek/extracted/";
    
    # Configure 'plugins' that can be loaded
    # these are shortcut modules to specify common
    # file extraction policies. Example:
    # @load ./plugins/extract-pe.bro
    @load ./plugins/extract-common-exploit-types
  4. Create the directory to save all extracted files. It must match what we set in config.zeek.
    mkdir /opt/zeek/extracted
  5. As the zeek user, apply the new settings and start zeek.
    zeekctl deploy

Real World Example

So how could we use this in the real world? Imagine a user was sent a malicious link via their email that claimed to be this quarter’s employee bonus payouts.  The user proceeds to click on this link and immediately downloads a file.  We want to know whether the file was malicious and if so, determine what actions we can take to prevent other systems from downloading the same file.  Since we’ve got our Zeek instance automatically configured to hash all files, extract Windows PE files, and perform Team Cymru Malware Hash Registry lookups, we’re confident that we can perform a thorough analysis of the event.

  1. We’re first alerted to suspicious activity through an alert raised in notice.log. The log entry below tells us the file’s MIME type is “application/x-dosexec”, that the notice is in regards to a “TeamCymruMalwareHashRegistry::Match”, and that there’s a Team Cymru detection rate of 38%. Additionally, the notice provides a direct VirusTotal link to the suspicious file that shows virtually every scanner detecting this file as malicious.  From the detection names, we see that this is related to the WannaCry ransomware. The notice also conveniently tells us where the file originated from (149.202.220.122) and which host downloaded the file (10.2.2.23).
    {
      "ts": 1597850503.829048,
      "uid": "CO3tTx2lknzNvQe7P3",
      "id.orig_h": "10.2.2.23",
      "id.orig_p": 56197,
      "id.resp_h": "149.202.220.122",
      "id.resp_p": 80,
      "fuid": "F1sCdV2rXJ9afKdlP2",
      "file_mime_type": "application/x-dosexec",
      "file_desc": "http://s000.tinyupload.com/download.php?file_id=91645583928538055155&t=9164558392853805515507216",
      "proto": "tcp",
      "note": "TeamCymruMalwareHashRegistry::Match",
      "msg": "Malware Hash Registry Detection rate: 38%  Last seen: 2020-06-05 08:29:39",
      "sub": "https://www.virustotal.com/en/search/?query=5ff465afaabcbf0150d1a3ab2c2e74f3a4426467",
      "src": "10.2.2.23",
      "dst": "149.202.220.122",
      "p": 80,
      "peer_descr": "worker-1-2",
      "actions": [
        "Notice::ACTION_LOG"
      ],
      "suppress_for": 3600
    }
  2. Using the uid (CO3tTx2lknzNvQe7P3) from the notice, let’s search our logs for related activity and see what comes up.  You could search for this in Splunk or use grep to search through your raw logs.  Assuming we use grep, we find related activity in conn.log, http.log, and files.log as shown below.
    • conn.log
      First, we confirm the connection metadata detailed in notice.log and observe that the file was transferred via HTTP.

      {
        "ts": 1597850493.368458,
        "uid": "CO3tTx2lknzNvQe7P3",
        "id.orig_h": "10.2.2.23",
        "id.orig_p": 56197,
        "id.resp_h": "149.202.220.122",
        "id.resp_p": 80,
        "proto": "tcp",
        "service": "http",
        "duration": 113.54712104797363,
        "orig_bytes": 624,
        "resp_bytes": 3514699,
        "conn_state": "RSTR",
        "local_orig": true,
        "local_resp": false,
        "missed_bytes": 0,
        "history": "ShADadfr",
        "orig_pkts": 1398,
        "orig_ip_bytes": 73512,
        "resp_pkts": 2433,
        "resp_ip_bytes": 3641211
      }
    • http.log
      Next, we see that the user (10.2.2.23) made a GET request to s000.tinyupload.com to download a file.  Note the file information that Zeek includes in this log, the file’s unique ID (F1sCdV2rXJ9afKdlP2), the file’s name (bonus.exe), and the file’s MIME type (application/x-dosexec).

      {
        "ts": 1597850493.556732,
        "uid": "CO3tTx2lknzNvQe7P3",
        "id.orig_h": "10.2.2.23",
        "id.orig_p": 56197,
        "id.resp_h": "149.202.220.122",
        "id.resp_p": 80,
        "trans_depth": 1,
        "method": "GET",
        "host": "s000.tinyupload.com",
        "uri": "/download.php?file_id=91645583928538055155&t=9164558392853805515507216",
        "referrer": "http://s000.tinyupload.com/index.php?file_id=91645583928538055155",
        "version": "1.1",
        "user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.125 Safari/537.36",
        "request_body_len": 0,
        "response_body_len": 3514368,
        "status_code": 200,
        "status_msg": "OK",
        "tags": [],
        "resp_fuids": [
          "F1sCdV2rXJ9afKdlP2"
        ],
        "resp_filenames": [
          "bonus.exe"
        ],
        "resp_mime_types": [
          "application/x-dosexec"
        ]
      }
    • files.log
      Finally, we again see the same file information that the http.log provided — unique ID, name, and MIME type.  But now we also see the MD5, SHA1, and SHA256 hashes of the file.  Since we’ve also enabled automatic file extraction for commonly exploited file types, we see a new field named “extracted” that tells us where Zeek extracted a copy of the file to (/opt/zeek/extracted/HTTP-F1sCdV2rXJ9afKdlP2.exe).  Note that the filename is formatted SOURCE-fuid.  We confirm that “seen_bytes” matches “total_bytes” and that there are zero “missing_bytes”, ultimately telling us that Zeek was able to successfully analyze and fully extract the file in its entirety.

      {
        "ts": 1597850493.672357,
        "fuid": "F1sCdV2rXJ9afKdlP2",
        "tx_hosts": [
          "149.202.220.122"
        ],
        "rx_hosts": [
          "10.2.2.23"
        ],
        "conn_uids": [
          "CO3tTx2lknzNvQe7P3"
        ],
        "source": "HTTP",
        "depth": 0,
        "analyzers": [
          "SHA1",
          "EXTRACT",
          "PE",
          "MD5",
          "SHA256"
        ],
        "mime_type": "application/x-dosexec",
        "filename": "bonus.exe",
        "duration": 10.055749893188477,
        "local_orig": false,
        "is_orig": false,
        "seen_bytes": 3514368,
        "total_bytes": 3514368,
        "missing_bytes": 0,
        "overflow_bytes": 0,
        "timedout": false,
        "md5": "84c82835a5d21bbcf75a61706d8ab549",
        "sha1": "5ff465afaabcbf0150d1a3ab2c2e74f3a4426467",
        "sha256": "ed01ebfbc9eb5bbea545af4d01bf5f1071661840480439c6e5babe8e080e41aa",
        "extracted": "/opt/zeek/extracted/HTTP-F1sCdV2rXJ9afKdlP2.exe",
        "extracted_cutoff": false
      }
  3. From here, we can use our endpoint security systems to determine if the user executed the file or examine additional Zeek logs to identify subsequent suspicious behavior.  To prevent other systems from downloading this file, we can block the identified file hashes or IP/URL in our network and endpoint security platforms.  Additionally, since we have a copy of the raw file we can perform deeper analysis and generate additional IOCs and threat intelligence, further strengthening our defenses.  Pretty cool, huh?

Troubleshooting

If you find that files aren’t properly captured in files.log or automatically extracted, there are two likely causes:

  1. You’re not actually performing full packet capture. In Part I of this series, we enabled network optimizations to ensure your sensor is performing full packet capture and not utilizing any “NIC offloading functions.”  Refer to the steps in the section titled “Enable network service and disable NIC offloading functions” and confirm they’re applied properly on your system.  Zeek will typically warn you in reporter.log if it believes that NIC offloading functions have not been disabled.
  2. You’re dropping packets. This could be due to an underpowered Zeek sensor or an overwhelmed network mirror/tap.  Make sure your Zeek sensor uses appropriately sized hardware for the traffic it’s monitoring and that your network mirror/TAP is capable of handling your network’s traffic volume.

Up Next

In Part VII of this series, we’ll look at how to analyze and gain visibility into encrypted traffic.


Stuff I Like

Web Hosting: SiteGround

ericooi.com is proudly hosted by SiteGround. Performance and customer service are top notch. Quick and easy https implementation via built-in Let's Encrypt integration.

VPN: Private Internet Access

When I'm using a public internet access point, I use Private Internet Access to secure my connections. Easy to use, fast speeds, and no logs.


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.