Zeekurity Zen – Part IV: Threat Hunting With Zeek
If you're looking for professional services on this topic or interested in other cybersecurity consulting services, please reach out to me via my Contact page to discuss further.
This is part of the Zeekurity Zen Zeries on building a Zeek (formerly Bro) network sensor.
In our Zeek journey thus far, we’ve:
- Set up Zeek to monitor some network traffic.
- Used Zeek Package Manager to install packages.
- Configured Zeek to send logs to Splunk for analysis.
So you’ve got all your Zeek logs going into your Splunk server. Now what?
This is the fun part — threat hunting. It’s where we realize the potential of combining Zeek’s rich network metadata with Splunk’s powerful analytics for incredible network visibility. Let’s go through several examples of actionable queries you can use today. These should get you started finding notable events in your own network and hopefully inspire you to develop your own useful queries.
Note that the queries below assume Splunk is indexing your Zeek logs to the zeek index and that all sourcetypes are prefixed with zeek_. Modify the search queries below as needed.
In this log type you’ll find all sorts of information on every TCP, UDP, and ICMP connection that’s made. It’s similar to Cisco’s NetFlow and a great way to look for anomalous network events at a high level.
Connections To Destination Ports Above 1024
Recommended visualization: Column Chart
index=zeek sourcetype=zeek_conn id.resp_p>1024 | chart count over service by id.resp_p
Commonly used destination ports are typically below port 1024. Any destination port above this may be worth looking deeper into to see if the traffic is expected. On my own network, I observed a higher number of connections to destination port 5353 which turned out to be legitimate DNS traffic. As you begin to analyze your traffic, start to filter out expected behavior in your search query.
Top 10 Sources By Number Of Connections
Recommended visualization: Column Chart
index=zeek sourcetype=zeek_conn | top id.orig_h | head 10
This is a classic “top talkers” search query that reveals what systems are the originator of the highest number of connections. If there are any clear outliers, you can dive deeper into the connections and systems involved.
Top 10 Sources By Bytes Sent
Recommended visualization: Statistics view
index=zeek sourcetype=zeek_conn | stats values(service) as Services sum(orig_bytes) as B by id.orig_h | sort -B | head 10 | eval MB = round(B/1024/1024,2) | eval GB = round(MB/1024,2) | rename id.orig_h as Source | fields Source B MB GB Services
This is also a “top talkers” search query but this time we’re looking for systems that are transferring high volumes of data. The resulting table will display the top source IP and the amount of data it transferred in bytes, megabytes, and gigabytes. The final column will show the services that were used.
Top 10 Destinations By Number Of Connections
Recommended visualization: Column Chart
index=zeek sourcetype=zeek_conn | top id.resp_h | head 10
This shows which systems were the top destinations on your network. You’d expect your domain controller and DNS server to appear. Again, look for systems that seem out of the ordinary or wouldn’t make sense to have high levels of traffic.
Top 10 Destinations By Bytes Received
Recommended visualization: Statistics view
index=zeek sourcetype=zeek_conn | stats values(service) as Services sum(orig_bytes) as B by id.resp_h | sort -B | head 10 | eval MB = round(B/1024/1024,2) | eval GB = round(MB/1024,2) | rename id.resp_h as Destination | fields Destination B MB GB Services
This is similar to the Top 10 Sources By Bytes Sent query but now focusing on the top destinations for large amounts of data transferred. You’d expect file servers to show up here. Workstations or unfamiliar public IPs would be worth looking further into.
Bytes Transferred Over Time By Service
Recommended visualization: Line Chart
index=zeek sourcetype="zeek_conn" OR sourcetype="zeek_conn_long" | eval orig_megabytes = round(orig_bytes/1024/1024,2) | eval resp_megabytes = round(resp_bytes/1024/1024,2) | eval orig_gigabytes = round(orig_megabytes/1024,2) | eval resp_gigabytes = round(resp_megabytes/1024,2) | timechart sum(orig_gigabytes) AS "Outgoing Gigabytes",sum(resp_gigabytes) AS "Incoming Gigabytes" by service span=1h
This will help you easily spot any spikes in data transfers. Depending on what’s normal traffic levels on your network and the timeframe you’re analyzing, you’ll want to adjust the measure of unit for data and the timeframe accordingly. For example, you could try setting the span=1h for a 7 day timeframe or span=1d for a 30 day timeframe. You could also chart this by protocol or by source / destination. There’s a lot of flexibility depending on what you’re looking for.
Many of the major Internet properties now enforce encryption by default. This is great from a privacy perspective but is often seen as a burden for security professionals who now lack visibility into this traffic. Fortunately, Zeek collects a wealth of information from the initial TLS handshake negotiation since it’s processed in cleartext. So while we can’t see everything about a TLS encrypted communication, we can still get a rough idea of what’s happening.
Rare JA3 Hashes
index=zeek sourcetype=zeek_ssl | rare ja3
Zeek can perform TLS fingerprinting through the use of the JA3 hash, developed by the Salesforce Engineering team. If you installed the ja3 package from Part II, you’ll see the ja3 field appear in your ssl.log. As more network traffic becomes encrypted, JA3 becomes an increasingly valuable way to identify malicious activity on your network without having to perform decryption. A great use of JA3 hashes is cross-referencing them against known malicious JA3 hashes.
Related to the ssl.log, the x509.log captures the certificate information that’s served from a web server trying to encrypt its communications. It’s yet another way to add context to fully encrypted traffic without having to perform full decryption.
index=zeek sourcetype=zeek_x509 | convert num(certificate.not_valid_after) AS cert_expire | eval current_time = now(), cert_expire_readable = strftime(cert_expire,"%Y-%m-%dT%H:%M:%S.%Q"), current_time_readable=strftime(current_time,"%Y-%m-%dT%H:%M:%S.%Q") | where current_time > cert_expire
Wouldn’t it be great to show all the connections on your network to systems with expired certificates? Lucky for us, Zeek automatically captures the dates (in UNIX time) that a certificate is valid for. With some Splunk magic we can determine the current time at the start of our search and compare this to a certificate’s expiration date. Any certificate with an expiration date less than our current time has expired. The query above also adds human-readable versions of the certificate’s expiration date (cert_expire_readable) and current time (current_time_readable) so you can see for yourself.
By far, my favorite log is dns.log, given how much insight you can gain from DNS data. Though there is a push to encrypt DNS, it is largely unencrypted and remains one of the most effective methods for detecting malicious activity on your network. Zeek’s dns.log captures just about everything you’d want to know about a DNS query and its response. This means that even if an attacker has set up a malicious website using HTTPS, you’ll be able to clearly identify the domain being used. Combine this with ssl.log and x509.log, you’ll still know a lot about what’s going on.
Before we get to the queries, I wanted to mention that Splunk runs a great security blog filled with useful information on how to get the most from your security data in Splunk. No, I’m definitely not saying that just because they featured this series as one of their September 2019 monthly security reading staff picks.😛 One of my favorite posts, Hunting Your DNS Dragons, is about collecting and analyzing DNS data for anomalies. Splunk’s examples leverage their Splunk Stream solution to collect and analyze logs. We won’t cover the equivalent Zeek-based queries here, but it should be easily translatable.
Large DNS Queries
index=zeek sourcetype=zeek_dns | eval query_length = len(query) | where query_length > 75 | table _time id.orig_h id.resp_h proto query query_length answer
This calculates the length of a DNS query and filters for any queries with a length greater than whatever threshold deemed notable (e.g. one standard deviation). Large queries may be an indicator of possible DNS tunneling or exfiltration. In the query above, I’ve defined anything with a length greater than 75 characters to be interesting, but you should adjust this to your own liking.
Large DNS Responses
index=zeek sourcetype=zeek_dns | eval answer_length = len(answer) | where answer_length > 475 | table _time id.orig_h id.resp_h proto query answer answer_length
This calculates the length of the response (answer) to a DNS query and filters for any responses with a length greater than whatever threshold deemed notable. In the query above, I’ve set the notable length to anything greater than 475, but feel free to experiment with this on your own network. I’ve personally used this query to identify a misconfigured enterprise DNS server operating as an open relay to the public internet, ultimately being used as part of a DNS amplification attack. Yikes.
Query Responses With NXDOMAIN
index=zeek sourcetype=zeek_dns rcode_name=NXDOMAIN | table _time id.orig_h id.resp_h proto query
This identifies any DNS queries that result in a non-existent domain (NXDOMAIN) response. If you’re seeing a lot of these types of responses from a given system it may be that their DNS settings are misconfigured or they are trying to resolve a malicious domain that is no longer active. It may also be evidence of possible DNS exfiltration.
Top 10 Responding Servers
index=zeek sourcetype=zeek_dns | chart count by id.resp_h | sort -count | head 10
This lists your top 10 DNS servers that are responding to queries on your network. Use this to spot any unauthorized DNS servers.
Top 10 Sources For Queries
index=zeek sourcetype=zeek_dns | chart count by id.orig_h | sort -count | head 10
This lists your top 10 clients making DNS queries on your network. If there’s a particular system making significantly more DNS queries than your other systems, it may be evidence of DNS tunneling or exfiltration.
Yep, Zeek also logs syslog (can it get more meta than this?) and will tell you not only what systems are sending and receiving syslog but the full syslog message itself. Depending on how much syslog goes across your network, this log can grow quickly leading you to turn it off altogether. I leave it enabled to serve as more of a troubleshooting aid.
index=zeek sourcetype=zeek_syslog | where id.resp_h != YOUR_SYSLOG_IP_ADDRESS
In a past corporate life, we configured all of our server and network infrastructure to send syslog to our Splunk server. Following the classic trust but verify model, I wanted to verify whether this was actually true. Upon running the query, we discovered a number of legacy VOIP phones sending syslog to a public IP address belonging to an international fried chicken restaurant chain. Oops.
We’ve only scratched the surface of what’s possible. Consider the questions you have about your own network and how Zeek might help you answer them. Here are a few questions to get you started.
- What systems are using cleartext FTP? What data is being transferred?
- What kind of HTTP methods are most prevalent on my network?
- Is there an unauthorized internal mail relay in use?
- Is a server experiencing an SSH brute force attack?
- Are there suspicious user agents on my network?
- Are there older versions of TLS still in use?
In Part V of this series, we will discuss and set up Zeek’s Intelligence Framework.
If you like my content and want to support me, I'd greatly appreciate you buying me a coffee. Thanks! 🙏
4 thoughts on “Zeekurity Zen – Part IV: Threat Hunting With Zeek”
Hi dear Eric
Thank you for this complete step by step tutorial.
When I use the Top 10 sources by bytes sent I get large number results that is not true, for example i get value of 40GB from a specific source but I get 300MB from that source in my Netflow Analyzer. (this also happens with top 10 destinations by bytes received and also TOP 10 DESTINATIONS BY NUMBER OF CONNECTIONS)
Whats your opinion?
It could be any number of things. I’d start by verifying that your netflow analyzer is looking at the same traffic that zeek is inspecting and logging to Splunk. For example, it could be that your netflow analyzer is only monitoring traffic from a source out to the internet, while Zeek is seeing that plus internal to internal traffic, etc.
Awesome post! Thanks for sharing!!!!
No problem, glad you liked it!