How to Build Your Own DNS Sinkhole and DNS Logs Monitoring System

Ronny Thammasathiti & James Kelly
Oct 26, 2023
12 min read

Updated October 26, 2023; blog post originally published on February 5, 2018 by Ben Hughes.

Recently, I’ve been playing around with Pi-hole, an increasingly popular network adblocker designed to run on a Raspberry Pi. Pi-hole functions as your network’s DNS server, allowing it to block ad domains, malicious domains, and other domains (or TLD wildcards) that you add to its block lists -- effectively turning it into an open source, lightweight DNS sinkhole. This blocking occurs at the network level, meaning blocked resources never even reach your endpoint’s browser. Along with caching, this can increase website load performance and block ads that are difficult to block client-side such as in-app ads on an Android or iOS device.

Pi-hole also logs each DNS event, including domain resolutions and blocks. DNS logs are a gold mine that is sadly often overlooked by network defenders. Examples of malicious network traffic that can be identified in DNS logs include command and control (C2) traffic from a variety of malware including ransomware; malicious ads and redirects; exploit kits; phishing; typosquatting attacks; DNS hijacking; denial of service (DoS) attacks; and DNS tunneling.

While BIND and Windows DNS servers are perhaps more popular DNS resolver implementations, Pi-hole uses the very capable and lightweight dnsmasq as its DNS server. And while Pi-hole includes a nice web-based admin interface, I started to experiment with shipping its dnsmasq logs to the Elastic (AKA ELK) stack for security monitoring and threat hunting purposes. In the end, I quickly prototyped a Pi-hole based DNS sinkhole deployment, DNS log pipeline, and accompanying DNS log monitoring system thanks to Pi-hole’s dnsmasq implementation, the ELK (Elasticsearch, Logstash, and Kibana) stack, and Beats. This project is still a work in progress in my lab, but I thought I would share what I’ve learned so far. The steps are not difficult, but this guide assumes you have at least a basic familiarity with Linux commands, DNS logs, and the ELK stack.

Pi-hole

Pi-Hole is a DNS server / network adblocker / DNS sinkhole that is designed to run on minimal hardware including the Raspberry Pi. If you plan to use a Raspberry Pi keep in mind the DNS logs must be shipped using RSyslog to Logstash (Another blog post will provide steps on how to do this). We will be covering the setup process for a Pi-Hole VM and Raspberry Pi OS. I installed Pi-hole in a Ubuntu 22.04 Server VM. Typically, Pi-hole runs fine with only 1 CPU core and 512 MB RAM, though I allocated more to account for log shipping overhead. Pi-hole is suitable for SOHO and SMB networks, with reports of success in networks containing 100s of endpoints.

Pi-hole installation and configuration are well documented elsewhere, so I won’t dwell on the details here. You can actually install Pi-hole with a 1-line command (curl -sSL https://install.pi-hole.net | bash), though of course it is always a good security practice to review the script before executing it. Running the install script walks you through the initial setup, where you can assign a static IP address to the Pi-hole server, choose your upstream DNS resolution service (I recommend a security and privacy oriented solution such as OpenDNS or Quad9), and enable the web admin interface.

Pi-hole: A Black Hole for Internet Advertisements

Credit: https://pi-hole.net/

The admin password is displayed at the end of the install script, though you can always change it later. Once installed, you can review the excellent Pi-hole dashboard and take care of most administrative tasks by logging into its web interface at:

http://pi.hole/admin/

Once Pi-hole is up and running, you need to point your endpoints to your Pi-hole server’s IP address (which should be static) so that they will use the Pi-hole for DNS resolution going forward. You can set this manually per device. You can also configure most routers to use the Pi-hole as the DNS server. Beyond functioning as your network’s DNS server, Pi-hole (again thanks to dnsmasq) can also be a DHCP server. There are various pros and cons such as endpoint IP visibility that apply to these different deployment options, so read up on Pi-hole’s relevant documentation for more details. By default, Pi-hole leverages several ad blocklists, though you are free to add your own lists and domains or wildcards via the web interface or command line.

DNS Logs Pipeline

By default, Pi-hole stores its dnsmasq logs at /var/log/pihole/pihole.log. Beyond glancing at the Dashboard metrics and top lists, there are several ways to manually review these logs, including the “Query Log” area of the web interface, “Tail pihole.log” under Tools in the web interface, and directly via SSH access to the underlying server running Pi-hole (e.g., tail /var/log/pihole/pihole.log -f). Based on the Dashboard metrics, I know that on most days, my Pi-hole lab deployment blocks an average of about 10% of the total domain resolutions, and Windows 10 telemetry subdomains are often the most blocked DNS requests (which is great because it is otherwise somewhere between extremely difficult and impossible to disable such Win10 telemetry). While such information is useful, we can ship these valuable DNS logs to a centralized location for log enrichment and monitoring purposes, including security analytics and threat hunting.

Below is an example of raw dnsmasq logs from pihole.log:

Oct 25 12:28:13 dnsmasq[616]: 195649 192.168.1.8/63106 query[A] www.cnn.com from 192.168.1.8
Oct 25 12:28:13 dnsmasq[616]: 195649 192.168.1.8/63106 forwarded www.cnn.com to 9.9.9.9
Oct 25 12:28:13 dnsmasq[616]: 195649 192.168.1.8/63106 reply www.cnn.com is <CNAME>
Oct 25 12:28:13 dnsmasq[616]: 195649 192.168.1.8/63106 reply cnn-tls.map.fastly.net is 146.75.35.5
Oct 25 12:28:13 dnsmasq[616]: 195650 192.168.1.8/1388 query[A] cnn-tls.map.fastly.net from 192.168.1.8
Oct 25 12:28:13 dnsmasq[616]: 195650 192.168.1.8/1388 cached cnn-tls.map.fastly.net is 146.75.35.5
Oct 25 12:28:13 dnsmasq[616]: 195651 192.168.1.8/15436 query[AAAA] cnn-tls.map.fastly.net from 192.168.1.8
Oct 25 12:28:13 dnsmasq[616]: 195651 192.168.1.8/15436 forwarded cnn-tls.map.fastly.net to 9.9.9.9
Oct 25 12:28:13 dnsmasq[616]: 195651 192.168.1.8/15436 reply cnn-tls.map.fastly.net is 2a04:4e42:77::773
Oct 25 12:28:13 dnsmasq[616]: 195652 192.168.1.8/53145 query[A] incoming.telemetry.mozilla.org from 192.168.1.8
Oct 25 12:28:13 dnsmasq[616]: 195652 192.168.1.8/53145 gravity blocked incoming.telemetry.mozilla.org is 0.0.0.0
Oct 25 12:28:13 dnsmasq[616]: 195653 192.168.1.8/31567 query[A] cdn.optimizely.com from 192.168.1.8
Oct 25 12:28:13 dnsmasq[616]: 195653 192.168.1.8/31567 gravity blocked cdn.optimizely.com is 0.0.0.0
Oct 25 12:28:13 dnsmasq[616]: 195654 192.168.1.8/1572 query[A] get.s-onetag.com from 192.168.1.8
Oct 25 12:28:13 dnsmasq[616]: 195654 192.168.1.8/1572 gravity blocked get.s-onetag.com is 0.0.0.0
Oct 25 12:28:13 dnsmasq[616]: 195655 192.168.1.8/29028 query[A] tpc.googlesyndication.com from 192.168.1.8
Oct 25 12:28:13 dnsmasq[616]: 195655 192.168.1.8/29028 gravity blocked tpc.googlesyndication.com is 0.0.0.0
Oct 25 12:28:13 dnsmasq[616]: 195656 192.168.1.8/56785 query[A] pagead2.googlesyndication.com from 192.168.1.8
Oct 25 12:28:13 dnsmasq[616]: 195656 192.168.1.8/56785 gravity blocked pagead2.googlesyndication.com is 0.0.0.0
Oct 25 12:28:13 dnsmasq[616]: 195657 192.168.1.8/55392 query[A] www.googletagservices.com from 192.168.1.8
Oct 25 12:28:13 dnsmasq[616]: 195657 192.168.1.8/55392 gravity blocked www.googletagservices.com is 0.0.0.0
Oct 25 12:28:13 dnsmasq[616]: 195658 192.168.1.8/28280 query[A] www.google.com from 192.168.1.8
Oct 25 12:28:13 dnsmasq[616]: 195658 192.168.1.8/28280 forwarded www.google.com to 9.9.9.9
Oct 25 12:28:13 dnsmasq[616]: 195659 192.168.1.8/37472 query[A] c.amazon-adsystem.com from 192.168.1.8
Oct 25 12:28:13 dnsmasq[616]: 195659 192.168.1.8/37472 gravity blocked c.amazon-adsystem.com is 0.0.0.0
Oct 25 12:28:13 dnsmasq[616]: 195658 192.168.1.8/28280 reply www.google.com is 142.251.167.106
Oct 25 12:28:13 dnsmasq[616]: 195661 192.168.1.8/7803 query[A] ib.adnxs.com from 192.168.1.8
Oct 25 12:28:13 dnsmasq[616]: 195661 192.168.1.8/7803 gravity blocked ib.adnxs.com is 0.0.0.0
Oct 25 12:28:13 dnsmasq[616]: 195662 192.168.1.8/62562 query[AAAA] www.google.com from 192.168.1.8
Oct 25 12:28:14 dnsmasq[616]: 195684 192.168.1.8/41688 forwarded healthguides.cnn.com to 9.9.9.9
Oct 25 12:28:14 dnsmasq[616]: 195684 192.168.1.8/41688 reply healthguides.cnn.com is 18.165.83.23
Oct 25 12:28:14 dnsmasq[616]: 195685 192.168.1.8/43566 query[A] healthguides.cnn.com from 192.168.1.8
Oct 25 12:28:14 dnsmasq[616]: 195685 192.168.1.8/43566 cached healthguides.cnn.com is 18.165.83.23
Oct 25 12:28:14 dnsmasq[616]: 195685 192.168.1.8/43566 cached healthguides.cnn.com is 18.165.83.31
Oct 25 12:28:14 dnsmasq[616]: 195685 192.168.1.8/43566 cached healthguides.cnn.com is 18.165.83.55
Oct 25 12:28:14 dnsmasq[616]: 195685 192.168.1.8/43566 cached healthguides.cnn.com is 18.165.83.10
Oct 25 12:28:14 dnsmasq[616]: 195686 192.168.1.8/2757 query[AAAA] healthguides.cnn.com from 192.168.1.8
Oct 25 12:28:14 dnsmasq[616]: 195686 192.168.1.8/2757 forwarded healthguides.cnn.com to 9.9.9.9
Oct 25 12:28:14 dnsmasq[616]: 195686 192.168.1.8/2757 reply healthguides.cnn.com is NODATA-IPv6
Oct 25 12:28:15 dnsmasq[616]: 195687 192.168.1.8/41585 query[PTR] 5.31.75.146.in-addr.arpa from 192.168.1.8
Oct 25 12:28:15 dnsmasq[616]: 195687 192.168.1.8/41585 forwarded 5.31.75.146.in-addr.arpa to 9.9.9.9
Oct 25 12:28:15 dnsmasq[616]: 195688 192.168.1.8/60699 query[PTR] 14.24.17.104.in-addr.arpa from 192.168.1.8
Oct 25 12:28:15 dnsmasq[616]: 195688 192.168.1.8/60699 forwarded 14.24.17.104.in-addr.arpa to 9.9.9.9
Oct 25 12:28:15 dnsmasq[616]: 195689 192.168.1.8/55944 query[PTR] 23.83.165.18.in-addr.arpa from 192.168.1.8
Oct 25 12:28:15 dnsmasq[616]: 195689 192.168.1.8/55944 forwarded 23.83.165.18.in-addr.arpa to 9.9.9.9
Oct 25 12:28:15 dnsmasq[616]: 195687 192.168.1.8/41585 reply 146.75.31.5 is NXDOMAIN
Oct 25 12:28:15 dnsmasq[616]: 195688 192.168.1.8/60699 reply 104.17.24.14 is NXDOMAIN

You can see from the sample logs that one of my lab machines is using IP 192.168.1.8, that I am using the relatively new Quad9 (9.9.9.9, get it?) as my upstream DNS provider in Pi-hole, and that there are multiple DNS requests that are probably related to Microsoft software.

These aren’t the prettiest types of logs I’ve ever seen -- essentially there are multiple lines for each type of DNS event -- but they get the job done and have a standardized syslog-style timestamp per line, which we’ll need for our log shipment pipeline to the ELK stack.

At a high level, this represents the log shipment pipeline I set out to prototype:

Endpoints (client DNS requests) > Pi-hole (DNS server/sinkhole) > Filebeat (log shipper) > Logstash (log shaper) > Elasticsearch (log storage and indexing backend) and Kibana (log analysis frontend)

Essentially, the endpoints use Pi-hole as their DNS server. Pi-hole logs dnsmasq events including domain resolutions and blocklist matches to a local log file. I opted to use Filebeat, one of Elastic’s lightweight log shippers, directly on the Pi-hole server to ship those dnsmasq logs in real-time to a Logstash server. I created some custom configs for Logstash in order to implement basic field mappings, implement an accurate timestamp, and enrich the logs by adding GeoIP location lookups for external IP addresses from resolved domains. Logstash then ships those processed logs to a separate Elasticsearch server for storage and indexing, with Kibana serving as the frontend on the same server for manual searches, visualizations, and dashboards.

As an aside, one reason I found this project interesting is because there seems to be plentiful Internet chatter on working with BIND and Microsoft DNS logs, but not nearly so much about dnsmasq logs. That said, although the DNS log pipeline described here is designed for Pi-Hole’s dnsmasq logs, it can be easily adapted for other types of DNS logs such as BIND and Microsoft.

Back to business. Let’s walk through each of the major parts of the DNS logs pipeline in more detail. This guide will not cover the installation and basic configuration of the ELK stack itself, as this is well documented elsewhere. For my testing, I installed Logstash on an Ubuntu 22.04 Server VM, and Elasticsearch and Kibana on a separate Ubuntu Server VM. My main advice for deploying ELK is to ensure you allocate plenty of RAM. Ensure that your Logstash, Elasticsearch, and Kibana servers are all operational and you know their static IPs before proceeding. For this project, I am using the 8.x versions of the ELK stack components.

Filebeat

First, we need to install Filebeat on the Pi-hole server. Note that while Pi-hole itself has minimal system requirements (typically runs fine with 1 core and 512 MB RAM), running Filebeat on the same server will generate some performance overhead. In my case, I erred on the side of caution, and allocated 2 cores and 2 GB RAM to the Pi-hole server to account for the FileBeat addition, but even that is likely overkill for a small deployment. CPU usage is miniscule and total RAM utilization is typically <10% on my Pi-hole server.

Since I am using Ubuntu Server, I can manually wget and install a 64-bit DEB package, or follow Elastic’s instructions for installing from the official repo. The process would be the same for other Debian based distros.

Once Filebeat is installed, I need to customize its filebeat.yml config file to ship Pi-hole’s logs to my Logstash server. You can either use the default Filebeat prospector that includes the default log location /var/log/*.log (modify it to include pihole folder, /var/log/pihole/*.log or /var/log/pihole/*.log), or specify /var/log/pihole/pihole.log to only ship Pi-hole’s dnsmasq logs. Keep in mind that Filebeat default paths config “/var/log/*.log” will not send Pi-Hole logs since they’re located in a folder called ‘pihole’

We also need to point Filebeat to the Logstash server’s IP. I’m sticking with Logstash’s default port 5044.

Since I’m using Ubuntu 22.04 Server as the underlying OS for everything, the proper command to then start Filebeat manually is: sudo systemctl start filebeat. Filebeat will immediately start shipping the specific logs to Logstash. You can also configure Filebeat (as well as the ELK stack components) to start up automatically on boot.

Filebeat Logstash config file

While Filebeat requires minimal configuration to get started, Logstash configuration is much more involved. For my DNS logs pipeline, I installed Logstash on a dedicated Ubuntu Server VM. I named my custom config file as dnsmasq.conf, and ended up writing my own grok pattern filters to match on interesting dnsmasq logs in order to properly process and enrich them.

First, we specify the Logstash input in our custom config file, which is simply listening on its default port 5044 for logs shipped from Filebeat:

input {
  beats{
    port => 5044
    type => "logs"
    tags => ["pihole","5044"]
 }
}

Then we need to create a custom grok filter to match on the specific dnsmasq logs we are interested in. This has been the most time consuming part of this project, as there are multiple formats that dnsmasq logs take, and essentially a single DNS event gets broken into multiple lines. This is where I first learned about https://grokconstructor.appspot.com/, an extremely useful web-based tool to build and test grok regular expression (regex) patterns. Through trial and error, I got a few basic matches working for DNS query and reply logs. There is clearly still work to be done; for example, a blacklisted domain and the originating client IP are logged on separate lines by dnsmasq (they are effectively separate logs), so addressing that remains on my to do list.

filter {

  if "pihole" in [tags] {
    grok {
      patterns_dir => ["/etc/logstash/patterns/"]
      match => {
                "message" => [
 "%{logdate:LOGDATE} dnsmasq\[(?<dnsmasq>\d+)\]: (?<type>reply|cached|query|query\[AAA\]|forwarded|query\[A\]|query\[AAAA\]|query\[HTTPS\]|query\[PTR\]|gravity blocked) %{domain:domain_request} (?<direction>is|from|to) %{IP:ip_response}",

 "%{logdate:LOGDATE} dnsmasq\[(?<dnsmasq>\d+)\]: (?<type>reply|cached|query|query\[AAA\]|forwarded|query\[A\]|query\[AAAA\]|query\[HTTPS\]|query\[PTR\]|gravity blocked) %{domain:domain_request} (?<direction>is|from|to) %{IPV6:ip_response}",

 "%{logdate:LOGDATE} dnsmasq\[(?<dnsmasq>\d+)\]: (?<type>reply|cached|query|query\[AAA\]|forwarded|query\[A\]|query\[AAAA\]|query\[HTTPS\]|query\[PTR\]) %{domain:domain_request} (?<direction>is|from|to) (?<ip_response>NODATA-IPv6|\<CNAME\>|NODATA|NXDOMAIN)"
                  ]
      }
}

# to do cached and cached reverse

     if [message] =~ "cached" and [message] =~ "NXDOMAIN" {
       mutate {
         add_tag => [ "cached NXDOMAIN" ]
       }
     }

     else if [message] =~ "NODATA" {
       mutate {
         add_tag => [ "NODATA" ]
       }
     }
     else if "reply" in [type] {
       mutate {
         add_tag => [ "reply" ]
        }
     }
    geoip {
      source => "ip_response"
      target => "ip_response_geo"
    }
    date {
      match => [ "LOGDATE", "MMM dd HH:mm:ss", "MMM  d HH:mm:ss" ]
    }
  }
}

The above example grok patterns matches a majority of district types of dsnmasq logs, including initial DNS queries, replies, and blacklisted requests, etc.

You can see in my filter that I also specify a “patterns_dir”. In order to use custom patterns (which I have named the same as their respective fields in ALL CAPS) in a grok match, you must list them in a patterns file located in the specified directory.

The contents of my custom patterns file, which I simply saved to \patterns\dnsmasq:

logdate [\w]{3}\s[\s\d]{2}\s\d\d\:\d\d\:\d\d
blocklist [\/\w\.]+
domain [\w\.\-]+
clientip \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}
ip \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}
ipv6 ([0-9]|[a-f]|[A-F]){0,4}:{1,2}
FQDN \b(?:[\w-][\w-]{0,62})(?:\.(?:[\w-][\w-]{0,62}))*(\.?|\b)
DNSMASQPREFIX %{SYSLOGTIMESTAMP:date} %{SYSLOGPROG}: %{INT:logrow} %{IP:source_host}\/%{POSINT:source_port}

Note that I have not finished writing and perfecting grok patterns for all possible dnsmasq log types and fields. There are a few types of dnsmasq logs that I still need to address, and I’m sure refinements are needed for the somewhat crude-but-effective patterns I did write, to account for things like odd characters in domain names. See screenshots below of Grok constructor in action:

One issue I quickly ran into during testing was that the @timestamp field did not match my LOGDATE field once the logs arrived in Elasticsearch for indexing. LOGDATE represents the original timestamp of the dnsmasq event, while the @timestamp added in Elasticsearch represents the time the log was successfully shipped into Elasticsearch, which typically lags slightly behind the LOGDATE. Fortunately, Logstash’s date filter plugin makes it easy to fix this as follows:

    date {
      match => [ "LOGDATE", "MMM dd HH:mm:ss", "MMM  d HH:mm:ss" ]
    }

Essentially the dnsmasq logs have 2 possible representations for their syslog-style timestamp field, which I have named LOGDATE, consisting of 2 digit days and single digit days preceded by an extra space. The date filter above normalizes this such that the @timestamp field will exactly match the original corresponding LOGDATE field and also append the current year.

With domain resolution lookups, you have their resulting IP addresses being logged by Pi-hole. Accordingly, we want to enrich our logs with GeoIP location data. Logstash’s geoip filter plugin has made this remarkably easy:

    geoip {
      source => "ip_response"
      target => "ip_response_geo"
    }

What does this accomplish? Whenever this filter identifies an IP address for a resolvable domain, it enriches the document with GeoIP location data by adding various fields (drawing from the included Maxmind Lite database) like so:

With this GeoIP data, will be able to run searches and build Kibana visualizations such as maps based on where IPs are geolocated. The detailed guide of how to create Kibana visualizations with this GeoIP / geopoint data will be explained in a future blog post.

Finally, we need to configure Logstash to send these freshly shaped and enriched logs to the Elasticsearch server. In my sample Logstash config, it looks like this (be sure to specify your own IP and index naming convention preference):

output {
  elasticsearch {
    hosts => ["xxx"]
    user => "elastic"
    password => "${ES_PWD}"
    ssl_enabled => true
    ssl_certificate_authorities => "xxx.crt"
    index => "pihole-%{+YYYY.MM.dd}"
  }
#    stdout { codec => rubydebug { metadata => true } }
}

Once the config file is ready, run Logstash and specify that it load our config file:

sudo bin/logstash -f dnsmasq.conf

Don’t be discouraged if Logstash throws an error related to your config file; read the error message carefully and fix your config accordingly. An errant or missing brace character or other typo is usually to blame in my experience.

Once Logstash is running, you should see something like the following, indicating that it is successfully listening on its default port for logs:

And if you enabled an stdout filter, processed logs will be output to the screen in real-time. This is often helpful for debugging problems with your grok filter or other parts of your overall log pipeline.

Before getting to this point, you should have Elasticsearch and Kibana installed and running on a separate server with plenty of RAM allocated. To ensure that our log pipeline is working properly from end to end, query Elasticsearch from the command line or web browser to list the relevant Logstash indices:

Once that is done, you can finish setting up your index in Kibana and start reviewing logs. In Kibana, go to Management > Index Patterns and finish creating a new index pattern corresponding to the index naming convention you configured in Logstash.

Be sure to use the @timestamp field as the “Time Filter field name”, click “Create index pattern” and you are all set to start working with the logs in Kibana.

In my next post, I’ll share some sample Kibana searches, visualizations, and dashboards that make good use of our new and improved Pi-hole DNS logs for security monitoring and analytics. This includes a component template and ingest pipeline to add a GeoIP geopoint field for visualizations. In addition, the blog post will go into steps for shipping logs from a Raspberry Pi hole instance into Logstash.

I’ll also share additional lessons learned and recommended next steps for this project. In the meantime, you can find my sample configs on GitHub, with the caveat that they should still be considered mostly in beta stage at this point.

Polito Inc. offers a wide range of security consulting services including penetration testing, vulnerability assessments, red team assessments, incident response, digital forensics, threat hunting, and more. If your business or your clients have any cybersecurity needs, contact our experts and experience what Masterful Cyber Security is all about.

Phone: 571-969-7039

E-mail: info@politoinc.com

Website: politoinc.com

How to Build Your Own DNS Sinkhole and DNS Logs Monitoring System

Pi-hole

DNS Logs Pipeline

Filebeat

Filebeat Logstash config file

Recent Posts

CONTACT US