All posts

A WHOIS + DNS scraper for a list of domains - what to run before you inherit a portfolio

Before you migrate a website, take over an account, or pitch a multi-domain consolidation, you need to know what you're actually inheriting. A small Python script that takes a list of domains and writes back the registrar, nameservers, MX records, A records, TXT records, expiration dates, and creation dates for each one.

The day-one move on any new client engagement that involves DNS or domains: run a registrar and DNS audit on everything they own.

Two reasons. First, you find out what you're inheriting - sometimes that includes expired-registration risks, domains pointed at servers nobody remembers, or nameservers at a registrar nobody on the team has credentials for. Second, you find out what's not in the inventory - the orphan domain registered to an ex-employee that still owns the company's primary brand variant.

The big tools will do this for you for a price. The Python version takes 100 lines and runs against an arbitrary CSV of domains.

What it returns

For each domain in the input list, the script writes back:

  • DNS records: A records (IP addresses), MX records (mail exchange), TXT records (SPF, DKIM, DMARC, verification tags)
  • Socket-level DNS info: full getaddrinfo output including resolved IPs, address family, and socket type
  • WHOIS data: registrar, WHOIS server, creation date, expiration date, nameservers, registrant info, admin contact, tech contact
  • HTTP reachability: a quick requests.get to confirm the domain actually serves a response

Output is JSON. One record per domain. The script handles failures gracefully - a domain that doesn't resolve or has WHOIS privacy enabled doesn't break the run; it gets an "error" field in its record and the script moves to the next.

The libraries doing the actual work

Three Python packages do most of the heavy lifting:

  • whois for the registrar/WHOIS lookup. The package wraps the underlying WHOIS protocol and normalizes responses across different registrars (which all return slightly different field names for the same data).
  • dnspython (imported as dns.resolver) for the A/MX/TXT record lookups. Standard library socket can resolve A records but doesn't handle MX or TXT cleanly.
  • socket for the low-level resolved-IP info that dns.resolver doesn't provide.

That's it. No paid API, no rate-limit dance, no quota to manage. Everything runs against public DNS servers and public WHOIS databases.

The failure modes you'll hit

The first time you run this on 50 domains you'll see all of them:

WHOIS rate limits. WHOIS servers are slow and rate-limited. Running 50 lookups in a tight loop will get you throttled by some TLDs. The fix is a small time.sleep(2) between queries or a smarter exponential backoff on errors.

WHOIS privacy. Most domains today are registered with privacy enabled - the registrant info is hidden behind a privacy service like Domains by Proxy or WhoisGuard. The script doesn't try to defeat privacy; it just records "Privacy Protected or Unavailable" and moves on.

DNS providers that lie. Some DNS providers return cached/stale records or hijack NXDOMAIN responses with a "did you mean..." landing page (looking at you, certain ISPs). Running through a known-clean resolver (Cloudflare's 1.1.1.1, Google's 8.8.8.8) is more reliable than the default system resolver.

Domains that fail to resolve at all. Expired domains, parked domains, and domains pointed at dead nameservers will throw errors. Each error is caught in a try/except and logged - they don't stop the run.

What you actually use the output for

The audit produces a JSON file. The actionable parts:

Expiration dates. Sort by expiration_date ascending and the top of the list is whatever's expiring soonest. On most engagements, you'll find at least one domain inside 90 days of expiration that nobody at the client knew about. Catching it early is the value.

Nameserver consolidation. Group by nameservers and you see the registrar/DNS sprawl. A typical mid-sized company will have nameservers across three or four providers because nobody's ever consolidated. The audit makes the sprawl visible and the consolidation conversation possible.

SPF/DKIM/DMARC gaps. The TXT records column shows which domains have email auth set up and which don't. Email-sending domains without SPF/DMARC are vulnerable to spoofing and tend to have poor deliverability. The audit surfaces these immediately.

Orphan/forgotten domains. Domains registered to ex-employees, pointed at servers that no longer exist, or with registrar accounts the company can't access. These are usually the most-uncomfortable findings - and the most-valuable to find before they bite.

Why JSON output, not CSV

For most reporting work I default to CSV because Excel is universal. For domain audits the output is genuinely tree-shaped - WHOIS returns nested fields, DNS records are lists, socket info is arrays-of-arrays. Flattening that into a CSV loses information.

JSON keeps the structure intact. A follow-up script (or a few minutes in jq) can pull the specific fields the audit conversation needs into a flat table when that's what the client wants to see.

What I would change

A few next-iteration improvements:

Parallel WHOIS queries. Sequential WHOIS is slow because each query is gated on a remote server. Threading the WHOIS lookups (with conservative concurrency to avoid rate limits) would cut a 50-domain audit from 4 minutes to 30 seconds.

SSL certificate inspection. While we're already fetching each domain, capturing the SSL cert's issuer, expiration, and SAN (Subject Alternative Names) would add another dimension to the audit. Cert expiration is another "we didn't know" landmine on most client engagements.

Email deliverability scoring. SPF, DKIM, and DMARC presence is recorded but not interpreted. A scoring layer ("this domain has SPF but no DMARC and is sending mail - here's the spoofing risk") would turn raw records into a triage report.

Auto-renewal status from registrar APIs. WHOIS tells you when a domain expires. It doesn't tell you whether auto-renewal is on. For registrars with APIs (Cloudflare, Google Domains, etc.) the script could check renewal status and surface "domain expires in 21 days, auto-renewal is OFF" as a separate flag.

But for the actual question - "what do they own, where is it registered, when does each one expire, and what's pointed where?" - the current script answers it in one CSV-in, one JSON-out, in under five minutes. Run it on day one of any DNS-adjacent engagement. The findings pay for the engagement.

Source: github.com/schandler7171/portfolio-example-scripts/tree/main/standalone-scripts