Friday reports, automated - GA4, Search Console, and a GPT-4 commentary layer

Every agency I have ever worked at or for has the same Friday morning ritual. Somebody scrambles to pull last week's numbers for every client, drops them into a template, writes the same three paragraphs of "highlights and observations" they wrote last week with three numbers swapped, and clicks send. It is the kind of work that takes four hours and nobody wants to do.

So I built a Python pipeline that does it in twelve seconds.

The pieces are not exotic. GSC's API, GA4's API, GPT-4, Gmail, some HTML templates. What is interesting is what each piece is responsible for, and the rule I made for myself about where AI gets to touch the workflow.

The shape

Six steps, run sequentially every Friday at 6 AM:

Pull the last 30 days of GSC data per client (clicks, impressions, CTR, average position).
Pull the matching window from GA4 (new users, engaged sessions, engagement rate, average session duration).
Calculate the comparisons: 30 days vs prior 30 days, and 30 days vs same period last year.
Send each client's data block to GPT-4 with a prompt that asks for a performance summary in plain English.
Generate charts for the metrics that move.
Stitch summary, charts, and raw tables into an HTML email and send via Gmail API.

The whole thing finishes before the coffee maker.

What AI does, and what it does not do

The rule that took longest to land: AI gets to write the commentary, not the numbers.

The actual metric values, the comparisons, the growth percentages - those are pulled from APIs and computed by Pandas. They go straight from the source of truth to the report without an LLM in between. Nothing GPT outputs can become a number the client sees.

What GPT does is read those numbers and write the "what this means" paragraph. "Organic clicks were up 14% vs the prior 30 days, driven by improved rankings on long-tail variants of the priority keyword. Average position improved by 1.2, the largest single-week movement we have seen this quarter." That kind of sentence used to take me ten minutes to write per client. Now it costs about 200 tokens.

The rule matters because the failure modes are different. If GPT hallucinates a number, the client trusts it, and you find out two weeks later. If GPT writes a slightly awkward sentence about a real number, the worst case is mild embarrassment.

The multi-client config

A surprisingly large fraction of the build was just "make it work for many clients without me touching anything per client." Two files in /config:

clients.xlsx: one row per client with their GSC property URL, GA4 property ID, and a display name.
recipients.csv: one row per email recipient mapped to a client.

Adding a new client is a two-line edit in those files. No code changes, no redeploy, no testing. The pipeline iterates over the client list, runs the six steps for each, and emails the right people. If anything fails for a single client - bad credentials, GSC sync delay, GA4 quota hit - that client gets skipped and logged, the rest proceed.

The boring infrastructure decisions that paid off

Three things I would do again on day one:

OAuth refresh tokens, not credentials in env vars. Google's OAuth flow gives you a refresh token. Store that, not the username and password. Refresh tokens basically last forever as long as you do not revoke them. No re-auth interruptions in the middle of a Friday morning.

Charts as PNG, uploaded to SFTP, embedded by URL in the HTML email. Inline base64 images in HTML emails break Outlook in ways that make you want to throw your laptop out the window. Hosting the PNG and referencing it by URL has worked flawlessly across Gmail, Apple Mail, and Outlook for two years.

A "dry run" mode. python src/main.py --dry-run runs the whole pipeline but skips sending email. Lets me test a new client onboarding without spamming them with seven test reports.

The OpenAI API moment that taught me a lesson

The first version of the prompt asked GPT-4 to "write an SEO performance summary." It came back with something that sounded great and contained two completely fabricated metrics. The pipeline shipped it. A client called me about a "23% organic CTR increase" that did not exist anywhere in the GSC data.

The fix was to restructure the prompt so the numbers were explicit:

Here are this client's GSC metrics for the last 30 days, vs the prior 30 days:

- Clicks: 4,521 (up 14.2% from 3,961)
- Impressions: 142,883 (up 8.1% from 132,144)
- CTR: 3.16% (up 5.6% from 2.99%)
- Average position: 11.3 (improved by 1.2 from 12.5)

Write a 2-paragraph performance summary based ONLY on these numbers.
Do not invent metrics. Do not project. Reference only the values listed above.
If a number does not appear in the data, do not mention it.

That last line - "if a number does not appear in the data, do not mention it" - was the one that fixed it. LLMs are pleasers. They will fill in plausible-looking values to round out a paragraph if you let them. Tell them explicitly not to.

What I would change

The 30-day window is fine for steady performance. It is too long for catching sudden problems. The next iteration adds a 7-day check that runs daily, just looking for anomalies, and emails me, not the client, so I can dig in before the Friday report.

I would also separate the data layer from the presentation layer entirely. Right now the pipeline pulls, computes, and renders in sequence. Splitting "pull and store in SQLite" from "read SQLite and render report" would let me re-render last month's report without re-fetching the data, which is useful for both debugging and demonstrating to a new client what the reports will look like.

But the core - APIs in, comparisons computed locally, GPT writes the prose, HTML out, send - has held up across dozens of weekly cycles. It is the rare automation project where the maintenance time is genuinely close to zero.

Source: github.com/schandler7171/friday-reports-tool