Building an Automated Content Pipeline with AI

The Problem

As a Brazilian platform engineer working in the US, I wanted to help the Brazilian developer community stay on top of global tech trends without having to scroll through multiple feeds every day.

The solution: Café com Dopamina — an automated content platform that aggregates data from 70+ sources and generates daily episode-style blog posts in Portuguese.

The Architecture

The pipeline runs entirely on GitHub Actions:

Data Collection — Daily cron jobs scrape GitHub Trending, monitor LinkedIn engagement, and pull from RSS/Reddit/HN feeds
Digest Building — A Python script aggregates the last 24 hours of data into a structured JSON digest with themes, top repos, and community insights
Content Generation — The digest feeds into GPT-4o via the GitHub Models API (zero extra secrets — uses the built-in GITHUB_TOKEN)
Publishing — The generated episode is pushed as a PR to the blog repo for human review, then auto-deployed to Vercel on merge

Technical Highlights

Cross-repo automation: One workflow in the data repo pushes content to the blog repo and opens a PR
Anti-detection scraping: LinkedIn data is collected via Playwright with a full stealth stack
SEO-first: Server-rendered Next.js, JSON-LD structured data, dynamic OG images via Satori
Human-in-the-loop: AI generates, but every episode is reviewed before publish

What I Learned

Building this taught me that the hard part isn't the AI — it's the data pipeline. Getting clean, structured data from diverse sources is 80% of the work. The LLM is just the last mile.