Building an Automated Content Pipeline with AI
How I built a fully automated system that collects trending tech data from GitHub, LinkedIn, and newsletters, then uses AI to generate daily blog posts in Portuguese.
The Problem
As a Brazilian platform engineer working in the US, I wanted to help the Brazilian developer community stay on top of global tech trends without having to scroll through multiple feeds every day.
The solution: Café com Dopamina — an automated content platform that aggregates data from 70+ sources and generates daily episode-style blog posts in Portuguese.
The Architecture
The pipeline runs entirely on GitHub Actions:
- Data Collection — Daily cron jobs scrape GitHub Trending, monitor LinkedIn engagement, and pull from RSS/Reddit/HN feeds
- Digest Building — A Python script aggregates the last 24 hours of data into a structured JSON digest with themes, top repos, and community insights
- Content Generation — The digest feeds into GPT-4o via the GitHub Models API (zero extra secrets — uses the built-in
GITHUB_TOKEN) - Publishing — The generated episode is pushed as a PR to the blog repo for human review, then auto-deployed to Vercel on merge
Technical Highlights
- Cross-repo automation: One workflow in the data repo pushes content to the blog repo and opens a PR
- Anti-detection scraping: LinkedIn data is collected via Playwright with a full stealth stack
- SEO-first: Server-rendered Next.js, JSON-LD structured data, dynamic OG images via Satori
- Human-in-the-loop: AI generates, but every episode is reviewed before publish
What I Learned
Building this taught me that the hard part isn't the AI — it's the data pipeline. Getting clean, structured data from diverse sources is 80% of the work. The LLM is just the last mile.