Skip to main content
Back to Blog

Building an Automated Content Pipeline with AI

2 min read
via Café com Dopamina

How I built a fully automated system that collects trending tech data from GitHub, LinkedIn, and newsletters, then uses AI to generate daily blog posts in Portuguese.

AIAutomationGitHub ActionsContent Generation

The Problem

As a Brazilian platform engineer working in the US, I wanted to help the Brazilian developer community stay on top of global tech trends without having to scroll through multiple feeds every day.

The solution: Café com Dopamina — an automated content platform that aggregates data from 70+ sources and generates daily episode-style blog posts in Portuguese.

The Architecture

The pipeline runs entirely on GitHub Actions:

  1. Data Collection — Daily cron jobs scrape GitHub Trending, monitor LinkedIn engagement, and pull from RSS/Reddit/HN feeds
  2. Digest Building — A Python script aggregates the last 24 hours of data into a structured JSON digest with themes, top repos, and community insights
  3. Content Generation — The digest feeds into GPT-4o via the GitHub Models API (zero extra secrets — uses the built-in GITHUB_TOKEN)
  4. Publishing — The generated episode is pushed as a PR to the blog repo for human review, then auto-deployed to Vercel on merge

Technical Highlights

  • Cross-repo automation: One workflow in the data repo pushes content to the blog repo and opens a PR
  • Anti-detection scraping: LinkedIn data is collected via Playwright with a full stealth stack
  • SEO-first: Server-rendered Next.js, JSON-LD structured data, dynamic OG images via Satori
  • Human-in-the-loop: AI generates, but every episode is reviewed before publish

What I Learned

Building this taught me that the hard part isn't the AI — it's the data pipeline. Getting clean, structured data from diverse sources is 80% of the work. The LLM is just the last mile.