Information is the power of the 21st century, but curation is the key to focus. By building an automated newsletter generator, you transform the noisy web into a personalized stream of high-value intelligence.
1The RSS Backbone
RSS (Really Simple Syndication) is the quiet engine of the internet. While social media algorithms decide what you see, RSS Feeds give you direct, unfiltered access to a website's published content. Every major blog, news outlet, and podcast has one. In n8n, the RSS node acts as your sentry: it polls your configured feeds on a schedule, compares entries against a stored state, and surfaces only the items that are genuinely new since the last run.
The power of RSS for research automation is objectivity. You define exactly which sources matter to your domain. The algorithm has zero influence over what enters your pipeline. Over time, a well-curated RSS list becomes one of your most valuable professional assets.
Practically, you'll aggregate feeds from 5-20 sources into a single workflow. The n8n RSS node returns structured metadata for each entry: title, link, published date, and a short description. That description is rarely the full article ā you'll need a scraper for the rest.
// n8n RSS Feed Trigger (runs daily at 7am)
// Sources monitored:
[
'https://techcrunch.com/feed/',
'https://feeds.arstechnica.com/arstechnica/index',
'https://www.wired.com/feed/rss'
]
// Output per article:
{
title: 'OpenAI releases GPT-5...',
link: 'https://techcrunch.com/2024/...',
pubDate: '2024-03-15T08:30:00Z',
snippet: 'The new model achieves...'
}2Full-Text Scraping
The RSS snippet is rarely enough for meaningful summarization. It's usually 100-200 characters of teaser text. To get the full article body, your workflow must visit the URL and extract the content ā this is full-text scraping.
In n8n, the HTTP Request node fetches the raw HTML. A Code node or the HTML Extract node then parses it to strip boilerplate: navigation menus, ads, footers, cookie banners. What you want is just the <article> or <main> tag content. Libraries like cheerio (available in the n8n Code node) make this trivial.
Why bother? Because the AI summarization step is only as good as its input. Feed it a 3,000-word article about semiconductor geopolitics and it produces a tight 5-sentence insight. Feed it 200 words of RSS teaser and it produces filler. Clean full-text is the difference between a useful digest and a worthless one.
// n8n Code node: extract article body
const $ = cheerio.load($input.item.json.html);
// Remove noise
$('nav, footer, aside, .ads, .cookie-banner').remove();
// Extract main content
const articleText = $('article, main, .post-content')
.text()
.trim()
.replace(/\s+/g, ' ');
return [{ json: { text: articleText } }];3AI Summarization & Digest
With clean full-text from each article, you send it to an LLM with a structured summarization prompt. The key is to enforce a template: Summary (2 sentences), Key Data Point (1 statistic or fact), Industry Impact (1 sentence). This makes every entry in your digest consistent and scannable ā readers can process 20 items in 5 minutes.
After summarizing all articles, you aggregate the results into a single HTML email using a template node. Group by topic, rank by relevance score (another LLM call), and send via Gmail or SendGrid on a weekly schedule.
The output is a digest that reads like it was hand-curated by someone who read everything. Except it cost you zero minutes of reading time and runs automatically every Friday morning before you open your laptop.
// Summarization prompt
const prompt = `
Summarize this article in a structured format:
Article:
${articleText}
Respond in JSON:
{
"summary": "2 sentence summary",
"keyFact": "1 key statistic or quote",
"impact": "1 sentence on industry implications"
}
`;