n8n HTML Node

The data you need is sitting right there on a webpage, but getting it into your workflow feels impossible. Product prices, contact information, article headlines, table data. You can see it in your browser, but extracting it programmatically has always required coding skills or expensive third-party tools.

The HTML node changes that. It transforms raw HTML into structured JSON data using CSS selectors, the same targeting system browsers use to style web pages. Combined with the HTTP Request node for fetching pages, you can scrape data from virtually any website without writing code.

The Web Scraping Challenge

Web scraping sounds simple until you try it:

Websites structure their HTML differently, with no standard format
The data you want is buried inside nested elements
Class names and IDs vary between sites (and sometimes between page loads)
Some content only appears after JavaScript executes

The HTML node handles the parsing side of this equation. Once you understand CSS selectors and the node’s configuration options, you can extract almost any visible text or attribute from a webpage.

What You’ll Learn

How to use all three HTML node operations: extraction, template generation, and table conversion
CSS selector fundamentals for targeting specific page elements
Finding reliable selectors using browser developer tools
Handling common extraction failures and edge cases
Building maintainable scraping workflows that survive website changes
Real-world examples for e-commerce, content aggregation, and report parsing

When to Use the HTML Node

The HTML node serves three distinct purposes. Understanding which operation you need prevents wasted time.

Scenario	Operation	Why
Scrape product prices from a website	Extract HTML Content	CSS selectors target specific elements
Pull article titles and links from a news site	Extract HTML Content	Extract multiple values per page
Parse table data from an HTML report	Extract HTML Content	Tables are just nested HTML elements
Generate dynamic email templates	Generate HTML Template	Merge workflow data into HTML structure
Create HTML reports from JSON data	Convert to HTML Table	Automatically formats data as tables
Process HTML files from disk or email	Extract HTML Content	Works with binary HTML files too

Rule of thumb: Use “Extract HTML Content” for web scraping and parsing. Use the other operations for generating HTML output from your workflow data.

When Not to Use the HTML Node

The HTML node has specific limitations:

Limitation	Alternative
JavaScript-rendered content (SPAs)	Headless browser services or community nodes
Complex PDF parsing	Extract from File node or OCR services
API data (already JSON)	Process JSON directly with Edit Fields
Structured XML feeds	XML parsing nodes or Code node
Login-protected pages	HTTP Request with session cookies first

Understanding the Three Operations

The HTML node offers three operations, each serving a different purpose.

Extract HTML Content

This is the primary web scraping operation. It parses HTML content and extracts specific data using CSS selectors.

Input: HTML content (from HTTP Request, file, or previous node)

Output: Extracted values as JSON properties

Use cases:

Scraping prices, titles, descriptions from websites
Extracting links and their text content
Parsing structured data from HTML tables
Pulling metadata from web pages

Generate HTML Template

This operation creates HTML output by merging workflow data into a template. You write HTML with n8n expressions embedded, and the node renders the final output.

Input: JSON data from previous nodes

Output: Rendered HTML string

Use cases:

Creating dynamic email bodies
Generating HTML reports
Building HTML snippets for further processing
Creating formatted output for webhooks

Quick example:

<h1>Order Confirmation</h1>
<p>Hi {{ $json.customerName }},</p>
<p>Your order #{{ $json.orderId }} for {{ $json.itemCount }} items
   totaling ${{ $json.total }} has been confirmed.</p>

The node replaces expressions with actual values from your workflow data.

Convert to HTML Table

This operation transforms JSON array data into an HTML table format automatically, without writing any HTML.

Input: JSON array (multiple items)

Output: HTML table string with headers based on JSON keys

Use cases:

Converting spreadsheet-style data to HTML
Creating simple HTML reports
Formatting data for email or display

Quick example:

If your input is:

[
  { "name": "Widget A", "price": "$10", "stock": 50 },
  { "name": "Widget B", "price": "$25", "stock": 12 }
]

The node outputs a complete HTML table with name, price, and stock columns, ready to embed in emails or reports.

Your First HTML Extraction

Let’s build a complete scraping workflow step by step.

Step 1: Fetch the Web Page

First, use the HTTP Request node to retrieve the page’s HTML:

Add an HTTP Request node to your workflow
Set Method to GET
Enter a URL (for testing, use https://books.toscrape.com/)
Click Test step

The node returns the page’s HTML content in the response body.

Step 2: Add the HTML Node

Add an HTML node after HTTP Request
Set Operation to “Extract HTML Content”
For Source Data, select “JSON”
Set JSON Property to the field containing your HTML (typically data from HTTP Request)

Step 3: Configure Extraction Values

Now define what to extract. Click Add Value in the Extraction Values section:

Extracting a page title:

Setting	Value
Key	`pageTitle`
CSS Selector	`h1`
Return Value	Text

Extracting a product price:

Setting	Value
Key	`price`
CSS Selector	`.price_color`
Return Value	Text

Step 4: Test and Verify

Click Test step. The output should contain your extracted values:

{
  "pageTitle": "All products",
  "price": "£51.77"
}

You can now use these values in subsequent nodes with expressions like {{ $json.pageTitle }}.

CSS Selectors: The Complete Guide

CSS selectors are patterns that identify HTML elements. The same selectors used to style webpages also work for extraction.

Finding Selectors with Browser DevTools

The fastest way to find the right selector:

Open the webpage in Chrome (or any browser)
Right-click the element you want to extract
Select Inspect to open DevTools
In the Elements panel, right-click the highlighted HTML
Choose Copy > Copy selector

This gives you a precise selector for that element. However, auto-generated selectors are often overly specific. You may need to simplify them.

Basic Selector Patterns

Selector	Matches	Example
`tagname`	All elements of that type	`h1` matches all `<h1>` elements
`.classname`	Elements with that class	`.product-title` matches `<div class="product-title">`
`#idname`	Element with that ID	`#main-content` matches `<div id="main-content">`
`tag.class`	Tag with specific class	`p.description` matches `<p class="description">`
`parent child`	Descendant elements	`div p` matches `<p>` inside `<div>`
`parent > child`	Direct children only	`ul > li` matches immediate `<li>` children

Attribute Selectors

Target elements by their attributes:

Selector	Matches	Use Case
`[attr]`	Has attribute	`[href]` matches all links
`[attr="value"]`	Exact attribute value	`[type="email"]` matches email inputs
`[attr^="start"]`	Attribute starts with	`[href^="https"]` matches HTTPS links
`[attr$="end"]`	Attribute ends with	`[href$=".pdf"]` matches PDF links
`[attr*="contains"]`	Attribute contains	`[class*="price"]` matches classes containing “price”

Combining Selectors

Build precise selectors by combining patterns:

/* Element with multiple classes */
div.product.featured

/* Element with class inside another element */
article.post h2.title

/* Element with specific attribute inside class */
.product-card a[href*="/product/"]

/* Multiple comma-separated selectors */
h1, h2, h3

Pseudo-Selectors (Position-Based)

Select elements by their position:

Selector	Matches
`:first-child`	First child element
`:last-child`	Last child element
`:nth-child(n)`	nth child (1-indexed)
`:nth-child(odd)`	Odd-numbered children
`:nth-child(even)`	Even-numbered children

Example: .product-list li:first-child selects the first product in a list.

Important: Some advanced pseudo-selectors like :nth-child(n+4) may not work as expected in n8n. Test thoroughly before relying on complex selectors.

CSS Selector Quick Reference

Goal	Selector
All paragraphs	`p`
Element by ID	`#header`
Element by class	`.product-name`
All links	`a`
Links with specific text	`a[href*="product"]`
First item in list	`ul li:first-child`
All table rows	`table tr`
Specific data attribute	`[data-product-id]`
Multiple selectors	`h1, h2, h3`

For comprehensive CSS selector documentation, see MDN’s CSS Selectors guide.

Extraction Configuration Deep Dive

Understanding every configuration option prevents extraction failures.

Source Data Options

Option	When to Use
JSON	HTML is in a JSON property (most common with HTTP Request)
Binary	HTML is in a binary property (file upload, attachment)

For HTTP Request responses, use JSON and specify the property name containing HTML (usually data).

Return Value Types

This critical setting determines what gets extracted:

Return Value	Extracts	Example Output
Text	Text content only (no HTML tags)	`"Product Name"`
HTML	Inner HTML including child elements	`"<span>Product</span> Name"`
Attribute	Value of a specific attribute	`"/products/123"` (from `href`)

Text is the default and works for most extractions. Use Attribute when you need link URLs, image sources, or data attributes.

Attribute extraction example:

Setting	Value
CSS Selector	`a.product-link`
Return Value	Attribute
Attribute	`href`

Extraction Options

Option	Effect
Trim Values	Removes whitespace from start and end
Clean Up Text	Removes line breaks and condenses multiple spaces
Return Array	Returns all matches as array instead of first match only

Return Array is essential when scraping lists. Without it, you only get the first matching element.

Example: To extract all product names on a page:

Setting	Value
CSS Selector	`.product-name`
Return Value	Text
Return Array	Enabled

Output:

{
  "productNames": ["Widget A", "Widget B", "Widget C"]
}

Multiple Extraction Values

Add multiple extraction values to pull different data in a single node:

Key	CSS Selector	Return Value
`title`	`h1.product-title`	Text
`price`	`.current-price`	Text
`imageUrl`	`.product-image img`	Attribute (src)
`description`	`.product-description`	Text
`rating`	`.star-rating`	Attribute (class)

All values appear in the same output object.

Real-World Scraping Examples

Example 1: E-commerce Product Scraping

Scenario: Extract product information from an online store.

Workflow:

Schedule Trigger → HTTP Request (product page) → HTML Extract → Edit Fields (clean data) → Airtable/Sheets

HTTP Request:

URL: https://store.example.com/products/widget-123
Method: GET

HTML Extraction Values:

Key	CSS Selector	Return Value
`productName`	`h1.product-title`	Text
`currentPrice`	`.price-current`	Text
`originalPrice`	`.price-original`	Text
`availability`	`.stock-status`	Text
`productImage`	`.product-gallery img`	Attribute (src)

Post-processing with Edit Fields:

// Clean price (remove currency symbol, convert to number)
{{ parseFloat($json.currentPrice.replace(/[^0-9.]/g, '')) }}

// Calculate discount percentage
{{ Math.round((1 - $json.currentPrice / $json.originalPrice) * 100) }}%

Example 2: News Article Aggregation

Scenario: Collect headlines and links from news sites for a daily digest.

Workflow:

Schedule Trigger → HTTP Request (news page) → HTML Extract → Split In Batches → Process Each

HTML Extraction:

Key	CSS Selector	Return Value	Return Array
`headlines`	`article h2 a`	Text	Yes
`links`	`article h2 a`	Attribute (href)	Yes
`timestamps`	`article time`	Attribute (datetime)	Yes

Processing the arrays:

Use a Code node to combine the parallel arrays:

const items = $input.all();
const data = items[0].json;

// Combine parallel arrays into article objects
return data.headlines.map((headline, i) => ({
  json: {
    headline: headline,
    url: data.links[i],
    published: data.timestamps[i]
  }
}));

Example 3: Table Data Extraction

Scenario: Parse pricing tables or data grids from web pages.

Workflow:

HTTP Request → HTML Extract (headers) → HTML Extract (rows) → Code (combine) → Output

Extracting table headers:

Key	CSS Selector	Return Value	Return Array
`headers`	`table thead th`	Text	Yes

Extracting table rows:

Key	CSS Selector	Return Value	Return Array
`rowData`	`table tbody tr`	HTML	Yes

Then use a Code node to parse row data into structured objects using the headers.

Example 4: Generating HTML Email Templates

Scenario: Create personalized HTML emails from workflow data.

Setup:

Set Operation to “Generate HTML Template”
Enter your HTML template with n8n expressions:

<div style="font-family: Arial, sans-serif; max-width: 600px;">
  <h1>Hello {{ $json.firstName }}!</h1>
  <p>Your order #{{ $json.orderId }} has shipped.</p>
  <table style="width: 100%; border-collapse: collapse;">
    <tr>
      <td>Tracking Number:</td>
      <td>{{ $json.trackingNumber }}</td>
    </tr>
    <tr>
      <td>Estimated Delivery:</td>
      <td>{{ $json.estimatedDelivery }}</td>
    </tr>
  </table>
  <p>Thank you for your business!</p>
</div>

The output is a rendered HTML string ready for email nodes.

Handling JavaScript-Rendered Content

A critical limitation: the HTTP Request node fetches raw HTML before JavaScript executes. Modern websites using React, Vue, Angular, or other frameworks often render content client-side.

Symptoms of JavaScript-Rendered Content

Your selector works in the browser but returns nothing in n8n
Inspecting the HTTP Request output shows minimal HTML
The page shows a loading spinner or “Enable JavaScript” message
Content appears after a delay when you load the page manually

Solutions

1. Check for API endpoints

Many JavaScript sites fetch data from APIs. Open browser DevTools Network tab, filter by “Fetch/XHR”, and look for JSON responses. You may be able to call these APIs directly with HTTP Request.

2. Use headless browser services

Services like Browserless, ScrapingBee, or ScrapFly render pages in real browsers and return the final HTML.

3. Community nodes

Search the n8n community for nodes that support JavaScript rendering, such as those integrating with Puppeteer or Playwright.

4. External scraping APIs

Third-party scraping services handle JavaScript rendering and anti-bot measures for you.

For most static websites and server-rendered pages, the standard HTTP Request + HTML combination works perfectly.

CSS Selector Troubleshooting

When extraction fails, systematic debugging finds the problem.

Common Errors and Solutions

Problem	Cause	Solution
Selector returns empty/null	Element doesn’t exist in raw HTML	Check if content is JavaScript-rendered
Selector works in browser, fails in n8n	Different HTML structure or JS-rendered	Compare HTTP Request output with browser source
Only first element extracted	Return Array disabled	Enable “Return Array” option
Wrong element selected	Selector too generic	Make selector more specific
Extraction includes unwanted text	Selector matches parent element	Target more specific child element
Special characters break selector	Unescaped characters in selector	Escape special characters or use attribute selector

Dynamic Class Names (Next.js, React)

Modern frameworks often generate class names with random suffixes:

<div class="ProductCard_container__a1B2c">

The a1B2c part changes on every deployment, breaking your selector.

Solutions:

Use partial attribute matching:

[class^="ProductCard_container"]
[class*="ProductCard_container"]

Find stable identifiers:

Look for data- attributes, IDs, or semantic HTML that doesn’t change:

[data-testid="product-card"]
article[itemtype*="Product"]

Use structural selectors:

Target elements by their position in stable parent structures:

.products-grid > div:first-child
main article:nth-child(2)

Spaces in Class Names

HTML elements can have multiple classes separated by spaces:

<div class="product featured sale">

To match this element, use any single class:

.product
.featured
.sale

Or combine them (element must have all):

.product.featured.sale

Common mistake: Using .product featured (with a space) which means “element with class featured inside element with class product”.

Selector Maintenance Strategies

Prefer IDs over classes: IDs are typically more stable
Use data attributes: [data-product-id] is often added intentionally and stable
Avoid generated class names: Skip classes that look like random strings
Test with multiple pages: Ensure selectors work across different page variations
Document your selectors: Add comments explaining what each selector targets
Monitor for failures: Set up error handling to alert you when extractions fail

For debugging expression issues, our workflow debugger tool can help identify problems.

Pro Tips and Best Practices

1. Always Test Selectors in Browser First

Before configuring the HTML node:

Open the target page in your browser
Press F12 to open DevTools
Go to Console tab
Run: document.querySelectorAll('your-selector')
Verify the correct elements are selected

This catches selector errors before they reach your workflow.

2. Compare Browser Source vs HTTP Request

Sometimes the browser shows different HTML than HTTP Request returns:

View Page Source (Ctrl+U) shows the raw HTML, similar to HTTP Request
DevTools Elements panel shows the DOM after JavaScript modifications

Always compare your selectors against the raw source, not the rendered DOM.

3. Handle Missing Data Gracefully

Not every page will have every element. Use expressions to handle missing values:

{{ $json.price || 'Price not available' }}
{{ $json.rating ?? 0 }}

4. Respect Rate Limits

When scraping multiple pages:

Add Wait nodes between requests
Use the Split In Batches node
Check if the site has robots.txt restrictions
Consider using our rate limiting strategies

5. Set Proper Headers to Avoid Blocks

Some websites block requests that lack browser-like headers. In your HTTP Request node, add these headers under Options > Headers:

Header	Value
`User-Agent`	`Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36`
`Accept`	`text/html,application/xhtml+xml`

This mimics a real browser request and prevents many basic blocks.

6. Build Incrementally

Start with one extraction value. Test it. Add the next. This isolates problems and prevents debugging complex configurations.

7. Use Meaningful Keys

Instead of value1, value2, use descriptive keys like productPrice, productTitle. This makes downstream processing clearer.

8. Combine with Code Node for Complex Parsing

When CSS selectors alone aren’t enough, extract raw HTML and parse it in a Code node:

const html = $json.rawHtml;

// Use regex for complex extraction
const priceMatch = html.match(/\$(\d+\.\d{2})/);
const price = priceMatch ? parseFloat(priceMatch[1]) : null;

return { price };

For expression validation, try our expression validator tool.

When to Get Help

Some scraping scenarios require specialized expertise:

Anti-bot protection: Cloudflare, reCAPTCHA, or other blocking mechanisms
Session-based content: Pages requiring login or complex cookies
Large-scale scraping: Thousands of pages with rate limiting concerns
Data transformation: Complex restructuring of scraped data

Our workflow development services can build production-ready scraping solutions. For architectural guidance, explore our n8n consulting services.

Frequently Asked Questions

Why does my CSS selector work in Chrome DevTools but return nothing in n8n?

This disconnect is almost always caused by JavaScript-rendered content. When you inspect a page in Chrome, you see the DOM after JavaScript has executed. When HTTP Request fetches the page, it gets the raw HTML before any JavaScript runs.

To verify: Press Ctrl+U in Chrome to view the raw source code. Search for the text or element you’re trying to extract. If it doesn’t exist in the source but appears in the DOM, the content is JavaScript-rendered.

Solutions:

Check if the site has a public API that returns the data as JSON
Use a headless browser service that executes JavaScript
Look for community nodes that support JavaScript rendering

For static pages, ensure your selector exactly matches the HTML structure. Copy a selector from DevTools, but verify it against the raw source.

How do I scrape content that only appears after JavaScript loads?

The standard HTTP Request node cannot execute JavaScript. You have several options:

API discovery: Open DevTools Network tab, filter by “Fetch/XHR”, and watch the requests as the page loads. Many JavaScript sites load data from APIs that you can call directly.

Headless browser services: Services like Browserless, ScrapingBee, or Apify render pages in real browsers and return the final HTML. Use HTTP Request to call their APIs.

Server-side rendering detection: Some sites serve full HTML to search engine bots. Try adding a User-Agent header that mimics Googlebot.

The best solution depends on your specific target site. Start by investigating whether an API exists.

What is the difference between Text, HTML, and Attribute return values?

These options determine what data gets extracted from matched elements:

Text returns the visible text content only, stripping all HTML tags:

<p>Hello <strong>world</strong></p>

Returns: "Hello world"

HTML returns the inner HTML including all child elements and tags:

<p>Hello <strong>world</strong></p>

Returns: "Hello <strong>world</strong>"

Attribute returns the value of a specific attribute you specify:

<a href="/products/123" class="link">View Product</a>

With Attribute set to href, returns: "/products/123"

Use Text for readable content (titles, descriptions, prices). Use Attribute for URLs, image sources, data attributes, or class names.

How do I handle websites where class names change with each deployment?

Modern frameworks like Next.js, Nuxt, and others often generate class names with hash suffixes (Button_primary__x7Yz9) that change when the site is rebuilt.

Strategy 1: Partial class matching

Use CSS attribute selectors that match the beginning of the class:

[class^="Button_primary"]
[class*="ProductCard_title"]

Strategy 2: Find stable identifiers

Look for elements that developers add intentionally:

data-testid attributes (added for testing)
id attributes
Semantic HTML5 elements (article, main, nav)
Schema.org markup (itemtype, itemprop)

Strategy 3: Structural selectors

If the page structure is stable even when classes change:

.product-grid > div > h2
main > section:first-child p

Strategy 4: XPath alternative

For very complex cases, extract the HTML and use a Code node with a DOM parsing library to find elements by text content or position.

Monitor your scrapers regularly, as websites can change structure at any time.

Can I extract multiple different elements with different selectors in one node?

Yes. The HTML node supports multiple extraction values in a single operation. Click Add Value to add additional extractions.

Each extraction value has its own:

Key (output property name)
CSS Selector
Return Value type
Options

All extracted values appear in the same output object:

{
  "title": "Product Name",
  "price": "$29.99",
  "imageUrl": "/images/product.jpg",
  "description": "A great product..."
}

For extractions that return arrays (with “Return Array” enabled), each array is a separate property:

{
  "titles": ["Product 1", "Product 2", "Product 3"],
  "prices": ["$10", "$20", "$30"],
  "links": ["/p/1", "/p/2", "/p/3"]
}

To combine these parallel arrays into structured objects, use a Code node after extraction. This pattern is covered in the real-world examples section above.

n8n HTML Node

The Web Scraping Challenge

What You’ll Learn

When to Use the HTML Node

When Not to Use the HTML Node

Understanding the Three Operations

Extract HTML Content

Generate HTML Template

Convert to HTML Table

Your First HTML Extraction

Step 1: Fetch the Web Page

Step 2: Add the HTML Node

Step 3: Configure Extraction Values

Step 4: Test and Verify

CSS Selectors: The Complete Guide

Finding Selectors with Browser DevTools

Basic Selector Patterns

Attribute Selectors

Combining Selectors

Pseudo-Selectors (Position-Based)

CSS Selector Quick Reference

Extraction Configuration Deep Dive

Source Data Options

Return Value Types

Extraction Options

Multiple Extraction Values

Real-World Scraping Examples

Example 1: E-commerce Product Scraping

Example 2: News Article Aggregation

Example 3: Table Data Extraction

Example 4: Generating HTML Email Templates

Handling JavaScript-Rendered Content

Symptoms of JavaScript-Rendered Content

Solutions

CSS Selector Troubleshooting

Common Errors and Solutions

Dynamic Class Names (Next.js, React)

Spaces in Class Names

Selector Maintenance Strategies

Pro Tips and Best Practices

1. Always Test Selectors in Browser First

2. Compare Browser Source vs HTTP Request

3. Handle Missing Data Gracefully

4. Respect Rate Limits

5. Set Proper Headers to Avoid Blocks

6. Build Incrementally

7. Use Meaningful Keys

8. Combine with Code Node for Complex Parsing

When to Get Help

Frequently Asked Questions

Why does my CSS selector work in Chrome DevTools but return nothing in n8n?

How do I scrape content that only appears after JavaScript loads?

What is the difference between Text, HTML, and Attribute return values?

How do I handle websites where class names change with each deployment?

Can I extract multiple different elements with different selectors in one node?

Ready to Automate Your Business?

Create Your Free Account