n8n HTML Node
🔍
Transform Node

n8n HTML Node

Master the n8n HTML node for web scraping, data extraction, and template generation. Learn CSS selectors, troubleshoot common issues, and build production-ready scraping workflows.

The data you need is sitting right there on a webpage, but getting it into your workflow feels impossible. Product prices, contact information, article headlines, table data. You can see it in your browser, but extracting it programmatically has always required coding skills or expensive third-party tools.

The HTML node changes that. It transforms raw HTML into structured JSON data using CSS selectors, the same targeting system browsers use to style web pages. Combined with the HTTP Request node for fetching pages, you can scrape data from virtually any website without writing code.

The Web Scraping Challenge

Web scraping sounds simple until you try it:

  • Websites structure their HTML differently, with no standard format
  • The data you want is buried inside nested elements
  • Class names and IDs vary between sites (and sometimes between page loads)
  • Some content only appears after JavaScript executes

The HTML node handles the parsing side of this equation. Once you understand CSS selectors and the node’s configuration options, you can extract almost any visible text or attribute from a webpage.

What You’ll Learn

  • How to use all three HTML node operations: extraction, template generation, and table conversion
  • CSS selector fundamentals for targeting specific page elements
  • Finding reliable selectors using browser developer tools
  • Handling common extraction failures and edge cases
  • Building maintainable scraping workflows that survive website changes
  • Real-world examples for e-commerce, content aggregation, and report parsing

When to Use the HTML Node

The HTML node serves three distinct purposes. Understanding which operation you need prevents wasted time.

ScenarioOperationWhy
Scrape product prices from a websiteExtract HTML ContentCSS selectors target specific elements
Pull article titles and links from a news siteExtract HTML ContentExtract multiple values per page
Parse table data from an HTML reportExtract HTML ContentTables are just nested HTML elements
Generate dynamic email templatesGenerate HTML TemplateMerge workflow data into HTML structure
Create HTML reports from JSON dataConvert to HTML TableAutomatically formats data as tables
Process HTML files from disk or emailExtract HTML ContentWorks with binary HTML files too

Rule of thumb: Use “Extract HTML Content” for web scraping and parsing. Use the other operations for generating HTML output from your workflow data.

When Not to Use the HTML Node

The HTML node has specific limitations:

LimitationAlternative
JavaScript-rendered content (SPAs)Headless browser services or community nodes
Complex PDF parsingExtract from File node or OCR services
API data (already JSON)Process JSON directly with Edit Fields
Structured XML feedsXML parsing nodes or Code node
Login-protected pagesHTTP Request with session cookies first

Understanding the Three Operations

The HTML node offers three operations, each serving a different purpose.

Extract HTML Content

This is the primary web scraping operation. It parses HTML content and extracts specific data using CSS selectors.

Input: HTML content (from HTTP Request, file, or previous node)

Output: Extracted values as JSON properties

Use cases:

  • Scraping prices, titles, descriptions from websites
  • Extracting links and their text content
  • Parsing structured data from HTML tables
  • Pulling metadata from web pages

Generate HTML Template

This operation creates HTML output by merging workflow data into a template. You write HTML with n8n expressions embedded, and the node renders the final output.

Input: JSON data from previous nodes

Output: Rendered HTML string

Use cases:

  • Creating dynamic email bodies
  • Generating HTML reports
  • Building HTML snippets for further processing
  • Creating formatted output for webhooks

Quick example:

<h1>Order Confirmation</h1>
<p>Hi {{ $json.customerName }},</p>
<p>Your order #{{ $json.orderId }} for {{ $json.itemCount }} items
   totaling ${{ $json.total }} has been confirmed.</p>

The node replaces expressions with actual values from your workflow data.

Convert to HTML Table

This operation transforms JSON array data into an HTML table format automatically, without writing any HTML.

Input: JSON array (multiple items)

Output: HTML table string with headers based on JSON keys

Use cases:

  • Converting spreadsheet-style data to HTML
  • Creating simple HTML reports
  • Formatting data for email or display

Quick example:

If your input is:

[
  { "name": "Widget A", "price": "$10", "stock": 50 },
  { "name": "Widget B", "price": "$25", "stock": 12 }
]

The node outputs a complete HTML table with name, price, and stock columns, ready to embed in emails or reports.

Your First HTML Extraction

Let’s build a complete scraping workflow step by step.

Step 1: Fetch the Web Page

First, use the HTTP Request node to retrieve the page’s HTML:

  1. Add an HTTP Request node to your workflow
  2. Set Method to GET
  3. Enter a URL (for testing, use https://books.toscrape.com/)
  4. Click Test step

The node returns the page’s HTML content in the response body.

Step 2: Add the HTML Node

  1. Add an HTML node after HTTP Request
  2. Set Operation to “Extract HTML Content”
  3. For Source Data, select “JSON”
  4. Set JSON Property to the field containing your HTML (typically data from HTTP Request)

Step 3: Configure Extraction Values

Now define what to extract. Click Add Value in the Extraction Values section:

Extracting a page title:

SettingValue
KeypageTitle
CSS Selectorh1
Return ValueText

Extracting a product price:

SettingValue
Keyprice
CSS Selector.price_color
Return ValueText

Step 4: Test and Verify

Click Test step. The output should contain your extracted values:

{
  "pageTitle": "All products",
  "price": "ÂŁ51.77"
}

You can now use these values in subsequent nodes with expressions like {{ $json.pageTitle }}.

CSS Selectors: The Complete Guide

CSS selectors are patterns that identify HTML elements. The same selectors used to style webpages also work for extraction.

Finding Selectors with Browser DevTools

The fastest way to find the right selector:

  1. Open the webpage in Chrome (or any browser)
  2. Right-click the element you want to extract
  3. Select Inspect to open DevTools
  4. In the Elements panel, right-click the highlighted HTML
  5. Choose Copy > Copy selector

This gives you a precise selector for that element. However, auto-generated selectors are often overly specific. You may need to simplify them.

Basic Selector Patterns

SelectorMatchesExample
tagnameAll elements of that typeh1 matches all <h1> elements
.classnameElements with that class.product-title matches <div class="product-title">
#idnameElement with that ID#main-content matches <div id="main-content">
tag.classTag with specific classp.description matches <p class="description">
parent childDescendant elementsdiv p matches <p> inside <div>
parent > childDirect children onlyul > li matches immediate <li> children

Attribute Selectors

Target elements by their attributes:

SelectorMatchesUse Case
[attr]Has attribute[href] matches all links
[attr="value"]Exact attribute value[type="email"] matches email inputs
[attr^="start"]Attribute starts with[href^="https"] matches HTTPS links
[attr$="end"]Attribute ends with[href$=".pdf"] matches PDF links
[attr*="contains"]Attribute contains[class*="price"] matches classes containing “price”

Combining Selectors

Build precise selectors by combining patterns:

/* Element with multiple classes */
div.product.featured

/* Element with class inside another element */
article.post h2.title

/* Element with specific attribute inside class */
.product-card a[href*="/product/"]

/* Multiple comma-separated selectors */
h1, h2, h3

Pseudo-Selectors (Position-Based)

Select elements by their position:

SelectorMatches
:first-childFirst child element
:last-childLast child element
:nth-child(n)nth child (1-indexed)
:nth-child(odd)Odd-numbered children
:nth-child(even)Even-numbered children

Example: .product-list li:first-child selects the first product in a list.

Important: Some advanced pseudo-selectors like :nth-child(n+4) may not work as expected in n8n. Test thoroughly before relying on complex selectors.

CSS Selector Quick Reference

GoalSelector
All paragraphsp
Element by ID#header
Element by class.product-name
All linksa
Links with specific texta[href*="product"]
First item in listul li:first-child
All table rowstable tr
Specific data attribute[data-product-id]
Multiple selectorsh1, h2, h3

For comprehensive CSS selector documentation, see MDN’s CSS Selectors guide.

Extraction Configuration Deep Dive

Understanding every configuration option prevents extraction failures.

Source Data Options

OptionWhen to Use
JSONHTML is in a JSON property (most common with HTTP Request)
BinaryHTML is in a binary property (file upload, attachment)

For HTTP Request responses, use JSON and specify the property name containing HTML (usually data).

Return Value Types

This critical setting determines what gets extracted:

Return ValueExtractsExample Output
TextText content only (no HTML tags)"Product Name"
HTMLInner HTML including child elements"<span>Product</span> Name"
AttributeValue of a specific attribute"/products/123" (from href)

Text is the default and works for most extractions. Use Attribute when you need link URLs, image sources, or data attributes.

Attribute extraction example:

SettingValue
CSS Selectora.product-link
Return ValueAttribute
Attributehref

Extraction Options

OptionEffect
Trim ValuesRemoves whitespace from start and end
Clean Up TextRemoves line breaks and condenses multiple spaces
Return ArrayReturns all matches as array instead of first match only

Return Array is essential when scraping lists. Without it, you only get the first matching element.

Example: To extract all product names on a page:

SettingValue
CSS Selector.product-name
Return ValueText
Return ArrayEnabled

Output:

{
  "productNames": ["Widget A", "Widget B", "Widget C"]
}

Multiple Extraction Values

Add multiple extraction values to pull different data in a single node:

KeyCSS SelectorReturn Value
titleh1.product-titleText
price.current-priceText
imageUrl.product-image imgAttribute (src)
description.product-descriptionText
rating.star-ratingAttribute (class)

All values appear in the same output object.

Real-World Scraping Examples

Example 1: E-commerce Product Scraping

Scenario: Extract product information from an online store.

Workflow:

Schedule Trigger → HTTP Request (product page) → HTML Extract → Edit Fields (clean data) → Airtable/Sheets

HTTP Request:

  • URL: https://store.example.com/products/widget-123
  • Method: GET

HTML Extraction Values:

KeyCSS SelectorReturn Value
productNameh1.product-titleText
currentPrice.price-currentText
originalPrice.price-originalText
availability.stock-statusText
productImage.product-gallery imgAttribute (src)

Post-processing with Edit Fields:

// Clean price (remove currency symbol, convert to number)
{{ parseFloat($json.currentPrice.replace(/[^0-9.]/g, '')) }}

// Calculate discount percentage
{{ Math.round((1 - $json.currentPrice / $json.originalPrice) * 100) }}%

Example 2: News Article Aggregation

Scenario: Collect headlines and links from news sites for a daily digest.

Workflow:

Schedule Trigger → HTTP Request (news page) → HTML Extract → Split In Batches → Process Each

HTML Extraction:

KeyCSS SelectorReturn ValueReturn Array
headlinesarticle h2 aTextYes
linksarticle h2 aAttribute (href)Yes
timestampsarticle timeAttribute (datetime)Yes

Processing the arrays:

Use a Code node to combine the parallel arrays:

const items = $input.all();
const data = items[0].json;

// Combine parallel arrays into article objects
return data.headlines.map((headline, i) => ({
  json: {
    headline: headline,
    url: data.links[i],
    published: data.timestamps[i]
  }
}));

Example 3: Table Data Extraction

Scenario: Parse pricing tables or data grids from web pages.

Workflow:

HTTP Request → HTML Extract (headers) → HTML Extract (rows) → Code (combine) → Output

Extracting table headers:

KeyCSS SelectorReturn ValueReturn Array
headerstable thead thTextYes

Extracting table rows:

KeyCSS SelectorReturn ValueReturn Array
rowDatatable tbody trHTMLYes

Then use a Code node to parse row data into structured objects using the headers.

Example 4: Generating HTML Email Templates

Scenario: Create personalized HTML emails from workflow data.

Setup:

  1. Set Operation to “Generate HTML Template”
  2. Enter your HTML template with n8n expressions:
<div style="font-family: Arial, sans-serif; max-width: 600px;">
  <h1>Hello {{ $json.firstName }}!</h1>
  <p>Your order #{{ $json.orderId }} has shipped.</p>
  <table style="width: 100%; border-collapse: collapse;">
    <tr>
      <td>Tracking Number:</td>
      <td>{{ $json.trackingNumber }}</td>
    </tr>
    <tr>
      <td>Estimated Delivery:</td>
      <td>{{ $json.estimatedDelivery }}</td>
    </tr>
  </table>
  <p>Thank you for your business!</p>
</div>

The output is a rendered HTML string ready for email nodes.

Handling JavaScript-Rendered Content

A critical limitation: the HTTP Request node fetches raw HTML before JavaScript executes. Modern websites using React, Vue, Angular, or other frameworks often render content client-side.

Symptoms of JavaScript-Rendered Content

  • Your selector works in the browser but returns nothing in n8n
  • Inspecting the HTTP Request output shows minimal HTML
  • The page shows a loading spinner or “Enable JavaScript” message
  • Content appears after a delay when you load the page manually

Solutions

1. Check for API endpoints

Many JavaScript sites fetch data from APIs. Open browser DevTools Network tab, filter by “Fetch/XHR”, and look for JSON responses. You may be able to call these APIs directly with HTTP Request.

2. Use headless browser services

Services like Browserless, ScrapingBee, or ScrapFly render pages in real browsers and return the final HTML.

3. Community nodes

Search the n8n community for nodes that support JavaScript rendering, such as those integrating with Puppeteer or Playwright.

4. External scraping APIs

Third-party scraping services handle JavaScript rendering and anti-bot measures for you.

For most static websites and server-rendered pages, the standard HTTP Request + HTML combination works perfectly.

CSS Selector Troubleshooting

When extraction fails, systematic debugging finds the problem.

Common Errors and Solutions

ProblemCauseSolution
Selector returns empty/nullElement doesn’t exist in raw HTMLCheck if content is JavaScript-rendered
Selector works in browser, fails in n8nDifferent HTML structure or JS-renderedCompare HTTP Request output with browser source
Only first element extractedReturn Array disabledEnable “Return Array” option
Wrong element selectedSelector too genericMake selector more specific
Extraction includes unwanted textSelector matches parent elementTarget more specific child element
Special characters break selectorUnescaped characters in selectorEscape special characters or use attribute selector

Dynamic Class Names (Next.js, React)

Modern frameworks often generate class names with random suffixes:

<div class="ProductCard_container__a1B2c">

The a1B2c part changes on every deployment, breaking your selector.

Solutions:

  1. Use partial attribute matching:
[class^="ProductCard_container"]
[class*="ProductCard_container"]
  1. Find stable identifiers:

Look for data- attributes, IDs, or semantic HTML that doesn’t change:

[data-testid="product-card"]
article[itemtype*="Product"]
  1. Use structural selectors:

Target elements by their position in stable parent structures:

.products-grid > div:first-child
main article:nth-child(2)

Spaces in Class Names

HTML elements can have multiple classes separated by spaces:

<div class="product featured sale">

To match this element, use any single class:

.product
.featured
.sale

Or combine them (element must have all):

.product.featured.sale

Common mistake: Using .product featured (with a space) which means “element with class featured inside element with class product”.

Selector Maintenance Strategies

  1. Prefer IDs over classes: IDs are typically more stable
  2. Use data attributes: [data-product-id] is often added intentionally and stable
  3. Avoid generated class names: Skip classes that look like random strings
  4. Test with multiple pages: Ensure selectors work across different page variations
  5. Document your selectors: Add comments explaining what each selector targets
  6. Monitor for failures: Set up error handling to alert you when extractions fail

For debugging expression issues, our workflow debugger tool can help identify problems.

Pro Tips and Best Practices

1. Always Test Selectors in Browser First

Before configuring the HTML node:

  1. Open the target page in your browser
  2. Press F12 to open DevTools
  3. Go to Console tab
  4. Run: document.querySelectorAll('your-selector')
  5. Verify the correct elements are selected

This catches selector errors before they reach your workflow.

2. Compare Browser Source vs HTTP Request

Sometimes the browser shows different HTML than HTTP Request returns:

  • View Page Source (Ctrl+U) shows the raw HTML, similar to HTTP Request
  • DevTools Elements panel shows the DOM after JavaScript modifications

Always compare your selectors against the raw source, not the rendered DOM.

3. Handle Missing Data Gracefully

Not every page will have every element. Use expressions to handle missing values:

{{ $json.price || 'Price not available' }}
{{ $json.rating ?? 0 }}

4. Respect Rate Limits

When scraping multiple pages:

5. Set Proper Headers to Avoid Blocks

Some websites block requests that lack browser-like headers. In your HTTP Request node, add these headers under Options > Headers:

HeaderValue
User-AgentMozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36
Accepttext/html,application/xhtml+xml

This mimics a real browser request and prevents many basic blocks.

6. Build Incrementally

Start with one extraction value. Test it. Add the next. This isolates problems and prevents debugging complex configurations.

7. Use Meaningful Keys

Instead of value1, value2, use descriptive keys like productPrice, productTitle. This makes downstream processing clearer.

8. Combine with Code Node for Complex Parsing

When CSS selectors alone aren’t enough, extract raw HTML and parse it in a Code node:

const html = $json.rawHtml;

// Use regex for complex extraction
const priceMatch = html.match(/\$(\d+\.\d{2})/);
const price = priceMatch ? parseFloat(priceMatch[1]) : null;

return { price };

For expression validation, try our expression validator tool.

When to Get Help

Some scraping scenarios require specialized expertise:

  • Anti-bot protection: Cloudflare, reCAPTCHA, or other blocking mechanisms
  • Session-based content: Pages requiring login or complex cookies
  • Large-scale scraping: Thousands of pages with rate limiting concerns
  • Data transformation: Complex restructuring of scraped data

Our workflow development services can build production-ready scraping solutions. For architectural guidance, explore our n8n consulting services.

Frequently Asked Questions

Why does my CSS selector work in Chrome DevTools but return nothing in n8n?

This disconnect is almost always caused by JavaScript-rendered content. When you inspect a page in Chrome, you see the DOM after JavaScript has executed. When HTTP Request fetches the page, it gets the raw HTML before any JavaScript runs.

To verify: Press Ctrl+U in Chrome to view the raw source code. Search for the text or element you’re trying to extract. If it doesn’t exist in the source but appears in the DOM, the content is JavaScript-rendered.

Solutions:

  1. Check if the site has a public API that returns the data as JSON
  2. Use a headless browser service that executes JavaScript
  3. Look for community nodes that support JavaScript rendering

For static pages, ensure your selector exactly matches the HTML structure. Copy a selector from DevTools, but verify it against the raw source.

How do I scrape content that only appears after JavaScript loads?

The standard HTTP Request node cannot execute JavaScript. You have several options:

API discovery: Open DevTools Network tab, filter by “Fetch/XHR”, and watch the requests as the page loads. Many JavaScript sites load data from APIs that you can call directly.

Headless browser services: Services like Browserless, ScrapingBee, or Apify render pages in real browsers and return the final HTML. Use HTTP Request to call their APIs.

Server-side rendering detection: Some sites serve full HTML to search engine bots. Try adding a User-Agent header that mimics Googlebot.

The best solution depends on your specific target site. Start by investigating whether an API exists.

What is the difference between Text, HTML, and Attribute return values?

These options determine what data gets extracted from matched elements:

Text returns the visible text content only, stripping all HTML tags:

<p>Hello <strong>world</strong></p>

Returns: "Hello world"

HTML returns the inner HTML including all child elements and tags:

<p>Hello <strong>world</strong></p>

Returns: "Hello <strong>world</strong>"

Attribute returns the value of a specific attribute you specify:

<a href="/products/123" class="link">View Product</a>

With Attribute set to href, returns: "/products/123"

Use Text for readable content (titles, descriptions, prices). Use Attribute for URLs, image sources, data attributes, or class names.

How do I handle websites where class names change with each deployment?

Modern frameworks like Next.js, Nuxt, and others often generate class names with hash suffixes (Button_primary__x7Yz9) that change when the site is rebuilt.

Strategy 1: Partial class matching

Use CSS attribute selectors that match the beginning of the class:

[class^="Button_primary"]
[class*="ProductCard_title"]

Strategy 2: Find stable identifiers

Look for elements that developers add intentionally:

  • data-testid attributes (added for testing)
  • id attributes
  • Semantic HTML5 elements (article, main, nav)
  • Schema.org markup (itemtype, itemprop)

Strategy 3: Structural selectors

If the page structure is stable even when classes change:

.product-grid > div > h2
main > section:first-child p

Strategy 4: XPath alternative

For very complex cases, extract the HTML and use a Code node with a DOM parsing library to find elements by text content or position.

Monitor your scrapers regularly, as websites can change structure at any time.

Can I extract multiple different elements with different selectors in one node?

Yes. The HTML node supports multiple extraction values in a single operation. Click Add Value to add additional extractions.

Each extraction value has its own:

  • Key (output property name)
  • CSS Selector
  • Return Value type
  • Options

All extracted values appear in the same output object:

{
  "title": "Product Name",
  "price": "$29.99",
  "imageUrl": "/images/product.jpg",
  "description": "A great product..."
}

For extractions that return arrays (with “Return Array” enabled), each array is a separate property:

{
  "titles": ["Product 1", "Product 2", "Product 3"],
  "prices": ["$10", "$20", "$30"],
  "links": ["/p/1", "/p/2", "/p/3"]
}

To combine these parallel arrays into structured objects, use a Code node after extraction. This pattern is covered in the real-world examples section above.

Ready to Automate Your Business?

Tell us what you need automated. We'll build it, test it, and deploy it fast.

âś“ 48-72 Hour Turnaround
âś“ Production Ready
âś“ Free Consultation
⚡

Create Your Free Account

Sign up once, use all tools free forever. We require accounts to prevent abuse and keep our tools running for everyone.

or

By signing up, you agree to our Terms of Service and Privacy Policy. No spam, unsubscribe anytime.