n8n Remove Duplicates Node

Duplicate data silently sabotages your automations. Your CRM sync runs daily, but the same contacts get processed repeatedly. Your webhook fires multiple times for a single event. Your paginated API returns overlapping records between pages. Each duplicate wastes API calls, triggers redundant notifications, and corrupts your analytics.

The Remove Duplicates node solves this problem at two levels: it can eliminate duplicates within a single execution, and it can track items across multiple executions to ensure you never process the same record twice.

The Duplicate Data Problem

Duplicates creep into workflows from multiple sources:

API pagination overlap: Many APIs return the same items across page boundaries, especially when records are added or removed between requests
Webhook retries: Payment processors and notification services retry failed webhooks, triggering the same event multiple times
Data sync conflicts: Pulling from multiple sources often yields the same records from each source
Schedule timing issues: A scheduled workflow might catch the same new records on consecutive runs before they are marked as processed

Without deduplication, your downstream nodes process everything multiple times. Emails send twice. Database records duplicate. Reports show inflated numbers.

What the Remove Duplicates Node Does

The node offers three distinct operations for different deduplication needs (see the official n8n documentation for the complete reference):

Remove items repeated within current input: Compares items within a single execution and keeps only unique ones
Remove items processed in previous executions: Tracks items across workflow runs to skip previously seen data
Clear deduplication history: Resets the memory of previously processed items

Each operation serves different use cases. Within-execution dedup handles API pagination. Cross-execution dedup handles recurring workflows that should only process new data.

What You’ll Learn

When to use each of the three operations
Field comparison strategies for identifying duplicates
Cross-execution deduplication with history tracking
Scope options for sharing history across nodes
Common mistakes that cause deduplication failures
Real-world patterns for webhooks, APIs, and data sync

Quick Reference

Parameter	Options	Default
Operation	Within Input / Previous Executions / Clear History	Within Input
Compare	All Fields / All Fields Except / Selected Fields	All Fields
Keep Items Where	Value Is New / Higher / Later Date	Value Is New
Scope	Node / Workflow	Node
History Size	1 to 1,000,000	10,000

When to Use the Remove Duplicates Node

Before configuring the node, understand when it applies versus other data transformation options.

Scenario	Best Choice	Why
Remove duplicates within current data	Remove Duplicates	Built-in operation for single execution
Process only new items across runs	Remove Duplicates	Cross-execution history tracking
Find differences between two datasets	Compare Datasets	Four-branch output for different states
Keep items matching conditions	Filter	Conditional filtering, not deduplication
Combine duplicate items into aggregates	Aggregate	Grouping and summarizing
Join data from multiple sources	Merge	Combining streams, not removing duplicates

Rule of thumb: Use Remove Duplicates when items should only appear once in your output. Use Compare Datasets when you need to know what changed between two datasets. Use Filter when you want to keep items based on conditions rather than uniqueness.

Understanding the Three Operations

The Remove Duplicates node provides three operations, each addressing a specific deduplication scenario.

Operation 1: Remove Items Repeated Within Current Input

This operation examines all items flowing into the node during a single execution. It identifies duplicates and keeps only the first occurrence of each unique item.

When to use:

Paginated API responses with overlapping records
Data merged from multiple sources in the same workflow
Any scenario where duplicates exist within a single batch of data

Compare options:

Option	Behavior	Best For
All Fields	Items must be identical across every field	Exact duplicates only
All Fields Except	Ignores specified fields during comparison	Excluding timestamps or metadata
Selected Fields	Only compares specified fields	Matching by ID regardless of other differences

Configuration steps:

Add the Remove Duplicates node to your workflow
Set Operation to “Remove Items Repeated Within Current Input”
Choose your Compare strategy
If using “All Fields Except” or “Selected Fields”, enter the field names (comma-separated)
Connect to subsequent nodes

Which duplicate gets kept:

The node keeps the first occurrence and removes subsequent duplicates. If you need to keep a specific version (such as the most complete record or the most recent), sort your data before the Remove Duplicates node using a Code node or the Sort node.

Operation 2: Remove Items Processed in Previous Executions

This operation maintains a history of previously processed items. When the workflow runs again, it compares incoming items against this history and only passes through items that have not been seen before.

When to use:

Scheduled workflows that should only process new records
Webhook handlers that might receive duplicate events
Any recurring workflow where reprocessing the same data causes problems

Keep Items Where options:

Option	Passes Through When	Use Case
Value Is New	The value has never been seen	Standard deduplication across runs
Value Is Higher than Any Previous Value	The current value exceeds all stored values	Incremental ID processing
Value Is a Date Later than Any Previous Date	The current date exceeds all stored dates	Date-based incremental processing

Scope options:

Node (default): History is stored independently for this specific Remove Duplicates node. Other nodes in the workflow have their own separate history.
Workflow: History is shared across all Remove Duplicates nodes in the workflow that also use “Workflow” scope. Useful when multiple paths need to share deduplication state.

History Size parameter:

The node stores up to 10,000 items by default. When this limit is reached, the oldest items are removed (FIFO: first in, first out) to make room for new ones. Adjust this value in the node options if you need to track more items.

Operation 3: Clear Deduplication History

This operation wipes the stored history of previously processed items. After clearing, the node will treat all items as new again.

When to use:

Resetting after a major data migration
Starting fresh after fixing data quality issues
Testing workflows that use cross-execution deduplication

Scope considerations:

The clear operation respects the scope setting. Clearing at “Node” scope only affects this node’s history. Clearing at “Workflow” scope affects all nodes sharing that workflow-level history.

Workflow pattern:

A common pattern uses a separate branch that triggers the clear operation:

[Manual Trigger] → [Remove Duplicates: Clear History]

This gives you a manual reset button without modifying the main workflow logic.

Your First Remove Duplicates Workflow

Walk through a practical example that demonstrates within-execution deduplication.

Step 1: Create Sample Data with Duplicates

Add a Code node to generate test data:

return [
  { json: { id: 1, name: "Taylor Swift", job: "Pop star" } },
  { json: { id: 2, name: "Ed Sheeran", job: "Singer-songwriter" } },
  { json: { id: 3, name: "Adele", job: "Singer-songwriter" } },
  { json: { id: 1, name: "Taylor Swift", job: "Pop star" } },  // duplicate
  { json: { id: 4, name: "Bruno Mars", job: "Singer-songwriter" } },
  { json: { id: 2, name: "Ed Sheeran", job: "Singer-songwriter" } },  // duplicate
];

Step 2: Add the Remove Duplicates Node

Click + to add a node
Search for “Remove Duplicates”
Connect it to your Code node

Step 3: Configure for All Fields Comparison

Set Operation to “Remove Items Repeated Within Current Input”
Set Compare to “All Fields”

Step 4: Test the Node

Click Test step and examine the output. You should see 4 unique items instead of 6. The duplicate entries for Taylor Swift and Ed Sheeran have been removed.

Step 5: Try Selected Fields

Now change the configuration:

Set Compare to “Selected Fields”
Enter id in the Values to Compare field

Test again. This time, deduplication occurs based only on the id field, ignoring name and job. The result is the same in this case, but this approach handles scenarios where the same ID might have different metadata.

Field Comparison Strategies

Choosing the right comparison strategy determines which items are considered duplicates.

Comparing All Fields

The strictest option. Two items are duplicates only if every single field matches exactly.

Pros:

Safest choice when you need exact duplicates removed
No configuration required

Cons:

Timestamps, metadata, or auto-generated fields cause false negatives
Items that are logically the same but differ in any field pass through

Best for: Data that comes from a single source with consistent formatting.

Comparing Selected Fields

You specify which fields to compare. Items are duplicates if those specific fields match, regardless of other field values.

Example: Compare only by email field:

Values to Compare: email

Two contacts with the same email but different names or phone numbers are considered duplicates.

Multiple fields: Enter comma-separated field names:

Values to Compare: customer_id, order_date

Items must match on ALL specified fields to be considered duplicates.

Best for: Data with unique identifiers like IDs, emails, or composite keys.

Excluding Fields from Comparison

You specify which fields to ignore. All other fields are compared.

Example: Exclude timestamp and sync metadata:

Values to Skip: last_updated, sync_id, created_at

Two items with the same ID, name, and email but different timestamps are considered duplicates.

Best for: Data with auto-generated metadata fields that differ between otherwise identical records.

Cross-Execution Deduplication

The most powerful feature of the Remove Duplicates node is tracking items across multiple workflow executions. This ensures you process each unique item only once, regardless of how many times the workflow runs.

How History Tracking Works

When using “Remove Items Processed in Previous Executions”:

The node maintains an in-memory store of previously seen values
Each incoming item is checked against this store
Items found in the store are filtered out
New items are added to the store and passed through

The store persists across workflow executions but is lost if n8n restarts (unless using a database-backed n8n installation).

History Size and FIFO Behavior

The History Size parameter (default: 10,000) controls how many items the node remembers. When this limit is reached:

The oldest items are removed to make room for new ones (FIFO: first in, first out)
Items removed from history will be processed again if they reappear

Important: If a single batch of incoming items exceeds the history size, you will receive an error: “The number of items to be processed exceeds the maximum history size.” Either increase the history size or process data in smaller batches.

Scope: Node vs Workflow

Node scope (default):

Each Remove Duplicates node maintains its own independent history
Multiple nodes in the same workflow do not share history
Clearing history affects only this node

Workflow scope:

All Remove Duplicates nodes with “Workflow” scope share a single history
Useful when different branches process different subsets of the same data
Clearing history affects all nodes using workflow scope

Example use case for workflow scope:

[Webhook] → [Route by Type] → Branch A: [Remove Duplicates: Workflow Scope]
                            → Branch B: [Remove Duplicates: Workflow Scope]

Both branches share history, so an item processed in Branch A will not be processed again in Branch B if the same webhook fires twice.

Value Comparison Options

Value Is New:

Standard deduplication. The node checks if the specified value has ever been seen. If not, the item passes through and the value is stored.

Value to Dedupe On: $json.id

Value Is Higher than Any Previous Value:

For incremental ID processing. The item passes through only if its value is greater than all previously stored values.

Value to Dedupe On: $json.customer_id
Keep Items Where: Value Is Higher than Any Previous Value

This is perfect for databases with auto-incrementing IDs. Each run processes only records with IDs higher than any previously seen.

Value Is a Date Later than Any Previous Date:

Same concept but for date fields. Only items with dates later than any stored date pass through.

Value to Dedupe On: $json.created_at
Keep Items Where: Value Is a Date Later than Any Previous Date

Use this for time-series data or audit logs where you only want new entries.

Common Mistakes and How to Fix Them

These issues cause the most confusion based on community discussions and support requests.

Mistake 1: Wrong Duplicate Gets Kept

Problem: The node keeps the first occurrence, but you want to keep the most complete record or the most recent version.

Example: You have two records for the same customer:

{ "id": 123, "name": "John", "state": "" }
{ "id": 123, "name": "John", "state": "Texas" }

If the empty-state record comes first, that is what gets kept.

Solution: Sort your data before the Remove Duplicates node. Use a Code node to sort records so the preferred version comes first:

const items = $input.all();

// Sort so records WITH state come before records WITHOUT
items.sort((a, b) => {
  const aHasState = a.json.state && a.json.state.length > 0;
  const bHasState = b.json.state && b.json.state.length > 0;
  return bHasState - aHasState;  // true (1) minus false (0) = positive
});

return items;

Now the complete record appears first and gets kept.

Mistake 2: History Size Overflow Error

Problem: You see the error “The number of items to be processed exceeds the maximum history size.”

Cause: A single batch of incoming items is larger than the configured history size.

Solutions:

Increase history size: In node options, set a higher History Size value
Process in batches: Use the Split In Batches node before Remove Duplicates to process smaller chunks
Periodic history clear: Schedule a workflow to clear history when it grows too large

For very high volumes, consider using a database-based deduplication approach with the Code node and external storage.

Mistake 3: Type Mismatches in Field Comparison

Problem: Records that should match are treated as different.

Cause: One source stores IDs as strings ("123"), another as numbers (123). Strict comparison sees these as different values.

Example:

// From API A
{ "customer_id": 123 }

// From API B
{ "customer_id": "123" }

These do not match despite having the same logical ID.

Solution: Normalize types before the Remove Duplicates node using an Edit Fields node:

// Expression to normalize to string
{{ String($json.customer_id) }}

Or use a Code node to standardize all fields.

Mistake 4: Expecting Cross-Execution Dedup Without Configuring It

Problem: Duplicates still pass through on subsequent workflow runs.

Cause: The default operation is “Remove Items Repeated Within Current Input” which only looks at the current execution.

Solution: Change the operation to “Remove Items Processed in Previous Executions” and configure:

Set Value to Dedupe On to the unique identifier field
Choose an appropriate Keep Items Where option
Set the Scope if needed

Without this configuration, each execution starts fresh with no memory of previous runs.

Mistake 5: Scope Confusion When Clearing History

Problem: Clearing history in one node unexpectedly affects another node, or fails to affect nodes you expected.

Cause: Misunderstanding of Node vs Workflow scope.

Explanation:

Node scope: Each node has its own isolated history. Clearing one node’s history does not affect others.
Workflow scope: All nodes with workflow scope share history. Clearing affects all of them.

Best practice:

Use Node scope (default) unless you specifically need shared history
When using Workflow scope, document which nodes share history
Test clear operations in a development environment first

Real-World Examples

Example 1: Webhook Retry Deduplication

Scenario: Your payment processor sends webhooks for successful payments, but sometimes retries the same webhook if acknowledgment is delayed.

Problem: Customer records get updated multiple times, emails send twice.

Solution:

[Webhook] → [Remove Duplicates: Cross-Execution] → [Process Payment]

Configuration:

Operation: Remove Items Processed in Previous Executions
Value to Dedupe On: {{ $json.payment_id }}
Keep Items Where: Value Is New
History Size: 50000 (adjust based on payment volume)

Each payment processes exactly once, regardless of webhook retries.

Example 2: API Pagination with Overlapping Results

Scenario: You fetch all products from an e-commerce API using pagination. The API sometimes returns the same product on consecutive pages.

Problem: Duplicate products appear in your database sync.

Solution:

[Loop Over Pages] → [HTTP Request] → [Remove Duplicates: Within Execution] → [Database Insert]

Configuration:

Operation: Remove Items Repeated Within Current Input
Compare: Selected Fields
Values to Compare: sku

Products with the same SKU are deduplicated before database insertion.

Example 3: CRM Sync with Incremental IDs

Scenario: A scheduled workflow syncs new customers from your CRM every hour. The CRM uses auto-incrementing IDs.

Problem: You only want to process customers added since the last run.

Solution:

[Schedule Trigger] → [CRM: Get All Customers] → [Remove Duplicates: Higher Value] → [Process New Customers]

Configuration:

Operation: Remove Items Processed in Previous Executions
Value to Dedupe On: {{ $json.customer_id }}
Keep Items Where: Value Is Higher than Any Previous Value

Each run processes only customers with IDs higher than any previously seen.

Example 4: Daily Report with New Entries Only

Scenario: A daily workflow generates a report of new support tickets. Tickets should only appear in one report.

Problem: If the same ticket is still open the next day, it should not appear again.

Solution:

[Schedule: Daily] → [Zendesk: Get Open Tickets] → [Remove Duplicates: Cross-Execution] → [Generate Report]

Configuration:

Operation: Remove Items Processed in Previous Executions
Value to Dedupe On: {{ $json.ticket_id }}
Keep Items Where: Value Is New
History Size: 100000

Each ticket appears in exactly one daily report.

Example 5: Multi-Source Data Consolidation

Scenario: You pull customer data from three different sources (CRM, billing, support) and need a unified list without duplicates.

Problem: The same customer exists in multiple systems with the same email but different record IDs.

Solution:

[CRM Data] ──┬──► [Merge: Append] → [Remove Duplicates] → [Unified Customer List]
[Billing] ───┤
[Support] ───┘

Configuration:

Operation: Remove Items Repeated Within Current Input
Compare: Selected Fields
Values to Compare: email

Customers are deduplicated by email, keeping the first occurrence from whichever source is merged first.

Remove Duplicates vs JavaScript Deduplication

For complex deduplication logic, you might consider using a Code node instead. Here is when to use each approach.

Use the Remove Duplicates Node When:

You need cross-execution history tracking
Deduplication logic is straightforward (by fields)
You want a no-code solution
You need scope management (node vs workflow)

Use a Code Node When:

You need custom comparison logic (fuzzy matching, partial field comparison)
You want to keep a specific duplicate based on complex rules
You need to count or report on duplicates
You are already doing significant data transformation in code

JavaScript deduplication example:

// Get all items from the previous node
const items = $input.all();

// Track emails we've already seen (Set ensures uniqueness)
const seen = new Set();

// Store items that pass our dedup check
const unique = [];

for (const item of items) {
  // Normalize email: lowercase and remove whitespace
  // The ?. safely handles cases where email might be missing
  const key = item.json.email?.toLowerCase().trim();

  // Only keep this item if:
  // 1. The email exists (key is truthy)
  // 2. We haven't seen this email before
  if (key && !seen.has(key)) {
    seen.add(key);    // Remember this email
    unique.push(item); // Keep this item
  }
}

// Return only the unique items
return unique;

This example uses the JavaScript Set object to track seen values efficiently. It normalizes email addresses (lowercase, trimmed) before comparison, which the Remove Duplicates node cannot do natively. For more advanced filtering patterns, see the MDN Array filter() documentation.

For expression help, try our expression validator tool.

Pro Tips and Best Practices

1. Sort Before Deduplicating

The node keeps the first occurrence. If you need the most complete or most recent record, sort your data first:

[Get Data] → [Code: Sort by completeness] → [Remove Duplicates]

This gives you control over which duplicate survives.

2. Use Specific Fields for Large Datasets

Comparing all fields is computationally expensive. For large datasets, always use “Selected Fields” with just the unique identifier:

Values to Compare: id

This significantly improves performance.

3. Monitor History Size in Production

If you use cross-execution deduplication, monitor how many items you process over time. When history approaches the limit, oldest items get removed and may be reprocessed.

Consider scheduling a history clear with notification:

[Schedule: Weekly] → [Remove Duplicates: Clear History] → [Slack: History Cleared]

4. Combine with Filter for Complex Logic

Sometimes you need both deduplication and filtering. Chain the nodes:

[Get Data] → [Filter: Active Only] → [Remove Duplicates: By ID] → [Process]

Filter first to reduce the number of items before deduplication.

5. Document Your Dedup Strategy

Add a sticky note explaining:

Which fields are used for comparison
Why that strategy was chosen
Expected history size and cleanup schedule

This helps future you (or your team) maintain the workflow.

6. Test with Production-Like Data

Test your deduplication with realistic data volumes and duplicate patterns. Edge cases like:

All items are duplicates
No items are duplicates
First and last items are duplicates of each other
Nested fields with null values

Use our workflow debugger tool to identify issues.

For complex deduplication requirements or high-volume scenarios, our workflow development services can help design robust solutions. For strategic architecture guidance, explore our consulting services.

Frequently Asked Questions

How does the Remove Duplicates node decide which duplicate to keep?

The node always keeps the first occurrence and removes subsequent duplicates. The order of items flowing into the node determines which one survives.

Need to keep a specific version? Sort your data before the Remove Duplicates node:

Use a Code node to sort items so your preferred version appears first
Sort by completeness score descending to keep the most complete record
Sort by timestamp descending to keep the most recent version

See the n8n data transformation guide for sorting patterns.

What happens when the history size limit is reached in cross-execution mode?

The node uses FIFO (first in, first out) behavior. When history reaches the configured limit (default 10,000), the oldest items are removed to make room for new ones.

Key implications:

Very old items may be reprocessed if they reappear after falling out of history
If a single batch exceeds the history size, you get an error

Solutions:

Increase the History Size parameter
Process data in smaller batches using Split In Batches
Schedule periodic history clears and accept occasional reprocessing

Can I remove duplicates based on multiple fields?

Yes. When using “Selected Fields” comparison, enter comma-separated field names:

customer_id, order_date

Items are considered duplicates only if ALL specified fields match. This creates a composite key for comparison.

For complex nested fields:

Flatten your data first using an Edit Fields node
Or create a combined key field in a Code node

Test your field combinations using our expression validator.

How do I deduplicate both within a single execution AND across executions?

Chain two Remove Duplicates nodes in sequence:

[Get Data] → [Remove Duplicates: Within Input] → [Remove Duplicates: Cross Execution] → [Process]

Step 1: Use “Remove Items Repeated Within Current Input” to eliminate duplicates within the current batch.

Step 2: Use “Remove Items Processed in Previous Executions” to filter out items seen in previous runs.

This two-step approach handles both pagination duplicates and recurring workflow deduplication.

Does clearing history affect other workflows or just the current one?

It depends on your scope setting:

Node scope (default):

Clearing affects only that specific Remove Duplicates node instance
Other nodes in the same workflow are unaffected
All nodes in other workflows are unaffected

Workflow scope:

Clearing affects all Remove Duplicates nodes in that workflow using workflow scope
Other workflows always have completely separate history

Best practice: When using workflow scope, document which nodes share history to avoid confusion during maintenance.

The Duplicate Data Problem

What the Remove Duplicates Node Does

What You’ll Learn

Quick Reference

When to Use the Remove Duplicates Node

Understanding the Three Operations

Operation 1: Remove Items Repeated Within Current Input

Operation 2: Remove Items Processed in Previous Executions

Operation 3: Clear Deduplication History

Your First Remove Duplicates Workflow

Step 1: Create Sample Data with Duplicates

Step 2: Add the Remove Duplicates Node

Step 3: Configure for All Fields Comparison

Step 4: Test the Node

Step 5: Try Selected Fields

Field Comparison Strategies

Comparing All Fields

Comparing Selected Fields

Excluding Fields from Comparison

Cross-Execution Deduplication

How History Tracking Works

History Size and FIFO Behavior

Scope: Node vs Workflow

Value Comparison Options

Common Mistakes and How to Fix Them

Mistake 1: Wrong Duplicate Gets Kept

Mistake 2: History Size Overflow Error

Mistake 3: Type Mismatches in Field Comparison

Mistake 4: Expecting Cross-Execution Dedup Without Configuring It

Mistake 5: Scope Confusion When Clearing History

Real-World Examples

Example 1: Webhook Retry Deduplication

Example 2: API Pagination with Overlapping Results

Example 3: CRM Sync with Incremental IDs

Example 4: Daily Report with New Entries Only

Example 5: Multi-Source Data Consolidation

Remove Duplicates vs JavaScript Deduplication

Use the Remove Duplicates Node When:

Use a Code Node When:

Pro Tips and Best Practices

1. Sort Before Deduplicating

2. Use Specific Fields for Large Datasets

3. Monitor History Size in Production

4. Combine with Filter for Complex Logic

5. Document Your Dedup Strategy

6. Test with Production-Like Data

Frequently Asked Questions

How does the Remove Duplicates node decide which duplicate to keep?

What happens when the history size limit is reached in cross-execution mode?

Can I remove duplicates based on multiple fields?

How do I deduplicate both within a single execution AND across executions?

Does clearing history affect other workflows or just the current one?

Ready to Automate Your Business?

Create Your Free Account

Get Expert Help