n8n Remove Duplicates Node
🧹
Transform Node

n8n Remove Duplicates Node

Master the n8n Remove Duplicates node to eliminate duplicate data. Learn within-execution and cross-execution deduplication, field comparison, history management, and real-world patterns.

Duplicate data silently sabotages your automations. Your CRM sync runs daily, but the same contacts get processed repeatedly. Your webhook fires multiple times for a single event. Your paginated API returns overlapping records between pages. Each duplicate wastes API calls, triggers redundant notifications, and corrupts your analytics.

The Remove Duplicates node solves this problem at two levels: it can eliminate duplicates within a single execution, and it can track items across multiple executions to ensure you never process the same record twice.

The Duplicate Data Problem

Duplicates creep into workflows from multiple sources:

  • API pagination overlap: Many APIs return the same items across page boundaries, especially when records are added or removed between requests
  • Webhook retries: Payment processors and notification services retry failed webhooks, triggering the same event multiple times
  • Data sync conflicts: Pulling from multiple sources often yields the same records from each source
  • Schedule timing issues: A scheduled workflow might catch the same new records on consecutive runs before they are marked as processed

Without deduplication, your downstream nodes process everything multiple times. Emails send twice. Database records duplicate. Reports show inflated numbers.

What the Remove Duplicates Node Does

The node offers three distinct operations for different deduplication needs (see the official n8n documentation for the complete reference):

  1. Remove items repeated within current input: Compares items within a single execution and keeps only unique ones
  2. Remove items processed in previous executions: Tracks items across workflow runs to skip previously seen data
  3. Clear deduplication history: Resets the memory of previously processed items

Each operation serves different use cases. Within-execution dedup handles API pagination. Cross-execution dedup handles recurring workflows that should only process new data.

What You’ll Learn

  • When to use each of the three operations
  • Field comparison strategies for identifying duplicates
  • Cross-execution deduplication with history tracking
  • Scope options for sharing history across nodes
  • Common mistakes that cause deduplication failures
  • Real-world patterns for webhooks, APIs, and data sync

Quick Reference

ParameterOptionsDefault
OperationWithin Input / Previous Executions / Clear HistoryWithin Input
CompareAll Fields / All Fields Except / Selected FieldsAll Fields
Keep Items WhereValue Is New / Higher / Later DateValue Is New
ScopeNode / WorkflowNode
History Size1 to 1,000,00010,000

When to Use the Remove Duplicates Node

Before configuring the node, understand when it applies versus other data transformation options.

ScenarioBest ChoiceWhy
Remove duplicates within current dataRemove DuplicatesBuilt-in operation for single execution
Process only new items across runsRemove DuplicatesCross-execution history tracking
Find differences between two datasetsCompare DatasetsFour-branch output for different states
Keep items matching conditionsFilterConditional filtering, not deduplication
Combine duplicate items into aggregatesAggregateGrouping and summarizing
Join data from multiple sourcesMergeCombining streams, not removing duplicates

Rule of thumb: Use Remove Duplicates when items should only appear once in your output. Use Compare Datasets when you need to know what changed between two datasets. Use Filter when you want to keep items based on conditions rather than uniqueness.

Understanding the Three Operations

The Remove Duplicates node provides three operations, each addressing a specific deduplication scenario.

Operation 1: Remove Items Repeated Within Current Input

This operation examines all items flowing into the node during a single execution. It identifies duplicates and keeps only the first occurrence of each unique item.

When to use:

  • Paginated API responses with overlapping records
  • Data merged from multiple sources in the same workflow
  • Any scenario where duplicates exist within a single batch of data

Compare options:

OptionBehaviorBest For
All FieldsItems must be identical across every fieldExact duplicates only
All Fields ExceptIgnores specified fields during comparisonExcluding timestamps or metadata
Selected FieldsOnly compares specified fieldsMatching by ID regardless of other differences

Configuration steps:

  1. Add the Remove Duplicates node to your workflow
  2. Set Operation to “Remove Items Repeated Within Current Input”
  3. Choose your Compare strategy
  4. If using “All Fields Except” or “Selected Fields”, enter the field names (comma-separated)
  5. Connect to subsequent nodes

Which duplicate gets kept:

The node keeps the first occurrence and removes subsequent duplicates. If you need to keep a specific version (such as the most complete record or the most recent), sort your data before the Remove Duplicates node using a Code node or the Sort node.

Operation 2: Remove Items Processed in Previous Executions

This operation maintains a history of previously processed items. When the workflow runs again, it compares incoming items against this history and only passes through items that have not been seen before.

When to use:

  • Scheduled workflows that should only process new records
  • Webhook handlers that might receive duplicate events
  • Any recurring workflow where reprocessing the same data causes problems

Keep Items Where options:

OptionPasses Through WhenUse Case
Value Is NewThe value has never been seenStandard deduplication across runs
Value Is Higher than Any Previous ValueThe current value exceeds all stored valuesIncremental ID processing
Value Is a Date Later than Any Previous DateThe current date exceeds all stored datesDate-based incremental processing

Scope options:

  • Node (default): History is stored independently for this specific Remove Duplicates node. Other nodes in the workflow have their own separate history.
  • Workflow: History is shared across all Remove Duplicates nodes in the workflow that also use “Workflow” scope. Useful when multiple paths need to share deduplication state.

History Size parameter:

The node stores up to 10,000 items by default. When this limit is reached, the oldest items are removed (FIFO: first in, first out) to make room for new ones. Adjust this value in the node options if you need to track more items.

Operation 3: Clear Deduplication History

This operation wipes the stored history of previously processed items. After clearing, the node will treat all items as new again.

When to use:

  • Resetting after a major data migration
  • Starting fresh after fixing data quality issues
  • Testing workflows that use cross-execution deduplication

Scope considerations:

The clear operation respects the scope setting. Clearing at “Node” scope only affects this node’s history. Clearing at “Workflow” scope affects all nodes sharing that workflow-level history.

Workflow pattern:

A common pattern uses a separate branch that triggers the clear operation:

[Manual Trigger] → [Remove Duplicates: Clear History]

This gives you a manual reset button without modifying the main workflow logic.

Your First Remove Duplicates Workflow

Walk through a practical example that demonstrates within-execution deduplication.

Step 1: Create Sample Data with Duplicates

Add a Code node to generate test data:

return [
  { json: { id: 1, name: "Taylor Swift", job: "Pop star" } },
  { json: { id: 2, name: "Ed Sheeran", job: "Singer-songwriter" } },
  { json: { id: 3, name: "Adele", job: "Singer-songwriter" } },
  { json: { id: 1, name: "Taylor Swift", job: "Pop star" } },  // duplicate
  { json: { id: 4, name: "Bruno Mars", job: "Singer-songwriter" } },
  { json: { id: 2, name: "Ed Sheeran", job: "Singer-songwriter" } },  // duplicate
];

Step 2: Add the Remove Duplicates Node

  1. Click + to add a node
  2. Search for “Remove Duplicates”
  3. Connect it to your Code node

Step 3: Configure for All Fields Comparison

  1. Set Operation to “Remove Items Repeated Within Current Input”
  2. Set Compare to “All Fields”

Step 4: Test the Node

Click Test step and examine the output. You should see 4 unique items instead of 6. The duplicate entries for Taylor Swift and Ed Sheeran have been removed.

Step 5: Try Selected Fields

Now change the configuration:

  1. Set Compare to “Selected Fields”
  2. Enter id in the Values to Compare field

Test again. This time, deduplication occurs based only on the id field, ignoring name and job. The result is the same in this case, but this approach handles scenarios where the same ID might have different metadata.

Field Comparison Strategies

Choosing the right comparison strategy determines which items are considered duplicates.

Comparing All Fields

The strictest option. Two items are duplicates only if every single field matches exactly.

Pros:

  • Safest choice when you need exact duplicates removed
  • No configuration required

Cons:

  • Timestamps, metadata, or auto-generated fields cause false negatives
  • Items that are logically the same but differ in any field pass through

Best for: Data that comes from a single source with consistent formatting.

Comparing Selected Fields

You specify which fields to compare. Items are duplicates if those specific fields match, regardless of other field values.

Example: Compare only by email field:

Values to Compare: email

Two contacts with the same email but different names or phone numbers are considered duplicates.

Multiple fields: Enter comma-separated field names:

Values to Compare: customer_id, order_date

Items must match on ALL specified fields to be considered duplicates.

Best for: Data with unique identifiers like IDs, emails, or composite keys.

Excluding Fields from Comparison

You specify which fields to ignore. All other fields are compared.

Example: Exclude timestamp and sync metadata:

Values to Skip: last_updated, sync_id, created_at

Two items with the same ID, name, and email but different timestamps are considered duplicates.

Best for: Data with auto-generated metadata fields that differ between otherwise identical records.

Cross-Execution Deduplication

The most powerful feature of the Remove Duplicates node is tracking items across multiple workflow executions. This ensures you process each unique item only once, regardless of how many times the workflow runs.

How History Tracking Works

When using “Remove Items Processed in Previous Executions”:

  1. The node maintains an in-memory store of previously seen values
  2. Each incoming item is checked against this store
  3. Items found in the store are filtered out
  4. New items are added to the store and passed through

The store persists across workflow executions but is lost if n8n restarts (unless using a database-backed n8n installation).

History Size and FIFO Behavior

The History Size parameter (default: 10,000) controls how many items the node remembers. When this limit is reached:

  • The oldest items are removed to make room for new ones (FIFO: first in, first out)
  • Items removed from history will be processed again if they reappear

Important: If a single batch of incoming items exceeds the history size, you will receive an error: “The number of items to be processed exceeds the maximum history size.” Either increase the history size or process data in smaller batches.

Scope: Node vs Workflow

Node scope (default):

  • Each Remove Duplicates node maintains its own independent history
  • Multiple nodes in the same workflow do not share history
  • Clearing history affects only this node

Workflow scope:

  • All Remove Duplicates nodes with “Workflow” scope share a single history
  • Useful when different branches process different subsets of the same data
  • Clearing history affects all nodes using workflow scope

Example use case for workflow scope:

[Webhook] → [Route by Type] → Branch A: [Remove Duplicates: Workflow Scope]
                            → Branch B: [Remove Duplicates: Workflow Scope]

Both branches share history, so an item processed in Branch A will not be processed again in Branch B if the same webhook fires twice.

Value Comparison Options

Value Is New:

Standard deduplication. The node checks if the specified value has ever been seen. If not, the item passes through and the value is stored.

Value to Dedupe On: $json.id

Value Is Higher than Any Previous Value:

For incremental ID processing. The item passes through only if its value is greater than all previously stored values.

Value to Dedupe On: $json.customer_id
Keep Items Where: Value Is Higher than Any Previous Value

This is perfect for databases with auto-incrementing IDs. Each run processes only records with IDs higher than any previously seen.

Value Is a Date Later than Any Previous Date:

Same concept but for date fields. Only items with dates later than any stored date pass through.

Value to Dedupe On: $json.created_at
Keep Items Where: Value Is a Date Later than Any Previous Date

Use this for time-series data or audit logs where you only want new entries.

Common Mistakes and How to Fix Them

These issues cause the most confusion based on community discussions and support requests.

Mistake 1: Wrong Duplicate Gets Kept

Problem: The node keeps the first occurrence, but you want to keep the most complete record or the most recent version.

Example: You have two records for the same customer:

{ "id": 123, "name": "John", "state": "" }
{ "id": 123, "name": "John", "state": "Texas" }

If the empty-state record comes first, that is what gets kept.

Solution: Sort your data before the Remove Duplicates node. Use a Code node to sort records so the preferred version comes first:

const items = $input.all();

// Sort so records WITH state come before records WITHOUT
items.sort((a, b) => {
  const aHasState = a.json.state && a.json.state.length > 0;
  const bHasState = b.json.state && b.json.state.length > 0;
  return bHasState - aHasState;  // true (1) minus false (0) = positive
});

return items;

Now the complete record appears first and gets kept.

Mistake 2: History Size Overflow Error

Problem: You see the error “The number of items to be processed exceeds the maximum history size.”

Cause: A single batch of incoming items is larger than the configured history size.

Solutions:

  1. Increase history size: In node options, set a higher History Size value
  2. Process in batches: Use the Split In Batches node before Remove Duplicates to process smaller chunks
  3. Periodic history clear: Schedule a workflow to clear history when it grows too large

For very high volumes, consider using a database-based deduplication approach with the Code node and external storage.

Mistake 3: Type Mismatches in Field Comparison

Problem: Records that should match are treated as different.

Cause: One source stores IDs as strings ("123"), another as numbers (123). Strict comparison sees these as different values.

Example:

// From API A
{ "customer_id": 123 }

// From API B
{ "customer_id": "123" }

These do not match despite having the same logical ID.

Solution: Normalize types before the Remove Duplicates node using an Edit Fields node:

// Expression to normalize to string
{{ String($json.customer_id) }}

Or use a Code node to standardize all fields.

Mistake 4: Expecting Cross-Execution Dedup Without Configuring It

Problem: Duplicates still pass through on subsequent workflow runs.

Cause: The default operation is “Remove Items Repeated Within Current Input” which only looks at the current execution.

Solution: Change the operation to “Remove Items Processed in Previous Executions” and configure:

  1. Set Value to Dedupe On to the unique identifier field
  2. Choose an appropriate Keep Items Where option
  3. Set the Scope if needed

Without this configuration, each execution starts fresh with no memory of previous runs.

Mistake 5: Scope Confusion When Clearing History

Problem: Clearing history in one node unexpectedly affects another node, or fails to affect nodes you expected.

Cause: Misunderstanding of Node vs Workflow scope.

Explanation:

  • Node scope: Each node has its own isolated history. Clearing one node’s history does not affect others.
  • Workflow scope: All nodes with workflow scope share history. Clearing affects all of them.

Best practice:

  1. Use Node scope (default) unless you specifically need shared history
  2. When using Workflow scope, document which nodes share history
  3. Test clear operations in a development environment first

Real-World Examples

Example 1: Webhook Retry Deduplication

Scenario: Your payment processor sends webhooks for successful payments, but sometimes retries the same webhook if acknowledgment is delayed.

Problem: Customer records get updated multiple times, emails send twice.

Solution:

[Webhook] → [Remove Duplicates: Cross-Execution] → [Process Payment]

Configuration:

  • Operation: Remove Items Processed in Previous Executions
  • Value to Dedupe On: {{ $json.payment_id }}
  • Keep Items Where: Value Is New
  • History Size: 50000 (adjust based on payment volume)

Each payment processes exactly once, regardless of webhook retries.

Example 2: API Pagination with Overlapping Results

Scenario: You fetch all products from an e-commerce API using pagination. The API sometimes returns the same product on consecutive pages.

Problem: Duplicate products appear in your database sync.

Solution:

[Loop Over Pages] → [HTTP Request] → [Remove Duplicates: Within Execution] → [Database Insert]

Configuration:

  • Operation: Remove Items Repeated Within Current Input
  • Compare: Selected Fields
  • Values to Compare: sku

Products with the same SKU are deduplicated before database insertion.

Example 3: CRM Sync with Incremental IDs

Scenario: A scheduled workflow syncs new customers from your CRM every hour. The CRM uses auto-incrementing IDs.

Problem: You only want to process customers added since the last run.

Solution:

[Schedule Trigger] → [CRM: Get All Customers] → [Remove Duplicates: Higher Value] → [Process New Customers]

Configuration:

  • Operation: Remove Items Processed in Previous Executions
  • Value to Dedupe On: {{ $json.customer_id }}
  • Keep Items Where: Value Is Higher than Any Previous Value

Each run processes only customers with IDs higher than any previously seen.

Example 4: Daily Report with New Entries Only

Scenario: A daily workflow generates a report of new support tickets. Tickets should only appear in one report.

Problem: If the same ticket is still open the next day, it should not appear again.

Solution:

[Schedule: Daily] → [Zendesk: Get Open Tickets] → [Remove Duplicates: Cross-Execution] → [Generate Report]

Configuration:

  • Operation: Remove Items Processed in Previous Executions
  • Value to Dedupe On: {{ $json.ticket_id }}
  • Keep Items Where: Value Is New
  • History Size: 100000

Each ticket appears in exactly one daily report.

Example 5: Multi-Source Data Consolidation

Scenario: You pull customer data from three different sources (CRM, billing, support) and need a unified list without duplicates.

Problem: The same customer exists in multiple systems with the same email but different record IDs.

Solution:

[CRM Data] ──┬──► [Merge: Append] → [Remove Duplicates] → [Unified Customer List]
[Billing] ───┤
[Support] ───┘

Configuration:

  • Operation: Remove Items Repeated Within Current Input
  • Compare: Selected Fields
  • Values to Compare: email

Customers are deduplicated by email, keeping the first occurrence from whichever source is merged first.

Remove Duplicates vs JavaScript Deduplication

For complex deduplication logic, you might consider using a Code node instead. Here is when to use each approach.

Use the Remove Duplicates Node When:

  • You need cross-execution history tracking
  • Deduplication logic is straightforward (by fields)
  • You want a no-code solution
  • You need scope management (node vs workflow)

Use a Code Node When:

  • You need custom comparison logic (fuzzy matching, partial field comparison)
  • You want to keep a specific duplicate based on complex rules
  • You need to count or report on duplicates
  • You are already doing significant data transformation in code

JavaScript deduplication example:

// Get all items from the previous node
const items = $input.all();

// Track emails we've already seen (Set ensures uniqueness)
const seen = new Set();

// Store items that pass our dedup check
const unique = [];

for (const item of items) {
  // Normalize email: lowercase and remove whitespace
  // The ?. safely handles cases where email might be missing
  const key = item.json.email?.toLowerCase().trim();

  // Only keep this item if:
  // 1. The email exists (key is truthy)
  // 2. We haven't seen this email before
  if (key && !seen.has(key)) {
    seen.add(key);    // Remember this email
    unique.push(item); // Keep this item
  }
}

// Return only the unique items
return unique;

This example uses the JavaScript Set object to track seen values efficiently. It normalizes email addresses (lowercase, trimmed) before comparison, which the Remove Duplicates node cannot do natively. For more advanced filtering patterns, see the MDN Array filter() documentation.

For expression help, try our expression validator tool.

Pro Tips and Best Practices

1. Sort Before Deduplicating

The node keeps the first occurrence. If you need the most complete or most recent record, sort your data first:

[Get Data] → [Code: Sort by completeness] → [Remove Duplicates]

This gives you control over which duplicate survives.

2. Use Specific Fields for Large Datasets

Comparing all fields is computationally expensive. For large datasets, always use “Selected Fields” with just the unique identifier:

Values to Compare: id

This significantly improves performance.

3. Monitor History Size in Production

If you use cross-execution deduplication, monitor how many items you process over time. When history approaches the limit, oldest items get removed and may be reprocessed.

Consider scheduling a history clear with notification:

[Schedule: Weekly] → [Remove Duplicates: Clear History] → [Slack: History Cleared]

4. Combine with Filter for Complex Logic

Sometimes you need both deduplication and filtering. Chain the nodes:

[Get Data] → [Filter: Active Only] → [Remove Duplicates: By ID] → [Process]

Filter first to reduce the number of items before deduplication.

5. Document Your Dedup Strategy

Add a sticky note explaining:

  • Which fields are used for comparison
  • Why that strategy was chosen
  • Expected history size and cleanup schedule

This helps future you (or your team) maintain the workflow.

6. Test with Production-Like Data

Test your deduplication with realistic data volumes and duplicate patterns. Edge cases like:

  • All items are duplicates
  • No items are duplicates
  • First and last items are duplicates of each other
  • Nested fields with null values

Use our workflow debugger tool to identify issues.

For complex deduplication requirements or high-volume scenarios, our workflow development services can help design robust solutions. For strategic architecture guidance, explore our consulting services.

Frequently Asked Questions

How does the Remove Duplicates node decide which duplicate to keep?

The node always keeps the first occurrence and removes subsequent duplicates. The order of items flowing into the node determines which one survives.

Need to keep a specific version? Sort your data before the Remove Duplicates node:

  • Use a Code node to sort items so your preferred version appears first
  • Sort by completeness score descending to keep the most complete record
  • Sort by timestamp descending to keep the most recent version

See the n8n data transformation guide for sorting patterns.

What happens when the history size limit is reached in cross-execution mode?

The node uses FIFO (first in, first out) behavior. When history reaches the configured limit (default 10,000), the oldest items are removed to make room for new ones.

Key implications:

  • Very old items may be reprocessed if they reappear after falling out of history
  • If a single batch exceeds the history size, you get an error

Solutions:

  • Increase the History Size parameter
  • Process data in smaller batches using Split In Batches
  • Schedule periodic history clears and accept occasional reprocessing

Can I remove duplicates based on multiple fields?

Yes. When using “Selected Fields” comparison, enter comma-separated field names:

customer_id, order_date

Items are considered duplicates only if ALL specified fields match. This creates a composite key for comparison.

For complex nested fields:

  • Flatten your data first using an Edit Fields node
  • Or create a combined key field in a Code node

Test your field combinations using our expression validator.

How do I deduplicate both within a single execution AND across executions?

Chain two Remove Duplicates nodes in sequence:

[Get Data] → [Remove Duplicates: Within Input] → [Remove Duplicates: Cross Execution] → [Process]

Step 1: Use “Remove Items Repeated Within Current Input” to eliminate duplicates within the current batch.

Step 2: Use “Remove Items Processed in Previous Executions” to filter out items seen in previous runs.

This two-step approach handles both pagination duplicates and recurring workflow deduplication.

Does clearing history affect other workflows or just the current one?

It depends on your scope setting:

Node scope (default):

  • Clearing affects only that specific Remove Duplicates node instance
  • Other nodes in the same workflow are unaffected
  • All nodes in other workflows are unaffected

Workflow scope:

  • Clearing affects all Remove Duplicates nodes in that workflow using workflow scope
  • Other workflows always have completely separate history

Best practice: When using workflow scope, document which nodes share history to avoid confusion during maintenance.

Ready to Automate Your Business?

Tell us what you need automated. We'll build it, test it, and deploy it fast.

48-72 Hour Turnaround
Production Ready
Free Consultation

Create Your Free Account

Sign up once, use all tools free forever. We require accounts to prevent abuse and keep our tools running for everyone.

or

By signing up, you agree to our Terms of Service and Privacy Policy. No spam, unsubscribe anytime.