Duplicate data silently sabotages your automations. Your CRM sync runs daily, but the same contacts get processed repeatedly. Your webhook fires multiple times for a single event. Your paginated API returns overlapping records between pages. Each duplicate wastes API calls, triggers redundant notifications, and corrupts your analytics.
The Remove Duplicates node solves this problem at two levels: it can eliminate duplicates within a single execution, and it can track items across multiple executions to ensure you never process the same record twice.
The Duplicate Data Problem
Duplicates creep into workflows from multiple sources:
- API pagination overlap: Many APIs return the same items across page boundaries, especially when records are added or removed between requests
- Webhook retries: Payment processors and notification services retry failed webhooks, triggering the same event multiple times
- Data sync conflicts: Pulling from multiple sources often yields the same records from each source
- Schedule timing issues: A scheduled workflow might catch the same new records on consecutive runs before they are marked as processed
Without deduplication, your downstream nodes process everything multiple times. Emails send twice. Database records duplicate. Reports show inflated numbers.
What the Remove Duplicates Node Does
The node offers three distinct operations for different deduplication needs (see the official n8n documentation for the complete reference):
- Remove items repeated within current input: Compares items within a single execution and keeps only unique ones
- Remove items processed in previous executions: Tracks items across workflow runs to skip previously seen data
- Clear deduplication history: Resets the memory of previously processed items
Each operation serves different use cases. Within-execution dedup handles API pagination. Cross-execution dedup handles recurring workflows that should only process new data.
What You’ll Learn
- When to use each of the three operations
- Field comparison strategies for identifying duplicates
- Cross-execution deduplication with history tracking
- Scope options for sharing history across nodes
- Common mistakes that cause deduplication failures
- Real-world patterns for webhooks, APIs, and data sync
Quick Reference
| Parameter | Options | Default |
|---|---|---|
| Operation | Within Input / Previous Executions / Clear History | Within Input |
| Compare | All Fields / All Fields Except / Selected Fields | All Fields |
| Keep Items Where | Value Is New / Higher / Later Date | Value Is New |
| Scope | Node / Workflow | Node |
| History Size | 1 to 1,000,000 | 10,000 |
When to Use the Remove Duplicates Node
Before configuring the node, understand when it applies versus other data transformation options.
| Scenario | Best Choice | Why |
|---|---|---|
| Remove duplicates within current data | Remove Duplicates | Built-in operation for single execution |
| Process only new items across runs | Remove Duplicates | Cross-execution history tracking |
| Find differences between two datasets | Compare Datasets | Four-branch output for different states |
| Keep items matching conditions | Filter | Conditional filtering, not deduplication |
| Combine duplicate items into aggregates | Aggregate | Grouping and summarizing |
| Join data from multiple sources | Merge | Combining streams, not removing duplicates |
Rule of thumb: Use Remove Duplicates when items should only appear once in your output. Use Compare Datasets when you need to know what changed between two datasets. Use Filter when you want to keep items based on conditions rather than uniqueness.
Understanding the Three Operations
The Remove Duplicates node provides three operations, each addressing a specific deduplication scenario.
Operation 1: Remove Items Repeated Within Current Input
This operation examines all items flowing into the node during a single execution. It identifies duplicates and keeps only the first occurrence of each unique item.
When to use:
- Paginated API responses with overlapping records
- Data merged from multiple sources in the same workflow
- Any scenario where duplicates exist within a single batch of data
Compare options:
| Option | Behavior | Best For |
|---|---|---|
| All Fields | Items must be identical across every field | Exact duplicates only |
| All Fields Except | Ignores specified fields during comparison | Excluding timestamps or metadata |
| Selected Fields | Only compares specified fields | Matching by ID regardless of other differences |
Configuration steps:
- Add the Remove Duplicates node to your workflow
- Set Operation to “Remove Items Repeated Within Current Input”
- Choose your Compare strategy
- If using “All Fields Except” or “Selected Fields”, enter the field names (comma-separated)
- Connect to subsequent nodes
Which duplicate gets kept:
The node keeps the first occurrence and removes subsequent duplicates. If you need to keep a specific version (such as the most complete record or the most recent), sort your data before the Remove Duplicates node using a Code node or the Sort node.
Operation 2: Remove Items Processed in Previous Executions
This operation maintains a history of previously processed items. When the workflow runs again, it compares incoming items against this history and only passes through items that have not been seen before.
When to use:
- Scheduled workflows that should only process new records
- Webhook handlers that might receive duplicate events
- Any recurring workflow where reprocessing the same data causes problems
Keep Items Where options:
| Option | Passes Through When | Use Case |
|---|---|---|
| Value Is New | The value has never been seen | Standard deduplication across runs |
| Value Is Higher than Any Previous Value | The current value exceeds all stored values | Incremental ID processing |
| Value Is a Date Later than Any Previous Date | The current date exceeds all stored dates | Date-based incremental processing |
Scope options:
- Node (default): History is stored independently for this specific Remove Duplicates node. Other nodes in the workflow have their own separate history.
- Workflow: History is shared across all Remove Duplicates nodes in the workflow that also use “Workflow” scope. Useful when multiple paths need to share deduplication state.
History Size parameter:
The node stores up to 10,000 items by default. When this limit is reached, the oldest items are removed (FIFO: first in, first out) to make room for new ones. Adjust this value in the node options if you need to track more items.
Operation 3: Clear Deduplication History
This operation wipes the stored history of previously processed items. After clearing, the node will treat all items as new again.
When to use:
- Resetting after a major data migration
- Starting fresh after fixing data quality issues
- Testing workflows that use cross-execution deduplication
Scope considerations:
The clear operation respects the scope setting. Clearing at “Node” scope only affects this node’s history. Clearing at “Workflow” scope affects all nodes sharing that workflow-level history.
Workflow pattern:
A common pattern uses a separate branch that triggers the clear operation:
[Manual Trigger] → [Remove Duplicates: Clear History]
This gives you a manual reset button without modifying the main workflow logic.
Your First Remove Duplicates Workflow
Walk through a practical example that demonstrates within-execution deduplication.
Step 1: Create Sample Data with Duplicates
Add a Code node to generate test data:
return [
{ json: { id: 1, name: "Taylor Swift", job: "Pop star" } },
{ json: { id: 2, name: "Ed Sheeran", job: "Singer-songwriter" } },
{ json: { id: 3, name: "Adele", job: "Singer-songwriter" } },
{ json: { id: 1, name: "Taylor Swift", job: "Pop star" } }, // duplicate
{ json: { id: 4, name: "Bruno Mars", job: "Singer-songwriter" } },
{ json: { id: 2, name: "Ed Sheeran", job: "Singer-songwriter" } }, // duplicate
];
Step 2: Add the Remove Duplicates Node
- Click + to add a node
- Search for “Remove Duplicates”
- Connect it to your Code node
Step 3: Configure for All Fields Comparison
- Set Operation to “Remove Items Repeated Within Current Input”
- Set Compare to “All Fields”
Step 4: Test the Node
Click Test step and examine the output. You should see 4 unique items instead of 6. The duplicate entries for Taylor Swift and Ed Sheeran have been removed.
Step 5: Try Selected Fields
Now change the configuration:
- Set Compare to “Selected Fields”
- Enter
idin the Values to Compare field
Test again. This time, deduplication occurs based only on the id field, ignoring name and job. The result is the same in this case, but this approach handles scenarios where the same ID might have different metadata.
Field Comparison Strategies
Choosing the right comparison strategy determines which items are considered duplicates.
Comparing All Fields
The strictest option. Two items are duplicates only if every single field matches exactly.
Pros:
- Safest choice when you need exact duplicates removed
- No configuration required
Cons:
- Timestamps, metadata, or auto-generated fields cause false negatives
- Items that are logically the same but differ in any field pass through
Best for: Data that comes from a single source with consistent formatting.
Comparing Selected Fields
You specify which fields to compare. Items are duplicates if those specific fields match, regardless of other field values.
Example: Compare only by email field:
Values to Compare: email
Two contacts with the same email but different names or phone numbers are considered duplicates.
Multiple fields: Enter comma-separated field names:
Values to Compare: customer_id, order_date
Items must match on ALL specified fields to be considered duplicates.
Best for: Data with unique identifiers like IDs, emails, or composite keys.
Excluding Fields from Comparison
You specify which fields to ignore. All other fields are compared.
Example: Exclude timestamp and sync metadata:
Values to Skip: last_updated, sync_id, created_at
Two items with the same ID, name, and email but different timestamps are considered duplicates.
Best for: Data with auto-generated metadata fields that differ between otherwise identical records.
Cross-Execution Deduplication
The most powerful feature of the Remove Duplicates node is tracking items across multiple workflow executions. This ensures you process each unique item only once, regardless of how many times the workflow runs.
How History Tracking Works
When using “Remove Items Processed in Previous Executions”:
- The node maintains an in-memory store of previously seen values
- Each incoming item is checked against this store
- Items found in the store are filtered out
- New items are added to the store and passed through
The store persists across workflow executions but is lost if n8n restarts (unless using a database-backed n8n installation).
History Size and FIFO Behavior
The History Size parameter (default: 10,000) controls how many items the node remembers. When this limit is reached:
- The oldest items are removed to make room for new ones (FIFO: first in, first out)
- Items removed from history will be processed again if they reappear
Important: If a single batch of incoming items exceeds the history size, you will receive an error: “The number of items to be processed exceeds the maximum history size.” Either increase the history size or process data in smaller batches.
Scope: Node vs Workflow
Node scope (default):
- Each Remove Duplicates node maintains its own independent history
- Multiple nodes in the same workflow do not share history
- Clearing history affects only this node
Workflow scope:
- All Remove Duplicates nodes with “Workflow” scope share a single history
- Useful when different branches process different subsets of the same data
- Clearing history affects all nodes using workflow scope
Example use case for workflow scope:
[Webhook] → [Route by Type] → Branch A: [Remove Duplicates: Workflow Scope]
→ Branch B: [Remove Duplicates: Workflow Scope]
Both branches share history, so an item processed in Branch A will not be processed again in Branch B if the same webhook fires twice.
Value Comparison Options
Value Is New:
Standard deduplication. The node checks if the specified value has ever been seen. If not, the item passes through and the value is stored.
Value to Dedupe On: $json.id
Value Is Higher than Any Previous Value:
For incremental ID processing. The item passes through only if its value is greater than all previously stored values.
Value to Dedupe On: $json.customer_id
Keep Items Where: Value Is Higher than Any Previous Value
This is perfect for databases with auto-incrementing IDs. Each run processes only records with IDs higher than any previously seen.
Value Is a Date Later than Any Previous Date:
Same concept but for date fields. Only items with dates later than any stored date pass through.
Value to Dedupe On: $json.created_at
Keep Items Where: Value Is a Date Later than Any Previous Date
Use this for time-series data or audit logs where you only want new entries.
Common Mistakes and How to Fix Them
These issues cause the most confusion based on community discussions and support requests.
Mistake 1: Wrong Duplicate Gets Kept
Problem: The node keeps the first occurrence, but you want to keep the most complete record or the most recent version.
Example: You have two records for the same customer:
{ "id": 123, "name": "John", "state": "" }
{ "id": 123, "name": "John", "state": "Texas" }
If the empty-state record comes first, that is what gets kept.
Solution: Sort your data before the Remove Duplicates node. Use a Code node to sort records so the preferred version comes first:
const items = $input.all();
// Sort so records WITH state come before records WITHOUT
items.sort((a, b) => {
const aHasState = a.json.state && a.json.state.length > 0;
const bHasState = b.json.state && b.json.state.length > 0;
return bHasState - aHasState; // true (1) minus false (0) = positive
});
return items;
Now the complete record appears first and gets kept.
Mistake 2: History Size Overflow Error
Problem: You see the error “The number of items to be processed exceeds the maximum history size.”
Cause: A single batch of incoming items is larger than the configured history size.
Solutions:
- Increase history size: In node options, set a higher History Size value
- Process in batches: Use the Split In Batches node before Remove Duplicates to process smaller chunks
- Periodic history clear: Schedule a workflow to clear history when it grows too large
For very high volumes, consider using a database-based deduplication approach with the Code node and external storage.
Mistake 3: Type Mismatches in Field Comparison
Problem: Records that should match are treated as different.
Cause: One source stores IDs as strings ("123"), another as numbers (123). Strict comparison sees these as different values.
Example:
// From API A
{ "customer_id": 123 }
// From API B
{ "customer_id": "123" }
These do not match despite having the same logical ID.
Solution: Normalize types before the Remove Duplicates node using an Edit Fields node:
// Expression to normalize to string
{{ String($json.customer_id) }}
Or use a Code node to standardize all fields.
Mistake 4: Expecting Cross-Execution Dedup Without Configuring It
Problem: Duplicates still pass through on subsequent workflow runs.
Cause: The default operation is “Remove Items Repeated Within Current Input” which only looks at the current execution.
Solution: Change the operation to “Remove Items Processed in Previous Executions” and configure:
- Set Value to Dedupe On to the unique identifier field
- Choose an appropriate Keep Items Where option
- Set the Scope if needed
Without this configuration, each execution starts fresh with no memory of previous runs.
Mistake 5: Scope Confusion When Clearing History
Problem: Clearing history in one node unexpectedly affects another node, or fails to affect nodes you expected.
Cause: Misunderstanding of Node vs Workflow scope.
Explanation:
- Node scope: Each node has its own isolated history. Clearing one node’s history does not affect others.
- Workflow scope: All nodes with workflow scope share history. Clearing affects all of them.
Best practice:
- Use Node scope (default) unless you specifically need shared history
- When using Workflow scope, document which nodes share history
- Test clear operations in a development environment first
Real-World Examples
Example 1: Webhook Retry Deduplication
Scenario: Your payment processor sends webhooks for successful payments, but sometimes retries the same webhook if acknowledgment is delayed.
Problem: Customer records get updated multiple times, emails send twice.
Solution:
[Webhook] → [Remove Duplicates: Cross-Execution] → [Process Payment]
Configuration:
- Operation: Remove Items Processed in Previous Executions
- Value to Dedupe On:
{{ $json.payment_id }} - Keep Items Where: Value Is New
- History Size: 50000 (adjust based on payment volume)
Each payment processes exactly once, regardless of webhook retries.
Example 2: API Pagination with Overlapping Results
Scenario: You fetch all products from an e-commerce API using pagination. The API sometimes returns the same product on consecutive pages.
Problem: Duplicate products appear in your database sync.
Solution:
[Loop Over Pages] → [HTTP Request] → [Remove Duplicates: Within Execution] → [Database Insert]
Configuration:
- Operation: Remove Items Repeated Within Current Input
- Compare: Selected Fields
- Values to Compare:
sku
Products with the same SKU are deduplicated before database insertion.
Example 3: CRM Sync with Incremental IDs
Scenario: A scheduled workflow syncs new customers from your CRM every hour. The CRM uses auto-incrementing IDs.
Problem: You only want to process customers added since the last run.
Solution:
[Schedule Trigger] → [CRM: Get All Customers] → [Remove Duplicates: Higher Value] → [Process New Customers]
Configuration:
- Operation: Remove Items Processed in Previous Executions
- Value to Dedupe On:
{{ $json.customer_id }} - Keep Items Where: Value Is Higher than Any Previous Value
Each run processes only customers with IDs higher than any previously seen.
Example 4: Daily Report with New Entries Only
Scenario: A daily workflow generates a report of new support tickets. Tickets should only appear in one report.
Problem: If the same ticket is still open the next day, it should not appear again.
Solution:
[Schedule: Daily] → [Zendesk: Get Open Tickets] → [Remove Duplicates: Cross-Execution] → [Generate Report]
Configuration:
- Operation: Remove Items Processed in Previous Executions
- Value to Dedupe On:
{{ $json.ticket_id }} - Keep Items Where: Value Is New
- History Size: 100000
Each ticket appears in exactly one daily report.
Example 5: Multi-Source Data Consolidation
Scenario: You pull customer data from three different sources (CRM, billing, support) and need a unified list without duplicates.
Problem: The same customer exists in multiple systems with the same email but different record IDs.
Solution:
[CRM Data] ──┬──► [Merge: Append] → [Remove Duplicates] → [Unified Customer List]
[Billing] ───┤
[Support] ───┘
Configuration:
- Operation: Remove Items Repeated Within Current Input
- Compare: Selected Fields
- Values to Compare:
email
Customers are deduplicated by email, keeping the first occurrence from whichever source is merged first.
Remove Duplicates vs JavaScript Deduplication
For complex deduplication logic, you might consider using a Code node instead. Here is when to use each approach.
Use the Remove Duplicates Node When:
- You need cross-execution history tracking
- Deduplication logic is straightforward (by fields)
- You want a no-code solution
- You need scope management (node vs workflow)
Use a Code Node When:
- You need custom comparison logic (fuzzy matching, partial field comparison)
- You want to keep a specific duplicate based on complex rules
- You need to count or report on duplicates
- You are already doing significant data transformation in code
JavaScript deduplication example:
// Get all items from the previous node
const items = $input.all();
// Track emails we've already seen (Set ensures uniqueness)
const seen = new Set();
// Store items that pass our dedup check
const unique = [];
for (const item of items) {
// Normalize email: lowercase and remove whitespace
// The ?. safely handles cases where email might be missing
const key = item.json.email?.toLowerCase().trim();
// Only keep this item if:
// 1. The email exists (key is truthy)
// 2. We haven't seen this email before
if (key && !seen.has(key)) {
seen.add(key); // Remember this email
unique.push(item); // Keep this item
}
}
// Return only the unique items
return unique;
This example uses the JavaScript Set object to track seen values efficiently. It normalizes email addresses (lowercase, trimmed) before comparison, which the Remove Duplicates node cannot do natively. For more advanced filtering patterns, see the MDN Array filter() documentation.
For expression help, try our expression validator tool.
Pro Tips and Best Practices
1. Sort Before Deduplicating
The node keeps the first occurrence. If you need the most complete or most recent record, sort your data first:
[Get Data] → [Code: Sort by completeness] → [Remove Duplicates]
This gives you control over which duplicate survives.
2. Use Specific Fields for Large Datasets
Comparing all fields is computationally expensive. For large datasets, always use “Selected Fields” with just the unique identifier:
Values to Compare: id
This significantly improves performance.
3. Monitor History Size in Production
If you use cross-execution deduplication, monitor how many items you process over time. When history approaches the limit, oldest items get removed and may be reprocessed.
Consider scheduling a history clear with notification:
[Schedule: Weekly] → [Remove Duplicates: Clear History] → [Slack: History Cleared]
4. Combine with Filter for Complex Logic
Sometimes you need both deduplication and filtering. Chain the nodes:
[Get Data] → [Filter: Active Only] → [Remove Duplicates: By ID] → [Process]
Filter first to reduce the number of items before deduplication.
5. Document Your Dedup Strategy
Add a sticky note explaining:
- Which fields are used for comparison
- Why that strategy was chosen
- Expected history size and cleanup schedule
This helps future you (or your team) maintain the workflow.
6. Test with Production-Like Data
Test your deduplication with realistic data volumes and duplicate patterns. Edge cases like:
- All items are duplicates
- No items are duplicates
- First and last items are duplicates of each other
- Nested fields with null values
Use our workflow debugger tool to identify issues.
For complex deduplication requirements or high-volume scenarios, our workflow development services can help design robust solutions. For strategic architecture guidance, explore our consulting services.
Frequently Asked Questions
How does the Remove Duplicates node decide which duplicate to keep?
The node always keeps the first occurrence and removes subsequent duplicates. The order of items flowing into the node determines which one survives.
Need to keep a specific version? Sort your data before the Remove Duplicates node:
- Use a Code node to sort items so your preferred version appears first
- Sort by completeness score descending to keep the most complete record
- Sort by timestamp descending to keep the most recent version
See the n8n data transformation guide for sorting patterns.
What happens when the history size limit is reached in cross-execution mode?
The node uses FIFO (first in, first out) behavior. When history reaches the configured limit (default 10,000), the oldest items are removed to make room for new ones.
Key implications:
- Very old items may be reprocessed if they reappear after falling out of history
- If a single batch exceeds the history size, you get an error
Solutions:
- Increase the History Size parameter
- Process data in smaller batches using Split In Batches
- Schedule periodic history clears and accept occasional reprocessing
Can I remove duplicates based on multiple fields?
Yes. When using “Selected Fields” comparison, enter comma-separated field names:
customer_id, order_date
Items are considered duplicates only if ALL specified fields match. This creates a composite key for comparison.
For complex nested fields:
- Flatten your data first using an Edit Fields node
- Or create a combined key field in a Code node
Test your field combinations using our expression validator.
How do I deduplicate both within a single execution AND across executions?
Chain two Remove Duplicates nodes in sequence:
[Get Data] → [Remove Duplicates: Within Input] → [Remove Duplicates: Cross Execution] → [Process]
Step 1: Use “Remove Items Repeated Within Current Input” to eliminate duplicates within the current batch.
Step 2: Use “Remove Items Processed in Previous Executions” to filter out items seen in previous runs.
This two-step approach handles both pagination duplicates and recurring workflow deduplication.
Does clearing history affect other workflows or just the current one?
It depends on your scope setting:
Node scope (default):
- Clearing affects only that specific Remove Duplicates node instance
- Other nodes in the same workflow are unaffected
- All nodes in other workflows are unaffected
Workflow scope:
- Clearing affects all Remove Duplicates nodes in that workflow using workflow scope
- Other workflows always have completely separate history
Best practice: When using workflow scope, document which nodes share history to avoid confusion during maintenance.