n8n Compare Datasets Node
🔄
Flow Control Node

n8n Compare Datasets Node

Master the n8n Compare Datasets node for syncing databases, detecting changes, and reconciling data. Learn field matching, output branches, and real-world sync patterns.

Your CRM says you have 5,000 customers. Your billing system says 4,847. Your email platform lists 5,234. Which number is correct? More importantly, which customers exist in one system but not another? Which records have changed? Finding these answers manually takes hours of spreadsheet gymnastics and still leaves room for error.

The Compare Datasets node eliminates this problem. It takes two data streams, compares them field by field, and outputs exactly what you need: records that exist only in the first dataset, records unique to the second, records that match perfectly, and records that have changed.

The Data Sync Problem

Every business runs into data fragmentation eventually. Customer records live in your CRM. Payment history sits in your billing platform. Email engagement data resides in your marketing tool. Product inventory exists in your e-commerce backend.

When these systems fall out of sync, problems compound. Sales teams contact customers who already churned. Marketing sends promotions to people who already purchased. Support cannot find customer records that definitely exist somewhere.

Manual reconciliation does not scale. Exporting CSV files, running VLOOKUP formulas, and eyeballing differences works for 50 records. It fails catastrophically at 5,000.

What the Compare Datasets Node Does

The Compare Datasets node automates what would otherwise require custom scripting or tedious manual work. Given two input streams, it:

  • Identifies unique records in each dataset
  • Finds exact matches where records are identical
  • Detects changes where records share a key but differ in other fields
  • Outputs to four separate branches so you can handle each case differently

This enables powerful sync workflows: add new records, update changed ones, archive deleted ones, and skip unchanged records entirely.

What You’ll Learn

  • When to use Compare Datasets versus Merge, Filter, or other nodes
  • How the four output branches work and when each fires
  • Configuring field matching for simple and nested data
  • Node options that control comparison behavior
  • Real-world sync patterns for databases, CRMs, and APIs
  • Common mistakes that cause unexpected results and how to fix them

When to Use the Compare Datasets Node

The Compare Datasets node excels at specific tasks. Understanding when it applies versus other data nodes saves frustration.

ScenarioBest ChoiceWhy
Find records that exist in A but not BCompare DatasetsOutputs to “In A Only” branch
Find what changed between two versionsCompare DatasetsOutputs to “Different” branch
Combine two streams into oneMergeMerging, not comparing
Join customers with their ordersMergeEnrichment, not difference detection
Remove duplicates within one datasetRemove DuplicatesSingle stream deduplication
Filter items based on conditionsFilterConditional filtering, not comparison
Route items based on field valuesSwitchConditional routing

Rule of thumb: Use Compare Datasets when you need to know what is different between two datasets. Use Merge when you need to combine or join them. The key question: “Do I need the differences, or do I need the combined data?”

Understanding the Four Output Branches

The Compare Datasets node routes items to four distinct output branches based on comparison results. Understanding each branch is essential for building effective sync workflows.

Branch 1: In A Only

Items that exist in Input A but have no matching record in Input B. These are typically new records that need to be added to the second system.

Use case: Your CRM has customers that do not exist in your billing system yet. The “In A Only” branch contains those customers so you can create their billing accounts.

Branch 2: In B Only

Items that exist in Input B but have no matching record in Input A. These might be records that were deleted from the first system or added directly to the second.

Use case: Your billing system has customers that no longer exist in your CRM. The “In B Only” branch contains those orphaned records so you can investigate or archive them.

Branch 3: Same

Items that exist in both inputs and are completely identical across all compared fields. These records need no action.

Use case: Most of your customer records match perfectly between systems. The “Same” branch lets you skip processing for these unchanged records.

Branch 4: Different

Items that match on the key field but have differences in other fields. These are records that exist in both systems but have changed.

Use case: A customer updated their email address in your CRM but the billing system still has the old email. The “Different” branch contains this record so you can update the billing system.

Visual Example

Consider comparing customer lists from a CRM and billing system:

Input A (CRM):

[
  { "id": "C001", "name": "Alice", "email": "[email protected]" },
  { "id": "C002", "name": "Bob", "email": "[email protected]" },
  { "id": "C003", "name": "Carol", "email": "[email protected]" }
]

Input B (Billing):

[
  { "id": "C001", "name": "Alice", "email": "[email protected]" },
  { "id": "C002", "name": "Bob", "email": "[email protected]" },
  { "id": "C004", "name": "Dan", "email": "[email protected]" }
]

Comparing by id field produces:

BranchContentsExplanation
In A OnlyC003 (Carol)Exists in CRM, not in billing
In B OnlyC004 (Dan)Exists in billing, not in CRM
SameC001 (Alice)Identical in both systems
DifferentC002 (Bob)Same ID, different email

Configuring Field Matching

The core configuration determines how records are matched between inputs.

Basic Field Matching

  1. Open Input A Field and enter the field name from Input A to use for matching
  2. Open Input B Field and enter the corresponding field name from Input B
  3. These fields act as the “key” that identifies the same record across datasets

Common matching fields:

  • id or customer_id for customer records
  • email for user accounts
  • sku or product_id for inventory
  • order_number for transactions

Matching by Multiple Fields

Sometimes a single field is not unique enough. Click Add Fields to Match to specify additional matching criteria.

Example: Match orders by both customer_id AND order_date:

Fields to Match:
  Input A Field: customer_id    Input B Field: customerId
  Input A Field: order_date     Input B Field: orderDate

Records must match on ALL specified fields to be considered the same record.

Accessing Nested Fields

For nested data structures, use dot notation to access nested properties:

{
  "customer": {
    "details": {
      "id": "C001"
    }
  }
}

Set Input A Field to customer.details.id to match on the nested ID.

Important: This only works when Disable Dot Notation is turned off (the default). If your field names literally contain dots (like customer.id as a single key name), enable Disable Dot Notation.

Node Options Deep Dive

The Options section provides fine-grained control over comparison behavior. Expand Options to access these settings.

Fields to Skip Comparing

Enter field names that should be ignored during comparison. The node will not consider these fields when determining if records are “Same” or “Different.”

Use case: You want to match records by ID and compare most fields, but ignore the updated_at timestamp since it changes whenever you touch the record.

Fields to Skip Comparing: updated_at, last_sync_date

With these fields skipped, two records with the same ID and data but different timestamps route to “Same” instead of “Different.”

Disable Dot Notation

When enabled, dots in field names are treated literally instead of as nested property access.

Default (disabled): customer.email accesses { "customer": { "email": "..." } }

Enabled: customer.email accesses { "customer.email": "..." } (a single key with a dot in the name)

Enable this when your API returns field names containing literal dots.

Multiple Matches

Controls behavior when multiple records in one input match a single record in the other.

Include All Matches (default): Creates an output item for each matching pair. If Input A has one record and Input B has three matching records, you get three output items.

Include First Match Only: Only the first match produces an output item. Subsequent matches are ignored.

When to use each:

  • Use “Include All Matches” for many-to-many relationships where you need every combination
  • Use “Include First Match Only” when duplicates indicate data quality issues you want to suppress

Fuzzy Type Comparison

Some versions of the node offer tolerance for type differences. When enabled, the string "123" matches the number 123.

This helps when comparing data from different sources where one system stores IDs as strings and another as numbers.

Your First Compare Datasets Workflow

Let us build a simple comparison workflow step by step.

Step 1: Create Sample Data

Add two Code nodes to generate test data.

Code Node 1 (Dataset A):

return [
  { json: { id: 1, name: "Apple", price: 1.50 } },
  { json: { id: 2, name: "Banana", price: 0.75 } },
  { json: { id: 3, name: "Cherry", price: 2.00 } }
];

Code Node 2 (Dataset B):

return [
  { json: { id: 1, name: "Apple", price: 1.50 } },
  { json: { id: 2, name: "Banana", price: 0.99 } },
  { json: { id: 4, name: "Date", price: 3.50 } }
];

Step 2: Add the Compare Datasets Node

  1. Click + to add a node
  2. Search for “Compare Datasets”
  3. Add it to your canvas

Step 3: Connect Both Data Sources

  1. Connect Code Node 1 to the first input (Input A)
  2. Connect Code Node 2 to the second input (Input B)

The node requires both inputs to be connected before execution.

Step 4: Configure Field Matching

  1. Set Input A Field to id
  2. Set Input B Field to id

This compares records based on their id values.

Step 5: Test the Node

Click Test step and examine each output branch:

  • In A Only: { id: 3, name: "Cherry", price: 2.00 } (only in A)
  • In B Only: { id: 4, name: "Date", price: 3.50 } (only in B)
  • Same: { id: 1, name: "Apple", price: 1.50 } (identical)
  • Different: { id: 2, name: "Banana", price: 0.75 } with B version { id: 2, name: "Banana", price: 0.99 } (same ID, different price)

Step 6: Connect Downstream Nodes

Connect each branch to appropriate actions:

In A Only  → [Create in System B]
In B Only  → [Archive or Delete]
Same       → [Skip / Log Only]
Different  → [Update System B]

Real-World Examples

Example 1: Database Synchronization

Scenario: Keep a production database in sync with a reporting database.

Workflow:

[MySQL Production] ──┬──► [Compare Datasets] ──► In A Only  → [Insert to Reporting DB]
                     │                      ├──► In B Only  → [Delete from Reporting DB]
[MySQL Reporting] ───┘                      ├──► Same       → [No Action]
                                            └──► Different  → [Update Reporting DB]

Configuration:

  • Input A Field: record_id
  • Input B Field: record_id
  • Fields to Skip: last_modified, sync_timestamp

This ensures the reporting database mirrors production while ignoring metadata fields that naturally differ.

Example 2: CRM and Email Platform Reconciliation

Scenario: Ensure your email marketing platform has all CRM contacts with correct data.

Workflow:

[HubSpot Get Contacts] ──┬──► [Compare Datasets] ──► In A Only  → [Mailchimp Add Subscriber]
                         │                       ├──► In B Only  → [Mailchimp Archive]
[Mailchimp Get Members] ─┘                       ├──► Same       → [No Action]
                                                 └──► Different  → [Mailchimp Update]

Configuration:

  • Input A Field: email
  • Input B Field: email_address
  • Fields to Skip: list_id, mailchimp_id

Note the different field names between systems. Compare Datasets handles this by letting you specify different field names for A and B.

Example 3: Inventory Sync Between Platforms

Scenario: Sync product inventory between Shopify and a warehouse management system.

Workflow:

[Shopify Products] ──┬──► [Compare Datasets] ──► Different  → [If: Qty Different?] ──► [Update Shopify]
                     │                                                              └──► [Slack Alert: Price Changed]
[WMS Products] ──────┘

Configuration:

  • Input A Field: sku
  • Input B Field: product_sku

Connect the “Different” branch to an If node to handle quantity changes differently from price changes.

Example 4: Change Detection for Notifications

Scenario: Alert the team when customer data changes.

Workflow:

[Schedule Trigger: Daily] → [MySQL: Current Data] ──┬──► [Compare Datasets] ──► Different → [Slack: Changes Detected]
                           [MySQL: Yesterday Snapshot] ─┘

Store daily snapshots and compare against the current state. Any records in the “Different” branch represent changes that occurred since yesterday.

Example 5: Data Migration Validation

Scenario: Verify that data migrated correctly between systems.

Workflow:

[Old System Export] ──┬──► [Compare Datasets] ──► In A Only  → [Report: Missing in New]
                      │                       ├──► In B Only  → [Report: Extra in New]
[New System Export] ──┘                       ├──► Different  → [Report: Data Mismatch]
                                              └──► Same       → [Count for Success %]

After migration, run this workflow to generate a report showing any records that failed to migrate correctly.

Common Mistakes and How to Fix Them

Mistake 1: Confusing Input Order

Symptom: “In A Only” and “In B Only” seem swapped.

Cause: The node compares A against B directionally. Swapping which dataset connects to which input changes the output.

Example:

  • Input A: 1 record
  • Input B: 3 records that all match

Result: “Same” shows 3 items (B records matched). Swap the inputs and “Same” shows 1 item (A record matched).

Fix: Think of Input A as “source of truth” and Input B as “system to sync.” Items in “In A Only” need to be added to B. Items in “In B Only” might need removal from B.

Mistake 2: Using Field Values Instead of Field Keys

Symptom: No matches found despite identical data.

Cause: The field configuration expects the field name (key), not the field value.

Wrong:

Input A Field: {{ $json.email }}  // This is the VALUE

Correct:

Input A Field: email  // This is the KEY

Fix: Enter the literal field name as text, not an expression. Expressions in this field do not work as expected.

Mistake 3: Multiple Matches Causing Duplicates

Symptom: Output has more items than expected, with duplicates.

Cause: “Include All Matches” creates one output item per matching pair. If three records in B match one record in A, you get three output items.

Fix: Either:

  • Enable “Include First Match Only” if you want one-to-one matching
  • Use the Aggregate node after Compare Datasets to group results
  • Clean your data to remove duplicates before comparing

Mistake 4: Type Mismatches Without Fuzzy Compare

Symptom: Records that should match show as “In A Only” and “In B Only.”

Cause: One system stores IDs as strings ("123"), the other as numbers (123). Strict comparison sees these as different.

Example:

// Input A
{ "id": 123 }

// Input B
{ "id": "123" }

These do not match under strict comparison.

Fix: Enable fuzzy type comparison in options, or normalize data types before comparing using an Edit Fields node with expressions like {{ parseInt($json.id) }}.

Mistake 5: Expecting Symmetric Results

Symptom: Swapping Input A and Input B produces different counts.

Cause: The comparison is directional. “In A Only” count depends on how many A records have no B match. “In B Only” depends on the reverse.

Example:

  • A has 10 records, B has 100 records
  • 8 records match between systems
  • “In A Only” shows 2 (10 - 8)
  • “In B Only” shows 92 (100 - 8)

Swapping inputs reverses these numbers.

Fix: This is expected behavior. Design your workflow knowing which system is A and which is B. Document it clearly for future maintenance.

Mistake 6: Nested Data Without Dot Notation

Symptom: Nested field comparisons fail or match incorrectly.

Cause: Field names like customer.email need dot notation enabled (default). If disabled, it looks for a literal key named customer.email.

Fix: Ensure Disable Dot Notation is OFF when comparing nested fields. For data with literal dots in key names, turn it ON and restructure your data first.

Compare Datasets vs Merge: The Critical Difference

These nodes serve different purposes despite both handling two input streams.

Visual Comparison

Compare Datasets: Finding Differences

Stream A ──┬──► [Compare Datasets] ──► What's unique to A
           │                      ├──► What's unique to B
Stream B ──┘                      ├──► What's identical
                                  └──► What changed

Output: Four separate streams for different scenarios.

Merge: Combining Data

Stream A ──┬──► [Merge] ──► Combined/joined data

Stream B ──┘

Output: One stream with data from both inputs.

When to Use Each

QuestionAnswer Determines
Do I need to know what changed?Compare Datasets
Do I need to combine records with matching keys?Merge
Do I need records unique to each source?Compare Datasets
Do I need a SQL-like JOIN operation?Merge
Am I syncing data between systems?Compare Datasets
Am I enriching records with related data?Merge

Common Confusion

Users often try to use Merge to find differences. Merge can identify matches and non-matches through its Output Type options, but it does not distinguish between “same” and “different” the way Compare Datasets does.

Use Compare Datasets when you need all four categories (unique to A, unique to B, same, different). Use Merge when you need to join or combine data.

Pro Tips and Best Practices

1. Normalize Data Before Comparing

Use an Edit Fields node to standardize data before comparison:

// Normalize emails to lowercase
{{ $json.email.toLowerCase().trim() }}

// Standardize date formats
{{ DateTime.fromISO($json.date).toFormat('yyyy-MM-dd') }}

// Convert types consistently
{{ String($json.id) }}

This prevents false mismatches due to formatting differences.

2. Handle Empty Branches Gracefully

Not every workflow execution produces items in all four branches. Design downstream nodes to handle empty inputs:

  • Use Continue On Fail if downstream nodes error on empty input
  • Add an If node to check {{ $input.all().length > 0 }} before processing
  • Consider whether empty branches need logging or notification

3. Name Your Branches Clearly

Connect descriptive nodes or add sticky notes after each branch:

In A Only  → "New Records to Add"
In B Only  → "Orphaned Records"
Same       → "No Action Needed"
Different  → "Records to Update"

This makes workflows self-documenting and easier to debug.

4. Use Edit Fields to Standardize Field Names

When comparing systems with different field naming conventions:

[HubSpot] → [Edit Fields: Rename to standard] ──┬──► [Compare Datasets]

[Mailchimp] → [Edit Fields: Rename to standard] ─┘

Rename fields to match before comparing rather than juggling different Input A/B field names.

5. Consider Performance With Large Datasets

Compare Datasets holds both inputs in memory during comparison. For very large datasets:

  • Process in batches using the Split In Batches node
  • Filter to relevant records before comparing
  • Schedule during off-peak hours

Our workflow development services can help design efficient sync workflows for high-volume scenarios.

6. Test With Representative Data

Before deploying sync workflows:

  1. Test with small datasets first (10-20 records)
  2. Verify each branch produces expected results
  3. Test edge cases: empty inputs, all matches, no matches
  4. Run with production-like data volume before going live

Use our workflow debugger tool to troubleshoot unexpected behavior.

7. Chain With Update/Create Nodes

A typical sync pattern chains Compare Datasets with action nodes:

[Compare Datasets] → In A Only  → [Create Record in System B]
                   → Different  → [Update Record in System B]
                   → In B Only  → [If: Archive?] → [Archive in B] / [Delete from B]

For complex sync logic, consider our consulting services to design robust data pipelines.

Frequently Asked Questions

Why does my Compare Datasets output different item counts when I swap inputs?

The Compare Datasets node compares directionally.

  • “In A Only” contains records from Input A with no match in Input B
  • “In B Only” contains the reverse

When you swap inputs, you change which system is “A” and which is “B.”

Example: System A has 100 records, System B has 50, with 40 matching.

  • “In A Only” shows 60 records (100 - 40)
  • “In B Only” shows 10 records (50 - 40)

Swap the inputs and those numbers reverse. This is expected behavior.

Best practice: Design your workflow with a clear understanding of which system is A (typically the source of truth) and which is B (the system to sync).

How do I compare nested objects or arrays in the Compare Datasets node?

For nested objects: Use dot notation in your field configuration.

If your data looks like { "customer": { "details": { "id": "123" } } }, set Input A Field to customer.details.id. The node accesses the nested property for comparison.

For arrays: The comparison works on the entire array as a value. Two records with identical arrays match.

If you need to compare individual array elements:

  1. Use a Split Out node to flatten arrays into separate items
  2. Then run the comparison

Note: Comparing complex nested structures or arrays of objects may not work as expected. For complex data shapes, flatten the structure with an Edit Fields node before comparing.

What is the difference between “Same” and “Different” output branches?

Same branch:

  • Records that exist in both inputs
  • Completely identical across all compared fields (except those in Fields to Skip)
  • No updates needed for these records

Different branch:

  • Records that match on the key field(s) you specified
  • Have at least one other field that differs between Input A and Input B
  • These exist in both systems but have changed

For sync workflows, “Same” records need no action. “Different” records typically need an update operation.

The “Different” output includes both versions of the record, so you can access either the A version or B version in downstream nodes.

Can I compare datasets from different sources with different field names?

Yes. The Input A Field and Input B Field configurations accept different field names.

Example: Your CRM uses customer_email and your billing system uses email_address.

  • Set Input A Field to customer_email
  • Set Input B Field to email_address

The node matches records where these fields have the same value, regardless of different field names.

Pro tip: For cleaner workflows, add an Edit Fields node before Compare Datasets to rename fields to a consistent naming convention. This makes downstream processing easier since you do not need to handle two different field names.

How do I handle the case where one dataset is empty?

When one input is empty, the behavior is predictable:

  • Input A empty, Input B has records: All B records appear in “In B Only”
  • Input B empty, Input A has records: All A records appear in “In A Only”
  • Both scenarios: No records appear in “Same” or “Different” (matches require records in both inputs)

How to handle empty branches:

  • Use If nodes to check {{ $input.all().length > 0 }} before operations that would fail on empty input
  • Enable Continue On Fail on nodes that may receive empty input
  • Consider whether empty branches need logging or notification

Ready to Automate Your Business?

Tell us what you need automated. We'll build it, test it, and deploy it fast.

48-72 Hour Turnaround
Production Ready
Free Consultation

Create Your Free Account

Sign up once, use all tools free forever. We require accounts to prevent abuse and keep our tools running for everyone.

or

By signing up, you agree to our Terms of Service and Privacy Policy. No spam, unsubscribe anytime.