Your CRM says you have 5,000 customers. Your billing system says 4,847. Your email platform lists 5,234. Which number is correct? More importantly, which customers exist in one system but not another? Which records have changed? Finding these answers manually takes hours of spreadsheet gymnastics and still leaves room for error.
The Compare Datasets node eliminates this problem. It takes two data streams, compares them field by field, and outputs exactly what you need: records that exist only in the first dataset, records unique to the second, records that match perfectly, and records that have changed.
The Data Sync Problem
Every business runs into data fragmentation eventually. Customer records live in your CRM. Payment history sits in your billing platform. Email engagement data resides in your marketing tool. Product inventory exists in your e-commerce backend.
When these systems fall out of sync, problems compound. Sales teams contact customers who already churned. Marketing sends promotions to people who already purchased. Support cannot find customer records that definitely exist somewhere.
Manual reconciliation does not scale. Exporting CSV files, running VLOOKUP formulas, and eyeballing differences works for 50 records. It fails catastrophically at 5,000.
What the Compare Datasets Node Does
The Compare Datasets node automates what would otherwise require custom scripting or tedious manual work. Given two input streams, it:
- Identifies unique records in each dataset
- Finds exact matches where records are identical
- Detects changes where records share a key but differ in other fields
- Outputs to four separate branches so you can handle each case differently
This enables powerful sync workflows: add new records, update changed ones, archive deleted ones, and skip unchanged records entirely.
What You’ll Learn
- When to use Compare Datasets versus Merge, Filter, or other nodes
- How the four output branches work and when each fires
- Configuring field matching for simple and nested data
- Node options that control comparison behavior
- Real-world sync patterns for databases, CRMs, and APIs
- Common mistakes that cause unexpected results and how to fix them
When to Use the Compare Datasets Node
The Compare Datasets node excels at specific tasks. Understanding when it applies versus other data nodes saves frustration.
| Scenario | Best Choice | Why |
|---|---|---|
| Find records that exist in A but not B | Compare Datasets | Outputs to “In A Only” branch |
| Find what changed between two versions | Compare Datasets | Outputs to “Different” branch |
| Combine two streams into one | Merge | Merging, not comparing |
| Join customers with their orders | Merge | Enrichment, not difference detection |
| Remove duplicates within one dataset | Remove Duplicates | Single stream deduplication |
| Filter items based on conditions | Filter | Conditional filtering, not comparison |
| Route items based on field values | Switch | Conditional routing |
Rule of thumb: Use Compare Datasets when you need to know what is different between two datasets. Use Merge when you need to combine or join them. The key question: “Do I need the differences, or do I need the combined data?”
Understanding the Four Output Branches
The Compare Datasets node routes items to four distinct output branches based on comparison results. Understanding each branch is essential for building effective sync workflows.
Branch 1: In A Only
Items that exist in Input A but have no matching record in Input B. These are typically new records that need to be added to the second system.
Use case: Your CRM has customers that do not exist in your billing system yet. The “In A Only” branch contains those customers so you can create their billing accounts.
Branch 2: In B Only
Items that exist in Input B but have no matching record in Input A. These might be records that were deleted from the first system or added directly to the second.
Use case: Your billing system has customers that no longer exist in your CRM. The “In B Only” branch contains those orphaned records so you can investigate or archive them.
Branch 3: Same
Items that exist in both inputs and are completely identical across all compared fields. These records need no action.
Use case: Most of your customer records match perfectly between systems. The “Same” branch lets you skip processing for these unchanged records.
Branch 4: Different
Items that match on the key field but have differences in other fields. These are records that exist in both systems but have changed.
Use case: A customer updated their email address in your CRM but the billing system still has the old email. The “Different” branch contains this record so you can update the billing system.
Visual Example
Consider comparing customer lists from a CRM and billing system:
Input A (CRM):
[
{ "id": "C001", "name": "Alice", "email": "[email protected]" },
{ "id": "C002", "name": "Bob", "email": "[email protected]" },
{ "id": "C003", "name": "Carol", "email": "[email protected]" }
]
Input B (Billing):
[
{ "id": "C001", "name": "Alice", "email": "[email protected]" },
{ "id": "C002", "name": "Bob", "email": "[email protected]" },
{ "id": "C004", "name": "Dan", "email": "[email protected]" }
]
Comparing by id field produces:
| Branch | Contents | Explanation |
|---|---|---|
| In A Only | C003 (Carol) | Exists in CRM, not in billing |
| In B Only | C004 (Dan) | Exists in billing, not in CRM |
| Same | C001 (Alice) | Identical in both systems |
| Different | C002 (Bob) | Same ID, different email |
Configuring Field Matching
The core configuration determines how records are matched between inputs.
Basic Field Matching
- Open Input A Field and enter the field name from Input A to use for matching
- Open Input B Field and enter the corresponding field name from Input B
- These fields act as the “key” that identifies the same record across datasets
Common matching fields:
idorcustomer_idfor customer recordsemailfor user accountsskuorproduct_idfor inventoryorder_numberfor transactions
Matching by Multiple Fields
Sometimes a single field is not unique enough. Click Add Fields to Match to specify additional matching criteria.
Example: Match orders by both customer_id AND order_date:
Fields to Match:
Input A Field: customer_id Input B Field: customerId
Input A Field: order_date Input B Field: orderDate
Records must match on ALL specified fields to be considered the same record.
Accessing Nested Fields
For nested data structures, use dot notation to access nested properties:
{
"customer": {
"details": {
"id": "C001"
}
}
}
Set Input A Field to customer.details.id to match on the nested ID.
Important: This only works when Disable Dot Notation is turned off (the default). If your field names literally contain dots (like customer.id as a single key name), enable Disable Dot Notation.
Node Options Deep Dive
The Options section provides fine-grained control over comparison behavior. Expand Options to access these settings.
Fields to Skip Comparing
Enter field names that should be ignored during comparison. The node will not consider these fields when determining if records are “Same” or “Different.”
Use case: You want to match records by ID and compare most fields, but ignore the updated_at timestamp since it changes whenever you touch the record.
Fields to Skip Comparing: updated_at, last_sync_date
With these fields skipped, two records with the same ID and data but different timestamps route to “Same” instead of “Different.”
Disable Dot Notation
When enabled, dots in field names are treated literally instead of as nested property access.
Default (disabled): customer.email accesses { "customer": { "email": "..." } }
Enabled: customer.email accesses { "customer.email": "..." } (a single key with a dot in the name)
Enable this when your API returns field names containing literal dots.
Multiple Matches
Controls behavior when multiple records in one input match a single record in the other.
Include All Matches (default): Creates an output item for each matching pair. If Input A has one record and Input B has three matching records, you get three output items.
Include First Match Only: Only the first match produces an output item. Subsequent matches are ignored.
When to use each:
- Use “Include All Matches” for many-to-many relationships where you need every combination
- Use “Include First Match Only” when duplicates indicate data quality issues you want to suppress
Fuzzy Type Comparison
Some versions of the node offer tolerance for type differences. When enabled, the string "123" matches the number 123.
This helps when comparing data from different sources where one system stores IDs as strings and another as numbers.
Your First Compare Datasets Workflow
Let us build a simple comparison workflow step by step.
Step 1: Create Sample Data
Add two Code nodes to generate test data.
Code Node 1 (Dataset A):
return [
{ json: { id: 1, name: "Apple", price: 1.50 } },
{ json: { id: 2, name: "Banana", price: 0.75 } },
{ json: { id: 3, name: "Cherry", price: 2.00 } }
];
Code Node 2 (Dataset B):
return [
{ json: { id: 1, name: "Apple", price: 1.50 } },
{ json: { id: 2, name: "Banana", price: 0.99 } },
{ json: { id: 4, name: "Date", price: 3.50 } }
];
Step 2: Add the Compare Datasets Node
- Click + to add a node
- Search for “Compare Datasets”
- Add it to your canvas
Step 3: Connect Both Data Sources
- Connect Code Node 1 to the first input (Input A)
- Connect Code Node 2 to the second input (Input B)
The node requires both inputs to be connected before execution.
Step 4: Configure Field Matching
- Set Input A Field to
id - Set Input B Field to
id
This compares records based on their id values.
Step 5: Test the Node
Click Test step and examine each output branch:
- In A Only:
{ id: 3, name: "Cherry", price: 2.00 }(only in A) - In B Only:
{ id: 4, name: "Date", price: 3.50 }(only in B) - Same:
{ id: 1, name: "Apple", price: 1.50 }(identical) - Different:
{ id: 2, name: "Banana", price: 0.75 }with B version{ id: 2, name: "Banana", price: 0.99 }(same ID, different price)
Step 6: Connect Downstream Nodes
Connect each branch to appropriate actions:
In A Only → [Create in System B]
In B Only → [Archive or Delete]
Same → [Skip / Log Only]
Different → [Update System B]
Real-World Examples
Example 1: Database Synchronization
Scenario: Keep a production database in sync with a reporting database.
Workflow:
[MySQL Production] ──┬──► [Compare Datasets] ──► In A Only → [Insert to Reporting DB]
│ ├──► In B Only → [Delete from Reporting DB]
[MySQL Reporting] ───┘ ├──► Same → [No Action]
└──► Different → [Update Reporting DB]
Configuration:
- Input A Field:
record_id - Input B Field:
record_id - Fields to Skip:
last_modified,sync_timestamp
This ensures the reporting database mirrors production while ignoring metadata fields that naturally differ.
Example 2: CRM and Email Platform Reconciliation
Scenario: Ensure your email marketing platform has all CRM contacts with correct data.
Workflow:
[HubSpot Get Contacts] ──┬──► [Compare Datasets] ──► In A Only → [Mailchimp Add Subscriber]
│ ├──► In B Only → [Mailchimp Archive]
[Mailchimp Get Members] ─┘ ├──► Same → [No Action]
└──► Different → [Mailchimp Update]
Configuration:
- Input A Field:
email - Input B Field:
email_address - Fields to Skip:
list_id,mailchimp_id
Note the different field names between systems. Compare Datasets handles this by letting you specify different field names for A and B.
Example 3: Inventory Sync Between Platforms
Scenario: Sync product inventory between Shopify and a warehouse management system.
Workflow:
[Shopify Products] ──┬──► [Compare Datasets] ──► Different → [If: Qty Different?] ──► [Update Shopify]
│ └──► [Slack Alert: Price Changed]
[WMS Products] ──────┘
Configuration:
- Input A Field:
sku - Input B Field:
product_sku
Connect the “Different” branch to an If node to handle quantity changes differently from price changes.
Example 4: Change Detection for Notifications
Scenario: Alert the team when customer data changes.
Workflow:
[Schedule Trigger: Daily] → [MySQL: Current Data] ──┬──► [Compare Datasets] ──► Different → [Slack: Changes Detected]
[MySQL: Yesterday Snapshot] ─┘
Store daily snapshots and compare against the current state. Any records in the “Different” branch represent changes that occurred since yesterday.
Example 5: Data Migration Validation
Scenario: Verify that data migrated correctly between systems.
Workflow:
[Old System Export] ──┬──► [Compare Datasets] ──► In A Only → [Report: Missing in New]
│ ├──► In B Only → [Report: Extra in New]
[New System Export] ──┘ ├──► Different → [Report: Data Mismatch]
└──► Same → [Count for Success %]
After migration, run this workflow to generate a report showing any records that failed to migrate correctly.
Common Mistakes and How to Fix Them
Mistake 1: Confusing Input Order
Symptom: “In A Only” and “In B Only” seem swapped.
Cause: The node compares A against B directionally. Swapping which dataset connects to which input changes the output.
Example:
- Input A: 1 record
- Input B: 3 records that all match
Result: “Same” shows 3 items (B records matched). Swap the inputs and “Same” shows 1 item (A record matched).
Fix: Think of Input A as “source of truth” and Input B as “system to sync.” Items in “In A Only” need to be added to B. Items in “In B Only” might need removal from B.
Mistake 2: Using Field Values Instead of Field Keys
Symptom: No matches found despite identical data.
Cause: The field configuration expects the field name (key), not the field value.
Wrong:
Input A Field: {{ $json.email }} // This is the VALUE
Correct:
Input A Field: email // This is the KEY
Fix: Enter the literal field name as text, not an expression. Expressions in this field do not work as expected.
Mistake 3: Multiple Matches Causing Duplicates
Symptom: Output has more items than expected, with duplicates.
Cause: “Include All Matches” creates one output item per matching pair. If three records in B match one record in A, you get three output items.
Fix: Either:
- Enable “Include First Match Only” if you want one-to-one matching
- Use the Aggregate node after Compare Datasets to group results
- Clean your data to remove duplicates before comparing
Mistake 4: Type Mismatches Without Fuzzy Compare
Symptom: Records that should match show as “In A Only” and “In B Only.”
Cause: One system stores IDs as strings ("123"), the other as numbers (123). Strict comparison sees these as different.
Example:
// Input A
{ "id": 123 }
// Input B
{ "id": "123" }
These do not match under strict comparison.
Fix: Enable fuzzy type comparison in options, or normalize data types before comparing using an Edit Fields node with expressions like {{ parseInt($json.id) }}.
Mistake 5: Expecting Symmetric Results
Symptom: Swapping Input A and Input B produces different counts.
Cause: The comparison is directional. “In A Only” count depends on how many A records have no B match. “In B Only” depends on the reverse.
Example:
- A has 10 records, B has 100 records
- 8 records match between systems
- “In A Only” shows 2 (10 - 8)
- “In B Only” shows 92 (100 - 8)
Swapping inputs reverses these numbers.
Fix: This is expected behavior. Design your workflow knowing which system is A and which is B. Document it clearly for future maintenance.
Mistake 6: Nested Data Without Dot Notation
Symptom: Nested field comparisons fail or match incorrectly.
Cause: Field names like customer.email need dot notation enabled (default). If disabled, it looks for a literal key named customer.email.
Fix: Ensure Disable Dot Notation is OFF when comparing nested fields. For data with literal dots in key names, turn it ON and restructure your data first.
Compare Datasets vs Merge: The Critical Difference
These nodes serve different purposes despite both handling two input streams.
Visual Comparison
Compare Datasets: Finding Differences
Stream A ──┬──► [Compare Datasets] ──► What's unique to A
│ ├──► What's unique to B
Stream B ──┘ ├──► What's identical
└──► What changed
Output: Four separate streams for different scenarios.
Merge: Combining Data
Stream A ──┬──► [Merge] ──► Combined/joined data
│
Stream B ──┘
Output: One stream with data from both inputs.
When to Use Each
| Question | Answer Determines |
|---|---|
| Do I need to know what changed? | Compare Datasets |
| Do I need to combine records with matching keys? | Merge |
| Do I need records unique to each source? | Compare Datasets |
| Do I need a SQL-like JOIN operation? | Merge |
| Am I syncing data between systems? | Compare Datasets |
| Am I enriching records with related data? | Merge |
Common Confusion
Users often try to use Merge to find differences. Merge can identify matches and non-matches through its Output Type options, but it does not distinguish between “same” and “different” the way Compare Datasets does.
Use Compare Datasets when you need all four categories (unique to A, unique to B, same, different). Use Merge when you need to join or combine data.
Pro Tips and Best Practices
1. Normalize Data Before Comparing
Use an Edit Fields node to standardize data before comparison:
// Normalize emails to lowercase
{{ $json.email.toLowerCase().trim() }}
// Standardize date formats
{{ DateTime.fromISO($json.date).toFormat('yyyy-MM-dd') }}
// Convert types consistently
{{ String($json.id) }}
This prevents false mismatches due to formatting differences.
2. Handle Empty Branches Gracefully
Not every workflow execution produces items in all four branches. Design downstream nodes to handle empty inputs:
- Use Continue On Fail if downstream nodes error on empty input
- Add an If node to check
{{ $input.all().length > 0 }}before processing - Consider whether empty branches need logging or notification
3. Name Your Branches Clearly
Connect descriptive nodes or add sticky notes after each branch:
In A Only → "New Records to Add"
In B Only → "Orphaned Records"
Same → "No Action Needed"
Different → "Records to Update"
This makes workflows self-documenting and easier to debug.
4. Use Edit Fields to Standardize Field Names
When comparing systems with different field naming conventions:
[HubSpot] → [Edit Fields: Rename to standard] ──┬──► [Compare Datasets]
│
[Mailchimp] → [Edit Fields: Rename to standard] ─┘
Rename fields to match before comparing rather than juggling different Input A/B field names.
5. Consider Performance With Large Datasets
Compare Datasets holds both inputs in memory during comparison. For very large datasets:
- Process in batches using the Split In Batches node
- Filter to relevant records before comparing
- Schedule during off-peak hours
Our workflow development services can help design efficient sync workflows for high-volume scenarios.
6. Test With Representative Data
Before deploying sync workflows:
- Test with small datasets first (10-20 records)
- Verify each branch produces expected results
- Test edge cases: empty inputs, all matches, no matches
- Run with production-like data volume before going live
Use our workflow debugger tool to troubleshoot unexpected behavior.
7. Chain With Update/Create Nodes
A typical sync pattern chains Compare Datasets with action nodes:
[Compare Datasets] → In A Only → [Create Record in System B]
→ Different → [Update Record in System B]
→ In B Only → [If: Archive?] → [Archive in B] / [Delete from B]
For complex sync logic, consider our consulting services to design robust data pipelines.
Frequently Asked Questions
Why does my Compare Datasets output different item counts when I swap inputs?
The Compare Datasets node compares directionally.
- “In A Only” contains records from Input A with no match in Input B
- “In B Only” contains the reverse
When you swap inputs, you change which system is “A” and which is “B.”
Example: System A has 100 records, System B has 50, with 40 matching.
- “In A Only” shows 60 records (100 - 40)
- “In B Only” shows 10 records (50 - 40)
Swap the inputs and those numbers reverse. This is expected behavior.
Best practice: Design your workflow with a clear understanding of which system is A (typically the source of truth) and which is B (the system to sync).
How do I compare nested objects or arrays in the Compare Datasets node?
For nested objects: Use dot notation in your field configuration.
If your data looks like { "customer": { "details": { "id": "123" } } }, set Input A Field to customer.details.id. The node accesses the nested property for comparison.
For arrays: The comparison works on the entire array as a value. Two records with identical arrays match.
If you need to compare individual array elements:
- Use a Split Out node to flatten arrays into separate items
- Then run the comparison
Note: Comparing complex nested structures or arrays of objects may not work as expected. For complex data shapes, flatten the structure with an Edit Fields node before comparing.
What is the difference between “Same” and “Different” output branches?
Same branch:
- Records that exist in both inputs
- Completely identical across all compared fields (except those in Fields to Skip)
- No updates needed for these records
Different branch:
- Records that match on the key field(s) you specified
- Have at least one other field that differs between Input A and Input B
- These exist in both systems but have changed
For sync workflows, “Same” records need no action. “Different” records typically need an update operation.
The “Different” output includes both versions of the record, so you can access either the A version or B version in downstream nodes.
Can I compare datasets from different sources with different field names?
Yes. The Input A Field and Input B Field configurations accept different field names.
Example: Your CRM uses customer_email and your billing system uses email_address.
- Set Input A Field to
customer_email - Set Input B Field to
email_address
The node matches records where these fields have the same value, regardless of different field names.
Pro tip: For cleaner workflows, add an Edit Fields node before Compare Datasets to rename fields to a consistent naming convention. This makes downstream processing easier since you do not need to handle two different field names.
How do I handle the case where one dataset is empty?
When one input is empty, the behavior is predictable:
- Input A empty, Input B has records: All B records appear in “In B Only”
- Input B empty, Input A has records: All A records appear in “In A Only”
- Both scenarios: No records appear in “Same” or “Different” (matches require records in both inputs)
How to handle empty branches:
- Use If nodes to check
{{ $input.all().length > 0 }}before operations that would fail on empty input - Enable Continue On Fail on nodes that may receive empty input
- Consider whether empty branches need logging or notification