That PDF attachment your workflow just received? The data is trapped inside.
Invoices from email. Customer lists from spreadsheets. Reports downloaded from APIs. The Extract from File node cracks open these files and transforms binary data into JSON your workflow can actually use.
The Challenge
Binary data handling in n8n trips up even experienced users:
- Files seem to “disappear” between nodes
- Extraction returns empty results
- Error messages reference properties that don’t exist
These problems have real solutions. This guide covers every one of them.
What You’ll Learn
- How to extract data from 10+ file formats including PDF, Excel, CSV, and more
- Binary data fundamentals that prevent the most common n8n file errors
- Step-by-step troubleshooting for “binary file not found” and empty extraction issues
- Real workflow patterns for processing email attachments, form uploads, and API downloads
When to Use Extract from File
Before diving into configuration, understand when this node is the right choice:
| Scenario | Best Approach | Why |
|---|---|---|
| Parse CSV/Excel from API or email | Extract from File | Converts binary to JSON rows |
| Read PDF text content | Extract from File | Native PDF text extraction |
| Process form file uploads | Extract from File | Works with Form Trigger binary data |
| Parse complex PDFs with tables | AI vision models or OCR | Better table structure recognition |
| Read local JSON config files | Read/Write Files from Disk | Already outputs JSON directly |
| Transform JSON between nodes | Edit Fields node | No file conversion needed |
| Create files from JSON data | Convert to File node | Inverse operation of Extract from File |
Rule of thumb: Use Extract from File whenever you have binary file data that needs to become structured JSON. If your data is already JSON or you need advanced document intelligence, consider alternatives.
Understanding Binary Data in n8n
Before extracting anything, you need to understand how n8n handles files. This knowledge prevents 90% of extraction errors. For a deeper dive, see the official n8n binary data documentation.
What Is Binary Data?
In n8n, data comes in two forms:
- JSON data - Structured key-value pairs that nodes can read and manipulate directly
- Binary data - Raw file contents (images, PDFs, spreadsheets) stored as Base64-encoded strings separately from JSON
When you download a file via the HTTP Request node or receive an email attachment, the file content lives in a binary property, not in $json. You cannot access binary content with expressions like {{ $json.fileContent }}.
The Binary Property Convention
Binary data is stored under named properties. The default name is data, but it can be anything:
// Typical item structure with binary data in n8n
{
"json": {
// JSON properties are accessible via $json expressions
"fileName": "report.pdf",
"size": 102400
},
"binary": {
// Binary data lives here, separate from JSON
// The property name "data" is the default, but can be anything
"data": {
"mimeType": "application/pdf", // File type identifier
"fileName": "report.pdf", // Original filename
"fileExtension": "pdf", // Extension without the dot
"data": "JVBERi0xLjQK..." // Base64 encoded file contents
}
}
}
The Extract from File node reads from binary.data by default. If your binary property has a different name (like attachment or file), you must specify it in the node configuration.
How Files Flow Through Workflows
Understanding this flow prevents the “where did my file go?” confusion:
- Source node creates binary data (HTTP Request, Gmail, Read Files from Disk)
- Binary data travels alongside JSON through connections
- Transform nodes (Set, Switch, IF) pass binary data through if configured correctly
- Extract from File reads the binary and outputs JSON
- Original binary data is gone after extraction unless explicitly preserved
Critical insight: Some nodes discard binary data by default. If you add an Edit Fields node between your file source and Extract from File, and you’re using “Set” mode instead of “Append”, your binary data disappears.
Supported File Formats
The Extract from File node handles 10 different formats, each with specific use cases. For the complete parameter reference, see the official n8n Extract from File documentation.
| Format | Operation | Best For | Special Considerations |
|---|---|---|---|
| CSV | Extract from CSV | Tabular data exports, database dumps | Configure delimiter if not comma |
| XLSX | Extract from XLSX | Modern Excel files | Can read specific sheets |
| XLS | Extract from XLS | Legacy Excel files | Older format, less common |
| Extract from PDF | Reports, invoices, documents | Only extracts text, not images | |
| HTML | Extract from HTML | Web page content | Extracts structured data from tables |
| JSON | Extract from JSON | JSON files in binary form | Useful for binary JSON payloads |
| ICS | Extract from ICS | Calendar events | iCalendar format parsing |
| ODS | Extract from ODS | LibreOffice spreadsheets | Open document format |
| RTF | Extract from RTF | Rich text documents | Preserves basic formatting |
| Text | Extract from Text | Plain text files | Simple string output |
There’s also a special operation:
| Operation | Purpose |
|---|---|
| Convert Binary File to Base64 String | Converts binary to Base64 text for APIs requiring string input |
Your First Extraction
Let’s walk through a complete example: downloading a CSV file and extracting its data.
Step 1: Add the HTTP Request Node
First, we need to get the file:
- Add an HTTP Request node to your workflow
- Set Method to GET
- Enter a CSV file URL (for testing, try
https://people.sc.fsu.edu/~jburkardt/data/csv/addresses.csv) - Click Test step
The node returns binary data. In the output panel, you’ll see a “Binary” tab showing your file.
Step 2: Add Extract from File
- Add an Extract from File node after HTTP Request
- Set Operation to “Extract from CSV”
- Leave Binary Property as “data” (the default)
- Click Test step
Step 3: Use the Extracted Data
The output is now JSON. Each row becomes a separate item:
{
"row": {
"0": "John",
"1": "Doe",
"2": "120 jefferson st.",
"3": "Riverside",
"4": "NJ",
"5": "08075"
}
}
If your CSV has headers and you enable the “Header Row” option, the output uses column names as keys:
{
"First": "John",
"Last": "Doe",
"Address": "120 jefferson st.",
"City": "Riverside",
"State": "NJ",
"Zip": "08075"
}
Now you can access data with expressions like {{ $json.First }} in subsequent nodes.
Extracting from CSV Files
CSV (Comma-Separated Values) extraction is the most common use case. The format follows the RFC 4180 specification, though many real-world CSV files deviate from the standard. Here’s everything you need to know.
Configuration Options
| Option | Default | Purpose |
|---|---|---|
| Header Row | false | Treat first row as column names |
| Delimiter | Comma | Field separator (comma, semicolon, tab, etc.) |
| Include Empty Cells | false | Include empty values as empty strings |
Handling Different Delimiters
European CSV exports often use semicolons instead of commas. If your data looks wrong after extraction (all values in one column), check the delimiter:
- Open your CSV in a text editor
- Look at how values are separated
- Set the Delimiter option to match
Common delimiters:
,- Comma (US standard);- Semicolon (European standard)\t- Tab (TSV files)|- Pipe (some exports)
Encoding Issues
If extracted text shows strange characters, you have an encoding mismatch. The file might be UTF-16 or ISO-8859-1 instead of UTF-8. Unfortunately, the Extract from File node doesn’t have an encoding option. Workarounds:
- Convert the file encoding before importing to n8n
- Use a Code node to handle encoding manually
- Request the source system export in UTF-8
Common CSV Issues
Problem: All data appears in a single column
Cause: Wrong delimiter setting
Fix: Check your file and set the correct delimiter
Problem: First row of data is missing
Cause: Header Row is enabled but file has no headers
Fix: Disable the Header Row option
Problem: Numbers extracted as strings
Cause: CSV format is inherently text-based
Fix: Use Edit Fields or Code node to convert: {{ parseInt($json.amount) }}
Extracting from Excel Files
Excel files (XLS/XLSX) contain structured spreadsheet data with support for multiple sheets.
Basic Extraction
For simple spreadsheets with data starting in cell A1:
- Set Operation to “Extract from XLSX” (or XLS for older files)
- Leave defaults and test
Each row becomes a JSON item, columns become properties.
Working with Multiple Sheets
By default, n8n extracts from the first sheet. To extract from a specific sheet:
- Find the Sheet Name option
- Enter the exact sheet name (case-sensitive)
To extract from multiple sheets, you need multiple Extract from File nodes or a loop structure.
Reading Specific Ranges
The node extracts all data by default. For specific ranges:
- Use the Range option
- Enter Excel-style range notation:
A1:D100
This is useful for spreadsheets with metadata rows or multiple tables.
Excel-Specific Issues
Problem: Date values appear as numbers (like 45234)
Cause: Excel stores dates as serial numbers
Fix: Convert in a Code node:
// Excel stores dates as "serial numbers" (days since January 1, 1900)
// This code converts that number to a proper JavaScript date
// Get the Excel serial number from your extracted data
const excelDate = $json.dateColumn;
// Convert to JavaScript date:
// - Subtract 25569 (days between 1900 and 1970, the JavaScript epoch)
// - Multiply by 86400 (seconds per day) and 1000 (milliseconds)
const jsDate = new Date((excelDate - 25569) * 86400 * 1000);
// Return the date in ISO format (e.g., "2024-03-15T00:00:00.000Z")
return { date: jsDate.toISOString() };
Problem: Formulas extracted instead of values
Cause: Unusual, but can happen with certain files
Fix: Open in Excel, copy-paste values only, re-save
Extracting from PDF Files
PDF extraction is powerful but has important limitations.
What Gets Extracted
The node extracts text content from PDFs. This includes:
- Paragraphs and headings
- Table text (but not table structure)
- Text in forms
It does not extract:
- Images or graphics
- Text embedded in images (scanned documents)
- Complex table layouts as structured data
Text-Based vs Image-Based PDFs
This distinction is critical:
Text-based PDFs (created digitally): Extraction works well. The output includes readable text you can process.
Image-based PDFs (scanned documents): Extraction returns nothing or garbage. The “text” is actually a picture of text.
Test by opening the PDF and trying to select/copy text. If you can’t select individual words, it’s image-based and requires OCR.
When PDF Extraction Isn’t Enough
For complex PDFs with tables, invoices, or forms, the basic text extraction often falls short. Consider these alternatives:
- AI Vision Models - Send PDF pages as images to multimodal AI models that can understand visual layout and table structure
- Cloud OCR Services - AWS Textract, Google Cloud Document AI, or Azure Form Recognizer provide structured extraction from complex documents
- Open-source OCR - Tools like Tesseract OCR can convert scanned documents to text
These require additional setup but handle complex documents the basic Extract from File node cannot. For understanding the PDF format itself, the Adobe PDF Reference provides technical details.
PDF Extraction Example
Trigger → HTTP Request (get PDF) → Extract from PDF → Code (parse text) → Output
The raw PDF text output is a single string. You typically need a Code node to parse specific information using regular expressions:
// The extracted PDF text comes as a single string in $json.text
const text = $json.text;
// Use regex to find patterns in the text
// This pattern looks for "Invoice #:" followed by digits
const invoiceMatch = text.match(/Invoice #:\s*(\d+)/);
// If found, get the captured group (the digits); otherwise null
const invoiceNumber = invoiceMatch ? invoiceMatch[1] : null;
// Look for "Total:" followed by optional $ and numbers
const totalMatch = text.match(/Total:\s*\$?([\d,.]+)/);
// Convert the matched string to a number (remove commas first)
const total = totalMatch ? parseFloat(totalMatch[1].replace(',', '')) : null;
// Return the parsed data as structured JSON
return {
invoiceNumber, // e.g., "12345"
total, // e.g., 1234.56
rawText: text // Keep original text for debugging
};
Common Input Sources
Extract from File works with any node that outputs binary data. Here are the most common sources:
HTTP Request (API Downloads)
For files available via URL:
HTTP Request → Extract from File
Configure HTTP Request:
- Method: GET
- URL: The file URL
- Response Format: File (automatic for binary content)
Read/Write Files from Disk
For local files (self-hosted n8n only):
Read/Write Files from Disk → Extract from File
Configure Read/Write Files:
- Operation: Read File(s) From Disk
- File Path: /path/to/your/file.csv
Google Drive / Cloud Storage
For files stored in cloud services:
Google Drive (Download) → Extract from File
Most cloud storage nodes have a “Download” operation that outputs binary data.
Email Attachments
For processing email attachments:
Gmail Trigger → Extract from File
Gmail outputs attachments as binary properties named attachment_0, attachment_1, etc. You must set the Binary Property field to match.
Form Trigger File Uploads
For user-uploaded files:
Form Trigger → Extract from File
Important: Form Trigger file uploads require specific configuration. See the troubleshooting section below.
Troubleshooting Common Errors
These are the errors n8n users encounter most often with file extraction.
”Binary file ‘data’ not found” Error
This is the most common extraction error. It means the node cannot find binary data under the expected property name.
Causes:
- The property name doesn’t match (e.g., binary is named “attachment_0” but node expects “data”)
- Binary data was lost in a previous node
- The source node didn’t output binary data
Diagnostic steps:
- Click on the node before Extract from File
- Check the Output panel for a “Binary” tab
- Note the property name shown (data, attachment_0, file, etc.)
Fixes:
- If property name differs, update Binary Property in Extract from File
- If no binary tab exists, the issue is with your source node
- If using intermediate nodes (Set, Switch), check they preserve binary data
Binary Data Lost After Intermediate Nodes
You download a file, add a Set node to add some metadata, then Extract from File fails. The binary data disappeared.
Why this happens:
The Edit Fields node in “Set” mode replaces items entirely. Binary data is part of the item, so it gets discarded.
Fix 1: Use “Append” mode
In Edit Fields, change mode from “Set” to “Append”. This keeps existing data including binary.
Fix 2: Use a Code node to preserve binary
// Get all incoming items (each item may have both JSON and binary data)
const items = $input.all();
// Transform each item while keeping the binary data intact
return items.map(item => ({
json: {
...item.json, // Spread operator: keep all existing JSON fields
myNewField: "value" // Add your new field
},
binary: item.binary // IMPORTANT: Explicitly pass through binary data
}));
Fix 3: Restructure your workflow
Extract from the file first, then transform the JSON data:
Source → Extract from File → Edit Fields
Instead of:
Source → Edit Fields → Extract from File (fails)
Form Trigger Upload Issues
Form Trigger file uploads are notorious for extraction problems. Common symptoms:
- Binary property exists but has no actual data
- Error: “The item has no binary field”
- Extraction returns empty results
The underlying issue:
Form Trigger has known issues with binary data format. The uploaded file metadata exists, but the actual Base64 content may be missing or malformed.
Workarounds:
- Check your n8n version - Updates often fix Form Trigger binary handling
- Use a Code node to validate binary data exists:
// Get all items from the Form Trigger
const items = $input.all();
// Loop through each item to validate it has binary data
for (const item of items) {
// The ?. is "optional chaining" - safely access nested properties
// If any part is undefined, the whole expression returns undefined
if (!item.binary?.file?.data) {
// Throw an error to stop the workflow with a clear message
throw new Error('No binary data received from form');
}
}
// If we get here, all items have valid binary data
return items;
- Alternative: Use webhook with multipart/form-data instead
The Webhook node sometimes handles file uploads more reliably than Form Trigger.
Empty or No Data Extracted
You run extraction successfully (no error), but the output is empty or contains no useful data.
For PDFs:
- The PDF may be image-based (scanned document)
- Try opening the PDF and selecting text - if you can’t, OCR is needed
- The PDF may be password-protected
For CSV/Excel:
- File may be empty
- Data may start on a row other than row 1
- Headers may be on a different row than expected
For all formats:
- File may be corrupted
- File extension may not match actual format (e.g., CSV saved as .xlsx)
- Encoding issues may cause content to be unreadable
Debugging approach:
Add a Code node after your source to inspect the binary data:
// Get the first item's binary data (under the "data" property)
// The ?. prevents errors if binary or data doesn't exist
const binaryData = $input.first().binary?.data;
// Return diagnostic information about the binary data
return {
exists: !!binaryData, // true if binaryData exists, false otherwise
mimeType: binaryData?.mimeType, // e.g., "application/pdf"
fileName: binaryData?.fileName, // e.g., "report.pdf"
dataLength: binaryData?.data?.length || 0 // Length of Base64 string (0 if missing)
};
If dataLength is 0 or very small, the file transfer failed.
Real-World Workflow Examples
Example 1: Process Email Invoice Attachments
Scenario: Automatically extract data from PDF invoices received via email and log to a spreadsheet.
Gmail Trigger → IF (has attachment) → Extract from PDF → Code (parse invoice) → Google Sheets
Gmail Trigger configuration:
- Trigger on new emails matching a filter (e.g., from: [email protected])
- Attachments output as binary properties
IF node: Check that an attachment exists:
{{ $json.attachments?.length > 0 }}
Extract from File:
- Operation: Extract from PDF
- Binary Property:
attachment_0
Code node (parse invoice text):
// Get the extracted text from the PDF
const text = $json.text;
// Use regex patterns to find specific data in the invoice text
// Each .match() returns an array where [1] is the captured group, or null if not found
// The ?. safely handles null results, and ?.trim() removes whitespace
return {
vendor: text.match(/From:\s*(.+)/)?.[1]?.trim(), // Find "From: Company Name"
invoiceNumber: text.match(/Invoice[#:\s]+(\w+)/i)?.[1], // Find "Invoice #123" or "Invoice: ABC"
amount: text.match(/Total[:\s]*\$?([\d,.]+)/)?.[1], // Find "Total: $1,234.56"
rawText: text.substring(0, 500) // Keep first 500 chars for debugging
};
Example 2: Sync CSV Reports to Database
Scenario: Download a daily report CSV from an API and upsert records to a database.
Schedule Trigger → HTTP Request → Extract from CSV → Loop → Postgres (upsert)
HTTP Request:
- URL: API endpoint returning CSV
- Authentication as needed
Extract from CSV:
- Header Row: true (if CSV has headers)
- Delimiter: comma
Loop Over Items: The extraction creates one item per row. Connect directly to database node.
Postgres:
- Operation: Upsert
- Map columns from
$jsonto database fields
Example 3: User Form File Processing
Scenario: Accept Excel uploads via a form and validate the data.
Form Trigger → Extract from XLSX → Code (validate) → IF (valid) → Process / Send Error
Form Trigger:
- Add a file upload field
- Note the field name (used as binary property)
Extract from XLSX:
- Binary Property: match your form field name
Code (validation):
// Get all items (each row from the Excel file is one item)
const items = $input.all();
// Array to collect any validation errors we find
const errors = [];
// Loop through each row to validate the data
for (let i = 0; i < items.length; i++) {
// Get the JSON data for this row
const row = items[i].json;
// Check if email exists and contains @
if (!row.email || !row.email.includes('@')) {
errors.push(`Row ${i + 1}: Invalid email`);
}
// Check if name exists and is at least 2 characters
if (!row.name || row.name.length < 2) {
errors.push(`Row ${i + 1}: Name too short`);
}
}
// If we found any errors, return them with valid: false
if (errors.length > 0) {
return { valid: false, errors };
}
// All rows passed validation
return { valid: true, rowCount: items.length };
Example 4: Batch Process Local Files
Scenario: Process all CSV files in a folder (self-hosted n8n).
Schedule Trigger → Read Files from Disk → Loop → Extract from CSV → Aggregate → Output
Read Files from Disk:
- File Selector:
/data/imports/*.csv - This returns multiple items if multiple files exist
Split In Batches (or Loop): Process files one at a time to avoid memory issues.
Extract from CSV: Processes each file’s binary data.
Aggregate: Combine all extracted rows if needed.
Pro Tips and Best Practices
1. Always Verify Binary Data Exists
Before extraction, confirm your source node outputs binary data. Add a Set node with an expression that fails if binary is missing:
// This expression checks if binary data exists
// $binary.data accesses the "data" binary property
// If it exists, returns 'has data'
// If missing, $throw() stops the workflow with your error message
{{ $binary.data ? 'has data' : $throw('No binary data') }}
2. Match Binary Property Names Exactly
The most common error source is mismatched property names. Check your source node’s output and copy the exact property name to Extract from File.
3. Handle Extraction Errors Gracefully
Wrap extraction in error handling for production workflows. Connect an Error Trigger node workflow to catch failures:
[Main Workflow] → Extract from File (may fail)
↓ (on error)
[Error Workflow] → Error Trigger → Slack notification
4. Consider File Size Limits
Large files can cause memory issues, especially in n8n Cloud. For files over 10MB:
- Process in batches if possible (multiple smaller files)
- Use filesystem mode for binary storage (self-hosted)
- Consider external processing for very large files
5. Use Sub-Workflows for Reusable Extraction Logic
If you extract the same file type in multiple workflows, create a sub-workflow:
[Sub-Workflow]
Start → Extract from File → Code (standardize output) → End
[Main Workflows]
... → Execute Workflow (call sub-workflow) → ...
This centralizes your extraction configuration and makes updates easier.
6. Log Extraction Results for Debugging
During development, add a Set node after extraction to capture metadata:
{
// Count how many rows were extracted
"extractedRowCount": {{ $items().length }},
// Get the original filename from the HTTP Request node
// $('Node Name') references a specific node's output
"sourceFile": "{{ $('HTTP Request').item.binary.data.fileName }}",
// Add timestamp using n8n's built-in $now variable
"extractedAt": "{{ $now.toISO() }}"
}
This helps diagnose issues when extraction doesn’t return expected results.
For complex file processing workflows, our workflow development services can help you build robust, production-ready solutions. If you need guidance on architecture or best practices, explore our n8n consulting services.
Frequently Asked Questions
Why does my PDF extraction return empty text?
PDF extraction only works with text-based PDFs where the text is encoded as actual characters, not images.
If your PDF was created by scanning paper documents, the “text” is actually a picture of text. This requires OCR (Optical Character Recognition) to read.
Quick test: Open the PDF and try to select/copy text with your mouse. If you cannot select individual words, you need OCR.
OCR options:
- AI vision models to read PDF pages as images
- Cloud OCR services like AWS Textract or Google Cloud Document AI
- Open-source tools like Tesseract for self-hosted processing
How do I extract data from multiple sheets in an Excel file?
The Extract from File node extracts from one sheet at a time.
To process multiple sheets, you have two options:
Option 1: Multiple Extract nodes Use multiple Extract from File nodes in sequence, each configured for a different sheet name.
Option 2: Loop approach Create an array of sheet names, loop through them, and for each iteration use an expression to set the sheet name in Extract from File.
The extracted data from each sheet can then be aggregated or processed separately.
Can I extract from password-protected files?
Short answer: No. The native Extract from File node does not support password-protected files.
Workarounds:
- Use a Code node with an appropriate library to decrypt first
- Pre-process the file externally to remove protection
- Set up an external decryption service
- Request unprotected exports from your data providers
Our workflow debugger tool can help diagnose issues if you’re unsure whether password protection is causing extraction failures.
How do I handle CSV files with different delimiters?
The Extract from File node has a Delimiter option specifically for CSV extraction.
Common delimiters:
,(comma) for US-standard CSV;(semicolon) for European formats\t(tab) for TSV files|(pipe) for some database exports
How to check: Open your file in a text editor (not Excel) and look at how values are separated.
If your delimiter varies between files, use an expression to set it dynamically. For truly irregular files that mix delimiters or have complex quoting rules, a Code node with a CSV parsing library gives you full control.
Why does binary data disappear after my Switch node?
The Switch node itself doesn’t remove binary data. The issue is usually what happens after.
Common cause: Adding transform nodes (Edit Fields, Set) after the Switch that replace item data without preserving binary.
The fix: Ensure any Edit Fields nodes after Switch use “Append” mode rather than “Set” mode.
How to verify: Check the output panel’s Binary tab at any point to confirm binary data exists.
If binary is truly missing after Switch (rare), it may be a version-specific bug. Check the n8n community forum for similar reports or consider updating n8n.