n8n Extract from File Node
📁
Transform Node

n8n Extract from File Node

Master the n8n Extract from File node to parse PDFs, Excel, CSV, and more. Learn binary data handling, troubleshoot common errors, and build file processing workflows.

That PDF attachment your workflow just received? The data is trapped inside.

Invoices from email. Customer lists from spreadsheets. Reports downloaded from APIs. The Extract from File node cracks open these files and transforms binary data into JSON your workflow can actually use.

The Challenge

Binary data handling in n8n trips up even experienced users:

  • Files seem to “disappear” between nodes
  • Extraction returns empty results
  • Error messages reference properties that don’t exist

These problems have real solutions. This guide covers every one of them.

What You’ll Learn

  • How to extract data from 10+ file formats including PDF, Excel, CSV, and more
  • Binary data fundamentals that prevent the most common n8n file errors
  • Step-by-step troubleshooting for “binary file not found” and empty extraction issues
  • Real workflow patterns for processing email attachments, form uploads, and API downloads

When to Use Extract from File

Before diving into configuration, understand when this node is the right choice:

ScenarioBest ApproachWhy
Parse CSV/Excel from API or emailExtract from FileConverts binary to JSON rows
Read PDF text contentExtract from FileNative PDF text extraction
Process form file uploadsExtract from FileWorks with Form Trigger binary data
Parse complex PDFs with tablesAI vision models or OCRBetter table structure recognition
Read local JSON config filesRead/Write Files from DiskAlready outputs JSON directly
Transform JSON between nodesEdit Fields nodeNo file conversion needed
Create files from JSON dataConvert to File nodeInverse operation of Extract from File

Rule of thumb: Use Extract from File whenever you have binary file data that needs to become structured JSON. If your data is already JSON or you need advanced document intelligence, consider alternatives.

Understanding Binary Data in n8n

Before extracting anything, you need to understand how n8n handles files. This knowledge prevents 90% of extraction errors. For a deeper dive, see the official n8n binary data documentation.

What Is Binary Data?

In n8n, data comes in two forms:

  1. JSON data - Structured key-value pairs that nodes can read and manipulate directly
  2. Binary data - Raw file contents (images, PDFs, spreadsheets) stored as Base64-encoded strings separately from JSON

When you download a file via the HTTP Request node or receive an email attachment, the file content lives in a binary property, not in $json. You cannot access binary content with expressions like {{ $json.fileContent }}.

The Binary Property Convention

Binary data is stored under named properties. The default name is data, but it can be anything:

// Typical item structure with binary data in n8n
{
  "json": {
    // JSON properties are accessible via $json expressions
    "fileName": "report.pdf",
    "size": 102400
  },
  "binary": {
    // Binary data lives here, separate from JSON
    // The property name "data" is the default, but can be anything
    "data": {
      "mimeType": "application/pdf",    // File type identifier
      "fileName": "report.pdf",          // Original filename
      "fileExtension": "pdf",            // Extension without the dot
      "data": "JVBERi0xLjQK..."          // Base64 encoded file contents
    }
  }
}

The Extract from File node reads from binary.data by default. If your binary property has a different name (like attachment or file), you must specify it in the node configuration.

How Files Flow Through Workflows

Understanding this flow prevents the “where did my file go?” confusion:

  1. Source node creates binary data (HTTP Request, Gmail, Read Files from Disk)
  2. Binary data travels alongside JSON through connections
  3. Transform nodes (Set, Switch, IF) pass binary data through if configured correctly
  4. Extract from File reads the binary and outputs JSON
  5. Original binary data is gone after extraction unless explicitly preserved

Critical insight: Some nodes discard binary data by default. If you add an Edit Fields node between your file source and Extract from File, and you’re using “Set” mode instead of “Append”, your binary data disappears.

Supported File Formats

The Extract from File node handles 10 different formats, each with specific use cases. For the complete parameter reference, see the official n8n Extract from File documentation.

FormatOperationBest ForSpecial Considerations
CSVExtract from CSVTabular data exports, database dumpsConfigure delimiter if not comma
XLSXExtract from XLSXModern Excel filesCan read specific sheets
XLSExtract from XLSLegacy Excel filesOlder format, less common
PDFExtract from PDFReports, invoices, documentsOnly extracts text, not images
HTMLExtract from HTMLWeb page contentExtracts structured data from tables
JSONExtract from JSONJSON files in binary formUseful for binary JSON payloads
ICSExtract from ICSCalendar eventsiCalendar format parsing
ODSExtract from ODSLibreOffice spreadsheetsOpen document format
RTFExtract from RTFRich text documentsPreserves basic formatting
TextExtract from TextPlain text filesSimple string output

There’s also a special operation:

OperationPurpose
Convert Binary File to Base64 StringConverts binary to Base64 text for APIs requiring string input

Your First Extraction

Let’s walk through a complete example: downloading a CSV file and extracting its data.

Step 1: Add the HTTP Request Node

First, we need to get the file:

  1. Add an HTTP Request node to your workflow
  2. Set Method to GET
  3. Enter a CSV file URL (for testing, try https://people.sc.fsu.edu/~jburkardt/data/csv/addresses.csv)
  4. Click Test step

The node returns binary data. In the output panel, you’ll see a “Binary” tab showing your file.

Step 2: Add Extract from File

  1. Add an Extract from File node after HTTP Request
  2. Set Operation to “Extract from CSV”
  3. Leave Binary Property as “data” (the default)
  4. Click Test step

Step 3: Use the Extracted Data

The output is now JSON. Each row becomes a separate item:

{
  "row": {
    "0": "John",
    "1": "Doe",
    "2": "120 jefferson st.",
    "3": "Riverside",
    "4": "NJ",
    "5": "08075"
  }
}

If your CSV has headers and you enable the “Header Row” option, the output uses column names as keys:

{
  "First": "John",
  "Last": "Doe",
  "Address": "120 jefferson st.",
  "City": "Riverside",
  "State": "NJ",
  "Zip": "08075"
}

Now you can access data with expressions like {{ $json.First }} in subsequent nodes.

Extracting from CSV Files

CSV (Comma-Separated Values) extraction is the most common use case. The format follows the RFC 4180 specification, though many real-world CSV files deviate from the standard. Here’s everything you need to know.

Configuration Options

OptionDefaultPurpose
Header RowfalseTreat first row as column names
DelimiterCommaField separator (comma, semicolon, tab, etc.)
Include Empty CellsfalseInclude empty values as empty strings

Handling Different Delimiters

European CSV exports often use semicolons instead of commas. If your data looks wrong after extraction (all values in one column), check the delimiter:

  1. Open your CSV in a text editor
  2. Look at how values are separated
  3. Set the Delimiter option to match

Common delimiters:

  • , - Comma (US standard)
  • ; - Semicolon (European standard)
  • \t - Tab (TSV files)
  • | - Pipe (some exports)

Encoding Issues

If extracted text shows strange characters, you have an encoding mismatch. The file might be UTF-16 or ISO-8859-1 instead of UTF-8. Unfortunately, the Extract from File node doesn’t have an encoding option. Workarounds:

  1. Convert the file encoding before importing to n8n
  2. Use a Code node to handle encoding manually
  3. Request the source system export in UTF-8

Common CSV Issues

Problem: All data appears in a single column

Cause: Wrong delimiter setting

Fix: Check your file and set the correct delimiter


Problem: First row of data is missing

Cause: Header Row is enabled but file has no headers

Fix: Disable the Header Row option


Problem: Numbers extracted as strings

Cause: CSV format is inherently text-based

Fix: Use Edit Fields or Code node to convert: {{ parseInt($json.amount) }}

Extracting from Excel Files

Excel files (XLS/XLSX) contain structured spreadsheet data with support for multiple sheets.

Basic Extraction

For simple spreadsheets with data starting in cell A1:

  1. Set Operation to “Extract from XLSX” (or XLS for older files)
  2. Leave defaults and test

Each row becomes a JSON item, columns become properties.

Working with Multiple Sheets

By default, n8n extracts from the first sheet. To extract from a specific sheet:

  1. Find the Sheet Name option
  2. Enter the exact sheet name (case-sensitive)

To extract from multiple sheets, you need multiple Extract from File nodes or a loop structure.

Reading Specific Ranges

The node extracts all data by default. For specific ranges:

  1. Use the Range option
  2. Enter Excel-style range notation: A1:D100

This is useful for spreadsheets with metadata rows or multiple tables.

Excel-Specific Issues

Problem: Date values appear as numbers (like 45234)

Cause: Excel stores dates as serial numbers

Fix: Convert in a Code node:

// Excel stores dates as "serial numbers" (days since January 1, 1900)
// This code converts that number to a proper JavaScript date

// Get the Excel serial number from your extracted data
const excelDate = $json.dateColumn;

// Convert to JavaScript date:
// - Subtract 25569 (days between 1900 and 1970, the JavaScript epoch)
// - Multiply by 86400 (seconds per day) and 1000 (milliseconds)
const jsDate = new Date((excelDate - 25569) * 86400 * 1000);

// Return the date in ISO format (e.g., "2024-03-15T00:00:00.000Z")
return { date: jsDate.toISOString() };

Problem: Formulas extracted instead of values

Cause: Unusual, but can happen with certain files

Fix: Open in Excel, copy-paste values only, re-save

Extracting from PDF Files

PDF extraction is powerful but has important limitations.

What Gets Extracted

The node extracts text content from PDFs. This includes:

  • Paragraphs and headings
  • Table text (but not table structure)
  • Text in forms

It does not extract:

  • Images or graphics
  • Text embedded in images (scanned documents)
  • Complex table layouts as structured data

Text-Based vs Image-Based PDFs

This distinction is critical:

Text-based PDFs (created digitally): Extraction works well. The output includes readable text you can process.

Image-based PDFs (scanned documents): Extraction returns nothing or garbage. The “text” is actually a picture of text.

Test by opening the PDF and trying to select/copy text. If you can’t select individual words, it’s image-based and requires OCR.

When PDF Extraction Isn’t Enough

For complex PDFs with tables, invoices, or forms, the basic text extraction often falls short. Consider these alternatives:

  1. AI Vision Models - Send PDF pages as images to multimodal AI models that can understand visual layout and table structure
  2. Cloud OCR Services - AWS Textract, Google Cloud Document AI, or Azure Form Recognizer provide structured extraction from complex documents
  3. Open-source OCR - Tools like Tesseract OCR can convert scanned documents to text

These require additional setup but handle complex documents the basic Extract from File node cannot. For understanding the PDF format itself, the Adobe PDF Reference provides technical details.

PDF Extraction Example

Trigger → HTTP Request (get PDF) → Extract from PDF → Code (parse text) → Output

The raw PDF text output is a single string. You typically need a Code node to parse specific information using regular expressions:

// The extracted PDF text comes as a single string in $json.text
const text = $json.text;

// Use regex to find patterns in the text
// This pattern looks for "Invoice #:" followed by digits
const invoiceMatch = text.match(/Invoice #:\s*(\d+)/);
// If found, get the captured group (the digits); otherwise null
const invoiceNumber = invoiceMatch ? invoiceMatch[1] : null;

// Look for "Total:" followed by optional $ and numbers
const totalMatch = text.match(/Total:\s*\$?([\d,.]+)/);
// Convert the matched string to a number (remove commas first)
const total = totalMatch ? parseFloat(totalMatch[1].replace(',', '')) : null;

// Return the parsed data as structured JSON
return {
  invoiceNumber,    // e.g., "12345"
  total,            // e.g., 1234.56
  rawText: text     // Keep original text for debugging
};

Common Input Sources

Extract from File works with any node that outputs binary data. Here are the most common sources:

HTTP Request (API Downloads)

For files available via URL:

HTTP Request → Extract from File

Configure HTTP Request:

  • Method: GET
  • URL: The file URL
  • Response Format: File (automatic for binary content)

Read/Write Files from Disk

For local files (self-hosted n8n only):

Read/Write Files from Disk → Extract from File

Configure Read/Write Files:

  • Operation: Read File(s) From Disk
  • File Path: /path/to/your/file.csv

Google Drive / Cloud Storage

For files stored in cloud services:

Google Drive (Download) → Extract from File

Most cloud storage nodes have a “Download” operation that outputs binary data.

Email Attachments

For processing email attachments:

Gmail Trigger → Extract from File

Gmail outputs attachments as binary properties named attachment_0, attachment_1, etc. You must set the Binary Property field to match.

Form Trigger File Uploads

For user-uploaded files:

Form Trigger → Extract from File

Important: Form Trigger file uploads require specific configuration. See the troubleshooting section below.

Troubleshooting Common Errors

These are the errors n8n users encounter most often with file extraction.

”Binary file ‘data’ not found” Error

This is the most common extraction error. It means the node cannot find binary data under the expected property name.

Causes:

  1. The property name doesn’t match (e.g., binary is named “attachment_0” but node expects “data”)
  2. Binary data was lost in a previous node
  3. The source node didn’t output binary data

Diagnostic steps:

  1. Click on the node before Extract from File
  2. Check the Output panel for a “Binary” tab
  3. Note the property name shown (data, attachment_0, file, etc.)

Fixes:

  1. If property name differs, update Binary Property in Extract from File
  2. If no binary tab exists, the issue is with your source node
  3. If using intermediate nodes (Set, Switch), check they preserve binary data

Binary Data Lost After Intermediate Nodes

You download a file, add a Set node to add some metadata, then Extract from File fails. The binary data disappeared.

Why this happens:

The Edit Fields node in “Set” mode replaces items entirely. Binary data is part of the item, so it gets discarded.

Fix 1: Use “Append” mode

In Edit Fields, change mode from “Set” to “Append”. This keeps existing data including binary.

Fix 2: Use a Code node to preserve binary

// Get all incoming items (each item may have both JSON and binary data)
const items = $input.all();

// Transform each item while keeping the binary data intact
return items.map(item => ({
  json: {
    ...item.json,        // Spread operator: keep all existing JSON fields
    myNewField: "value"  // Add your new field
  },
  binary: item.binary    // IMPORTANT: Explicitly pass through binary data
}));

Fix 3: Restructure your workflow

Extract from the file first, then transform the JSON data:

Source → Extract from File → Edit Fields

Instead of:

Source → Edit Fields → Extract from File (fails)

Form Trigger Upload Issues

Form Trigger file uploads are notorious for extraction problems. Common symptoms:

  • Binary property exists but has no actual data
  • Error: “The item has no binary field”
  • Extraction returns empty results

The underlying issue:

Form Trigger has known issues with binary data format. The uploaded file metadata exists, but the actual Base64 content may be missing or malformed.

Workarounds:

  1. Check your n8n version - Updates often fix Form Trigger binary handling
  2. Use a Code node to validate binary data exists:
// Get all items from the Form Trigger
const items = $input.all();

// Loop through each item to validate it has binary data
for (const item of items) {
  // The ?. is "optional chaining" - safely access nested properties
  // If any part is undefined, the whole expression returns undefined
  if (!item.binary?.file?.data) {
    // Throw an error to stop the workflow with a clear message
    throw new Error('No binary data received from form');
  }
}

// If we get here, all items have valid binary data
return items;
  1. Alternative: Use webhook with multipart/form-data instead

The Webhook node sometimes handles file uploads more reliably than Form Trigger.

Empty or No Data Extracted

You run extraction successfully (no error), but the output is empty or contains no useful data.

For PDFs:

  • The PDF may be image-based (scanned document)
  • Try opening the PDF and selecting text - if you can’t, OCR is needed
  • The PDF may be password-protected

For CSV/Excel:

  • File may be empty
  • Data may start on a row other than row 1
  • Headers may be on a different row than expected

For all formats:

  • File may be corrupted
  • File extension may not match actual format (e.g., CSV saved as .xlsx)
  • Encoding issues may cause content to be unreadable

Debugging approach:

Add a Code node after your source to inspect the binary data:

// Get the first item's binary data (under the "data" property)
// The ?. prevents errors if binary or data doesn't exist
const binaryData = $input.first().binary?.data;

// Return diagnostic information about the binary data
return {
  exists: !!binaryData,                    // true if binaryData exists, false otherwise
  mimeType: binaryData?.mimeType,          // e.g., "application/pdf"
  fileName: binaryData?.fileName,          // e.g., "report.pdf"
  dataLength: binaryData?.data?.length || 0 // Length of Base64 string (0 if missing)
};

If dataLength is 0 or very small, the file transfer failed.

Real-World Workflow Examples

Example 1: Process Email Invoice Attachments

Scenario: Automatically extract data from PDF invoices received via email and log to a spreadsheet.

Gmail Trigger → IF (has attachment) → Extract from PDF → Code (parse invoice) → Google Sheets

Gmail Trigger configuration:

  • Trigger on new emails matching a filter (e.g., from: [email protected])
  • Attachments output as binary properties

IF node: Check that an attachment exists:

{{ $json.attachments?.length > 0 }}

Extract from File:

  • Operation: Extract from PDF
  • Binary Property: attachment_0

Code node (parse invoice text):

// Get the extracted text from the PDF
const text = $json.text;

// Use regex patterns to find specific data in the invoice text
// Each .match() returns an array where [1] is the captured group, or null if not found
// The ?. safely handles null results, and ?.trim() removes whitespace
return {
  vendor: text.match(/From:\s*(.+)/)?.[1]?.trim(),           // Find "From: Company Name"
  invoiceNumber: text.match(/Invoice[#:\s]+(\w+)/i)?.[1],    // Find "Invoice #123" or "Invoice: ABC"
  amount: text.match(/Total[:\s]*\$?([\d,.]+)/)?.[1],        // Find "Total: $1,234.56"
  rawText: text.substring(0, 500)                             // Keep first 500 chars for debugging
};

Example 2: Sync CSV Reports to Database

Scenario: Download a daily report CSV from an API and upsert records to a database.

Schedule Trigger → HTTP Request → Extract from CSV → Loop → Postgres (upsert)

HTTP Request:

  • URL: API endpoint returning CSV
  • Authentication as needed

Extract from CSV:

  • Header Row: true (if CSV has headers)
  • Delimiter: comma

Loop Over Items: The extraction creates one item per row. Connect directly to database node.

Postgres:

  • Operation: Upsert
  • Map columns from $json to database fields

Example 3: User Form File Processing

Scenario: Accept Excel uploads via a form and validate the data.

Form Trigger → Extract from XLSX → Code (validate) → IF (valid) → Process / Send Error

Form Trigger:

  • Add a file upload field
  • Note the field name (used as binary property)

Extract from XLSX:

  • Binary Property: match your form field name

Code (validation):

// Get all items (each row from the Excel file is one item)
const items = $input.all();

// Array to collect any validation errors we find
const errors = [];

// Loop through each row to validate the data
for (let i = 0; i < items.length; i++) {
  // Get the JSON data for this row
  const row = items[i].json;

  // Check if email exists and contains @
  if (!row.email || !row.email.includes('@')) {
    errors.push(`Row ${i + 1}: Invalid email`);
  }

  // Check if name exists and is at least 2 characters
  if (!row.name || row.name.length < 2) {
    errors.push(`Row ${i + 1}: Name too short`);
  }
}

// If we found any errors, return them with valid: false
if (errors.length > 0) {
  return { valid: false, errors };
}

// All rows passed validation
return { valid: true, rowCount: items.length };

Example 4: Batch Process Local Files

Scenario: Process all CSV files in a folder (self-hosted n8n).

Schedule Trigger → Read Files from Disk → Loop → Extract from CSV → Aggregate → Output

Read Files from Disk:

  • File Selector: /data/imports/*.csv
  • This returns multiple items if multiple files exist

Split In Batches (or Loop): Process files one at a time to avoid memory issues.

Extract from CSV: Processes each file’s binary data.

Aggregate: Combine all extracted rows if needed.

Pro Tips and Best Practices

1. Always Verify Binary Data Exists

Before extraction, confirm your source node outputs binary data. Add a Set node with an expression that fails if binary is missing:

// This expression checks if binary data exists
// $binary.data accesses the "data" binary property
// If it exists, returns 'has data'
// If missing, $throw() stops the workflow with your error message
{{ $binary.data ? 'has data' : $throw('No binary data') }}

2. Match Binary Property Names Exactly

The most common error source is mismatched property names. Check your source node’s output and copy the exact property name to Extract from File.

3. Handle Extraction Errors Gracefully

Wrap extraction in error handling for production workflows. Connect an Error Trigger node workflow to catch failures:

[Main Workflow] → Extract from File (may fail)
                  ↓ (on error)
[Error Workflow] → Error Trigger → Slack notification

4. Consider File Size Limits

Large files can cause memory issues, especially in n8n Cloud. For files over 10MB:

  • Process in batches if possible (multiple smaller files)
  • Use filesystem mode for binary storage (self-hosted)
  • Consider external processing for very large files

5. Use Sub-Workflows for Reusable Extraction Logic

If you extract the same file type in multiple workflows, create a sub-workflow:

[Sub-Workflow]
Start → Extract from File → Code (standardize output) → End

[Main Workflows]
... → Execute Workflow (call sub-workflow) → ...

This centralizes your extraction configuration and makes updates easier.

6. Log Extraction Results for Debugging

During development, add a Set node after extraction to capture metadata:

{
  // Count how many rows were extracted
  "extractedRowCount": {{ $items().length }},

  // Get the original filename from the HTTP Request node
  // $('Node Name') references a specific node's output
  "sourceFile": "{{ $('HTTP Request').item.binary.data.fileName }}",

  // Add timestamp using n8n's built-in $now variable
  "extractedAt": "{{ $now.toISO() }}"
}

This helps diagnose issues when extraction doesn’t return expected results.

For complex file processing workflows, our workflow development services can help you build robust, production-ready solutions. If you need guidance on architecture or best practices, explore our n8n consulting services.

Frequently Asked Questions

Why does my PDF extraction return empty text?

PDF extraction only works with text-based PDFs where the text is encoded as actual characters, not images.

If your PDF was created by scanning paper documents, the “text” is actually a picture of text. This requires OCR (Optical Character Recognition) to read.

Quick test: Open the PDF and try to select/copy text with your mouse. If you cannot select individual words, you need OCR.

OCR options:

  • AI vision models to read PDF pages as images
  • Cloud OCR services like AWS Textract or Google Cloud Document AI
  • Open-source tools like Tesseract for self-hosted processing

How do I extract data from multiple sheets in an Excel file?

The Extract from File node extracts from one sheet at a time.

To process multiple sheets, you have two options:

Option 1: Multiple Extract nodes Use multiple Extract from File nodes in sequence, each configured for a different sheet name.

Option 2: Loop approach Create an array of sheet names, loop through them, and for each iteration use an expression to set the sheet name in Extract from File.

The extracted data from each sheet can then be aggregated or processed separately.

Can I extract from password-protected files?

Short answer: No. The native Extract from File node does not support password-protected files.

Workarounds:

  • Use a Code node with an appropriate library to decrypt first
  • Pre-process the file externally to remove protection
  • Set up an external decryption service
  • Request unprotected exports from your data providers

Our workflow debugger tool can help diagnose issues if you’re unsure whether password protection is causing extraction failures.

How do I handle CSV files with different delimiters?

The Extract from File node has a Delimiter option specifically for CSV extraction.

Common delimiters:

  • , (comma) for US-standard CSV
  • ; (semicolon) for European formats
  • \t (tab) for TSV files
  • | (pipe) for some database exports

How to check: Open your file in a text editor (not Excel) and look at how values are separated.

If your delimiter varies between files, use an expression to set it dynamically. For truly irregular files that mix delimiters or have complex quoting rules, a Code node with a CSV parsing library gives you full control.

Why does binary data disappear after my Switch node?

The Switch node itself doesn’t remove binary data. The issue is usually what happens after.

Common cause: Adding transform nodes (Edit Fields, Set) after the Switch that replace item data without preserving binary.

The fix: Ensure any Edit Fields nodes after Switch use “Append” mode rather than “Set” mode.

How to verify: Check the output panel’s Binary tab at any point to confirm binary data exists.

If binary is truly missing after Switch (rare), it may be a version-specific bug. Check the n8n community forum for similar reports or consider updating n8n.

Ready to Automate Your Business?

Tell us what you need automated. We'll build it, test it, and deploy it fast.

48-72 Hour Turnaround
Production Ready
Free Consultation

Create Your Free Account

Sign up once, use all tools free forever. We require accounts to prevent abuse and keep our tools running for everyone.

or

By signing up, you agree to our Terms of Service and Privacy Policy. No spam, unsubscribe anytime.