Streaming Large Files

Process files bigger than your RAM. It's easier than you think.

When to Stream vs Collect

Always stream when:

File is larger than available RAM (or even close to it)
You don't need all data at once
Processing can be done incrementally (line-by-line, record-by-record)
You want to start processing immediately without waiting for full download/read
Memory efficiency is important

Consider collecting when:

File is small (< 100MB) and fits comfortably in memory
You need random access to data
You need to process data multiple times
You need to sort or aggregate all data before processing
Memory is not a concern

Memory/Speed Tradeoffs:

Streaming: Constant memory (~64KB buffer), processes as data arrives, can't random access
Collecting: Memory = file size, all data available immediately, can random access, must wait for full load

Example decision:

// 10GB log file - MUST stream
for await (const line of read("huge.log").lines) {
  if (line.includes("ERROR")) console.log(line);
}

// 1MB config file - can collect
const config = await read("config.json").lines.collect();
const parsed = JSON.parse(config.join("\n"));

// 500MB data file - stream if processing once
const sum = await read("numbers.txt")
  .lines
  .map(line => parseFloat(line))
  .reduce((a, b) => a + b, 0);

The Problem

You have a 10GB log file. Loading it into memory crashes your program:

// ❌ Crashes with large files
const content = await Deno.readTextFile("huge.log");
const lines = content.split("\n");

The Solution

Stream it, one line at a time:

import { read } from "jsr:@j50n/proc@0.23.3";

// ✅ Constant memory, any file size
for await (const line of read("huge.log").lines) {
  if (line.includes("ERROR")) {
    console.log(line);
  }
}

How Streaming Works

Instead of loading everything:

Read a chunk (buffer)
Process it
Discard it
Repeat

Memory usage stays constant, no matter how big the file.

Real Examples

Count Lines in Huge File

const count = await read("10gb-file.txt").lines.count();
console.log(`${count} lines`);

Uses ~constant memory, even for 10GB.

Find Pattern in Large File

const matches = await read("huge.log")
  .lines
  .filter(line => line.includes("ERROR"))
  .take(10)  // Stop after 10 matches
  .collect();

Stops reading once it finds 10 matches. Efficient!

Process CSV File

const data = await read("huge-data.csv")
  .lines
  .drop(1)  // Skip header
  .map(line => {
    const [id, name, value] = line.split(",");
    return { id, name, value: parseFloat(value) };
  })
  .filter(row => row.value > 100)
  .collect();

Aggregate Large Dataset

const sum = await read("numbers.txt")
  .lines
  .map(line => parseFloat(line))
  .reduce((acc, n) => acc + n, 0);

Compressed Files

Stream compressed files without extracting:

const lineCount = await read("huge.log.gz")
  .transform(new DecompressionStream("gzip"))
  .lines
  .count();

Decompresses on-the-fly, never stores uncompressed data.

Multiple Files

Process multiple large files:

import { enumerate } from "jsr:@j50n/proc@0.23.3";

const files = ["log1.txt", "log2.txt", "log3.txt"];

for (const file of files) {
  const errors = await read(file)
    .lines
    .filter(line => line.includes("ERROR"))
    .count();
  console.log(`${file}: ${errors} errors`);
}

Streaming Transformations

Chain transformations, all streaming:

const result = await read("data.txt")
  .lines
  .map(line => line.trim())
  .filter(line => line.length > 0)
  .map(line => line.toUpperCase())
  .filter(line => line.startsWith("ERROR"))
  .collect();

Each line flows through all transformations before the next line is read.

Writing Large Files

Stream output to a file:

import { concat } from "jsr:@j50n/proc@0.23.3";

const processed = await read("input.txt")
  .lines
  .map(line => line.toUpperCase())
  .map(line => new TextEncoder().encode(line + "\n"))
  .collect();

await Deno.writeFile("output.txt", concat(processed));

Performance Tips

Use take() for Early Exit

// Stops reading after 100 matches
const first100 = await read("huge.txt")
  .lines
  .filter(predicate)
  .take(100)
  .collect();

Don't Collect Unless Needed

// ❌ Loads everything into memory
const lines = await read("huge.txt").lines.collect();
for (const line of lines) process(line);

// ✅ Streams
for await (const line of read("huge.txt").lines) {
  process(line);
}

Use Concurrent Processing

Process multiple files in parallel:

const results = await enumerate(files)
  .concurrentMap(async (file) => {
    return await read(file).lines.count();
  }, { concurrency: 3 })
  .collect();

Memory Usage

Streaming uses constant memory:

// File size: 10GB
// Memory used: ~64KB (buffer size)
await read("10gb-file.txt")
  .lines
  .forEach(line => process(line));

Real-World Example

Analyze a year of logs:

const errorsByDay = await read("year-of-logs.txt")
  .lines
  .filter(line => line.includes("ERROR"))
  .map(line => {
    const date = line.match(/\d{4}-\d{2}-\d{2}/)?.[0];
    return date;
  })
  .filter(date => date !== null)
  .reduce((acc, date) => {
    acc[date] = (acc[date] || 0) + 1;
    return acc;
  }, {});

// Show top 10 error days
Object.entries(errorsByDay)
  .sort((a, b) => b[1] - a[1])
  .slice(0, 10)
  .forEach(([date, count]) => {
    console.log(`${date}: ${count} errors`);
  });

Processes gigabytes of logs with minimal memory.

When to Stream

Always stream when:

File is larger than available RAM
You don't need all data at once
Processing can be done incrementally
You want to start processing immediately

Consider collecting when:

File is small (< 100MB)
You need random access
You need to process data multiple times
Memory is not a concern

Common Patterns

Filter and Count

const count = await read("file.txt")
  .lines
  .filter(predicate)
  .count();

Transform and Save

const output = await read("input.txt")
  .lines
  .map(transform)
  .map(line => new TextEncoder().encode(line + "\n"))
  .collect();

await Deno.writeFile("output.txt", concat(output));

Aggregate Data

const stats = await read("data.txt")
  .lines
  .reduce((acc, line) => {
    // Update statistics
    return acc;
  }, initialStats);

Next Steps

Concurrent Processing - Process multiple files in parallel
Performance Optimization - Make it faster
Decompressing Files - Work with compressed files

Keyboard shortcuts

proc