pdf

NAME

pdf — PDF creation, manipulation, and text extraction in SLICC

DESCRIPTION

SLICC provides a complete PDF toolkit that runs entirely in-browser via WebAssembly. Agents can read, create, merge, split, rotate, burst, and extract text from PDF files without Python or native binaries. The primary commands are pdftk (manipulation and extraction) and create-pdf (generation from scratch). Page-to-image conversion is available via convert on tray runtimes.

PDFTK

pdftk is the main PDF toolkit. It uses positional syntax — input file(s) come before the operation.

Synopsis

pdftk <input.pdf> <operation> [options]
pdftk A=first.pdf B=second.pdf cat A B output merged.pdf

Page Ranges

Page ranges are 1-based. The keyword end refers to the last page.

Extract Text

Dump the UTF-8 text content of every page in a PDF.

pdftk /mnt/file.pdf dump_data_utf8

Returns the full text content per page, suitable for piping into further analysis.

Extract Metadata

Print metadata including page count, title, author, and other info fields.

pdftk /mnt/file.pdf dump_data

Output includes NumberOfPages and InfoKey/InfoValue pairs.

Merge PDFs

Combine multiple PDFs into a single file by assigning handle labels.

pdftk A=/mnt/a.pdf B=/mnt/b.pdf C=/mnt/c.pdf cat A B C output /shared/merged.pdf

Handles (A, B, C…) allow you to interleave pages from different sources:

pdftk A=/mnt/report.pdf B=/mnt/appendix.pdf cat A1-5 B A6-end output /shared/combined.pdf

Split — Extract Page Ranges

Extract specific pages or ranges from a PDF.

# Extract pages 2–5
pdftk /mnt/file.pdf cat 2-5 output /shared/pages2to5.pdf

# Extract a single page
pdftk /mnt/file.pdf cat 3 output /shared/page3.pdf

# Extract from page 4 to the end
pdftk /mnt/file.pdf cat 4-end output /shared/from4.pdf

Split — Burst Into Individual Pages

Split every page into its own file. Use %02d or %03d for zero-padded numbering.

pdftk /mnt/file.pdf burst output /shared/page_%02d.pdf

Creates /shared/page_01.pdf, /shared/page_02.pdf, etc.

Rotate Pages

Rotate pages by 90°, 180°, or 270°. Directions: right (90° CW), left (90° CCW), down (180°).

# Rotate all pages 90° clockwise
pdftk /mnt/file.pdf rotate 1-end right output /shared/rotated.pdf

# Rotate only page 3
pdftk /mnt/file.pdf rotate 3 right output /shared/rotated.pdf

# Rotate pages 1–5 upside-down
pdftk /mnt/file.pdf rotate 1-5 down output /shared/flipped.pdf

CREATE-PDF

Generate a new PDF document from scratch using pdf-lib. Produces a formatted US Letter (612×792pt) document with headers, footers, and wrapped text.

Synopsis

create-pdf [title] [--output=/shared/out.pdf]

If called with no arguments, creates a sample document. The title argument sets both the document title and the filename (slugified).

Output Format

Programmatic Use

For custom PDF generation, use pdf-lib directly in a .jsh script:

var lib = await import('https://esm.sh/pdf-lib');
var doc = await lib.PDFDocument.create();
var page = doc.addPage([612, 792]);
var font = await doc.embedFont(lib.StandardFonts.Helvetica);
page.drawText('Hello, world!', { x: 72, y: 700, size: 18, font: font });
var bytes = await doc.save();
await writeFile('/shared/hello.pdf', bytes);

READING PDFs

Agents read PDF content by extracting text, then processing the result. The standard workflow:

# 1. Extract text from the PDF
pdftk /mnt/upload.pdf dump_data_utf8

# 2. Get page count and metadata
pdftk /mnt/upload.pdf dump_data

For visual inspection or OCR-like tasks, convert a page to an image and view it:

# Convert page 1 to PNG (0-indexed for convert)
convert /mnt/file.pdf[0] /shared/page1.png
open /shared/page1.png --view

Note: convert requires a tray runtime and is not available in browser-only mode. OCR and PDF form filling are not currently available.

EXAMPLES

Merge Monthly Reports

# Combine Q1 reports into a single document
pdftk A=/mnt/jan.pdf B=/mnt/feb.pdf C=/mnt/mar.pdf cat A B C output /shared/q1-report.pdf
open /shared/q1-report.pdf --download

Extract Text for Analysis

# Extract all text, then search for keywords
pdftk /mnt/contract.pdf dump_data_utf8 | grep -i "termination"

Split a Large Document

# Extract the executive summary (pages 1–3)
pdftk /mnt/whitepaper.pdf cat 1-3 output /shared/executive-summary.pdf

# Burst into individual pages for review
pdftk /mnt/whitepaper.pdf burst output /shared/wp_page_%02d.pdf

Fix Landscape Pages

# Rotate pages 4–6 which were scanned sideways
pdftk /mnt/scan.pdf rotate 4-6 left output /shared/fixed-scan.pdf

Create a Report and Deliver It

# Generate a titled PDF
create-pdf "Q2 Sales Summary" --output=/shared/q2-sales-summary.pdf
open /shared/q2-sales-summary.pdf --download

Combine and Reorder

# Take cover from one file, body from another, appendix from a third
pdftk A=/mnt/cover.pdf B=/mnt/body.pdf C=/mnt/appendix.pdf \
  cat A1 B C output /shared/final-document.pdf

NOTES

SEE ALSO