html-to-markdown

NAME

html-to-markdown — convert HTML to Markdown

SYNOPSIS

html-to-markdown [OPTION]... [FILE]
command | html-to-markdown

DESCRIPTION

Converts HTML content to Markdown format using the turndown library. Reads from FILE or standard input and writes Markdown to standard output. Commonly piped from curl to convert web pages into readable text for the agent.

OPTIONS

-b, --bullet=CHAR       Bullet character for unordered lists (-, +, or *)
-c, --code=FENCE        Fence style for code blocks (``` or ~~~)
-r, --hr=STRING         String for horizontal rules (default: ---)
    --heading-style=STYLE
                        'atx' for # headings (default), 'setext' for underlined
    --help              Display help and exit

SUPPORTED ELEMENTS

Headings (h1-h6)    → # Markdown headings
Paragraphs (p)      → Plain text with blank lines
Links (a)           → [text](url)
Images (img)        → ![alt](src)
Bold/Strong         → **text**
Italic/Em           → _text_
Code (code, pre)    → `inline` or fenced blocks
Lists (ul, ol, li)  → - or 1. items
Blockquotes         → > quoted text
Horizontal rules    → ---
Tables              → | col | col | pipe tables

USAGE PATTERNS

# Convert a URL to readable markdown
curl -s https://example.com | html-to-markdown

# Convert a local file
html-to-markdown page.html

# Pipe from any command
cat file.html | html-to-markdown

EXAMPLES

# Scrape a docs page and save as markdown
curl -s https://docs.example.com/api | html-to-markdown > api-docs.md

# Quick inline conversion
echo '<h1>Hello</h1><p>World</p>' | html-to-markdown

# Fetch and grep for specific content
curl -s https://example.com/changelog | html-to-markdown | grep -A5 "v2.0"

# Use with custom bullet style
curl -s https://example.com | html-to-markdown --bullet="*"

NOTES

Use html-to-markdown when you need text content from a web page without full browser automation. It strips navigation, scripts, and styling, leaving readable prose. For pages that require JavaScript rendering or interaction, use playwright-cli snapshot instead.

SEE ALSO

curl, playwright-cli snapshot