Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.docex.dev/llms.txt

Use this file to discover all available pages before exploring further.

Docex

Agentic vision and OCR infrastructure. Route any image or PDF through the optimal AI pipeline — with automatic fallback, cost optimization, and structured output.
Docex accepts a file (image or PDF) plus a user prompt, classifies the task, selects an execution strategy, runs it through one or more AI providers, and returns structured JSON with confidence scores and usage-based billing.

What makes Docex different

  • Task-agnostic pipeline — Not limited to “documents.” Analyze any image or PDF for any purpose: security scanning, data extraction, content moderation, visual QA.
  • Provider abstraction — Automatic routing across Anthropic, OpenRouter, Mistral OCR, and more with intelligent fallback if a provider fails.
  • Dynamic billing — Pay a transparent markup on the actual upstream AI cost. No flat rates, no hidden fees.
  • Agent-first onboarding — The entire setup flow (docex setup) is designed for AI agents, with human approval only at the payment boundary.

How it works

1

Upload

Your app requests a presigned upload URL and uploads the file directly to storage.
2

Run

You call POST /v1/runs with the upload ID, a prompt, and optional schema or workflow hints.
3

Route

The worker classifies the task, selects the optimal provider strategy, and estimates cost.
4

Execute

The executor runs the strategy. If a provider fails, it falls back to the next candidate automatically.
5

Receive

You poll or receive a structured run envelope with result, confidence, usage, and chargedUsd.

Quickstart

Run docex setup, approve in the browser, and make your first API call in minutes.

API Reference

Full reference for uploads, runs, polling, and error handling.

SDK & CLI

Install docexdev, configure your workspace, and run vision analysis from code or terminal.

Integrations

Prompts and scaffolding for Codex, Claude Code, Cursor, and OpenCode agents.