Telegram: capture document/file uploads #24

Closed
opened 2026-07-05 07:48:49 +08:00 by weiwen · 0 comments
Owner

What to build

Let evie ingest Telegram document uploads so pi can read and act on files (summarize a PDF, ingest a markdown note, etc.). Today only text and photos are handled; documents are silently ignored.

When the user sends a document (with optional caption):

  • Pre-check file_size; if it exceeds the Bot API getFile hard cap (20 MB), reply with a friendly "file too large" message and skip the download (respecting expose_errors semantics for the wording).
  • Download to a temp directory under the document's original filename+extension (e.g. /tmp/<rand>/report.pdf), sanitizing the filename against path traversal. The temp guard is held alive for the whole pi turn (so pi can copy the file into the vault if asked), mirroring how photos work today.
  • If the document's mime type is image/* (an image sent uncompressed as a file), route it through the existing vision/ImageContent base64 path instead of the file-path path.
  • Otherwise inject [Document file: <path>] into the message; the caption (if present) is the user's instruction. pi's read tool / a configured pi extension does all parsing — evie pulls in no parsing crates and stays a dumb pass-through.

This slice generalizes the temp guard from a single NamedTempFile to also cover a temp-dir-with-original-name guard. No protocol change, no new config.

Acceptance criteria

  • Sending a text-family document (md/txt/csv/json/code) makes pi able to read it via a [Document file: <path>] marker at a path bearing the original filename+extension
  • The caption, when present, is passed as the accompanying instruction
  • A document larger than 20 MB is rejected with a friendly message and no download attempt
  • An image/* document is handled via the vision path, identically to a photo
  • Filenames are sanitized so a malicious file_name cannot escape the temp directory
  • The temp file/dir is retained for the full pi turn and cleaned up afterward
  • Existing text and photo handling is unaffected
  • Unit test(s) cover the document branch, the image-doc routing, the size cap, and filename sanitization

Blocked by

None - can start immediately

## What to build Let evie ingest Telegram document uploads so pi can read and act on files (summarize a PDF, ingest a markdown note, etc.). Today only text and photos are handled; documents are silently ignored. When the user sends a document (with optional caption): - Pre-check `file_size`; if it exceeds the Bot API `getFile` hard cap (20 MB), reply with a friendly "file too large" message and skip the download (respecting `expose_errors` semantics for the wording). - Download to a temp **directory** under the document's **original filename+extension** (e.g. `/tmp/<rand>/report.pdf`), sanitizing the filename against path traversal. The temp guard is held alive for the whole pi turn (so pi can copy the file into the vault if asked), mirroring how photos work today. - If the document's mime type is `image/*` (an image sent uncompressed as a file), route it through the existing vision/`ImageContent` base64 path instead of the file-path path. - Otherwise inject `[Document file: <path>]` into the message; the caption (if present) is the user's instruction. pi's `read` tool / a configured pi extension does all parsing — **evie pulls in no parsing crates** and stays a dumb pass-through. This slice generalizes the temp guard from a single `NamedTempFile` to also cover a temp-dir-with-original-name guard. No protocol change, no new config. ## Acceptance criteria - [ ] Sending a text-family document (md/txt/csv/json/code) makes pi able to read it via a `[Document file: <path>]` marker at a path bearing the original filename+extension - [ ] The caption, when present, is passed as the accompanying instruction - [ ] A document larger than 20 MB is rejected with a friendly message and no download attempt - [ ] An `image/*` document is handled via the vision path, identically to a photo - [ ] Filenames are sanitized so a malicious `file_name` cannot escape the temp directory - [ ] The temp file/dir is retained for the full pi turn and cleaned up afterward - [ ] Existing text and photo handling is unaffected - [ ] Unit test(s) cover the document branch, the image-doc routing, the size cap, and filename sanitization ## Blocked by None - can start immediately
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
weiwen/evie#24
No description provided.