Stream Telegram responses live as pi generates them #1

Closed
opened 2026-07-04 16:37:52 +08:00 by weiwen · 1 comment
Owner

Problem Statement

When I ask Evie a question through Telegram, I stare at a "typing…" indicator and dead air for the entire time pi is working, then the whole answer lands at once as a wall of text. For anything longer than a trivial lookup this feels unresponsive — I can't tell whether the bot is making progress, and I can't start reading the answer until it's completely finished. The pi process is already producing the answer incrementally, but none of that progress reaches me.

Solution

Evie streams the response into Telegram as pi produces it. Instead of buffering the entire answer and sending it at the end, Evie sends one message immediately and edits it in place as text arrives, so I watch the answer build up live — the same experience as a chat app typing a reply. Long answers roll across multiple messages seamlessly. The final result is properly formatted MarkdownV2, exactly as today.

User Stories

  1. As a Telegram user, I want to see the response begin appearing within a second or two of sending my question, so that I know the bot is working and not stuck.
  2. As a Telegram user, I want the answer to grow in place as pi generates it, so that I can start reading before it's complete.
  3. As a Telegram user, I want the streamed message to update at a comfortable cadence (not flickering many times per second), so that it's readable while it streams.
  4. As a Telegram user, I want the final message to be formatted with proper MarkdownV2 (bold, code blocks, links), so that the finished answer looks exactly as polished as it does today.
  5. As a Telegram user, I want long answers that exceed Telegram's message-size limit to continue seamlessly into a new message, so that I never lose the tail of a response.
  6. As a Telegram user, I want each message in a long answer to be sealed at a natural paragraph boundary, so that continuations don't split mid-sentence.
  7. As a Telegram user, I want the streaming to never leave a message in a broken half-formatted state, so that partial markup never shows up as raw symbols in the finished answer.
  8. As a Telegram user, I want the periodic "typing…" indicator to go away once real streamed text is appearing, so that I'm not shown a redundant activity hint.
  9. As a Telegram user sending a message that produces a very short answer, I want it to still work correctly (a single message, formatted), so that streaming doesn't regress simple lookups.
  10. As a Telegram user, I want a rendering failure on the final formatted result to fall back gracefully, so that I still see something rather than a silent drop.
  11. As a Telegram user with expose_errors on, I want to see the actual conversion error if final formatting fails, so that I can diagnose formatting bugs.
  12. As a Telegram user with expose_errors off, I want a friendly "something went wrong" message if final formatting fails, and I want my session preserved (not reset), so that a cosmetic rendering bug doesn't cost me my conversation context.
  13. As a Telegram user, I want /clear and error paths to behave exactly as before, so that streaming only changes the happy-path rendering.
  14. As an HTTP API consumer, I want my interface to keep returning the complete raw markdown as it does today, so that streaming is a Telegram-only concern and doesn't change the API contract.
  15. As a developer, I want the pi RPC parser to expose the incremental message_update deltas it currently discards, so that streaming has a source of live text.
  16. As a developer, I want the streaming pagination/rendering logic to be a pure, unit-testable unit, so that rolling-seal and formatting behavior can be verified without a live Telegram connection or a running pi.
  17. As a developer, I want the throttling and Telegram edit I/O kept as thin glue over the tested seams, so that the hard-to-test surface stays minimal.
  18. As a Telegram user, I want redundant edits suppressed when the rendered text hasn't changed, so that Telegram doesn't reject "message is not modified" and waste API calls.

Implementation Decisions

RPC delta parser (pi.rs). The RPC parser currently folds the entire JSONL event stream into a single final String, reading only agent_end and discarding every message_update. It will be extended to emit an ordered sequence of parsed events — a text Delta per message_update, and a Final carrying the complete text from agent_end. It remains a pure function over the line stream so existing JSONL-fixture tests continue to drive it. This is the fix for "the bot does not actually stream today."

Transport change (pi.rssession.rs → Telegram handler). PiProcess::send_message and SessionManager::send_message stop returning Result<String> and instead surface deltas to the caller (via a channel or sink) plus a terminal result. This is plumbing, exercised through the two pure seams rather than being a new test surface. The HTTP interface continues to consume the terminal complete text and returns raw markdown unchanged — streaming is Telegram-only.

Streaming paginator (telegram/markdown.rs). The render-once send_in_chunks is replaced by a stateful paginator. Fed the growing accumulated raw-markdown buffer, it decides: (a) the rendered MarkdownV2 text for the current message, (b) when to seal the current message at a paragraph boundary and open a new one, and (c) whether the render is unchanged since the last emit (skip). It is pure and synchronous.

MarkdownV2 rendering via telegram-markdown-v2 crate. Partial-buffer rendering uses the telegram-markdown-v2 crate's convert_with_strategy, called on the accumulated buffer per emit. Because it is parser-based, any truncated prefix of the stream renders to valid MarkdownV2 with no dangling delimiters — this replaces the hand-rolled escape_markdown_v2 escaper. UnsupportedTagsStrategy::Escape is used so tables and raw HTML render as literal text and content is never silently dropped (a notes bot must not lose information).

Rolling seal on overflow. When the rendered length of the current message approaches Telegram's 4096-char cap, the paginator finalizes it at the last paragraph boundary, a new placeholder message is opened, and streaming continues into it — repeating for arbitrarily long answers. The cap is measured against convert() output length, not the raw markdown length, since escaping changes the count.

Throttle and edit loop (telegram/mod.rs). The Telegram handler sends one placeholder message on the first delta, then issues editMessageText at most once per ~1.5s, driven by the paginator's decisions. Edits whose rendered output is byte-identical to the previous emit are skipped (avoids Telegram's "message is not modified" 400). The existing periodic typing-indicator task is removed from the streamed path — the live edits are the activity signal.

Error handling on the terminal render. convert() is trusted on intermediate edits (a failed intermediate edit costs at most one stale frame). On the final buffer, if convert() errors: with expose_errors on, the conversion error is shown to the user; with expose_errors off, a generic "Something went wrong" message is shown. In neither case is the session reset — pi succeeded and only rendering failed, so context is preserved. This diverges from the existing pi/session error paths, which do reset.

Trust assumption to verify. Whether convert() is total over truncated mid-stream input (never errors on an arbitrary prefix) is undocumented. Intermediate edits proceed without a net; the terminal-render fallback above is the guard. A quick test feeding truncated prefixes of a sample answer should confirm totality before the streaming loop depends on it.

Testing Decisions

Good tests here assert external behavior — the sequence of parsed events, and the sequence of rendering/pagination decisions — not internal structure. They avoid a live Telegram connection or a running pi; both seams are pure functions fed literal inputs.

RPC delta parser (pi.rs). Extend the existing read_agent_response test suite (12 tests today feeding JSONL string fixtures) to assert the ordered event sequence: multiple message_update lines yield the expected Deltas in order, agent_end yields the Final, interleaved/other/invalid events are handled as before, and a stream that ends without agent_end still errors. Prior art: the existing test_read_agent_response_* tests.

Streaming paginator (telegram/markdown.rs). Test by feeding a sequence of accumulated-buffer snapshots and asserting the decision sequence: a short answer produces one message with correctly-rendered MarkdownV2; a buffer crossing the 4096 rendered-length cap seals the current message at a paragraph boundary and opens a new one; unchanged buffers between emits produce a skip; tables/raw HTML render as escaped literal text. Prior art: the existing split_at_paragraph and escape_markdown_v2 unit tests.

Totality check. A test that feeds every truncated prefix of a representative markdown answer to convert() and asserts it never errors — de-risking the trust-no-net decision on intermediate edits.

The throttle timer and the sendMessage/editMessageText I/O in the Telegram handler are thin glue over these seams and are validated by manual/integration exercise, not unit tests.

Out of Scope

  • Richer input (voice/photo/multimodal). Deferred — no multimodal path exists in the pi RPC (Prompt carries only text) and it forces a dependency/values decision about Evie's local-only character. Parked in ~/notes/evie.md.
  • Interaction affordances (setMyCommands, /help, inline keyboards / /clear confirmation). Deferred and parked.
  • Proactive push (scheduled digests/reminders). Deferred — fights the pull-only, request-response model.
  • HTTP API streaming. The HTTP interface keeps its current behavior (complete raw markdown, no streaming, no splitting).
  • Streaming of pi thinking or tool-call events. Only assistant text deltas are streamed to the user; thinking/tool events remain logged as today.
  • Configurability of the throttle interval or 4096 cap. Fixed constants, matching how the existing cleanup interval is fixed.

Further Notes

  • This feature is the highest-leverage item from a grilling session on expanding Evie's use of the Telegram Bot API; the other candidates were deferred to a TODO list in ~/notes/evie.md.
  • Reflow/flicker is inherent to parser-based partial rendering: text that renders as a literal ** at one emit will "snap" into bold once its closing delimiter streams in. This is accepted as the cost of live formatting.
  • The telegram-markdown-v2 crate is a new dependency; adopting it also removes the bespoke escape_markdown_v2 implementation, so net module complexity in markdown.rs may decrease.
  • The domain term "Message Accumulator" (CONTEXT.md) already describes the per-session buffer that collects message_update text — this feature finally uses it for its stated streaming purpose rather than only for end-of-response chunking.
## Problem Statement When I ask Evie a question through Telegram, I stare at a "typing…" indicator and dead air for the entire time `pi` is working, then the whole answer lands at once as a wall of text. For anything longer than a trivial lookup this feels unresponsive — I can't tell whether the bot is making progress, and I can't start reading the answer until it's completely finished. The `pi` process is already producing the answer incrementally, but none of that progress reaches me. ## Solution Evie streams the response into Telegram as `pi` produces it. Instead of buffering the entire answer and sending it at the end, Evie sends one message immediately and edits it in place as text arrives, so I watch the answer build up live — the same experience as a chat app typing a reply. Long answers roll across multiple messages seamlessly. The final result is properly formatted MarkdownV2, exactly as today. ## User Stories 1. As a Telegram user, I want to see the response begin appearing within a second or two of sending my question, so that I know the bot is working and not stuck. 2. As a Telegram user, I want the answer to grow in place as `pi` generates it, so that I can start reading before it's complete. 3. As a Telegram user, I want the streamed message to update at a comfortable cadence (not flickering many times per second), so that it's readable while it streams. 4. As a Telegram user, I want the final message to be formatted with proper MarkdownV2 (bold, code blocks, links), so that the finished answer looks exactly as polished as it does today. 5. As a Telegram user, I want long answers that exceed Telegram's message-size limit to continue seamlessly into a new message, so that I never lose the tail of a response. 6. As a Telegram user, I want each message in a long answer to be sealed at a natural paragraph boundary, so that continuations don't split mid-sentence. 7. As a Telegram user, I want the streaming to never leave a message in a broken half-formatted state, so that partial markup never shows up as raw symbols in the finished answer. 8. As a Telegram user, I want the periodic "typing…" indicator to go away once real streamed text is appearing, so that I'm not shown a redundant activity hint. 9. As a Telegram user sending a message that produces a very short answer, I want it to still work correctly (a single message, formatted), so that streaming doesn't regress simple lookups. 10. As a Telegram user, I want a rendering failure on the final formatted result to fall back gracefully, so that I still see something rather than a silent drop. 11. As a Telegram user with `expose_errors` on, I want to see the actual conversion error if final formatting fails, so that I can diagnose formatting bugs. 12. As a Telegram user with `expose_errors` off, I want a friendly "something went wrong" message if final formatting fails, and I want my session preserved (not reset), so that a cosmetic rendering bug doesn't cost me my conversation context. 13. As a Telegram user, I want `/clear` and error paths to behave exactly as before, so that streaming only changes the happy-path rendering. 14. As an HTTP API consumer, I want my interface to keep returning the complete raw markdown as it does today, so that streaming is a Telegram-only concern and doesn't change the API contract. 15. As a developer, I want the `pi` RPC parser to expose the incremental `message_update` deltas it currently discards, so that streaming has a source of live text. 16. As a developer, I want the streaming pagination/rendering logic to be a pure, unit-testable unit, so that rolling-seal and formatting behavior can be verified without a live Telegram connection or a running `pi`. 17. As a developer, I want the throttling and Telegram edit I/O kept as thin glue over the tested seams, so that the hard-to-test surface stays minimal. 18. As a Telegram user, I want redundant edits suppressed when the rendered text hasn't changed, so that Telegram doesn't reject "message is not modified" and waste API calls. ## Implementation Decisions **RPC delta parser (`pi.rs`).** The RPC parser currently folds the entire JSONL event stream into a single final `String`, reading only `agent_end` and discarding every `message_update`. It will be extended to emit an ordered sequence of parsed events — a text `Delta` per `message_update`, and a `Final` carrying the complete text from `agent_end`. It remains a pure function over the line stream so existing JSONL-fixture tests continue to drive it. This is the fix for "the bot does not actually stream today." **Transport change (`pi.rs` → `session.rs` → Telegram handler).** `PiProcess::send_message` and `SessionManager::send_message` stop returning `Result<String>` and instead surface deltas to the caller (via a channel or sink) plus a terminal result. This is plumbing, exercised through the two pure seams rather than being a new test surface. The HTTP interface continues to consume the terminal complete text and returns raw markdown unchanged — streaming is Telegram-only. **Streaming paginator (`telegram/markdown.rs`).** The render-once `send_in_chunks` is replaced by a stateful paginator. Fed the growing accumulated raw-markdown buffer, it decides: (a) the rendered MarkdownV2 text for the current message, (b) when to seal the current message at a paragraph boundary and open a new one, and (c) whether the render is unchanged since the last emit (skip). It is pure and synchronous. **MarkdownV2 rendering via `telegram-markdown-v2` crate.** Partial-buffer rendering uses the `telegram-markdown-v2` crate's `convert_with_strategy`, called on the accumulated buffer per emit. Because it is parser-based, any truncated prefix of the stream renders to valid MarkdownV2 with no dangling delimiters — this replaces the hand-rolled `escape_markdown_v2` escaper. `UnsupportedTagsStrategy::Escape` is used so tables and raw HTML render as literal text and content is never silently dropped (a notes bot must not lose information). **Rolling seal on overflow.** When the *rendered* length of the current message approaches Telegram's 4096-char cap, the paginator finalizes it at the last paragraph boundary, a new placeholder message is opened, and streaming continues into it — repeating for arbitrarily long answers. The cap is measured against `convert()` output length, not the raw markdown length, since escaping changes the count. **Throttle and edit loop (`telegram/mod.rs`).** The Telegram handler sends one placeholder message on the first delta, then issues `editMessageText` at most once per ~1.5s, driven by the paginator's decisions. Edits whose rendered output is byte-identical to the previous emit are skipped (avoids Telegram's "message is not modified" 400). The existing periodic typing-indicator task is removed from the streamed path — the live edits are the activity signal. **Error handling on the terminal render.** `convert()` is trusted on intermediate edits (a failed intermediate edit costs at most one stale frame). On the final buffer, if `convert()` errors: with `expose_errors` on, the conversion error is shown to the user; with `expose_errors` off, a generic "Something went wrong" message is shown. In neither case is the session reset — `pi` succeeded and only rendering failed, so context is preserved. This diverges from the existing pi/session error paths, which do reset. **Trust assumption to verify.** Whether `convert()` is *total* over truncated mid-stream input (never errors on an arbitrary prefix) is undocumented. Intermediate edits proceed without a net; the terminal-render fallback above is the guard. A quick test feeding truncated prefixes of a sample answer should confirm totality before the streaming loop depends on it. ## Testing Decisions Good tests here assert external behavior — the sequence of parsed events, and the sequence of rendering/pagination decisions — not internal structure. They avoid a live Telegram connection or a running `pi`; both seams are pure functions fed literal inputs. **RPC delta parser (`pi.rs`).** Extend the existing `read_agent_response` test suite (12 tests today feeding JSONL string fixtures) to assert the ordered event sequence: multiple `message_update` lines yield the expected `Delta`s in order, `agent_end` yields the `Final`, interleaved/other/invalid events are handled as before, and a stream that ends without `agent_end` still errors. Prior art: the existing `test_read_agent_response_*` tests. **Streaming paginator (`telegram/markdown.rs`).** Test by feeding a sequence of accumulated-buffer snapshots and asserting the decision sequence: a short answer produces one message with correctly-rendered MarkdownV2; a buffer crossing the 4096 rendered-length cap seals the current message at a paragraph boundary and opens a new one; unchanged buffers between emits produce a skip; tables/raw HTML render as escaped literal text. Prior art: the existing `split_at_paragraph` and `escape_markdown_v2` unit tests. **Totality check.** A test that feeds every truncated prefix of a representative markdown answer to `convert()` and asserts it never errors — de-risking the trust-no-net decision on intermediate edits. The throttle timer and the `sendMessage`/`editMessageText` I/O in the Telegram handler are thin glue over these seams and are validated by manual/integration exercise, not unit tests. ## Out of Scope - **Richer input (voice/photo/multimodal).** Deferred — no multimodal path exists in the `pi` RPC (`Prompt` carries only text) and it forces a dependency/values decision about Evie's local-only character. Parked in `~/notes/evie.md`. - **Interaction affordances** (`setMyCommands`, `/help`, inline keyboards / `/clear` confirmation). Deferred and parked. - **Proactive push** (scheduled digests/reminders). Deferred — fights the pull-only, request-response model. - **HTTP API streaming.** The HTTP interface keeps its current behavior (complete raw markdown, no streaming, no splitting). - **Streaming of `pi` thinking or tool-call events.** Only assistant text deltas are streamed to the user; thinking/tool events remain logged as today. - **Configurability of the throttle interval or 4096 cap.** Fixed constants, matching how the existing cleanup interval is fixed. ## Further Notes - This feature is the highest-leverage item from a grilling session on expanding Evie's use of the Telegram Bot API; the other candidates were deferred to a TODO list in `~/notes/evie.md`. - Reflow/flicker is inherent to parser-based partial rendering: text that renders as a literal `**` at one emit will "snap" into bold once its closing delimiter streams in. This is accepted as the cost of live formatting. - The `telegram-markdown-v2` crate is a new dependency; adopting it also removes the bespoke `escape_markdown_v2` implementation, so net module complexity in `markdown.rs` may decrease. - The domain term "Message Accumulator" (CONTEXT.md) already describes the per-session buffer that collects `message_update` text — this feature finally uses it for its stated streaming purpose rather than only for end-of-response chunking.
Author
Owner

Implemented — commit 0d6b15a9

Streaming is built, tested, and reviewed. Not yet merged (the main bookmark hasn't been advanced), so this is ready for human review/merge.

What shipped

  • pi.rs parses message_update and forwards each new snapshot over a channel, still returning the agent_end text as the authoritative final response.
  • telegram/markdown.rsrender (via the telegram-markdown-v2 crate, Escape strategy) + paginate (prefix-stable split into ≤4096 rendered UTF-16-unit pages, paragraph-boundary seal with hard-split fallback). Replaces the hand-rolled MarkdownV2 escaper.
  • telegram/mod.rs — immediate placeholder, throttled editMessageText (~1/1.5s) with no-op-edit suppression, rolling seal into new messages on overflow, best-effort intermediate + final edits, expose_errors-aware final error handling that never resets the session on a render failure.
  • HTTP interface unchanged (discards snapshots, returns complete raw markdown).

Deviations from the PRD (decisions made during implementation)

  1. message_update is a cumulative snapshot, not a delta. Verified against a live pi run: each update's text is a superset of the previous, growing 0 → full. So the buffer is replaced with the latest snapshot, not appended (appending would duplicate text). The PRD/US15 and CONTEXT.md said "delta"; CONTEXT.md has been corrected.
  2. Typing indicator kept, not removed. US8 was read as "remove the periodic typing indicator." Per follow-up feedback, the indicator now runs for the whole turn (thinking + streaming) and stops when the final message is delivered — alongside the immediate placeholder. This supersedes US8's "remove" wording.

De-risked

  • The PRD-flagged unknown — whether convert() is total over truncated input — is confirmed: tests/convert_totality.rs feeds every truncated prefix of a realistic answer (mid-code-fence, mid-link, mid-table) and none error. Intermediate edits run without a per-edit net on that basis.

Tests (24 passing)

  • Parser: ordered/deduped snapshot sequence + final; user-update filtering (pi.rs).
  • Paginator: short answer, overflow seal, escaping, prefix-stability, hard-split (markdown.rs).
  • Totality: per-prefix convert() (tests/convert_totality.rs).

Review

/code-review run: Standards clean (no violations introduced; naming conforms to the CONTEXT.md _Avoid_ lists), Spec faithful. Both Spec findings fixed — intermediate edit errors are now best-effort (no longer orphan the turn), and error notices land on the most recent message.

Notes for merge

  • Advance the branch with jj bookmark set main -r <feature-commit>.
  • Pre-existing, out of scope: code uses tg-/api- chat-ID prefixes while CONTEXT.md/DESIGN.md document tg:/api: (colons).
## Implemented — commit `0d6b15a9` Streaming is built, tested, and reviewed. Not yet merged (the `main` bookmark hasn't been advanced), so this is ready for human review/merge. ### What shipped - **`pi.rs`** parses `message_update` and forwards each new snapshot over a channel, still returning the `agent_end` text as the authoritative final response. - **`telegram/markdown.rs`** — `render` (via the `telegram-markdown-v2` crate, `Escape` strategy) + `paginate` (prefix-stable split into ≤4096 **rendered** UTF-16-unit pages, paragraph-boundary seal with hard-split fallback). Replaces the hand-rolled MarkdownV2 escaper. - **`telegram/mod.rs`** — immediate `…` placeholder, throttled `editMessageText` (~1/1.5s) with no-op-edit suppression, rolling seal into new messages on overflow, best-effort intermediate + final edits, `expose_errors`-aware final error handling that never resets the session on a render failure. - **HTTP interface unchanged** (discards snapshots, returns complete raw markdown). ### Deviations from the PRD (decisions made during implementation) 1. **`message_update` is a cumulative snapshot, not a delta.** Verified against a live `pi` run: each update's text is a superset of the previous, growing `0 → full`. So the buffer is **replaced** with the latest snapshot, not appended (appending would duplicate text). The PRD/US15 and CONTEXT.md said "delta"; CONTEXT.md has been corrected. 2. **Typing indicator kept, not removed.** US8 was read as "remove the periodic typing indicator." Per follow-up feedback, the indicator now runs for the **whole turn** (thinking + streaming) and stops when the final message is delivered — alongside the immediate placeholder. This supersedes US8's "remove" wording. ### De-risked - The PRD-flagged unknown — whether `convert()` is total over truncated input — is **confirmed**: `tests/convert_totality.rs` feeds every truncated prefix of a realistic answer (mid-code-fence, mid-link, mid-table) and none error. Intermediate edits run without a per-edit net on that basis. ### Tests (24 passing) - Parser: ordered/deduped snapshot sequence + final; user-update filtering (`pi.rs`). - Paginator: short answer, overflow seal, escaping, prefix-stability, hard-split (`markdown.rs`). - Totality: per-prefix `convert()` (`tests/convert_totality.rs`). ### Review `/code-review` run: **Standards** clean (no violations introduced; naming conforms to the CONTEXT.md `_Avoid_` lists), **Spec** faithful. Both Spec findings fixed — intermediate edit errors are now best-effort (no longer orphan the turn), and error notices land on the most recent message. ### Notes for merge - Advance the branch with `jj bookmark set main -r <feature-commit>`. - Pre-existing, out of scope: code uses `tg-`/`api-` chat-ID prefixes while CONTEXT.md/DESIGN.md document `tg:`/`api:` (colons).
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
weiwen/evie#1
No description provided.