Stream Telegram responses live as pi generates them #1
Labels
No labels
in-review
ready-for-agent
ready-for-human
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
weiwen/evie#1
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem Statement
When I ask Evie a question through Telegram, I stare at a "typing…" indicator and dead air for the entire time
piis working, then the whole answer lands at once as a wall of text. For anything longer than a trivial lookup this feels unresponsive — I can't tell whether the bot is making progress, and I can't start reading the answer until it's completely finished. Thepiprocess is already producing the answer incrementally, but none of that progress reaches me.Solution
Evie streams the response into Telegram as
piproduces it. Instead of buffering the entire answer and sending it at the end, Evie sends one message immediately and edits it in place as text arrives, so I watch the answer build up live — the same experience as a chat app typing a reply. Long answers roll across multiple messages seamlessly. The final result is properly formatted MarkdownV2, exactly as today.User Stories
pigenerates it, so that I can start reading before it's complete.expose_errorson, I want to see the actual conversion error if final formatting fails, so that I can diagnose formatting bugs.expose_errorsoff, I want a friendly "something went wrong" message if final formatting fails, and I want my session preserved (not reset), so that a cosmetic rendering bug doesn't cost me my conversation context./clearand error paths to behave exactly as before, so that streaming only changes the happy-path rendering.piRPC parser to expose the incrementalmessage_updatedeltas it currently discards, so that streaming has a source of live text.pi.Implementation Decisions
RPC delta parser (
pi.rs). The RPC parser currently folds the entire JSONL event stream into a single finalString, reading onlyagent_endand discarding everymessage_update. It will be extended to emit an ordered sequence of parsed events — a textDeltapermessage_update, and aFinalcarrying the complete text fromagent_end. It remains a pure function over the line stream so existing JSONL-fixture tests continue to drive it. This is the fix for "the bot does not actually stream today."Transport change (
pi.rs→session.rs→ Telegram handler).PiProcess::send_messageandSessionManager::send_messagestop returningResult<String>and instead surface deltas to the caller (via a channel or sink) plus a terminal result. This is plumbing, exercised through the two pure seams rather than being a new test surface. The HTTP interface continues to consume the terminal complete text and returns raw markdown unchanged — streaming is Telegram-only.Streaming paginator (
telegram/markdown.rs). The render-oncesend_in_chunksis replaced by a stateful paginator. Fed the growing accumulated raw-markdown buffer, it decides: (a) the rendered MarkdownV2 text for the current message, (b) when to seal the current message at a paragraph boundary and open a new one, and (c) whether the render is unchanged since the last emit (skip). It is pure and synchronous.MarkdownV2 rendering via
telegram-markdown-v2crate. Partial-buffer rendering uses thetelegram-markdown-v2crate'sconvert_with_strategy, called on the accumulated buffer per emit. Because it is parser-based, any truncated prefix of the stream renders to valid MarkdownV2 with no dangling delimiters — this replaces the hand-rolledescape_markdown_v2escaper.UnsupportedTagsStrategy::Escapeis used so tables and raw HTML render as literal text and content is never silently dropped (a notes bot must not lose information).Rolling seal on overflow. When the rendered length of the current message approaches Telegram's 4096-char cap, the paginator finalizes it at the last paragraph boundary, a new placeholder message is opened, and streaming continues into it — repeating for arbitrarily long answers. The cap is measured against
convert()output length, not the raw markdown length, since escaping changes the count.Throttle and edit loop (
telegram/mod.rs). The Telegram handler sends one placeholder message on the first delta, then issueseditMessageTextat most once per ~1.5s, driven by the paginator's decisions. Edits whose rendered output is byte-identical to the previous emit are skipped (avoids Telegram's "message is not modified" 400). The existing periodic typing-indicator task is removed from the streamed path — the live edits are the activity signal.Error handling on the terminal render.
convert()is trusted on intermediate edits (a failed intermediate edit costs at most one stale frame). On the final buffer, ifconvert()errors: withexpose_errorson, the conversion error is shown to the user; withexpose_errorsoff, a generic "Something went wrong" message is shown. In neither case is the session reset —pisucceeded and only rendering failed, so context is preserved. This diverges from the existing pi/session error paths, which do reset.Trust assumption to verify. Whether
convert()is total over truncated mid-stream input (never errors on an arbitrary prefix) is undocumented. Intermediate edits proceed without a net; the terminal-render fallback above is the guard. A quick test feeding truncated prefixes of a sample answer should confirm totality before the streaming loop depends on it.Testing Decisions
Good tests here assert external behavior — the sequence of parsed events, and the sequence of rendering/pagination decisions — not internal structure. They avoid a live Telegram connection or a running
pi; both seams are pure functions fed literal inputs.RPC delta parser (
pi.rs). Extend the existingread_agent_responsetest suite (12 tests today feeding JSONL string fixtures) to assert the ordered event sequence: multiplemessage_updatelines yield the expectedDeltas in order,agent_endyields theFinal, interleaved/other/invalid events are handled as before, and a stream that ends withoutagent_endstill errors. Prior art: the existingtest_read_agent_response_*tests.Streaming paginator (
telegram/markdown.rs). Test by feeding a sequence of accumulated-buffer snapshots and asserting the decision sequence: a short answer produces one message with correctly-rendered MarkdownV2; a buffer crossing the 4096 rendered-length cap seals the current message at a paragraph boundary and opens a new one; unchanged buffers between emits produce a skip; tables/raw HTML render as escaped literal text. Prior art: the existingsplit_at_paragraphandescape_markdown_v2unit tests.Totality check. A test that feeds every truncated prefix of a representative markdown answer to
convert()and asserts it never errors — de-risking the trust-no-net decision on intermediate edits.The throttle timer and the
sendMessage/editMessageTextI/O in the Telegram handler are thin glue over these seams and are validated by manual/integration exercise, not unit tests.Out of Scope
piRPC (Promptcarries only text) and it forces a dependency/values decision about Evie's local-only character. Parked in~/notes/evie.md.setMyCommands,/help, inline keyboards //clearconfirmation). Deferred and parked.pithinking or tool-call events. Only assistant text deltas are streamed to the user; thinking/tool events remain logged as today.Further Notes
~/notes/evie.md.**at one emit will "snap" into bold once its closing delimiter streams in. This is accepted as the cost of live formatting.telegram-markdown-v2crate is a new dependency; adopting it also removes the bespokeescape_markdown_v2implementation, so net module complexity inmarkdown.rsmay decrease.message_updatetext — this feature finally uses it for its stated streaming purpose rather than only for end-of-response chunking.Implemented — commit
0d6b15a9Streaming is built, tested, and reviewed. Not yet merged (the
mainbookmark hasn't been advanced), so this is ready for human review/merge.What shipped
pi.rsparsesmessage_updateand forwards each new snapshot over a channel, still returning theagent_endtext as the authoritative final response.telegram/markdown.rs—render(via thetelegram-markdown-v2crate,Escapestrategy) +paginate(prefix-stable split into ≤4096 rendered UTF-16-unit pages, paragraph-boundary seal with hard-split fallback). Replaces the hand-rolled MarkdownV2 escaper.telegram/mod.rs— immediate…placeholder, throttlededitMessageText(~1/1.5s) with no-op-edit suppression, rolling seal into new messages on overflow, best-effort intermediate + final edits,expose_errors-aware final error handling that never resets the session on a render failure.Deviations from the PRD (decisions made during implementation)
message_updateis a cumulative snapshot, not a delta. Verified against a livepirun: each update's text is a superset of the previous, growing0 → full. So the buffer is replaced with the latest snapshot, not appended (appending would duplicate text). The PRD/US15 and CONTEXT.md said "delta"; CONTEXT.md has been corrected.De-risked
convert()is total over truncated input — is confirmed:tests/convert_totality.rsfeeds every truncated prefix of a realistic answer (mid-code-fence, mid-link, mid-table) and none error. Intermediate edits run without a per-edit net on that basis.Tests (24 passing)
pi.rs).markdown.rs).convert()(tests/convert_totality.rs).Review
/code-reviewrun: Standards clean (no violations introduced; naming conforms to the CONTEXT.md_Avoid_lists), Spec faithful. Both Spec findings fixed — intermediate edit errors are now best-effort (no longer orphan the turn), and error notices land on the most recent message.Notes for merge
jj bookmark set main -r <feature-commit>.tg-/api-chat-ID prefixes while CONTEXT.md/DESIGN.md documenttg:/api:(colons).