Per-session locking: stop holding the map lock across a pi turn #33

Open
opened 2026-07-05 13:10:46 +08:00 by weiwen · 0 comments
Owner

What to build

SessionManager::send_message takes the session-map write lock and holds it across the entire pi turn, so a status check, the idle-cleanup task, or a message on another chat/interface all block for the full duration of an in-flight answer — defeating the RwLock that was chosen for concurrent reads. Restructure so the map lock is held only briefly (lookup, or spawn-and-insert) and each turn holds only its own session's lock.

Scoped single-user (ADR-0001), but the dual-interface overlap and status-during-turn cases are documented use cases, so the fix is worthwhile.

Decision from grilling:

map:  RwLock<HashMap<String, Arc<Slot>>>
Slot: { process: tokio::Mutex<PiProcess>, last_activity: AtomicI64 /* epoch secs */ }
  • The map lock is taken briefly to get-or-create the slot. On a miss, PiProcess::spawn is awaited under the map lock (session creation is infrequent), the slot is inserted, its Arc cloned, then the lock is dropped. The turn then locks only slot.process.
  • last_activity lives outside the turn lock as an atomic, so get_status and cleanup_idle read it under just the brief map read lock — they never block on a running turn.
  • On a turn error, re-acquire the map write lock and remove the slot, guarded by Arc::ptr_eq so a concurrently-replaced session is not clobbered.

Acceptance criteria

  • Session map stores Arc<Slot>; the turn's send_message holds only the per-session lock, not the map lock.
  • get_status and cleanup_idle read last_activity without acquiring the per-session turn lock (verified: status returns while a turn is simulated in flight).
  • Session creation still spawns exactly one pi process per chat; crash cleanup removes the crashed slot and is guarded against clobbering a replacement.
  • The single-chat back-to-back path behaves unchanged.
  • Existing session tests pass; just check green.

Blocked by

None - can start immediately.

Soft advisory: overlaps #31 in the session manager (session-id minting). Grabbing this after #31 avoids a merge collision — not a hard blocker.

## What to build `SessionManager::send_message` takes the session-map write lock and holds it across the **entire** pi turn, so a status check, the idle-cleanup task, or a message on another chat/interface all block for the full duration of an in-flight answer — defeating the `RwLock` that was chosen for concurrent reads. Restructure so the map lock is held only briefly (lookup, or spawn-and-insert) and each turn holds only its own session's lock. Scoped single-user (ADR-0001), but the dual-interface overlap and status-during-turn cases are documented use cases, so the fix is worthwhile. Decision from grilling: ``` map: RwLock<HashMap<String, Arc<Slot>>> Slot: { process: tokio::Mutex<PiProcess>, last_activity: AtomicI64 /* epoch secs */ } ``` - The map lock is taken briefly to get-or-create the slot. On a miss, `PiProcess::spawn` is awaited **under the map lock** (session creation is infrequent), the slot is inserted, its `Arc` cloned, then the lock is dropped. The turn then locks only `slot.process`. - `last_activity` lives **outside** the turn lock as an atomic, so `get_status` and `cleanup_idle` read it under just the brief map read lock — they never block on a running turn. - On a turn error, re-acquire the map write lock and remove the slot, guarded by `Arc::ptr_eq` so a concurrently-replaced session is not clobbered. ## Acceptance criteria - [ ] Session map stores `Arc<Slot>`; the turn's `send_message` holds only the per-session lock, not the map lock. - [ ] `get_status` and `cleanup_idle` read `last_activity` without acquiring the per-session turn lock (verified: status returns while a turn is simulated in flight). - [ ] Session creation still spawns exactly one pi process per chat; crash cleanup removes the crashed slot and is guarded against clobbering a replacement. - [ ] The single-chat back-to-back path behaves unchanged. - [ ] Existing session tests pass; `just check` green. ## Blocked by None - can start immediately. _Soft advisory: overlaps #31 in the session manager (session-id minting). Grabbing this after #31 avoids a merge collision — not a hard blocker._
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
weiwen/evie#33
No description provided.