-
-
Notifications
You must be signed in to change notification settings - Fork 3k
Open
Description
Bug Description
When the configured Gemini API key expires (or is invalid), the worker enters an infinite retry loop with no circuit breaker or backoff. This burns through API quota with 400 errors indefinitely until the worker is manually killed.
Impact
Over 6 days with an expired key, my worker generated ~77,000 failed requests (0.02% success rate on Gemini API dashboard). The pending_messages queue accumulated 25K+ failed entries, and the SQLite DB bloated to 564MB.
Reproduction
- Configure
CLAUDE_MEM_PROVIDER=geminiwith a valid API key - Let the worker run and accumulate some pending messages
- Revoke or let the API key expire
- Observe: worker retries every 4-14 seconds per session, indefinitely
Log Evidence
[ERROR] [SDK] [session-603] Session generator failed {provider=Fg} Gemini API error: 400 - {
"error": {
"code": 400,
"message": "API key expired. Please renew the API key.",
"status": "INVALID_ARGUMENT",
"details": [{ "reason": "API_KEY_INVALID" }]
}
}
[INFO] [SYSTEM] [session-603] Pending work remains after generator exit, restarting with fresh AbortController {pendingCount=172}
[INFO] [SYSTEM] [session-603] Starting generator (pending-work-restart) using Fg
This cycle repeats every ~5-14 seconds per session, with 13 active sessions running concurrently.
Expected Behavior
- 400
API_KEY_INVALIDshould be treated as a fatal, non-retryable error - Worker should stop retrying after detecting this error class, log a clear message, and either:
- Pause processing until the key is updated (preferred), or
- Mark all pending messages as failed and shut down gracefully
- At minimum, implement exponential backoff with a max retry cap for all 4xx errors
Current Behavior
- No circuit breaker exists
- No distinction between retryable (429, 503) and non-retryable (400, 401, 403) errors
retry_countfield onpending_messagesis never incremented (always 0)- Sessions retry indefinitely via
pending-work-restartgenerator pattern
Environment
- claude-mem version: 10.5.5
- Provider: gemini (
gemini-2.5-flash-lite) - OS: macOS
Suggested Fix
Classify HTTP status codes into retryable vs non-retryable:
- Retryable: 429 (rate limit), 500, 502, 503, 504 → exponential backoff with max retries
- Non-retryable: 400 (bad request), 401 (unauthorized), 403 (forbidden) → fail immediately, log error, stop session processing
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels