The cloud frontier still leads on raw quality: gpt-4.1-mini tops this edition at 96%. But the real story is local. Running fully on‑device, Gemma‑4 E4B — only about 4 billion effective parameters — lands right alongside the cloud at 89%, passing every must‑have gate. Privacy with essentially no quality penalty. The one cost is speed: roughly 7× slower than Gemini (~33 s vs ~4.8 s per turn on a MacBook Air (M3)). Tolerable — and analysis runs in the background, so you keep writing while it works.
Gemma‑4 E4B on‑device, all 5 must‑haves passed — vs 96% for the cloud frontier.
slower than Gemini (~33 s vs ~4.8 s/turn). But it runs in the background.
Nothing leaves your machine. No API keys, no upload — the whole novel stays local.
| # | Model | Quality | Speed | Verdict | Must‑have | |
|---|---|---|---|---|---|---|
| 1 | gpt-4.1-mini openai | 96% | 2.7 s/turn | pass | 5/5 | read full report → |
| 2 | gemini-2.5-flash gemini | 89% | 4.8 s/turn | flag | 4/5 | read full report → |
| 3 | google/gemma-4-e4b lmstudio · on‑device | 89% | 33 s/turn | pass | 5/5 | read full report → |
Every model analyses the same four chapters of Dracula through TalePal's full pipeline — character extraction, role classification, plot phases, quotes. The output is then graded against a hand‑built golden reference: the answer key a careful human reader would write.
Two must‑have gates sit on top of the weighted score: the protagonist and antagonist must be identified correctly. A model can write beautifully and still be flagged if it misses the villain. Every attributed quote is also verified against the source text — invented lines are caught, not rewarded.
Speed is measured separately, as average response time per chat turn, so the same machine's local‑vs‑cloud latency is comparable.