TalePal Model Benchmark

lmstudio google/gemma-4-e4b

Pros no hallucinated quotes · protagonist + antagonist correct · all 5 must-haves · no training-data leak
Cons duplicate: Peter Hawkins = Mr. Peter Hawkins · mentioned-only as present: Peter Hawkins (role=supporting) · chapter-occurrence 72.7% · over-classifies main (6 vs 2)
Overall Rating
89
/100

Rubric breakdown the four weighted components of the overall score

Each rubric is computed independently and weighted into the overall percent at the top. Weights are deliberately simple — tune as we learn. See the comparison page for what each rubric tests.

Hallucination resistance 30% 100
Core correctness 30% 100
Coverage 20% 73
Cleanliness 20% 70

Quality (text-anchored)

1/2

The two metrics whose truth is verifiable by string-matching against the source chapters: quote authenticity (did every attributed quote actually appear?) and chapter-occurrence (did the model place each character in the right chapters?).

Quote authenticity 45/45 100.0%
Chapter-occurrence 8/11 72.7%
findings
all attributed quotes verified in source

Must-have gate

5/5

A fixed list of facts that must be correct because downstream chat features assume them. A single failure here flags the model regardless of overall score.

harker_protagonist 1/1 Harker primaryRole=protagonist
dracula_antagonist 1/1 Dracula primaryRole=antagonist
min_4_characters 1/1 10 characters
harker_all_chapters 1/1 Harker in chapters 1-4
three_women_supporting 1/1 three women / weird sisters extracted

Counts (model vs gold)

2/2

How many characters the model extracted, split by role, compared to gold. Both protagonist and antagonist roles must be detected; the counts give an at-a-glance sense of over- or under-extraction.

Protagonist detected 1/1 Harker
Antagonist detected 1/1 Dracula
modelgoldΔ
characters104+6
main62+4
supporting42+2
findings
mentioned-only as present: Peter Hawkins (role=supporting)duplicate: Peter Hawkins = Mr. Peter Hawkins