Legal Task Benchmarks
Benchmark Methodology
Task Selection
Carefully curated legal tasks representing real-world scenarios including contract analysis, case prediction, legal research, and document drafting.
Evaluation Criteria
Each task is evaluated on accuracy, legal reasoning, citation quality, and practical applicability by a panel of IP attorneys.
Testing Environment
All models tested on identical hardware (RTX 4090, 64GB RAM) with standardized prompts and temperature settings for fair comparison.
Latest Updates
New Benchmark: Mistral 7B vs. Llama 2 7B
Added comprehensive comparison between the two leading 7B parameter models on contract analysis tasks.
Posted: June 15, 2023Methodology Update
Refined our evaluation criteria for legal research tasks to better assess citation accuracy and relevance.
Posted: May 28, 2023