LLM4Law

Legal LLM Benchmarking

Legal LLM Benchmarking

Quantitative analysis of local LLMs performance on legal tasks

Model Version Contract Analysis Case Prediction Legal Research Document Drafting Overall
Llama 2
Meta
7B-chat
78%
65%
52%
82%
Mistral
Mistral AI
7B-v0.1
85%
78%
68%
88%
GPT4All
Nomic AI
Falcon-7B
72%
70%
58%
80%

Task Selection

Carefully curated legal tasks representing real-world scenarios including contract analysis, case prediction, legal research, and document drafting.

Evaluation Criteria

Each task is evaluated on accuracy, legal reasoning, citation quality, and practical applicability by a panel of IP attorneys.

Testing Environment

All models tested on identical hardware (RTX 4090, 64GB RAM) with standardized prompts and temperature settings for fair comparison.

New Benchmark: Mistral 7B vs. Llama 2 7B

Added comprehensive comparison between the two leading 7B parameter models on contract analysis tasks.

Posted: June 15, 2023

Methodology Update

Refined our evaluation criteria for legal research tasks to better assess citation accuracy and relevance.

Posted: May 28, 2023

Made with DeepSite LogoDeepSite - 🧬 Remix