Dataset Benchmarks

Compare state-of-the-art model performance on FAMuS and SEAMuS tasks

Dataset

Task

Click on model names or metric headers to view detailed descriptions

Source Validation Performance

Binary classification: Does the source document describe the same event as the report?

Model	Setting	Accuracy	Precision	Recall	F1
{{ row.model }}	{{ row.setting \|\| '-' }}	{{ row.accuracy }}	{{ row.precision }}	{{ row.recall }}	{{ row.f1 }}

Cross-document argument extraction from report and source documents

Document Type

Metric Variant

Model	Setting	Precision	Recall	F1
	{{ row.setting \|\| '-' }}	{{ row.precision }}	{{ row.recall }}	{{ row.f1 }}

Event-centric summarization performance across multiple evaluation metrics

Model	Setting	R1	R2	RL	BS	CR	A	F
{{ row.model }}	{{ row.setting }}	{{ row.r1 }}	{{ row.r2 }}	{{ row.rl }}	{{ row.bs }}	{{ row.cr }}	{{ row.a }}	{{ row.f }}