Dataset Benchmarks
Compare state-of-the-art model performance on FAMuS and SEAMuS tasks
Select Benchmark
info Click on model names or metric headers to view detailed descriptions
Source Validation Performance
Binary classification: Does the source document describe the same event as the report?
Model unfold_more | Setting unfold_more | Accuracy unfold_more | Precision unfold_more | Recall unfold_more | F1 unfold_more |
---|---|---|---|---|---|
{{ row.model }} | {{ row.setting || '-' }} | {{ row.accuracy }} | {{ row.precision }} | {{ row.recall }} | {{ row.f1 }} |
Argument Extraction Performance
Cross-document argument extraction from report and source documents
Model unfold_more | Setting unfold_more | Precision unfold_more | Recall unfold_more | F1 unfold_more |
---|---|---|---|---|
{{ row.setting || '-' }} | {{ row.precision }} | {{ row.recall }} | {{ row.f1 }} |
{{ selectedSeamusTask === 'report' ? 'Report' : 'Cross-Document' }} Summarization
Event-centric summarization performance across multiple evaluation metrics
Model unfold_more | Setting unfold_more | R1 unfold_more | R2 unfold_more | RL unfold_more | BS unfold_more | CR unfold_more | A unfold_more | F unfold_more |
---|---|---|---|---|---|---|---|---|
{{ row.model }} | {{ row.setting }} | {{ row.r1 }} | {{ row.r2 }} | {{ row.rl }} | {{ row.bs }} | {{ row.cr }} | {{ row.a }} | {{ row.f }} |