*AMuS

Dataset Benchmarks

Compare state-of-the-art model performance on FAMuS and SEAMuS tasks

Select Benchmark

info Click on model names or metric headers to view detailed descriptions

Source Validation Performance

Binary classification: Does the source document describe the same event as the report?

Model unfold_more Setting unfold_more Accuracy unfold_more Precision unfold_more Recall unfold_more F1 unfold_more
{{ row.model }} {{ row.setting || '-' }} {{ row.accuracy }} {{ row.precision }} {{ row.recall }} {{ row.f1 }}

Argument Extraction Performance

Cross-document argument extraction from report and source documents

Model unfold_more Setting unfold_more Precision unfold_more Recall unfold_more F1 unfold_more
{{ row.setting || '-' }} {{ row.precision }} {{ row.recall }} {{ row.f1 }}

{{ selectedSeamusTask === 'report' ? 'Report' : 'Cross-Document' }} Summarization

Event-centric summarization performance across multiple evaluation metrics

Model unfold_more Setting unfold_more R1 unfold_more R2 unfold_more RL unfold_more BS unfold_more CR unfold_more A unfold_more F unfold_more
{{ row.model }} {{ row.setting }} {{ row.r1 }} {{ row.r2 }} {{ row.rl }} {{ row.bs }} {{ row.cr }} {{ row.a }} {{ row.f }}