About

CFinBench is introduced as a comprehensive evaluation benchmark specifically designed to assess the financial knowledge of large language models (LLMs) within a Chinese context. The benchmark is structured around four primary categories: Financial Subject, Financial Qualification, Financial Practice, and Financial Law. These categories examine LLMs' abilities in basic financial knowledge, obtaining necessary financial certifications, fulfilling practical financial roles, and adhering to financial laws and regulations, respectively. CFinBench includes 99,100 questions across 43 subcategories and three types of questions: single-choice, multiple-choice, and judgment.

The benchmark was used to evaluate 50 representative LLMs, including GPT4 and several Chinese-oriented models, on their performance. The results showed that GPT4 and some Chinese models led the evaluations, with the highest recorded average accuracy being 60.16%. This finding underscores the challenging nature of CFinBench. The authors of the study plan to make all the data and evaluation code openly available for further research and development in the field. All data and evaluation code are coming.

Announcement

2024/07/06 The paper link: arXiv Here.
2024/06/20 The dataset is released at Here.
2024/06/16 The evaluation code has been open sourced at Here.
2024/06/12 All data and evaluation code are coming.

Citation

@article{nie2024cfinbench,
        title={CFinBench: A Comprehensive Chinese Financial Benchmark for Large Language Models},
        author={Nie, Ying and Yan, Binwei and Guo, Tianyu and Liu, Hao and Wang, Haoyu and He, Wei and Zheng, Binfan and Wang, Weihao and Li, Qiang and Sun, Weijian and others},
        journal={arXiv preprint arXiv:2407.02301},
        year={2024}
        }
}