arXiv:2501.02031v1 Announce Type: new
Abstract: As the impact of global climate change intensifies, corporate carbon emissions have become a focal point of global attention. In response to issues such as the lag in climate change knowledge updates within large language models, the lack of specialization and accuracy in traditional augmented generation architectures for complex problems, and the high cost and time consumption of sustainability report analysis, this paper proposes CarbonChat: Large Language Model-based corporate carbon emission analysis and climate knowledge Q&A system, aimed at achieving precise carbon emission analysis and policy understanding.First, a diversified index module construction method is proposed to handle the segmentation of rule-based and long-text documents, as well as the extraction of structured data, thereby optimizing the parsing of key information.Second, an enhanced self-prompt retrieval-augmented generation architecture is designed, integrating intent recognition, structured reasoning chains, hybrid retrieval, and Text2SQL, improving the efficiency of semantic understanding and query conversion.Next, based on the greenhouse gas accounting framework, 14 dimensions are established for carbon emission analysis, enabling report summarization, relevance evaluation, and customized responses.Finally, through a multi-layer chunking mechanism, timestamps, and hallucination detection features, the accuracy and verifiability of the analysis results are ensured, reducing hallucination rates and enhancing the precision of the responses.
Source link
lol