
Gaku Morio
Research & Development Division
Hitachi America, Ltd.
Introduction
Industrial activities, encompassing construction, production, transportation, and operations, are major contributors to climate change. Addressing this global issue requires transparent corporate pledges and genuine efforts aligned with global or local climate policies and goals. Some corporations, however, whether intentionally or not, fall short of meaningful action, sometimes projecting misleading impressions of their climate change commitments. Policymakers and investors, who seek to regulate or support corporations, on the other hand, are increasingly looking for objective and evidence-based insights to guide their decisions. The current landscape of corporate messaging and activities on climate change, however, is opaque due to the sheer volume of data - including corporate reports, promotional material, and environmental databases – and the limited resources available for comprehensive analysis.
Given the promising throughput of artificial intelligence (AI), particularly natural language processing (NLP) in analyzing documents, can these challenges be addressed? In this article, I’d like to outline how NLP can be used to analyze corporate messages and activities related to climate change, drawing on my research as a visiting scholar at Stanford University with several case studies.
Benchmarking NLP models for climate policy engagement
To analyze corporate messaging on climate change, we first need to understand how companies engage with climate policies at a macro-level. Evidence of corporate climate policy engagement is typically found in lengthy reports and legal disclosures. To address the sparsity of structured methods to analyze such data, a co-author at Stanford and I constructed a benchmark dataset comprising of over 10,000 corporate-related documents[1] using data from LobbyMap[2], a platform that categorizes corporate stances on climate policies. To process the documents, we developed a structured extraction pipeline using PDF parsing with optical character recognition (OCR), and sentence alignment techniques. This pipeline allowed us to label each document with 13 climate policy categories (e.g., renewable energy, carbon pricing, emissions regulations), as well as apply a five-point stance scale (ranging from strongly supporting to opposing) together with pointers to the evidence pages within the document.

We evaluated the performance of several pre-trained language models using our proposed pipeline model (Figure 1). Among them, Longformer-large [3] achieved an F1-score of 53.6% for evidence page detection and 31.5% for stance classification. While these results are promising, they also highlight the challenges current models face in capturing nuanced messaging within lengthy and complex documents. On the other hand, it marks a major step forward in leveraging NLP models to detect corporate stances on climate policy.
Extracting key details from sustainability reports using ReportParse
Analyzing corporate engagement with climate policy provides a high-level overview, and the next logical step is to extract detailed information on climate commitments and actions. Sustainability reports, ESG reports, or CSR reports often contain this information; however, these documents can be lengthy (sometimes exceeding 100 pages) and often have unstructured and complex layouts which makes it a challenge for researchers to efficiently access and analyze the relevant data.

To address this, my colleagues and I developed ReportParse, a Python-based tool designed to automate the extraction of sustainability-related information by integrating document layout analysis with NLP models (Figure 2).
ReportParse enables researchers to apply customized text extraction or layout parsing methods to process reports. The tool organizes the extracted content into structured elements such as titles, paragraphs, and tables while applying third-party NLP models to label key information (e.g., emissions targets and climate risk disclosures). This allows analysis of the details of corporate climate-related commitments and activities.
Identifying misleading climate narratives in social media advertising
While corporate reports provide valuable information, they represent only part of the picture and more comprehensive analysis is possible. Beyond formal reports, companies can shape public perspective for example, through social media messaging, and in sectors such as oil & gas, this may include misleading narratives to maintain industry support.
To explore whether NLP models or large language models (LLMs) can automatically classify corporate advertisements, we used an existing dataset [6] containing manually labelled ads from fossil fuel companies. This dataset identifies seven types of climate obstruction narratives, such as “Clean gas as a climate solution,” “Energy independence through domestic oil,” and “Fossil fuels create jobs.”

My colleagues and I fine-tuned RoBERTa-large [7] on the dataset, achieving an F1-score of 71.4%.
[5] To further explore the potential of LLMs, we tested GPT-4 Turbo as an annotator (Table 1).

We found that the model trained on the GPT-annotated data with only 30 human-annotated examples outperformed RoBERTa-large trained on full human annotations – which was quite remarkable. This finding highlights the potential of LLMs to significantly reduce annotation costs and enable efficient large-scale monitoring of climate obstructive narratives in corporate advertisements on social media platforms.
Conclusion
As ESG investment and green bonds gain momentum, most sectors are being called to clearly indicate their position and actions on climate change. Companies are producing an increasing volume of climate-related disclosures across various channels. At the same time, stakeholders such as consumers, banks, governments, and auditors want access to transparent information and objective data.
In this article, I discussed how NLP can be used to investigate corporate engagement in climate policies, commitments and advertising strategies and shared my thoughts. While I recognize that many challenges remain, this research marks an important step towards the systematic and fair evaluation of the corporate climate actions. Moving forward, my research colleagues and I aim to expand this work by collecting more corporate climate-related messages and data in a comprehensive and reliable way to develop technology and use cases in other areas of sustainability, not just climate change.
Acknowledgements
I would like to thank my research colleagues at Stanford with whom this collaborative research was conducted. I would also like to acknowledge and thank the National Institute of Advanced Industrial Science and Technology (AIST) for the use of their computational resource, AI Bridging Cloud Infrastructure (ABC) in the experiments.
References
[1] G. Morio and C.D. Manning. 2023. An NLP Benchmark Dataset for Assessing Corporate Climate Policy Engagement. NeurIPS 2023 Datasets and Benchmarks Track.
[2] http://lobbymap.org/
[3] I. Beltagy, M.E. Peters, and A. Cohan. 2020. Longformer: The long-document transformer. CoRR, abs/2004.05150.
[4] G. Morio, S.Y. In, J. Yoon, H. Rowlands, C. Manning . ReportParse: A Unified NLP Tool for Extracting Document Structure and Semantics of Corporate Sustainability Reporting. In Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI-24, Aug 2024.
[5] H. Rowlands, G. Morio, D. Tanner, and C. Manning. Predicting Narratives of Climate Obstruction in Social Media Advertising. In Findings of the Association for Computational Linguistics: ACL 2024, Aug 2024.
[6] F. Holder, S. Mirza, N.N. Lee, J. Carbone, and R.E. McKie. 2023. Climate obstruction and Facebook advertising: how a sample of climate obstruction organizations use social media to disseminate discourses of delay. Climatic Change, 176(2):16.
[7] Y. Liu, M. Ott, N. Goyal, J. Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized BERT pretraining approach. CoRR, abs/1907.11692.