Main Content

A Textual Analysis of US Corporate Social Responsibility Reports

Peter Clarkson, Jordan Ponn*, Gordon Richardson#, Frank Rudzicz%, Albert Tsang&, and Jingjing Wang#

UQ Business School, University of Queensland and the Beedie School of Business, Simon Fraser University

* Department of Computer Science, University of Toronto

# Rotman School of Management, University of Toronto, Toronto, ON

%Li Ka Shing Knowledge Institute, St Michael’s Hospital and Surgical Safety Technologies Incorporated and Department of Computer Science, University of Toronto and Vector Institute for Artificial Intelligence

& School of Accounting and Finance, Hong Kong Polytechnic University

The flourishing of investor research services that measure firms’ performance on corporate social responsibility (CSR) testifies to the importance investors and stakeholders place on that performance. In response to stakeholder demand, companies are increasingly issuing voluntary CSR disclosures. But these voluntary disclosures are subject to only limited regulatory guidance and oversight, so how much trust can we place in them? And do professional assessments of CSR performance truly consider all the information available – not just what is intentionally disclosed, but what CSR reports unintentionally reveal through the style and quantity of their disclosure?


The authors started out by using machine-learning techniques that trained their model to recognize features of CSR reports from firms whose performance was already known to be good or poor. This let evidence, rather than the researchers’ preconceptions, drive the analysis of which features indicated a good or poor “performance type”. “We let the predictive model tell us which features are useful in predicting CSR performance type, rather than imposing ex-ante assumptions,” they wrote.


They then tested 466 commonly analyzed linguistic features and found that two simple measures – the number of words and the number of sentences in a report – were 81% accurate in predicting whether a firm had good or poor CSR performance.


Other stylistic features of CSR reports also indicate good or poor performance. By expanding their considerations to the top 50 linguistic features that their modelling found were associated with CSR performance, the researchers were able to predict good or poor performance with 96% accuracy: an increase of 15 percentage points over looking at length alone. They observed that “good CSR performers are generally more advanced in their writing”, “more sociable, friendly and cooperative”, and exhibit features suggesting greater ambition, achievement, and level of sophistication, consistent with their proactive CSR strategies.


The researchers even confirmed that linguistic analysis offers “incremental value” to investors when incorporated into valuation models, above what is offered by measures of CSR performance used by ASSET4, an investment research firm that specializes in assessment of economic, social and governance (ESG) factors. Use of linguistic analysis would “expand the information set that can be used by professional investor/rating services such as ASSET4, KLD, MSCI and Trucost, and indeed by public company investors and CSR stakeholders”.


Financial analysts and investors could also use this kind of analysis to assess CSR disclosures when making valuations of private firms and initial public offerings whose CSR performance is not covered by investor research services. This led to an interesting finding that “good CSR performers could potentially enhance their valuation premium arising from proactive CSR strategies by using plain English in their CSR reports.” This is because while longer reports are associated with higher valuations, complex reports are associated with lower ones. This reinforces findings from other studies on readability in financial reporting, which suggest that complex statements increase perceived uncertainty. Conversely, by issuing CSR reports that are simple and easy to understand, firms could inspire greater confidence among investors and stakeholders.


The full paper can be found at