Linguistic Metrics for Patent Disclosure: Evidence from University Versus Corporate Patents
This paper proposes a novel approach to measure disclosure in patent applications using algorithms from computational linguistics. Borrowing methods from the literature on second language acquisition, we analyze core linguistic features of 40,949 U.S. applications in three patent categories related to nanotechnology, batteries, and electricity from 2000 to 2019. Relying on the expectation that universities have more incentives to disclose their inventions than corporations for either incentive reasons or for different source documents that patent attorneys can draw on, we confirm the relevance and usefulness of the linguistic measures by showing that university patents are more readable. Combining the multiple measures using principal component analysis, we find that the gap in disclosure is 0.4 SD, with a wider gap between top applicants. Our results do not change after accounting for the heterogeneity of inventions by controlling for cited-patent fixed effects. We also explore whether one pathway by which corporate patents become less readable is use of multiple examples to mask the “best mode” of inventions. By confirming that computational linguistic measures are useful indicators of readability of patents, we suggest that the disclosure function of patents can be explored empirically in a way that has not previously been feasible.
This research uses data from the Lens (https://www.lens.org/) and we are grateful to Richard Jefferson and Aaron Ballagh for their constructive comments and data support. For thoughtful discussions, we thank Lesley Millar-Nicholson (Director at Technology Licensing Office, MIT), and Timothy Oyer (President at Wolf Greenfield Intellectual Property Law), as well as seminar participants at Queensland University of Technology. We appreciate Hamish Macintosh and Yi Wang for their technical support. This research is funded by Australian Research Council Discovery Grant DP180103856. The views expressed herein are those of the authors and do not necessarily reflect the views of the National Bureau of Economic Research.