[MGMT] Look through the website’s glasses: using big data for classifying and describing Italian innovative SMEs

Innovation is one of the main drivers of long-run economic growth. Defining and measuring innovative efforts and output at the firm level is of uttermost importance for the implementation, monitoring and evaluation of a vast array of industrial, innovation and technology policies. However, an accurate and timely measurement of innovation is a difficult task due to some drawbacks of traditional indicators that are based on companies R&D expenditures, often unreliable, or on surveys data, costly and time-consuming to collect.

This project intends to develop an alternative way of measuring and monitoring innovation based on the firm websites by building new indicators that might complement the existing ones, overcome their weaknesses, and possibly uncover new classifications of innovative business.

We study Italian SMEs through balance sheets data from Bureau Van Dijck’s Orbis (offline data) and information from their own websites (online data). By scraping corporate websites, we collect several features including text content (as a bag of words), technology and complexity (as underlying HTML code, presence of tags or words or hyperlinks, etc.). The study, via Machine Learning techniques, aims to uncover new patters in the data and will be benchmarked with conventional indicators.