Text and Image Mining Methods for Business Research and Education
December 18, 2020
Many consumer-firm interactions have moved from offline to online, transforming platforms and marketplaces such as Google, Amazon, Facebook into crucial touchpoints of the customer journey. Nearly 70% of consumers rely on social media to resolve customer service issues (Ahmed, 2017), and 58% of them read reviews before choosing a restaurant (Gatherup, 2018). In addition, firms actively connect with consumers by starting conversations (e.g., Tweeting about a live event) or responding to online consumer complaints. Image-driven social media platforms (e.g., Instagram) are gaining unprecedent relevance, such that 71% of businesses use them to promote services, and 75% of consumers engage with those promotions (Clarke, 2019). Along with the increase of digital interactions, an unprecedented amount of unstructured data, mainly text and images, has disrupted business research.
Unlike structured data (i.e., numbers), which has been the main source of information that business analyzes for the past decades, unstructured data comes through a pathway of customers telling stories to interconnect with other customers and/or firms, rather than from solicited customer feedback surveys, predefined by organizations. Unveiling insights such as brand sentiment and audience segmentation from these text and image driven stories, requires the use of methods and tools from natural language processing, computer vision and machine learning. The present article describes the state-of-the-art of the text and image mining fields, discusses the key objectives and methods for business research, and explains the implications for business education.
Over the last five years, the increase in usability and popularity of text and image analytics (aka, text and image mining) methods has grown exponentially. In fact, recent business literature has dedicated several step-by-step guides in marketing (Berger et al. 2020), management (McKenny et al. 2018), retailing (deKimpe 2020) and business in general (Schwenzow et al. 2020), that should help researchers to use unstructured data for business insight. In addition, the expected growth rate for the global text mining market is 18.1% from 4.75 billion in 2019 to 16.85 billion in 2027(Reports and Data 2020) and the global image recognition (aka, image mining) market size was valued at USD 27.3 billion in 2019 and is expected to register growth of 18.8% from 2020 to 2027 (Reports and Data 2020).
To keep pace with these changes, it is important understanding the key objectives and methods to implement text and image mining. We can identify 3 main objectives for text and image mining (Villarroel Ordenes and Zhang 2020). First, the operationalization of an observed or predefined construct. In this case, a manager might be interested in the consumer sentiment or trust derived from words and images in social media. Second, the identification of unobserved constructs or clusters. An example is when managers want to learn the most relevant topics that are discussed in online reviews (e.g., hotel features), or when they want to identify different types of service experience (e.g., tangible vs. experiential) by clustering consumer images in social media. The third objective is the identification of relationships amongst features in text or images. Researchers might be interested in the words that most frequently occur with a determined brand (e.g., cars), or uncovering relations between the text and images that brands are using in social media.
Each of the aforementioned objectives can be accomplished by one or a combination of methods. Measuring consumer sentiment can be achieved by methods such as lexicon-based, machine learning, deep learning, ensembles, and transformer (Heitmann et al. 2020). There are also several methods for the identification of topics out of customer reviews such as Latent Dirichlet Allocation, Correlated Topic Models and Structured Topic models (Grewal et al. 2020), each taking different assumptions that might enhance their model fit in a business context. Identifying entity relations offers a broad range of methods to assess relationships between constructs, as well. For example, in evaluating the similarity between brands, researchers could use the Cosine or Word Embedding distances measures between the words’ used to describe brands (Netzer et al. 2012). In image mining, applications such as Amazon Rekognition can identify objects or actions in images with high accuracy (e.g., humans, smiles, logos). In addition, advances in deep learning and neuronal nets have contributed to the development of customized algorithms to identify image types or motives that a researcher might be interested in (e.g., rugged brands; Liu, Dzyabura, and Mizik 2020).
The aforementioned developments in text and image mining methods resulted in a pressing need to cover them in business analytics curriculum. These courses face the challenge of balancing advanced technical capabilities in programing (e.g., R, Python, Knime) and statistics, with empirical demonstrations based on real business data. Cumbersome in achieving this balance is the implementation of applied projects that stimulate students to find the right business questions (exploratory, causal, or predictive?), suitable text and image data to answer these questions (e.g., social media, brand forums, or online reviews?), and the most efficient set of methods (e.g., supervised, unsupervised learning, or both?) and visualizations (e.g., which type of graph or figure?) to provide business insight. Without a curriculum that promotes student development in these three areas, it is likely that future managers will struggle in cutting the clutter from Big Data.
Newsletter
Related articles
Financing innovation: a new perspective on the business angels market
February 11, 2020
“Business angels are individuals who evaluate, select and eventually finance innovative business projects that they believe are promising. Business angels networks are simply business angels associations, and the tool through which they organize themselves”. An interview with Paolo Giordani about the Financing innovation.
Reshoring: Consumer Perception and its market effects
February 3, 2020
Reshoring is the company decision to relocate activities back to the home country. An interview with Simona Romani about the effects that reshoring can have on the market, considering the consumers’ point of view.
How to frame the digital transformation? Digital Advisory Board I° Edition Report
April 10, 2019
How to frame the digital transformation? What are the winning strategies? How to implement them effectively? Where to start? In a very open and collaborative spirit, 20 Italian Top Managers responsible for the digital transformation of their companies, have taken part to the Digital Advisory Board (DAB) to gain insights on the impact of digital technologies.
July 25, 2018
“The essence of leadership: personally assuming the moral duty to do something, to act, to take part in the process of building the future. Acknowledging one’s own responsibility to give future generations hope for something better.” The message that Sergio Marchionne has left to the LUISS community