Data Science and Interactive Visualization Tools for the Analysis of Qualitative Evidence
- 1st Edition - July 1, 2025
- Author: Manuel González Canché
- Language: English
- Paperback ISBN:9 7 8 - 0 - 4 4 3 - 2 1 9 6 1 - 0
- eBook ISBN:9 7 8 - 0 - 4 4 3 - 2 1 9 6 0 - 3
Data Science and Interactive Visualization Tools for the Analysis of Qualitative Evidence empowers qualitative and mixed methods researchers in the data science movement by offeri… Read more
Purchase options
Institutional subscription on ScienceDirect
Request a sales quoteData Science and Interactive Visualization Tools for the Analysis of Qualitative Evidence empowers qualitative and mixed methods researchers in the data science movement by offering no-code, cost-free software access so that they can apply cutting-edge and innovative methods to synthetize qualitative data. The book builds on the idea that qualitative and mixed methods researchers should not have to learn to code to benefit from rigorous open-source, cost-free software that uses artificial intelligence, machine learning, and data visualization tools—just as people do not need to know C++ or TypeScript to benefit from Microsoft Word.
The real barrier is the hundreds of R code lines required to apply these concepts to their databases. By removing the coding proficiency hurdle, this book will empower their research endeavors and help them become active members of and contributors to the applied data science community. The book offers a comprehensive explanation of data science and machine learning methodologies, along with access to software application tools to implement these techniques without any coding proficiency. The book addresses the need for innovative tools that enable researchers to tap into the insights that come out of cutting-edge data science tools with absolutely no computer language literacy requirements.
The real barrier is the hundreds of R code lines required to apply these concepts to their databases. By removing the coding proficiency hurdle, this book will empower their research endeavors and help them become active members of and contributors to the applied data science community. The book offers a comprehensive explanation of data science and machine learning methodologies, along with access to software application tools to implement these techniques without any coding proficiency. The book addresses the need for innovative tools that enable researchers to tap into the insights that come out of cutting-edge data science tools with absolutely no computer language literacy requirements.
- Provides access to no-code software that enables the dynamic analysis of knowledge production and democratizes access to data science tools
- Discusses analytic frameworks that overcome aggregation bias in text classification via machine learning
- Covers the integration of qualitative and quantitative methods in fully equal, status mixed methods design
Computer Science researchers, data science researchers, and data analysis researchers in academia and industry, researchers in advanced mixed methods, qualitative, and quantitative seminars for graduate students in the social sciences
Part I. Introduction to Data Science and Interactive Visualization Tools for the Analysis of Qualitative Evidence
1. Truly Equal-Status Mixed Methods Design (TESM2D)
1.Qual Versus Quant is no longer?
2.Truly Equal-Status Mixed Methods Design (TESM2D)
3.Relevance of data science and interactive visualizations in the birth of TESM2D
1.How does data science and interactive visualizations can be used to synthesize qualitative evidence?
2.Relevance of No-code or low-code analytic approaches for TESM2D
4.TESM2D and its Connection with Data Science Democratization
2. Textual and Relational data (TRD)
1.Textual and Relational data (TRD) Almost Overwhelming Availability
2.Importance of data science techniques to make sense of TRD
3.Integrating Spatial Contextualization in the Study of TRD
3. Digital Ethnography, Data Science, and Ethical Considerations
1.Digital Ethnography and Data Science for Qualitative Evidence Analyses
2.What Ethical Considerations should be considered when applying data science tools for the analysis of TRD?
1.Approaches to analyzing TRD
2.Software Introduction for Tools to analyze TRD
3.Description of Data Sources for Replication and Reproducibility Purposes
4.How to Read this Book and its Standalone Chapters?
5.Link Between Network Modeling and Relational Data
6.Link Between Text Classification and Textual Data
7.Integration of Spatial Context To Data Storytelling
Part II. Network modeling frameworks
This suite of frameworks relies on network analyses methods to retrieve the mathematical structure embedded in qualitative and relational data. The three frameworks will provide researchers with dynamic and/or interactive visualizations highlighting central topics and actors, as well as software tools designed to highlight the processes and contexts wherein qualitative evidence emerged, including hypotheses tests via Monte Carlo simulations.
5. Network Analysis of Qualitative Data (NAQD)
NAQD analyzes the mathematical structure contained in participants’ coded responses (labels assigned to their transcribed responses), identifies influential actors and coded responses via centrality measures, and identifies similar concerns based on group ascription or participants’ roles (via quadratic assignment correlations). Additionally, this procedure identifies participants’ network communities given their shared responses. Outputs include fully interactive HTML visualizations, downloadable databases with community ascription, and distributions of Monte Carlo simulations of likelihood quantiles.
6. Graphical Retrieval and Analysis of Temporal Information Systems (GRATIS)
GRATIS analyzes the chronological/temporal evolution of information provided by research participants or retrieved from document analyses/essays in qualitative studies. GRATIS offers the possibility of observing the simultaneous evolution of information as retrieved across all research participants, even when data gathering, or data collection, did not happen at the same time or in the same space. This analysis is achieved via global time stamps. This framework also identifies the relevance of actors and coded responses and renders dynamic, fully interactive HTML visualizations.
7. Visual Evolution, Replay, and Integration of Temporal Analytic Systems (VERITAS)
VERITAS analyzes the evolution of information in focus group interactions occurring in the same space (including virtual spaces such as video calls) and in real time. Analyses are separated into (a) evolution of message or information exchanges among participants (actor to actor) and (b) evolution of coded responses over time (actor to coded responses). This framework renders dynamic and interactive HTML visualizations.
8. Relational Frameworks for Data Mining and Data Retrieval via Co-authorship Networks (CN)
This chapter offers a framework to model relationships among units, which may be used as a map to detect influential and peripheral players. Specifically, we offer an example that may take data from SCOPUS to detect the most influential co-authors in the topic “ChatGPT.” This network may offer systematic and innovative approaches to conducted literature reviews, efficiently by identifying central authors who may be collaborating among different communities of thought, as realized by their publication records. Co-authorship networks (CN) may also be integrated with the results of Machine Driven Literature Classification (MDLC)
Part III. Machine Driven Text Classification and Statistical Modeling frameworks
This second suite of frameworks and software tools offers qualitative and mixed methods researchers cutting-edge, state-of-the-art methods to synthetize qualitative evidence in three main areas: code or label identification of written or transcribed data typically employed in qualitative studies; classification of document analyses, with main applications (but not limited) to systematic literature reviews; and the closing of open-ended questions in survey research, which, despite allowing survey respondents to express their views in their own words and providing information about processes or reasons typically absent in quantitative studies, may become difficult or impossible to analyze manually given time and/or financial constraints.
9. Latent Code Identification (LACOID)
LACOID identifies latent codes (topics) in qualitatively gathered data such as interviews, focus groups, essays, media posts, ethnographic observations. In addition to all identified latent codes along with the original texts, LACOID outputs include dynamic HTML visualizations of each latent code and their constituting words frequencies
10. Machine Driven Classification of Open-ended Responses (MDCOR)
MDCOR closes open-ended questions in surveys by providing procedures for detecting themes and assessing the optimal number of latent topics in thousands of survey responses. Outputs include access to the original text of open-ended classified responses and HTML summaries of each topic.
11. Machine Driven Literature Classification (MDLC)
MDLC identifies latent topics or themes in document analyses, including (but not limited to) systematic literature reviews. This framework allows for the assessment of the optimal number of topics in a set of documents and/or research articles. Outputs will allow access to the classified list of documents and latent codes along with dynamic HTML summaries.
Part IV. Integration of Network and Text Classification Analyses
12. In what instances should or could we integrate the analyses and frameworks described in parts I and II?
1.Pros and Cons
2.Best practices
13. Incorporating Spatial Context for Data StoryTelling: GeoStoryTelling
1.Data storytelling and the Academic Research Process
2.Digital Ethnography and Geographical Information Systems (GIS)
3.Multimedia tools to share stories
4.GIS and Data StoryTelling: GeoStoryTelling
14. Sentiment Network Modeling
1.Descriptive modeling tools for Text Analysis
2.Sentiment Analysis
3.Integrating Sentiment Analysis with Relational Thinking
4.Integrating Sentiment Analysis with Classified Topics used in MDCOR, LACOID, and MDLC
15. Closing thoughts and future work
1. Truly Equal-Status Mixed Methods Design (TESM2D)
1.Qual Versus Quant is no longer?
2.Truly Equal-Status Mixed Methods Design (TESM2D)
3.Relevance of data science and interactive visualizations in the birth of TESM2D
1.How does data science and interactive visualizations can be used to synthesize qualitative evidence?
2.Relevance of No-code or low-code analytic approaches for TESM2D
4.TESM2D and its Connection with Data Science Democratization
2. Textual and Relational data (TRD)
1.Textual and Relational data (TRD) Almost Overwhelming Availability
2.Importance of data science techniques to make sense of TRD
3.Integrating Spatial Contextualization in the Study of TRD
3. Digital Ethnography, Data Science, and Ethical Considerations
1.Digital Ethnography and Data Science for Qualitative Evidence Analyses
2.What Ethical Considerations should be considered when applying data science tools for the analysis of TRD?
- Guidelines for informed consent
- Repercussions for Institutional Revie Boards
- Protecting Privacy and Confidentiality
1.Approaches to analyzing TRD
2.Software Introduction for Tools to analyze TRD
3.Description of Data Sources for Replication and Reproducibility Purposes
4.How to Read this Book and its Standalone Chapters?
5.Link Between Network Modeling and Relational Data
6.Link Between Text Classification and Textual Data
7.Integration of Spatial Context To Data Storytelling
Part II. Network modeling frameworks
This suite of frameworks relies on network analyses methods to retrieve the mathematical structure embedded in qualitative and relational data. The three frameworks will provide researchers with dynamic and/or interactive visualizations highlighting central topics and actors, as well as software tools designed to highlight the processes and contexts wherein qualitative evidence emerged, including hypotheses tests via Monte Carlo simulations.
5. Network Analysis of Qualitative Data (NAQD)
NAQD analyzes the mathematical structure contained in participants’ coded responses (labels assigned to their transcribed responses), identifies influential actors and coded responses via centrality measures, and identifies similar concerns based on group ascription or participants’ roles (via quadratic assignment correlations). Additionally, this procedure identifies participants’ network communities given their shared responses. Outputs include fully interactive HTML visualizations, downloadable databases with community ascription, and distributions of Monte Carlo simulations of likelihood quantiles.
6. Graphical Retrieval and Analysis of Temporal Information Systems (GRATIS)
GRATIS analyzes the chronological/temporal evolution of information provided by research participants or retrieved from document analyses/essays in qualitative studies. GRATIS offers the possibility of observing the simultaneous evolution of information as retrieved across all research participants, even when data gathering, or data collection, did not happen at the same time or in the same space. This analysis is achieved via global time stamps. This framework also identifies the relevance of actors and coded responses and renders dynamic, fully interactive HTML visualizations.
7. Visual Evolution, Replay, and Integration of Temporal Analytic Systems (VERITAS)
VERITAS analyzes the evolution of information in focus group interactions occurring in the same space (including virtual spaces such as video calls) and in real time. Analyses are separated into (a) evolution of message or information exchanges among participants (actor to actor) and (b) evolution of coded responses over time (actor to coded responses). This framework renders dynamic and interactive HTML visualizations.
8. Relational Frameworks for Data Mining and Data Retrieval via Co-authorship Networks (CN)
This chapter offers a framework to model relationships among units, which may be used as a map to detect influential and peripheral players. Specifically, we offer an example that may take data from SCOPUS to detect the most influential co-authors in the topic “ChatGPT.” This network may offer systematic and innovative approaches to conducted literature reviews, efficiently by identifying central authors who may be collaborating among different communities of thought, as realized by their publication records. Co-authorship networks (CN) may also be integrated with the results of Machine Driven Literature Classification (MDLC)
Part III. Machine Driven Text Classification and Statistical Modeling frameworks
This second suite of frameworks and software tools offers qualitative and mixed methods researchers cutting-edge, state-of-the-art methods to synthetize qualitative evidence in three main areas: code or label identification of written or transcribed data typically employed in qualitative studies; classification of document analyses, with main applications (but not limited) to systematic literature reviews; and the closing of open-ended questions in survey research, which, despite allowing survey respondents to express their views in their own words and providing information about processes or reasons typically absent in quantitative studies, may become difficult or impossible to analyze manually given time and/or financial constraints.
9. Latent Code Identification (LACOID)
LACOID identifies latent codes (topics) in qualitatively gathered data such as interviews, focus groups, essays, media posts, ethnographic observations. In addition to all identified latent codes along with the original texts, LACOID outputs include dynamic HTML visualizations of each latent code and their constituting words frequencies
10. Machine Driven Classification of Open-ended Responses (MDCOR)
MDCOR closes open-ended questions in surveys by providing procedures for detecting themes and assessing the optimal number of latent topics in thousands of survey responses. Outputs include access to the original text of open-ended classified responses and HTML summaries of each topic.
11. Machine Driven Literature Classification (MDLC)
MDLC identifies latent topics or themes in document analyses, including (but not limited to) systematic literature reviews. This framework allows for the assessment of the optimal number of topics in a set of documents and/or research articles. Outputs will allow access to the classified list of documents and latent codes along with dynamic HTML summaries.
Part IV. Integration of Network and Text Classification Analyses
12. In what instances should or could we integrate the analyses and frameworks described in parts I and II?
1.Pros and Cons
2.Best practices
13. Incorporating Spatial Context for Data StoryTelling: GeoStoryTelling
1.Data storytelling and the Academic Research Process
2.Digital Ethnography and Geographical Information Systems (GIS)
3.Multimedia tools to share stories
4.GIS and Data StoryTelling: GeoStoryTelling
14. Sentiment Network Modeling
1.Descriptive modeling tools for Text Analysis
2.Sentiment Analysis
3.Integrating Sentiment Analysis with Relational Thinking
4.Integrating Sentiment Analysis with Classified Topics used in MDCOR, LACOID, and MDLC
15. Closing thoughts and future work
- No. of pages: 250
- Language: English
- Edition: 1
- Published: July 1, 2025
- Imprint: Morgan Kaufmann
- Paperback ISBN: 9780443219610
- eBook ISBN: 9780443219603
MG
Manuel González Canché
Dr. Manuel S. González Canché is an Associate Professor in the Policy, Organization, Leadership, and Systems Division of the University of Pennsylvania, where he holds a tenured appointment. Dr. González Canché also serves as affiliated faculty with the Human Development and Quantitative Methods division and the International Educational Development Program. In addition, he is a senior scholar in the Alliance for Higher Education and Democracy. In his research, Dr. González Canché employs econometric, quasi-experimental, spatial statistics, and visualization methods for big and geocoded data, including geographical information systems, representation of real-world networks, and text-mining techniques. In related work, he aims to harness the mathematical power of network analysis to find structure in written content. He is developing an analytic method (Network Analysis of Qualitative Data) that blends quantitative, mathematical, and qualitative principles to analyze text data. Similarly, he is also developing the implementation of geographical network analyses that merge network principles and spatial econometrics to model spatial dependence of the outcome variables before making inferential claims. Dr. González Canché is currently teaching courses that rely heavily on computer programming code for PhD students. The no-code tools included in the proposed book have translated into grant funding and peer-reviewed publications in The Journal of Mixed Methods Research, The International Journal of Qualitative Methods, Expert Systems with Applications, and Methodological Innovations. Additionally, he has been offering professional development workshops for the American Educational Research Association. Dr. González Canché has a PhD in Higher Education Policy with cognates in Sociology, Economics, and Biostatistics from the University of Arizona.
Affiliations and expertise
Associate Professor, University of Pennsylvania, USA