LIMITED OFFER
Save 50% on book bundles
Immediately download your ebook while waiting for your print delivery. No promo code needed.
Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications brings together all the information, tools and methods a professional will need to effici… Read more
LIMITED OFFER
Immediately download your ebook while waiting for your print delivery. No promo code needed.
Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications brings together all the information, tools and methods a professional will need to efficiently use text mining applications and statistical analysis.
Winner of a 2012 PROSE Award in Computing and Information Sciences from the Association of American Publishers, this book presents a comprehensive how-to reference that shows the user how to conduct text mining and statistically analyze results. In addition to providing an in-depth examination of core text mining and link detection tools, methods and operations, the book examines advanced preprocessing techniques, knowledge representation considerations, and visualization approaches. Finally, the book explores current real-world, mission-critical applications of text mining and link detection using real world example tutorials in such varied fields as corporate, finance, business intelligence, genomics research, and counterterrorism activities.
The world contains an unimaginably vast amount of digital information which is getting ever vaster ever more rapidly. This makes it possible to do many things that previously could not be done: spot business trends, prevent diseases, combat crime and so on. Managed well, the textual data can be used to unlock new sources of economic value, provide fresh insights into science and hold governments to account. As the Internet expands and our natural capacity to process the unstructured text that it contains diminishes, the value of text mining for information retrieval and search will increase dramatically.
In one comprehensive resource, this book provides complete coverage of statistical analysis and text mining applications to aid professionals, practitioners, researchers and upper level undergraduate and graduate students for those who need to learn how to rapidly do text mining to incorporate into information distillation and thus good decision making.
Dedication
Endorsements for Practical Text Mining & Statistical Analysis for Non-structured Text Data Applications
Foreword 1
Foreword 2
Foreword 3
Acknowledgments
Preface
About the Authors
Introduction
Building the Workshop Manual
Communication
The Structure of this Book
Part I: Basic Text Mining Principles
Part II: Tutorials
Part III: Advanced Topics
Tutorials
Why Did We Write This Book?
What Are the Benefits of Text Mining?
Blast Off!
References
List of Tutorials by Guest Authors
Part I: Basic Text Mining Principles
Chapter 1. The History of Text Mining
Preamble
The Roots of Text Mining: Information Retrieval, Extraction, and Summarization
Information Extraction and Modern Text Mining
Major Innovations in Text Mining since 2000
The Development of Enabling Technology in Text Mining
Emerging Applications in Text Mining
Sentiment Analysis and Opinion Mining
IBM’s Watson: An “Intelligent” Text Mining Machine?
What’s Next?
Postscript
References
Chapter 2. The Seven Practice Areas of Text Analytics
Preamble
What is Text Mining?
The Seven Practice Areas of Text Analytics
Five Questions for Finding the Right Practice Area
The Seven Practice Areas in Depth
Interactions between the Practice Areas
Scope of This Book
Summary
Postscript
References
Chapter 3. Conceptual Foundations of Text Mining and Preprocessing Steps
Preamble
Introduction
Syntax versus Semantics
The Generalized Vector-Space Model
Preprocessing Text
Creating Vectors from Processed Text
Summary
Postscript
Reference
Chapter 4. Applications and Use Cases for Text Mining
Preamble
Why Is Text Mining Useful?
Extracting “Meaning” from Unstructured Text
Summarizing Text
Common Approaches to Extracting Meaning
Extracting Information through Statistical Natural Language Processing
Statistical Analysis of Dimensions of Meaning
Beyond Statistical Analysis of Word Frequencies: Parsing and Analyzing Syntax
Review
Improving Accuracy in Predictive Modeling
Using Statistical Natural Language Processing to Improve Lift
Using Dictionaries to Improve Prediction
Identifying Similarity and Relevance by Searching
Part of Speech Tagging and Entity Extraction
Summary
Postscript
References
Chapter 5. Text Mining Methodology
Preamble
Text Mining Applications
Cross-Industry Standard Process for Data Mining (CRISP-DM)
Example 1: An Exploratory Literature Survey Using Text Mining
Postscript
References
Chapter 6. Three Common Text Mining Software Tools
Preamble
Introduction
IBM SPSS Modeler Premium
SAS Text Miner
About the Scenarios in This SAS Section
Tips for Text Mining
STATISTICA Text Miner
Summary: STATISTICA Text Miner
Postscript
Part II: Introduction to the Tutorial and Case Study Section of This Book
Introduction
Reference
Tutorial AA. Case Study: Using the Social Share of Voice to Predict Events That Are about to Happen
Analysis
Summary
Tutorial BB. Mining Twitter for Airline Consumer Sentiment
Introduction
What Is R?
Loading Data into R
The twitteR Package
Extracting Text from Tweets
The plyr Package
Estimating Sentiment
Loading the Opinion Lexicon
Implementing Our Sentiment Scoring Algorithm
Algorithm Sanity Check
data.frames Hold Tabular Data
Scoring the Tweets
Repeat for Each Airline
Compare the Score Distributions
Ignore the Middle
Compare with ACSI’s Customer Satisfaction Index
Scrape the ACSI Website
Compare Twitter Results with ACSI Scores
Graph the Results
Notes and Acknowledgments
References
Tutorial A. Using STATISTICA Text Miner to Monitor and Predict Success of Marketing Campaigns Based on Social Media Data
Introduction
The Key Issue
Step 1: Collecting Data
Step 2: Monitoring the Situation
Step 3: Creating Predictive Models
Step 4: Performing a “What-If” Analysis of the Marketing Campaigns
Step 5: Performing Sentiment Analysis
Summary
Tutorial B. Text Mining Improves Model Performance in Predicting Airplane Flight Accident Outcome
Introduction
The Data
Text Mining the Data
Text Mining Results
Data Preparation
Using Text Mining Results to Build Predictive Models
Tutorial C. Insurance Industry: Text Analytics Adds “Lift” to Predictive Models with STATISTICA Text and Data Miner
Introduction
Data Description
Part A: Comparing the Lift of Predictive Models with and without Text Mining
Boosted Trees (without Text Material)
Boosted Trees Adding the Text Mining Variables
How to Merge Graphs
Part B: Enterprise Deployment
Summary
Tutorial D. Analysis of Survey Data for Establishing the “Best Medical Survey Instrument” Using Text Mining
Introduction
The Analysis
Summary
Tutorial E. Analysis of Survey Data for Establishing “Best Medical Survey Instrument” Using Text Mining: Central Asian (Russian Language) Study Tutorial 2: Potential for Constructing Instruments That Have Increased Validity
Introduction
The Analysis
Summary
Tutorial F. Using eBay Text for Predicting ATLAS Instrumental Learning
Introduction
Examining the Data by Types
Summary
Reference
Tutorial G. Text Mining for Patterns in Children’s Sleep Disorders Using STATISTICA Text Miner
Setting Up the Analysis
Reviewing Results
Summary
Tutorial H. Extracting Knowledge from Published Literature Using RapidMiner
Introduction
Motivation
A Brief Introduction to RapidMiner
Text Analytics in RapidMiner
Starting a New Process
Summary
Reference
Tutorial I. Text Mining Speech Samples: Can the Speech of Individuals Diagnosed with Schizophrenia Differentiate Them from Unaffected Controls?
Introduction
Objectives
Case Study: The Steps Used to Prepare the Data
Results and Analysis
Summary
References
Tutorial J. Text Mining Using STM™, CART®, and TreeNet® from Salford Systems: Analysis of 16,000 iPod Auctions on eBay
Installing the Salford Text Miner
Comments on the Challenge
Tutorial K. Predicting Micro Lending Loan Defaults Using SAS® Text Miner
Introduction
About SAS® Text Miner
Project Overview
Preparing the Data and Setting Up the Diagram
Creating a New Project
Registering the Table
Creating a New Diagram
Text Filter Node
Text Topic Node
Creating the Text Mining Flow
Inserting the Data
Understanding Text Parsing
Synonyms and Multiterm Words
Defining Topics
Other Uses of the Interactive Topic Viewer
Making the Predictive Model
Final Results
Viewing the Reports
Text Only Decision Tree
All Variable Text and Relational
Conclusion
Tutorial L. Opera Lyrics: Text Analytics Compared by the Composer and the Century of Composition—Wagner versus Puccini
Tutorial M. Case Study: Sentiment-Based Text Analytics to Better Predict Customer Satisfaction and Net Promoter® Score Using IBM®SPSS® Modeler
Introduction
Business Objectives
Case Study
Creating New Categories and Adding Missing Descriptors
Results and Analysis
Summary
References
Tutorial N. Case Study: Detecting Deception in Text with Freely Available Text and Data Mining Tools
Introduction
General Architecture for Test Engineering
Linguistic Inquiry and Word Count
Working with General Architecture for Test Engineering and Linguistic Inquiry and Word Count Output
Summary
References
Tutorial O. Predicting Box Office Success of Motion Pictures with Text Mining
Introduction
Analysis
Summary
References
Tutorial P. A Hands-On Tutorial of Text Mining in PASW: Clustering and Sentiment Analysis Using Tweets from Twitter
Introduction
Objective
Case Study
Categorization
Cluster Analysis
Analyzing Text Links
Additional Settings
Summary
Tutorial Q. A Hands-On Tutorial on Text Mining in SAS®: Analysis of Customer Comments for Clustering and Predictive Modeling
Introduction
Objective
Case Study
Summary
References
Tutorial R. Scoring Retention and Success of Incoming College Freshmen Using Text Analytics
Introduction
Part I. Predictive Modeling Using Only the Numeric Variables
Part II. Text Mining and Text Variables’ Word Frequencies and Concepts
Tutorial S. Searching for Relationships in Product Recall Data from the Consumer Product Safety Commission with STATISTICA Text Miner
Specifying the Analysis
Reviewing the Results
Tutorial T. Potential Problems That Can Arise in Text Mining: Example Using NALL Aviation Data
Introduction
Spelling Errors
Example: Finding Spelling Errors in Text Miner
Combine Words
Misspellings as Synonyms
Unexpected Terms
Example: Finding Unexpected Terms
Different File Types
Summary
Tutorial U. Exploring the Unabomber Manifesto Using Text Miner
Introduction
Summarizing the Text
Searching for Trends with Pronouns
References
Tutorial V. Text Mining PubMed: Extracting Publications on Genes and Genetic Markers Associated with Migraine Headaches from PubMed Abstracts
Tutorial W. Case Study: The Problem with the Use of Medical Abbreviations by Physicians and Health Care Providers
The Present Problem in the use of Medical Abbreviations by Physicians and Health Care Providers
TJC (JCAHO) “Do Not Use” Abbreviations
Additional Abbreviations, Acronyms, and Symbols
Using the “Text Mining Project” Format of STATISTICA Text Miner
Using TextMiner3.dbs
Conclusion
Intervention Training Needed
References
Tutorial X. Classifying Documents with Respect to “Earnings” and Then Making a Predictive Model for the Target Variable Using Decision Trees, MARSplines, Naïve Bayes Classifier, and K-Nearest Neighbors with STATISTICA Text Miner
Introduction: Automatic Text Classification
Data File with File References
Specifying the Analysis
Processing the Data Analysis
Saving the Extracted Word Frequencies to the Input File
Initial Feature Selection
General Classification and Regression Trees
K-Nearest Neighbors Modeling
Conclusion
Reference
Tutorial y. Case Study: Predicting Exposure of Social Messages: The Bin Laden Live Tweeter
Introduction
Analysis
Summary
Tutorial Z. The InFLUence Model: Web Crawling, Text Mining, and Predictive Analysis with 2010–2011 Influenza Guidelines—CDC, IDSA, WHO, and FMC
Abstract
Web Crawling and Text Mining of CDC Documents on FLU
Feature Selection
MARSplines Interactive Module Modeling
Boosted Trees
Naïve Bayes Modeling
K-Nearest Neighbors
Part III: Advanced Topics
Chapter 7. Text Classification and Categorization
Preamble
Introduction
Defining a Classification Problem
Feature Creation
Text Classification Algorithms
Combining Evidence
Evaluating Text Classifiers
Hierarchical Text Classification
Text Classification Applications
Summary
Postscript
References
Chapter 8. Prediction in Text Mining: The Data Mining Algorithms of Predictive Analytics
Preamble
Introduction
The Power of Simple Descriptive Statistics, Graphics, and Visual Text Mining
Visual Data Mining
Predictive Modeling (Supervised Learning)
Statistical Models versus General Predictive Modeling
Clustering (Unsupervised Learning)
Singular Value Decomposition, Principal Components Analysis, and Dimension Reduction
Association and Link Analysis
Summary
Postscript
References
Chapter 9. Entity Extraction
Preamble
Introduction
Text Features for Entity Extraction
Strategies for Entity Extraction
Choosing an Entity Extraction Approach
Evaluating Entity Extraction
Summary
Postscript
References
Chapter 10. Feature Selection and Dimensionality Reduction
Preamble
Introduction
Feature Selection
Feature Selection Approaches
Dimensionality Reduction
Linear Dimensionality Reduction Approaches
Postscript
References
Chapter 11. Singular Value Decomposition in Text Mining
Preamble
Introduction
Redundancy in Text
Dimensions of Meaning: Latent Semantic Indexing
The Math of Singular Value Decomposition
Graphical Representations and Simple Examples
Singular Value Decomposition in Equation Form
Singular Value Decomposition and Principal Components Analysis Eigenvalues
Some Practical Considerations
Extracting Dimensions
Subjective Methods: Reviewing Graphs
Analytical Methods: Building Models for Dimensions
Useful Analyses Based on Singular Value Decomposition Scores
Cluster Analysis
Predictive Modeling
When SVD Is Not Useful
Summary
Postscript
References
Chapter 12. Web Analytics and Web Mining
Preamble
Web Analytics
The Value of Web Analytics
The Future of Web Analytics and Web Mining
Postscript
References
Chapter 13. Clustering Words and Documents
Preamble
Introduction
Clustering Algorithms
Clustering Documents
Clustering Words
Cluster Visualization
Summary
Postscript
References
Chapter 14. Leveraging Text Mining in Property and Casualty Insurance
Preamble
Introduction
Property and Casualty Insurance as a Business
Analytics Opportunities in the Insurance Life Cycle
Driving Business Value Using Text Mining
Summary
Postscript
References
Chapter 15. Focused Web Crawling
Preamble
Introduction
The Focused Crawling Process
The Opportunities and Challenges of Mining the Web
Topic Hierarchies for Focused Crawling
Training the Document Classifier
Capturing User Feedback
Summary
Postscript
References
Chapter 16. The Future of Text and Web Analytics
Text Analytics and Text Mining
The Pros and Cons of Commercial Software versus Open Source Software
The Future of Text Mining
The Future of Web Analytics
Multisession Pathing
Integration of Web Analytics with Standard BI Tools
Attribution across Multiple Sessions
The Future: What Does It Hold?
New Areas That May Use Text Analytics in the Future
IBM Watson
Summary
References
IBM-Watson References
Chapter 17. Summary
Why Are You Reading This Chapter?
Our Perspective for Applying Text Mining Technology
Part I: Background and Theory
What Is Text Mining?
What Tools Can I Use?
Part II: The Text Mining Laboratory—28 Tutorials
Part III: Advanced Topics
Outlines of Chapter 7–15
Glossary
Index
How to Use the Data Sets and the Text Mining Software on the DVD or on Links for Practical Text Mining
I Data Sets for the Tutorials in Practical Text Mining
II SAS Text Miner Software
III Salford Systems Software, Including a New Text Miner Module Made for this Book (30-Day Free Trial Available)
IV STATISTICA Text Miner Software (30-day free trial on the DVD that accompanies this book)
GM
JE
AF
TH
RN
DD