
Text Information Retrieval Systems
- 3rd Edition - December 19, 2006
- Imprint: Academic Press
- Authors: Charles T. Meadow, Bert R. Boyce, Donald H. Kraft, Carol L Barry
- Language: English
- eBook ISBN:9 7 8 - 0 - 0 8 - 0 4 6 9 0 3 - 4
This will be the third edition of the highly successful ‘Text Information Retrieval Systems’. The book's purpose is to teach people who will be searching or designing text… Read more

Purchase options

Institutional subscription on ScienceDirect
Request a sales quoteThis will be the third edition of the highly successful ‘Text Information Retrieval Systems’. The book's purpose is to teach people who will be searching or designing text retrieval systems how the systems work. For designers, it covers problems they will face and reviews currently available solutions to provide a basis for more advanced study. For the searcher its purpose is to describe why such systems work as they do. The book is primarily about computer-based retrieval systems, but the principles apply to nonmechanized ones as well. The book covers the nature of information, how it is organized for use by a computer, how search functions are carried out, and some of the theory underlying these functions. As well, it discusses the interaction between user and system and how retrieved items, users, and complete systems are evaluated. A limited knowledge of mathematics and of computing is assumed.
This third edition will be updated to include coverage of the WWW and current search engines. In many cases, examples of non-web searching will be replaced with web-based illustrations. Coverage of interfaces, various features available to assist searchers, and areas in which search assistance is not available will also be covered. In addition, the book will have a web dimension which will include relevant material available online, to be used in conjunction with the text.
This third edition will be updated to include coverage of the WWW and current search engines. In many cases, examples of non-web searching will be replaced with web-based illustrations. Coverage of interfaces, various features available to assist searchers, and areas in which search assistance is not available will also be covered. In addition, the book will have a web dimension which will include relevant material available online, to be used in conjunction with the text.
*Follow-up to the award winning 2nd Edition
*Focuses on computer-based sytem but basic principles can be applied to any information seeking context
*Focuses on computer-based sytem but basic principles can be applied to any information seeking context
Designers, web searchers, academics, students
Contents
Preface
1
Introduction
1.1 What Is Information?
1.2 What Is Information Retrieval?
1.3 How Does Information Retrieval Work?
1.3.1 The User Sequence
1.3.2 The Database Producer Sequence
1.3.3 System Design and Functioning
1.3.4 Why the Process Is Not Perfect
1.4 Who Uses Information Retrieval?
1.4.1 Information Specialists
1.4.2 Subject Specialist End Users
1.4.3 Non-Subject Specialist End Users
1.5 What Are the Problems in IRS Design and Use?
1.5.1 Design
1.5.2 Understanding User Behavior
1.6 A Brief History of Information Retrieval
1.6.1 Traditional Information Retrieval Methods
1.6.2 Pre-computer IR Systems
1.6.3 Special Purpose Computer Systems
1.6.4 General Purpose Computer Systems
1.6.5 Online Database Services
1.6.6 The World Wide Web
Recommended Reading
2
Data, Information, and Knowledge
2.1 Introduction
2.2 Definitions
2.2.1 Data
2.2.2 Information
2.2.3 News
2.2.4 Knowledge
2.2.5 Intelligence
2.2.6 Meaning
2.2.7 Wisdom
2.3 Metadata
2.4 Knowledge Base
2.5 Credence, Justified Belief, and Point of View
2.6 Summary
3
Representation of Information
3.1 Information to Be Represented
3.2 Types of Representation
3.2.1 Natural Language
3.2.2 Restricted Natural Language
3.2.3 Artificial Language
3.2.4 Codes, Measures, and Descriptors
3.2.5 Mathematical Models of Text
3.3 Characteristics of Information Representations
3.3.1 Discriminating Power
3.3.2 Identification of Similarity
3.3.3 Descriptiveness
3.3.4 Ambiguity
3.3.5 Conciseness
3.4 Relationships Among Entities and Attribute Values
3.4.1 Hierarchical Codes
3.4.2 Measurements
3.4.3 Nominal Descriptors
3.4.4 Inflected Language
3.4.5 Full Text
3.4.6 Explicit Pointers and Links
3.5 Summary
4
Attribute Content and Values
4.1 Types of Attribute Symbols
4.1.1 Numbers
4.1.2 Character Strings—Names
4.1.3 Other Character Strings
4.2 Class Relationships
4.2.1 Hierarchical Classification
4.2.2 Network Relationships
4.2.3 Class Membership—Binary, Probabilistic, or Fuzzy
4.3 Transformation of Values
4.3.1 Transformation of Words by Stemming
4.3.2 Sound-Based Transformation of Words
4.3.3 Transformation of Words by Meaning
4.3.4 Transformation of Graphics
4.3.5 Transformation of Sound
4.4 Uniqueness of Values
4.5 Ambiguity of Attribute Values
4.6 Indexing of Text
4.7 Control of Vocabulary
4.7.1 Elements of Control
4.7.2 Dissemination of controlled vocabularies
4.8 The Importance of Point of View
4.9 Summary
5
Models of Virtual Data Structure
5.1 The Concept of Models of Data
5.2 Basic Data Elements and Structures
5.2.1 Scalar Variables and Constants
5.2.2 Vector Variables
5.2.3 Structures
5.2.4 Arrays
5.2.5 Tuples
5.2.6 Relations
5.2.7 Text
5.3 The Common Structural Models
5.3.1 The Linear Sequential Model
5.3.2 The Relational Model
5.3.3 Hierarchical and Network Models
5.4 Applications of the Basic Models
5.4.1 Hypertext
5.4.2 Spreadsheet Files
5.5 The Entity-Relationship Model
5.6 Summary
6
The Physical Structure of Data
6.1 Introduction to Physical Structures
6.2 Record Structures and Their Effects
6.2.1 Basic Structures
6.2.2 Space-Time and Transaction Rate
6.3 Basic Concepts of File Structure
6.3.1 The Order of Records
6.3.2 Finding Records
6.4 Organizational Methods
6.4.1 Sequential Files
6.4.2 Index-File Structures
6.4.3 Lists
6.4.4 Trees
6.4.5 Direct-Access Structures
6.5 Parsing of Data Elements
6.5.1 Phrase Parsing
6.5.2 Word Parsing
6.5.3 Word and Phrase Parsing
6.6 Combination Structures
6.6.1 Nested Indexes
6.6.2 Direct Structure with Chains
6.6.3 Indexed Sequential Access Method
6.7 Summary
7
Querying the Information Retrieval System
7.1 Introduction
7.2 Language Types
7.3 Query Logic
7.3.1 Sets and Subsets
7.3.2 Relational Statements
7.3.3 Boolean Query Logic
7.3.4 Ranked and Fuzzy Sets
7.3.5 Similarity Measures
7.4 Functions Performed
7.4.1 Connect to a Remote IRS
7.4.2 Select Database
7.4.3 Search the Inverted File or Thesaurus
7.4.4 Create a Subset of the Database
7.4.5 Search for Strings
7.4.6 Analyze a Set
7.4.7 Sort, Display, and Format Records
7.4.8 Handle the Unstructured Record
7.4.9 Download
7.4.10 Order Documents
7.4.11 Save, Recall, and Edit Searches
7.4.12 Current Awareness Search
7.4.13 Cost Summary
7.4.14 Terminate a Session
7.5 The Basis for Charging for Searches
8
Interpretation and Execution of Query Statements
8.1 Problems of Query Language Interpretation
8.1.1 Parsing Command Language
8.1.2 Parsing Natural Language
8.1.3 Processing Menu Choices
8.2 Executing Retrieval Commands
8.2.1 Database Selection
8.2.2 Inverted File Search
8.2.3 Set or Subset Creation
8.2.4 Truncation and Universal Characters
8.2.5 Left-Hand Truncation
8.3 Executing Record Analysis and Presentation Commands
8.3.1 Set Analysis Functions
8.3.2 Display, Format, and Sort
8.3.3 Offline Printing
8.4 Executing Other Commands
8.4.1 Ordering
8.4.2 Save, Recall, and Edit Searches
8.4.3 Current Awareness
8.4.4 Cost Summation and Billing
8.4.5 Terminate a Session
8.5 Feedback to Users and Error Messages
8.5.1 Response to Command Errors
8.5.2 Set-Size Indication
8.5.3 Record Display
8.5.4 Set Analysis
8.5.5 Cost
8.5.6 Help
9
Text Searching
9.1 The Special Problems of Text Searching
9.1.1 Note on Terminology and Symbols
9.1.2 The Semantic Web
9.2 Some Characteristics of Text and Their Applications
9.2.1 Components of Text
9.2.2 Significant Words—Indexing
9.2.3 Significant Sentences—Abstracting
9.2.4 Measures of Complete Texts
9.3 Command Language for Text Searching
9.3.1 Set Membership Statements
9.3.2 Word or String Occurrence Statements
9.3.3 Proximity Statements
9.3.4 Web Based Text Search
9.4 Term Weighting
9.4.1 Indexing with Weights
9.4.2 Automated Assignment of Weights
9.4.3 Improving Weights
9.5 Word Association Techniques
9.5.1 Dictionaries and Thesauri
9.5.2 Mini-Thesauri
9.5.3 Word Co-occurrence Statistics
9.5.4 Stemming and Conflation
9.6 Text or Record Association Techniques
9.6.1 Similarity Measures
9.6.1 Clustering
9.6.3 Signature Matching
9.6.4 Discriminant Methods
9.7 Other Processes with Words of a Text
9.7.1 Stop Words
9.7.2 Replacement of Words with Roots or Associated Words
9.7.3 Varying Significance as a Function of Frequency
9.7.4 Comments on the Computation of the Strength of Document Association
10
System-Computed Relevance and Ranking
10.1 The Retrieval Status Value (rsv)
10.2 Ranking
10.3 Methods of Evaluating the rsv
10.3.1 The Vector Space Model
10.3.2 The Probabilistic Model
10.3.3 The Extended Boolean Model
10.4 The rsv in Operational Retrieval
11
Search Feedback and Iteration
11.1 Basic Concepts of Feedback and Iteration
11.2 Command Sequences
11.3 Information Available as Feedback
11.3.1 File or Database Selection
11.3.2 Terms Search or Browsing
11.3.3 Record Search or Set Formation
11.3.4 Record Display and Browsing
11.3.5 Record Acquisition
11.3.6 Requests for Information about the Retrieval System
11.3.7 Establishing Communication Parameters
11.3.8 Trends over Sequences and Cycles
11.4 Adjustments in the Search
11.4.1 Improve Term Selection
11.4.2 Improve Set Formation Logic
11.4.3 Improve Final Set Size
11.4.4 Improve Precision, Recall, or Total Utility
11.5 Feedback from User to System
12
Multi-Database Searching and Mapping
12.1 Basic Concepts
12.2 Multi-database Search
12.2.1 The Nature of Duplicate Records
12.2.2 Detection of Duplicates
12.2.3 Scanning Multiple Databases
12.3 Mapping
12.4 Value of Mapping
13
Search Strategy
13.1 The Nature of Searching Reconsidered
13.1.1 Known Item Search
13.1.2 Specific Information Search
13.1.3 General Information Search
13.1.4 Exploration of the Database
13.2 The Nature of Search Strategy
13.2.1 Search Objective
13.2.2 General Plan of Operation
13.2.3 The Essential Information Elements of a Search
13.2.4 Specific Plan of Operation
13.3 Types of Strategies
13.3.1 Categorizing by Objective
13.3.2 Categorizing by Plan of Operation
13.4 Tactics
13.4.1 Monitoring Tactics
13.4.2 File Structure Tactics
13.4.3 Search Formulation Tactics
13.4.4 Term Tactics
13.5 Summary
14
The Information Retrieval System Interface
14.1 General Model of Message Flow
14.2 Sources of Ambiguity
14.3 The Role of a Search Intermediary
14.3.1 Establishing the Information Need
14.3.2 Development of a Search Strategy
14.3.3 Translation of the Need Statement into a Query
14.3.4 Interpretation and Evaluation of Output
14.3.5 Search Iteration Within the Strategic Plan
14.3.6 Change of Strategy when Necessary
14.3.7 Help Using an IRS
14.4 Automated Search Mediation
14.4.1 Early Development
14.4.2 Fully Automatic Intermediary Functions
14.4.3 Interactive Intermediary Functions
14.5 The User Interface as a Component of All Systems
14.6 The User Interface n Web Search Engines
15
A Sampling of Information Retrieval Systems
15.1 Introduction
15.2 Dialog
15.2.1 A Command Language Using Boolean Logic
15.2.2 Target
15.2.3 DIALOGWeb— A Web Adaptation
15.3 AltaVista
15.3.1 Default Query Entry Form
15.3.2 Advanced Search Form
15.4 Google
15.4.1 The Web Crawler
15.4.2 Searching
15.4.3 Google Advanced Search
15.5 PubMed
15.6 EBSCO Host
15.7 Summary
16
Measurement and Evaluation
16.1 Basics of Measurement
16.1.1 The Data Manager
16.1.2 The Query Manager
16.1.3 The Query Composition Process
16.1.4 Deriving the Information Need
16.1.5 The Database
16.1.6 Users
16.2 Relevance, Value, and Utility
16.2.1 Relevance as Relatedness
16.2.2 Aspects of Value
16.2.3 Relevance as Utility
16.2.4 Retaining Two Separate Relevance Measures
16.2.5 The Relevance Measurement Scale
16.2.6 Taking the Measurements
16.2.7 Questions About Relevance as a Measure
16.3 Measures Based on Relevance
16.3.1 Precision (Pr)
16.3.2 Recall (Re)
16.3.3 Relationship of Recall and Precision
16.3.4 Overall Effectiveness Measures Based on Re and Pr
16.4 Measures of Process
16.4.1 Query Translation
16.4.2 Errors in a Query Statement
16.4.3 Average Time per Command or per User Decision
16.4.4 Elapsed Time of a Search
16.4.5 Number of Commands or Steps in a Search
16.4.6 Cost of a Search
16.4.7 Size of Final Set Formed
16.4.8 Number of Records Reviewed by the User
16.4.9 Patterns of Language Use
16.4.10 Measures of Rank Order
16.5 Measures of Outcome
16.5.1 Precision
16.5.2 Recall
16.5.3 Efficiency
16.5.4 Overall User Evaluation
16.6 Measures of Environment
16.6.1 Database Record Selection
16.6.2 Record Content
16.6.3 Measures of Users
16.7 Conclusion
Bibliography
Index
- Edition: 3
- Published: December 19, 2006
- Imprint: Academic Press
- Language: English
- eBook ISBN: 9780080469034
CM
Charles T. Meadow
Charles T. Meadow, professor emeritus, University of Toronto, and has been visiting professor at the Universities of North Carolina and the West Indies. He edited the Journal of the American Society for Information Science and the Canadian Journal of Information Science and was president of the Canadian Association for Information Science. Received Research Award and shared Annual Information Science Book Award from ASIS&T.
Affiliations and expertise
University of Toronto, Ontario, CanadaBB
Bert R. Boyce
Bert Boyce has been an Information System Research Analyst, for the Information Systems Office, at the Library of Congress, a faculty member and acting Dean of the School of Library and Information Science, University of Missouri, Columbia, Missouri, and Dean of the School of Library and Information Science, Louisiana State University, where he is now Professor and Dean Emeritus. He is currently Editor of the Academic Press Library and Information Science Series. He received the ASIS&T Outstanding Information Science Teacher Award in 1989, and has shared the Annual Information Science Book Award from ASIS&T.
Affiliations and expertise
Louisiana State University, Baton Rouge, U.S.A.DK
Donald H. Kraft
Donald Kraft is professor at LSU and Distinguished Visiting Professor at the U.S. Air Force Academy. He is a fellow of IEEE and AAAS and editor of the Journal of the American Society for Information Science and Technology He received the Research Award, Watson Davis Award, and shared the Annual Information Science Book Award from ASIS&T and the LSU Distinguished Faculty award.
Affiliations and expertise
Louisiana State University, Baton Rouge, U.S.A.CB
Carol L Barry
Carol Barry is associate professor in the School of Library and Information Science, Louisiana State University. She has received the Best JASIS Paper Award, 1995; the LSU Alumni Association Teaching Award, 1995; and the American Society for Information Science, Doctoral Forum Award, 1993. She is associate editor of JASIS&T, a Member of the Board of ASIS&T, and a member of the LSU Faculty Senate and its vice president in 2000-2001. She has authored or co-authored over 30 research papers.
Affiliations and expertise
Associate Professor at Louisiana State University, USA.