Skip to main content

Data Mining

Practical Machine Learning Tools and Techniques, Second Edition

  • 2nd Edition - June 8, 2005
  • Latest edition
  • Authors: Ian H. Witten, Eibe Frank
  • Language: English

Data Mining, Second Edition, describes data mining techniques and shows how they work. The book is a major revision of the first edition that appeared in 1999. While the basic cor… Read more

World Book Day celebration

Where learning shapes lives

Up to 25% off trusted resources that support research, study, and discovery.

Description

Data Mining, Second Edition, describes data mining techniques and shows how they work. The book is a major revision of the first edition that appeared in 1999. While the basic core remains the same, it has been updated to reflect the changes that have taken place over five years, and now has nearly double the references.

The highlights of this new edition include thirty new technique sections; an enhanced Weka machine learning workbench, which now features an interactive interface; comprehensive information on neural networks; a new section on Bayesian networks; and much more.

This text is designed for information systems practitioners, programmers, consultants, developers, information technology managers, specification writers as well as professors and students of graduate-level data mining and machine learning courses.

Key features

  • Algorithmic methods at the heart of successful data mining—including tried and true techniques as well as leading edge methods
  • Performance improvement techniques that work by transforming the input or output

Readership

Information systems practitioners, programmers, consultants, developers, information technology managers, specification writers as well as professors and students of graduate-level data mining and machine learning courses

Table of contents

Preface

1. What’s it all about?

1.1 Data mining and machine learning

1.2 Simple examples: the weather problem and others

1.3 Fielded applications

1.4 Machine learning and statistics

1.5 Generalization as search

1.6 Data mining and ethics

1.7 Further reading

2. Input: Concepts, instances, attributes

2.1 What’s a concept?

2.2 What’s in an example?

2.3 What’s in an attribute?

2.4 Preparing the input

2.5 Further reading

3. Output: Knowledge representation

3.1 Decision tables

3.2 Decision trees

3.3 Classification rules

3.4 Association rules

3.5 Rules with exceptions

3.6 Rules involving relations

3.7 Trees for numeric prediction

3.8 Instance-based representation

3.9 Clusters

3.10 Further reading

4. Algorithms: The basic methods

4.1 Inferring rudimentary rules

4.2 Statistical modeling

4.3 Divide-and-conquer: constructing decision trees

4.4 Covering algorithms: constructing rules

4.5 Mining association rules

4.6 Linear models

4.7 Instance-based learning

4.8 Clustering

4.9 Further reading

5. Credibility: Evaluating what’s been learned

5.1 Training and testing

5.2 Predicting performance

5.3 Cross-validation

5.4 Other estimates

5.5 Comparing data mining schemes

5.6 Predicting probabilities

5.7 Counting the cost

5.8 Evaluating numeric prediction

5.9 The minimum description length principle

5.10 Applying MDL to clustering

5.11 Further reading

6. Implementations: Real machine learning schemes

6.1 Decision trees

6.2 Classification rules

6.3 Extending linear models

6.4 Instance-based learning

6.5 Numeric prediction

6.6 Clustering

6.7 Bayesian networks

7. Transformations: Engineering the input and output

7.1 Attribute selection

7.2 Discretizing numeric attributes

7.3 Some useful transformations

7.4 Automatic data cleansing

7.5 Combining multiple models

7.6 Using unlabeled data

7.7 Further reading

8. Moving on: Extensions and applications

8.1 Learning from massive datasets

8.2 Incorporating domain knowledge

8.3 Text and Web mining

8.4 Adversarial situations

8.5 Ubiquitous data mining

8.6 Further reading

Part II: The Weka machine learning workbench

9. Introduction to Weka

9.1 What’s in Weka?

9.2 How do you use it?

9.3 What else can you do?

10. The Explorer

10.1 Getting started

10.2 Exploring the Explorer

10.3 Filtering algorithms

10.4 Learning algorithms

10.5 Meta-learning algorithms

10.6 Clustering algorithms

10.7 Association-rule learners

10.8 Attribute selection

11. The Knowledge Flow interface

11.1 Getting started

11.2 Knowledge Flow components

11.3 Configuring and connecting the components

11.4 Incremental learning

12. The Experimenter

12.1 Getting started

12.2 Simple setup

12.3 Advanced setup

12.4 The Analyze panel

12.5 Distributing processing over several machines

13. The command-line interface

13.1 Getting started

13.2 The structure of Weka

13.3 Command-line options

14. Embedded machine learning

15. Writing new learning schemes

References
Index

Review quotes

“This book presents this new discipline in a very accessible form: both as a text to train the next generation of practitioners and researchers, and to inform lifelong learners like myself. Witten and Frank have a passion for simple and elegant solutions. They approach each topic with this mindset, grounding all concepts in concrete examples, and urging the reader to consider the simple techniques first, and then progress to the more sophisticated ones if the simple ones prove inadequate. If you have data that you want to analyze and understand, this book and the associated Weka toolkit are an excellent way to start.” —From the foreword by Jim Gray, Microsoft Research

“It covers cutting-edge, data mining technology that forward-looking organizations use to successfully tackle problems that are complex, highly dimensional, chaotic, non-stationary (changing over time), or plagued by. The writing style is well-rounded and engaging without subjectivity, hyperbole, or ambiguity. I consider this book a classic already!” —Dr. Tilmann Bruckhaus, StickyMinds.com

Product details

  • Edition: 2
  • Latest edition
  • Published: July 13, 2005
  • Language: English

About the authors

IW

Ian H. Witten

Ian H. Witten is a professor of computer science at the University of Waikato in New Zealand. He directs the New Zealand Digital Library research project. His research interests include information retrieval, machine learning, text compression, and programming by demonstration. He received an MA in Mathematics from Cambridge University, England; an MSc in Computer Science from the University of Calgary, Canada; and a PhD in Electrical Engineering from Essex University, England. He is a fellow of the ACM and of the Royal Society of New Zealand. He has published widely on digital libraries, machine learning, text compression, hypertext, speech synthesis and signal processing, and computer typography.
Affiliations and expertise
Computer Science Department, University of Waikato, New Zealand

EF

Eibe Frank

Eibe Frank lives in New Zealand with his Samoan spouse and two lovely boys, but originally hails from Germany, where he received his first degree in computer science from the University of Karlsruhe. He moved to New Zealand to pursue his Ph.D. in machine learning under the supervision of Ian H. Witten and joined the Department of Computer Science at the University of Waikato as a lecturer on completion of his studies. He is now a professor at the same institution. As an early adopter of the Java programming language, he laid the groundwork for the Weka software described in this book. He has contributed a number of publications on machine learning and data mining to the literature and has refereed for many conferences and journals in these areas.
Affiliations and expertise
Computer Science Department, University of Waikato, New Zealand