
Transportation Big Data
Theory and Methods
- 1st Edition - November 29, 2024
- Imprint: Elsevier
- Authors: Zhiyuan Liu, Ziyuan Gu, Pan Liu
- Language: English
- Paperback ISBN:9 7 8 - 0 - 4 4 3 - 3 3 8 9 1 - 5
- eBook ISBN:9 7 8 - 0 - 4 4 3 - 3 3 8 9 2 - 2
Transportation Big Data: Theory and Methods is centered on the big data theory and methods. Big data is now a key topic in transportation, simply because the volume of data has in… Read more

Purchase options

Institutional subscription on ScienceDirect
Request a sales quoteTransportation Big Data: Theory and Methods is centered on the big data theory and methods. Big data is now a key topic in transportation, simply because the volume of data has increased exponentially due to the growth in the amount of traffic (all modes) and detectors. This book provides a structured analysis of the commonly used methods for handling transportation big data; it is supported by a wealth of transportation engineering examples, together with codes. It offers a concise, yet comprehensive, description of the key techniques and important tools in transportation big data analysis.
- Covers big data applications in transportation engineering in real-world scenarios
- Shows how to select different machine learning algorithms for processing, analyzing, and modeling transportation data
- Provides an overview of the fundamental concepts of machine learning and how classical algorithms can be applied to transportation-related problems
- Provides an overview of Python’s basic syntax and commonly used modules, enabling practical data analysis and modeling tasks using Python
Practitioners in the transportation sector who seek to learn and apply big data and machine learning techniques for solving transportation problems
- Title of Book
- Cover image
- Title page
- Table of Contents
- Copyright
- Preface
- Chapter 1. Introduction
- 1.1 Aims and scopes
- 1.1.1 Background
- 1.1.2 Features of this book
- 1.1.3 Objectives of this book
- 1.1.4 Contant summary
- 1.2 Foundations of this book
- 1.2.1 What is data mining?
- 1.2.2 Typical process for data analysis and modeling
- 1.2.2.1 Data selection
- 1.2.2.2 Data preprocessing
- 1.2.2.3 Exploratory data analysis
- 1.2.2.4 Data modeling
- 1.2.3 Introduction to types of traffic data
- 1.2.3.1 Static data
- 1.2.3.2 Fixed detector data
- 1.2.3.3 Mobile detector data
- 1.2.3.4 Operational data
- 1.3 Introduction to the example datasets
- 1.4 Introduction to chapters
- Chapter 2. Data analysis in Python
- 2.1 Setting up the Python environment
- 2.1.1 Basic Python environment configuration
- 2.1.2 Anaconda environment configuration
- 2.1.3 Common interactive tools - Jupyter Notebook
- 2.2 Python basics
- 2.2.1 Basic data types
- 2.2.1.1 Integer
- 2.2.1.2 Floating point number
- 2.2.1.3 Character string
- 2.2.1.4 Boolean value
- 2.2.1.5 Null
- 2.2.2 Variables and assignment
- 2.2.2.1 String operators
- 2.2.2.2 Frequently used functions for string processing
- 2.2.3 Indentation and comments
- 2.2.3.1 Indentation
- 2.2.3.2 Single-line comment
- 2.2.3.3 Multiline comments
- 2.3 Container in Python
- 2.3.1 List
- 2.3.1.1 The definition of list
- 2.3.1.2 Taking and slicing lists
- 2.3.1.3 Addition and deletion of elements
- 2.3.1.4 List generation
- 2.3.2 Tuple
- 2.3.3 Dict
- 2.3.3.1 Definition of dictionary
- 2.3.3.2 Dictionary values
- 2.3.3.3 Addition and deletion of dictionary elements
- 2.3.3.4 Taking out the dictionary keys
- 2.3.4 Set
- 2.3.4.1 Definition of set
- 2.3.4.2 Addition and deletion of collection elements
- 2.3.4.3 Operations on sets
- 2.4 Control flow statements
- 2.4.1 Conditional statements
- 2.4.2 Loop
- 2.4.2.1 Two types of loops
- 2.4.2.2 Loop control
- 2.5 Function definition and invocation
- 2.5.1 Function definition
- 2.5.2 Function invocation
- 2.6 Exception handling
- 2.7 Anonymous functions
- 2.8 Python modules
- 2.8.1 Module usage
- 2.8.1.1 Module retrieval
- 2.8.1.2 Module installation
- 2.8.1.3 Module usage
- 2.8.2 Introduction to Python standard library
- 2.8.2.1 Math module
- 2.8.2.2 Time module
- 2.8.2.3 Random module
- 2.8.3 Introduction to NumPy
- 2.8.3.1 Installing NumPy
- 2.8.3.2 Ndarray object
- 2.8.3.3 Indexing and slicing of multidimensional arrays
- 2.8.3.4 Array operations
- 2.8.3.5 Random number generation
- 2.8.3.6 Further knowledge
- 2.8.4 Introduction to pandas
- 2.8.4.1 Installing pandas
- 2.8.4.2 DataFrame data type
- 2.8.4.3 File reading and writing
- 2.8.4.4 Column operations
- 2.8.4.5 Grouping operations
- 2.8.4.6 Data concatenation
- 2.8.4.7 Statistical functions in pandas
- 2.8.4.8 Advanced statistical functions
- 2.8.5 Introduction matplotlib
- 2.8.5.1 Matplotlib installation
- 2.8.5.2 Introduction to matplotlib
- 2.8.5.3 Introduction to seaborn
- 2.8.6 Introduction to scikit-learn
- 2.8.6.1 Scikit-learn installation
- 2.8.6.2 Introduction to scikit-learn
- 2.8.6.3 Data preprocessing
- 2.8.6.4 Common models
- 2.8.7 Introduction to TensorFlow
- 2.8.7.1 TensorFlow introduction
- 2.8.7.2 TensorFlow 2 installation
- 2.8.7.3 Colaboratory usage
- 2.8.7.4 Introduction to TensorFlow high-level API—Keras
- 2.9 Chapter conclusion
- 2.10 Chapter exercises
- Chapter 3. Data preprocessing and exploratory data analysis
- 3.1 Data preprocessing
- 3.1.1 Data quality analysis
- 3.1.1.1 Missing value check
- 3.1.1.2 Anomaly check
- 3.1.2 Handling of missing values
- 3.1.2.1 Deleting records
- 3.1.2.2 Imputation of missing values
- 3.1.3 Handling of outliers
- 3.1.4 Data standardization
- 3.1.4.1 Min-max normalization
- 3.1.4.2 Zero-mean normalization
- 3.1.4.3 Decimal scaling normalization
- 3.2 Basics of spatiotemporal data analysis
- 3.2.1 Spatial coordinate system transformation
- 3.2.2 Spatiotemporal unit partitioning
- 3.2.3 Spatiotemporal feature extraction
- 3.2.3.1 Calculation of individual features
- 3.2.3.2 Individual feature aggregation
- 3.2.4 Ride-hailing trajectory data after gridding
- 3.3 Exploratory data analysis
- 3.3.1 Analysis of data distribution characteristics
- 3.3.2 Statistical analysis
- 3.3.2.1 Measures of central tendency
- 3.3.2.2 Measures of dispersion
- 3.3.3 Comparative analysis
- 3.3.4 Cyclical analysis
- 3.3.5 Correlation analysis
- 3.3.5.1 Direct plot of scatter plots
- 3.3.5.2 Plotting a scatter plot matrix
- 3.3.5.3 Calculating the correlation coefficient
- 3.4 Chapter conclusion
- 3.5 Chapter exercises
- Chapter 4. Data visualization
- 4.1 Setting chart elements
- 4.1.1 Axis configuration
- 4.1.2 Grid configuration
- 4.1.3 Text configuration in the plot
- 4.1.4 Adding legends and annotations
- 4.1.5 Adjusting figure size, resolution, and saving
- 4.2 Common visualization charts
- 4.2.1 Line plots
- 4.2.2 Bar charts
- 4.2.3 Boxplots
- 4.2.4 Pie charts
- 4.2.5 Histogram
- 4.2.6 Scatter plots
- 4.2.7 Faceted graphs
- 4.3 Fundamentals of interactive data visualization
- 4.3.1 Introduction to scatter plots and interactive operations
- 4.3.2 Overlay of interactive graphs
- 4.3.3 Interactive faceted charts
- 4.3.4 Saving interactive plots
- 4.4 Basic of point-line network plotting
- 4.4.1 Generating graphs in NetworkX
- 4.4.2 Network analysis for graphs
- 4.4.3 Drawing point-line networks
- 4.5 Chapter conclusion
- 4.6 Chapter exercises
- Chapter 5. Machine learning basics
- 5.1 Fundamental concepts of machine learning
- 5.1.1 Common concepts
- Samples and features
- Data sets (training, validation, and test sets)
- Features and feature space
- Labeled data and unlabeled data
- Model
- Hypothesis space
- Hyperparameters
- 5.1.2 The subject of machine learning
- 5.1.3 The main task of machine learning
- 5.2 The introduction of tasks in machine learning
- 5.2.1 Regression
- 5.2.2 Classification
- 5.2.3 Clustering
- 5.2.4 Dimensionality reduction
- 5.3 The basic process of machine learning
- 5.4 Model evaluation and selection
- 5.4.1 Error and loss function
- Error
- Loss function
- The expected risk and empirical risk of an error
- 5.4.2 Performance metrics of classification problem
- Accuracy
- Precision
- Recall
- P–R curves and ROC curves
- F1-score
- 5.4.3 Underfitting and overfitting
- The introduction to underfitting and overfitting
- Bias and variance
- Learning curve
- 5.5 The methods for improving model performance
- 5.5.1 Regularization
- 5.5.2 Cross-validation
- S-fold cross-validation
- Leave-one-out cross-validation
- 5.5.3 Conclusion for improving model performance
- 5.6 Techniques in machine learning training
- 5.6.1 Methods to improve model performance using data
- Artificially augmenting the dataset
- Resample
- Rescale
- Data transformation
- 5.6.2 The techniques of feature selection
- 5.6.3 The methods of hyperparameters tuning
- Grid search
- Random search
- Bayesian optimization
- Other hyperparameter tuning methods
- 5.7 Chapter conclusion
- 5.8 Chapter exercises
- Chapter 6. Linear models
- 6.1 Linear regression
- 6.1.1 Fundamental concepts
- Guidance 1: Data generation
- 6.1.2 Least squares method
- Guidance 2: Data splitting and standardization
- Guidance 3: Learning linear regression
- 6.1.3 Multiple linear regression
- Guidance 4: Model evaluation and validation
- Guidance 5: Model visualization
- 6.1.4 Maximum likelihood estimation
- 6.1.5 Overfitting in linear regression
- Overfitting phenomenon
- Ridge regression
- 6.1.6 Application of linear regression
- Variable selection
- Fitting using the linear model
- Prediction using linear model
- 6.2 Logistic regression
- 6.2.1 Generalized linear model
- 6.2.2 Logistic regression
- 6.2.3 Logistic regression parameter learning rules
- 6.2.4 Logistic regression loss: Cross-entropy
- Batch gradient descent algorithm
- Stochastic gradient descent algorithm
- Mini-batch SGD algorithm
- 6.2.5 Multiclass problems
- Building multiple binary classifiers [18]
- Softmax classification
- 6.2.6 Example application of logistic regression
- Dataset introduction
- Data preprocessing
- Model training
- Visualization of classification results
- Analysis of classification accuracy
- Summary
- 6.3 Chapter conclusion
- 6.4 Chapter exercises
- Chapter 7. Support vector machine
- 7.1 Basic concepts in support vector machine
- 7.1.1 Linear separability and hyperplanes
- 7.1.2 Margins and support vectors
- 7.2 Linearly separable support vector machine
- 7.2.1 Basic form of support vector machine
- 7.2.2 Dual problem of linearly separable support vector machine
- 7.3 Soft margin linear support vector machine
- 7.3.1 Hard margin and soft margin
- 7.3.2 Dual problem of the soft margin linear support vector machine
- 7.4 Nonlinear support vector machine
- 7.4.1 Kernel trick
- 7.4.2 Commonly used kernel functions
- 7.4.2.1 Linear kernel
- 7.4.2.2 Polynomial kernel
- 7.4.2.3 Gaussian kernel
- 7.4.2.4 Sigmoid kernel
- 7.5 Solving support vector machine models
- 7.5.1 Sequential minimal optimization algorithm
- 7.5.2 Bivariate quadratic programming
- 7.5.3 Choice of optimization parameters
- 7.6 Example application of SVM model
- 7.6.1 Dataset import
- 7.6.2 Exploratory analysis of the dataset
- 7.6.3 Model training
- 7.6.4 Summary
- 7.7 Chapter conclusion
- 7.8 Chapter exercises
- Chapter 8. Decision tree
- 8.1 Decision tree model
- 8.1.1 The structure of decision tree
- 8.1.2 Feature selection
- 8.1.2.1 Information gain
- 8.1.2.2 Information gain ratio
- 8.1.2.3 Gini index
- 8.2 Decision tree generation
- 8.2.1 ID3 algorithm based on information gain
- 8.2.1.1 ID3 tree construction rules
- 8.2.1.2 ID3 algorithm procedure
- 8.2.2 C4.5 algorithm based on information gain ratio
- 8.2.2.1 C4.5 tree construction rules
- 8.2.2.2 Discretization of continuous variables
- 8.2.3 Classification and regression trees model
- 8.2.3.1 Branching rules for CART classification trees
- 8.2.3.2 CART classification trees algorithm procedure
- 8.2.3.3 CART regression tree
- 8.2.4 Decision boundary
- 8.3 Decision tree pruning
- 8.3.1 The introduction of pruning
- 8.3.2 Pessimistic error pruning
- 8.3.3 Complexity pruning method
- 8.4 The application examples of decision tree
- 8.4.1 Introduction to the dataset
- 8.4.2 Modeling preprocessing
- 8.4.3 Model training
- 8.4.4 Classification accuracy and visualization
- 8.4.5 Conclusion
- 8.5 Conclusion
- 8.6 Exercises
- Chapter 9. Clustering analysis
- 9.1 General modeling process of cluster analysis
- 9.1.1 Modeling principles
- 9.1.2 Performance metrics of cluster analysis algorithms
- 9.1.3 Similarity calculation
- 9.2 K-means clustering algorithms
- 9.2.1 Algorithm process
- 9.2.2 Objective function
- 9.2.3 Algorithm variant: K-medoids
- 9.2.4 Example application of the K-means algorithm
- 9.2.4.1 Dataset introduction
- 9.2.4.2 Model training
- 9.2.4.3 Cluster results analysis and visualization
- 9.3 Gaussian mixture clustering
- 9.3.1 Basic concepts
- 9.3.1.1 Derivation of key parameters
- 9.3.2 EM algorithm procedure
- 9.3.3 Example application of Gaussian mixture clustering algorithm
- 9.4 Hierarchical clustering algorithm
- 9.4.1 Basic idea of the algorithm
- 9.4.2 Algorithm procedure
- 9.4.3 An example application of hierarchical clustering algorithm
- 9.5 DBSCAN clustering method based on density
- 9.5.1 Basic concepts
- 9.5.2 Algorithm procedure
- 9.5.3 Algorithm variant: ST-DBSCAN
- 9.5.4 Example 9.4: Using the DBSCAN algorithm to determine grid congestion levels
- 9.6 Chapter conclusion
- 9.7 Chapter exercises
- Chapter 10. Ensemble learning
- 10.1 Classification of ensemble learning
- 10.2 Boosting
- 10.2.1 Introduction to boosting
- 10.2.2 AdaBoost
- 10.2.3 Boosting trees
- 10.2.4 Gradient boosting trees
- 10.3 Bagging
- 10.3.1 Introduction to bagging
- 10.3.2 Random forest
- 10.4 Diversity of individual learners
- 10.5 Example applications of ensemble learning models
- 10.5.1 Introduction to the data set
- 10.5.2 Model preprocessing
- 10.5.3 Model training
- 10.5.4 Model accuracy comparison and visualization
- 10.6 Chapter conclusion
- 10.7 Chapter exercises
- Chapter 11. Artificial neural networks
- 11.1 Basic structure of neural networks
- 11.1.1 Basic concepts of the neuron model
- 11.1.2 Generalized representation of the neuron model
- 11.1.3 Multilayer neural network models
- 11.1.4 Neural network fitting ability
- 11.2 Activation functions and forward propagation
- 11.2.1 Activation function
- 11.2.1.1 Sigmoid function
- 11.2.1.2 Tanh function
- 11.2.1.3 ReLU function
- 11.2.1.4 Softmax function
- 11.2.2 Forward propagation
- 11.2.3 Forward propagation example
- 11.3 Backpropagation
- 11.4 Solutions to common problems in neural networks
- 11.4.1 Gradient vanishing and gradient explosion
- 11.4.2 Normalization and standardization of data
- 11.4.3 Overfitting
- 11.4.3.1 Early-stopping
- 11.4.3.2 Dropout
- 11.5 Application of neural networks to transportation problems
- 11.5.1 Introduction to the dataset
- 11.5.2 Read data and generate features
- 11.5.3 Data set partitioning and normalization
- 11.5.4 Building neural networks
- 11.5.5 Model training and validation
- 11.5.6 Summary and reflections
- 11.6 Chapter conclusion
- 11.7 Chapter exercises
- Chapter 12. Deep learning
- 12.1 Convolutional neural network
- 12.1.1 Convolution operation
- 12.1.2 Convolutional layer
- 12.1.3 Downsampling layer
- 12.1.4 Architectural design
- 12.1.4.1 Sequential design of convolution–pooling
- 12.1.4.2 Multiconvolutional kernel design
- 12.1.4.3 Skip/shortcut connection
- 12.1.5 Application to ride-hailing traffic prediction
- 12.1.5.1 Application scenario analysis
- 12.1.5.2 Data preprocessing
- 12.1.5.3 Model building
- 12.1.5.4 Model training and performance evaluation
- 12.2 Recurrent neural network
- 12.2.1 Basic structure
- 12.2.2 Backpropagation
- 12.2.3 Gates
- 12.2.4 Application to ride-hailing traffic prediction
- 12.2.4.1 Modeling preprocessing
- 12.2.4.2 Model establishment
- 12.2.4.3 Model training and performance evaluation
- 12.3 Graph neural networks
- 12.3.1 Graph convolution
- 12.3.2 Attention mechanism
- 12.3.3 Gating mechanism
- 12.3.4 Residual connections
- 12.4 Chapter conclusion
- 12.5 Chapter exercises
- Notation
- Index
- Edition: 1
- Published: November 29, 2024
- Imprint: Elsevier
- No. of pages: 454
- Language: English
- Paperback ISBN: 9780443338915
- eBook ISBN: 9780443338922
ZL
Zhiyuan Liu
Dr. Zhiyuan (Terry) Liu is a Professor at the School of Transportation at Southeast University, China. He obtained his PhD degree from the National University of Singapore, Singapore. His research interests lie in the intersection and integration of transportation system analysis, big data analytics, and machine learning methods. He has published more than 100 papers in these areas.
Affiliations and expertise
Professor, Southeast University, ChinaZG
Ziyuan Gu
Dr. Ziyuan (Frank) Gu is an Associate Professor at the School of Transportation at Southeast University, China. He obtained his PhD degree from the University of New South Wales Sydney, Australia. His research interests include data-driven transportation system analysis and machine learning-assisted traffic simulation and optimization. He has published over 40 papers in these areas.
Affiliations and expertise
School of Transportation, Southeast University, ChinaPL
Pan Liu
Dr. Pan Liu is a Professor at the School of Transportation at Southeast University, China. He obtained his PhD degree from the University of South Florida, Tampa, USA. He has authored or co-authored over 100 papers in prestigious transportation journals. His research interests include transportation big data analysis, traffic operations and safety, and intelligent transportation systems
Affiliations and expertise
Southeast UniversityRead Transportation Big Data on ScienceDirect