coverpage
Mastering Data Analysis with R
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files eBooks discount offers and more
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Chapter 1. Hello Data!
Loading text files of a reasonable size
Benchmarking text file parsers
Loading a subset of text files
Loading data from databases
Importing data from other statistical systems
Loading Excel spreadsheets
Summary
Chapter 2. Getting Data from the Web
Loading datasets from the Internet
Other popular online data formats
Reading data from HTML tables
Scraping data from other online sources
R packages to interact with data source APIs
Summary
Chapter 3. Filtering and Summarizing Data
Drop needless data
Aggregation
Running benchmarks
Summary functions
Summary
Chapter 4. Restructuring Data
Transposing matrices
Filtering data by string matching
Rearranging data
dplyr versus data.table
Computing new variables
Merging datasets
Reshaping data in a flexible way
The evolution of the reshape packages
Summary
Chapter 5. Building Models (authored by Renata Nemeth and Gergely Toth)
The motivation behind multivariate models
Linear regression with continuous predictors
Model assumptions
How well does the line fit in the data?
Discrete predictors
Summary
Chapter 6. Beyond the Linear Trend Line (authored by Renata Nemeth and Gergely Toth)
The modeling workflow
Logistic regression
Models for count data
Summary
Chapter 7. Unstructured Data
Importing the corpus
Cleaning the corpus
Visualizing the most frequent words in the corpus
Further cleanup
Analyzing the associations among terms
Some other metrics
The segmentation of documents
Summary
Chapter 8. Polishing Data
The types and origins of missing data
Identifying missing data
By-passing missing values
Getting rid of missing data
Filtering missing data before or during the actual analysis
Data imputation
Extreme values and outliers
Using robust methods
Summary
Chapter 9. From Big to Small Data
Adequacy tests
Principal Component Analysis
Factor analysis
Principal Component Analysis versus Factor Analysis
Multidimensional Scaling
Summary
Chapter 10. Classification and Clustering
Cluster analysis
Latent class models
Discriminant analysis
Logistic regression
Machine learning algorithms
Summary
Chapter 11. Social Network Analysis of the R Ecosystem
Loading network data
Centrality measures of networks
Visualizing network data
Further network analysis resources
Summary
Chapter 12. Analyzing Time-series
Creating time-series objects
Visualizing time-series
Seasonal decomposition
Holt-Winters filtering
Autoregressive Integrated Moving Average models
Outlier detection
More complex time-series objects
Advanced time-series analysis
Summary
Chapter 13. Data Around Us
Geocoding
Visualizing point data in space
Finding polygon overlays of point data
Plotting thematic maps
Rendering polygons around points
Satellite maps
Interactive maps
Alternative map designs
Spatial statistics
Summary
Chapter 14. Analyzing the R Community
R Foundation members
R package maintainers
The R-help mailing list
Analyzing overlaps between our lists of R users
The number of R users in social media
R-related posts in social media
Summary
Appendix A. References
General good readings on R
Chapter 1 – Hello Data!
Chapter 2 – Getting Data from the Web
Chapter 3 – Filtering and Summarizing Data
Chapter 4 – Restructuring Data
Chapter 5 – Building Models (authored by Renata Nemeth and Gergely Toth)
Chapter 6 – Beyond the Linear Trend Line (authored by Renata Nemeth and Gergely Toth)
Chapter 7 – Unstructured Data
Chapter 8 – Polishing Data
Chapter 9 – From Big to Smaller Data
Chapter 10 – Classification and Clustering
Chapter 11 – Social Network Analysis of the R Ecosystem
Chapter 12 – Analyzing Time-series
Chapter 13 – Data Around Us
Chapter 14 – Analysing the R Community
Index
更新时间:2021-07-09 21:59:19