ISBN: 978-1-108-96478-4

Jonas Vestby


Read more about this book at

Organizing and processing data is a fundamental skill for social scientists. However, it is often a topic glossed over in introductory classes, which favor topics related to data collection or data analysis. Proper data management is a superpower that facilitates data analysis, makes data collection much easier and less error prone, and contributes to the whole scientific community by creating data structures that are easy to work with. This book provides a good and thoughtful introduction for students and social scientists who want to learn how to work more systematically with data processing. All examples in the book are in R (and SQL calls from R) using the RStudio IDE. The main benefits of the book are that it provides a starting point and examples that are familiar to social scientists and explains ideas in an accessible manner. The book is not a comprehensive learning program for data management, however. It is particularly focused on how to structure and store tabular information (to optimize for data integrity), as opposed to how to efficiently organize data for statistical analysis. For the latter, I would have expected discussions on array-based formats (tif, netCDF/HDF), column-wise formats such as Parquet, and in-memory versus disk-storage solutions. It only introduces basic SQL building blocks, not high-level abstractions, or ways to plan relational databases. On the other hand, it does introduce vector-based spatial data, network graphs, and ways to work with text corpora in separate chapters. A reader would quickly want to learn more about many of the topics introduced in the book. Here, the book could have benefited from giving pointers to further reading.