Statistics Using Stata: An Integrative Approach

8
Nov
2016

Sharon Lawner Weinberg

With many decades of combined teaching experience to both undergraduate students at a liberal arts university and to graduate students at a large research university from a variety of disciplines including education, psychology, health, and policy analysis, we have direct knowledge of how varied the mathematical backgrounds are of students who enroll in an introductory course in statistics. It is clear to us that for students to become engaged in learning statistics, an introductory book on the subject needs to be informal in its presentation, yet at the same time accessible and rigorous, so that students are given a solid foundation upon which to build a more advanced expertise in statistics if they so desire. It also needs to be constructed in a way that reflects statistical practice, with ample worked out examples embedded in each chapter along with output from the statistical software package used for generating the answers.

To be true to statistical practice, we knew the book had to convey the important idea that more than one method of analysis typically is needed to address a research question, and that a characterization of results, in both tabular and graphical formats, is critical to any data analytic plan. In order to leave students with the notion that statistics is an integrated and cohesive set of tools used together to uncover the story contained in the numbers, we knew we had to highlight the interrelatedness of methods wherever possible and to use a series of real data sets to motivate discussions of how one selects the appropriate methods of analysis in any one situation.

We knew that to be easily accessed, the data sets that would be used throughout the text would need to be on the textbook’s website. Two of the real data sets that are used repeatedly in both worked-out examples and end-of-chapter exercises are large. One data set contains 48 variables and 500 cases from the education discipline; the other contains 49 variables and nearly 4,500 cases from the health discipline. By posing interesting questions about variables in these large, real data sets (e.g., Is there a gender difference in eighth graders’ expected incomes at age 30?), we knew we would be able to employ a more meaningful and contextual approach to the introduction of statistical methods and to engage students more actively in the learning process. We also believed that the repeated use of these data sets would help to create a more cohesive presentation of statistics; one that links different methods of analysis to each other and avoids the perception that statistics is an often-confusing array of so many separate and distinct methods of analysis, with no bearing or relationship to one another.

Statistics Using Stata combines the teaching of statistical concepts with the acquisition of the popular Stata software package. It closely aligns Stata commands with numerous examples based on the real data sets, enabling students to develop a deep understanding of statistics in a way that reflects statistical practice. Capitalizing on the fact that Stata has both a menu-driven “point and click” and a program syntax interface, the text guides students effectively from the comfortable “point and click” environment to the beginnings of statistical programming. As such, it provides an easy transition to programming in other languages, such as R. To further enhance learning, online resources are provided on the text’s website, including complete solutions to exercises, PowerPoint slides, and Stata syntax (Do-files) for each chapter. These Do-files provide the Stata code used by the authors to generate all figures and worked-out examples for each chapter. They provide an opportunity for students to review and adapt the code on their own to solve new problems, reinforcing their programming skills.

In addition to data transformations, diagnostic tools for the analysis of model fit, the logic of null hypothesis testing, assessing the magnitude of effects, interaction and its interpretation in two-way analysis of variance and multiple regression, and non-parametric statistics, we knew that to give instructors flexibility in curriculum planning and provide students with more advanced material to prepare them for future work, the book needed to provide an even more comprehensive coverage of essential topics not covered by other introductory statistics textbooks. These additional topics include robust methods of estimation based on resampling using the bootstrap, regression to the mean, the weighted mean, Simpson’s Paradox, counterfactuals and other topics in research design, and data workflow management using the Stata Do-file.

The book, consisting of 17 chapters, is intended for use in a one- or two-semester introductory applied statistics course for the behavioral, social, or health sciences at either the graduate or undergraduate level. It also can be used as a reference text. It emphasizes a conceptual understanding through an exploration of both the mathematical principles underlying statistical methods and real world applications, and is not intended for readers who wish to acquire a more theoretical understanding of mathematical statistics. To offer another perspective, the book may be described as one that begins with modern approaches to Exploratory Data Analysis (EDA) and descriptive statistics, and then covers material similar to what is found in an introductory mathematical statistics text, such as for undergraduates in math and the physical sciences, but stripped of calculus and linear algebra. Instead, it is grounded in data examples. Thus, theoretical probability distributions, The Law of Large Numbers, sampling distributions and The Central Limit Theorem are all covered, but in the context of solving practical and interesting problems.

We believe that Statistics Using Stata captures what we set out to do and hope that its readers believe the same!