The post Candy and Crime first appeared on Fifteen Eighty Four | Cambridge University Press.

]]>The article describes a longitudinal study of 17,415 people. The researchers noted whether the participants self-reported that had been convicted of committing a violent crime by the time they were 34 years old and whether or not they had eaten candy daily when they were 10 years old. They also collected data when the participants were 5 years old to classify their early development and their parents’ style of parenting. Overall, 69% of respondents who were violent by the age of 34 years reported that they ate candy daily during childhood. In addition, candy was eaten daily by 42% of those who were non-violent. It is important to note that only 81, less than 0.5%, of the children in this study became violent offenders by the time they were 34 years old. The newspaper summaries of the study indicated that candy consumption in children caused them to grow up to be violent criminals.

We use this article to make the following points that are related to the chapter.

- When dichotomous variables are involved in a correlation, the coding determines the sign of that correlation.

Based on the percentages, we know that 69% of respondents who were violent by the age of 34 years reported that they ate candy daily during childhood. In addition, candy was eaten daily by 42% of those who were non-violent. If we had created two variables based on these percentages, VIOLENT (coded with 1 = had committed a violent crime by age 34 and 2 = had not committed a violent crime by age 34) and CANDY (coded with 1 = had eaten candy daily at age 10 and 2 = had not eaten candy daily at age 10), what would the sign of the correlation have been between the variables? Because people who had been violent had also tended to eat candy daily and those who had not been violent had not tended to eat candy, in this case, high scores on one variable tend to correspond to high scores on the other variable and low with low, and we would expect a positive correlation. In fact, if we calculate the Pearson correlation on the percentages, we have *r *= .26. If the coding had been reversed for one of the variables, we would expect the sign, but not the magnitude of the correlation to change.

- When creating a clustered bar graph of the results based on the given percentages, the choice of axes is important.

Because we know that 69% of respondents who were violent by the age of 34 years reported that they ate candy daily, the horizontal axis needs to indicate whether or not the person ate candy, because that is the denominator of the given percentage. The graph itself and the related Stata commands are given below.

**label define violent 1 “violent” 2 “non-violent”

**label values violent violent

graph bar [fweight = freq], over(chocolate) asyvars percentage over (violent) blabel(total) ytitle(percent violent) ///

legend(label(1 “Did Not Eat Chocolate”) label(2 “Ate Chocolate”))

- The results are different when based on the percentages versus the frequencies.

Recall that the correlation between the two variables when based on the percentages was *r* = .26. The authors indicated that only 81, less than 0.5%, of the children in this study became violent offenders by the time they were 34 years old. That means that out of the 81 people who were violent offenders, about 56 (69%) ate candy daily and 25 did not. Out of the 17334 people who were not violent offenders, about 7280 (42%) ate candy daily and 10,054 did not. The correlation between the two variables when based on the frequencies is *r* = .04. When taking into account the scarcity of violent offenders, we see that there is little or no relationship between the two variables.

- Causal conclusions have been made from observational data and this is problematic.

Although *The Mirror* probably sold more newspapers with the headline “Lots of sweets makes kids thuggish adults,” the study results do not support the claim that changing a person’s candy consumption at age 10 would change his or her violent behavior at age 34. For example, if the parents are extremely permissive, that could be associated with a lot of candy consumption and also result in adults with less self-control who are more likely to commit violent crimes. In that scenario, changing just one aspect of the permissive parenting by controlling candy consumption, probably will have little change in the likelihood of committing violent crimes. When we ask our students to discuss, *based on the results of this study*, whether they would advise their parents to limit the candy consumption of their younger siblings, many who answer this question have a hard time limiting themselves to the results of the study. They talk about how candy causes cavities and is not healthy and should be avoided.

The post Candy and Crime first appeared on Fifteen Eighty Four | Cambridge University Press.

]]>The post Statistics Using Stata: An Integrative Approach first appeared on Fifteen Eighty Four | Cambridge University Press.

]]>To be true to statistical practice, we knew the book had to convey the important idea that more than one method of analysis typically is needed to address a research question, and that a characterization of results, in both tabular and graphical formats, is critical to any data analytic plan. In order to leave students with the notion that statistics is an integrated and cohesive set of tools used together to uncover the story contained in the numbers, we knew we had to highlight the interrelatedness of methods wherever possible and to use a series of real data sets to motivate discussions of how one selects the appropriate methods of analysis in any one situation.

We knew that to be easily accessed, the data sets that would be used throughout the text would need to be on the textbook’s website. Two of the real data sets that are used repeatedly in both worked-out examples and end-of-chapter exercises are large. One data set contains 48 variables and 500 cases from the education discipline; the other contains 49 variables and nearly 4,500 cases from the health discipline. By posing interesting questions about variables in these large, real data sets (e.g., Is there a gender difference in eighth graders’ expected incomes at age 30?), we knew we would be able to employ a more meaningful and contextual approach to the introduction of statistical methods and to engage students more actively in the learning process. We also believed that the repeated use of these data sets would help to create a more cohesive presentation of statistics; one that links different methods of analysis to each other and avoids the perception that statistics is an often-confusing array of so many separate and distinct methods of analysis, with no bearing or relationship to one another.

*Statistics Using Stata *combines the teaching of statistical concepts with the acquisition of the popular Stata software package. It closely aligns Stata commands with numerous examples based on the real data sets, enabling students to develop a deep understanding of statistics in a way that reflects statistical practice. Capitalizing on the fact that Stata has both a menu-driven “point and click” and a program syntax interface, the text guides students effectively from the comfortable “point and click” environment to the beginnings of statistical programming. As such, it provides an easy transition to programming in other languages, such as R. To further enhance learning, online resources are provided on the text’s website, including complete solutions to exercises, PowerPoint slides, and Stata syntax (Do-files) for each chapter. These Do-files provide the Stata code used by the authors to generate all figures and worked-out examples for each chapter. They provide an opportunity for students to review and adapt the code on their own to solve new problems, reinforcing their programming skills.

In addition to data transformations, diagnostic tools for the analysis of model fit, the logic of null hypothesis testing, assessing the magnitude of effects, interaction and its interpretation in two-way analysis of variance and multiple regression, and non-parametric statistics, we knew that to give instructors flexibility in curriculum planning and provide students with more advanced material to prepare them for future work, the book needed to provide an even more comprehensive coverage of essential topics not covered by other introductory statistics textbooks. These additional topics include robust methods of estimation based on resampling using the bootstrap, regression to the mean, the weighted mean, Simpson’s Paradox, counterfactuals and other topics in research design, and data workflow management using the Stata Do-file.

The book, consisting of 17 chapters, is intended for use in a one- or two-semester introductory applied statistics course for the behavioral, social, or health sciences at either the graduate or undergraduate level. It also can be used as a reference text. It emphasizes a conceptual understanding through an exploration of both the mathematical principles underlying statistical methods and real world applications, and is not intended for readers who wish to acquire a more theoretical understanding of mathematical statistics. To offer another perspective, the book may be described as one that begins with modern approaches to Exploratory Data Analysis (EDA) and descriptive statistics, and then covers material similar to what is found in an introductory mathematical statistics text, such as for undergraduates in math and the physical sciences, but stripped of calculus and linear algebra. Instead, it is grounded in data examples. Thus, theoretical probability distributions, The Law of Large Numbers, sampling distributions and The Central Limit Theorem are all covered, but in the context of solving practical and interesting problems.

We believe that *Statistics Using Stata *captures what we set out to do and hope that its readers believe the same!

The post Statistics Using Stata: An Integrative Approach first appeared on Fifteen Eighty Four | Cambridge University Press.

]]>