Subject: summary of responses (Re: terminology for variables in regression) |

From: Tim Hesterberg |

Date: Mon, 02 Dec 2002 13:18:52 -0800 |

To: isostat@oberlin.edu |

(oops; delete previous mail, this version adds one more reply) Thank you all for your responses to my query on terminology for variables in regression. Below is a summary of the responses; overwhelmingly in favor of response instead of dependent predictor/explanatory instead of independent Two other terms were mentioned as alternatives to response: outcome (2 people), criterion (2 people) Tim Hesterberg From: David Moore <dsmoore@stat.purdue.edu> As you note, all my books use ``explanatory variable'' and ``response variable.'' The comment in IPS (similar in the others) says You will often see explanatory variables called {\bf independent variables} and response variables called {\bf dependent variables}. The idea behind this language is that response variables depend on explanatory variables. Because the words ``independent'' and ``dependent'' have other meanings in statistics that are unrelated to the explanatory-response distinction, we prefer to avoid those words. My books are the best-sellers above the two-year-college text level. Some of the better new books are written by people who have taught from my texts and follow the same terminology: Jessica Utts (Seeing Through Statistics, Mind On Statistics) and Wild and Seber in their new and good text (Chance Encounters: A First Course in Data Analysis and Inference). The Samuels/Witmer ``Statistics for the Life Sciences'' (my favorite biostat book) also uses explanatory/response. And for what it's worth, the index entry for ``Dependent variable'' in Chambers/Hastie Statistical Models in S says ``See response.'' Both sets of terms are widely used and recognized. Although change comes slowly, I think the clearer terms are spreading in texts, especially texts that are ``modern'' in flavor and so ought to appeal to Splus folk. From: "Allan Rossman" <arossman@calpoly.edu> I prefer the terminology that you suggest. That's what Workshop Statistics uses. From: Cyndy Long <LONG_C@palmer.edu> I often use "outcome" variable (typically teaching health professionals), sometimes "response" variable. I prefer "explanatory" variable to "predictor" variable. And, I certainly advocate for the elimination of the terms "dependent" and "independent" in this context. From: Johanna Hardin <Jo.Hardin@pomona.edu> I totally agree. I use explanatory and response instead of dependent / independent. The book "The Statistical Sleuth" uses the newer terms as do Moore's books and Jessica Utts' books. The terms are much cleaner and the students understand them better. From: "Steve C. Wang" <scwang@swarthmore.edu>

I'm fighting a battle here at Insightful (S-PLUS) to avoid the terminology dependent variable independent variables for variables in a regression, because "dependent" and "independent" have other meanings in statistics. Note that "independent" variables need not be independent (and almost never are). To me this terminology is needlessly confusing, particularly for students.

I dislike this terminology, for exactly the reasons you cite.

I'm pushing for response variable explanatory variables or predictor variables

What terminology do you use with your students?

I usually use "response variable" and "predictor". Or just Y and X.

What terminology does the books you use prefer (what books)?

I use Moore and McCabe, so response/explanatory. From: Brian Jersky <jersky@SONOMA.EDU> Most biostats books (eg Pagano and Gavreau) I've seen use predictor response, as do I. From: "Katherine Halvorsen" <Khalvors@email.smith.edu> I'm using Moore and McCabe and response/predictor terminology. From: "Douglas M. Andrews" <dandrews@wittenberg.edu>

What terminology do you use with your students?

Response/explanatory.

What terminology does the books you use prefer (what books)?

I use Moore's "Basic Practice of Statistics" and Rossman's "Workshop Statistics" -- two of the stat ed standards -- and they both use response/explanatory. P.S. Another reason to eschew the dep/indep lingo: Referring to Y as the "dependent" variable suggests that Y depends on X, which for many people implies a causal association, which of course need not be the case, even if there's a strong association. From: Bharath <rbharath@colby.edu> I use response or predictand for Y and predictor(s) or control variables for X's. The book I used last semester, Kachigan: Statistical Analysis uses the terminology of predictor variables and very explicitly discusses the confusion caused by talking of " independent variables" and uses the latter term only to refer to variables which do not covary. For Y, Kachigan uses the term "criterion variable". From: "Christopher J. Lacke" <lacke@rowan.edu> The following use "explanatory" and "response" Ramsey and Schafer Samuels and Witmer Schork and Remington Weiss Peck, Olsen, and Devore I also took a cursory glance at some regression books, where I saw both terminologies used. From: "Lachenbruch, Peter" <lachenbruch@cber.FDA.gov> I prefer response and predictor. For many years, I would get confused about independent and dependent variables (maybe that's the wrong confession to make...) From: Karla Ballman <Ballman.Karla@mayo.edu> Although I am not teaching undergraduates any more, I am still teaching stats courses through the Mayo Graduate school. I agree with you about using the terms independent and dependent variables. Both in my teaching and in my other work with clinicians, we never use these terms (these just are not meaningful to the MDs). We use the terms response and outcome for Y and explanatory and predictor for X. To be honest, I prefer the terms outcome (Y) and explanatory (X). Actually, for the Y variable, I really am indifferent between the terms response and outcome. I tend to favor outcome since in our setting, that is generally what the Y variable is measuring. For the X variable I tend to avoid the use of predictor. The reason is that many of our studies are observational and as such, we can only hope to establish association. However, the MDs always try to push the interpretations of the results more towards causation. For some reason, the use of predictor to describe the X variables makes them want to say that knowing X you can precict Y and make the immediate leap to causation. They tend not to do this as much when I refer to the Xs as explanatory. From: Albyn Jones <jones@reed.edu> I use "response" and "explanatory", and explicitly discourage the use of "independent" and "dependent" for exactly the reason you give. a quickie psudo random sample of texts within reach yields: Weisberg "Applied Linear Regression" 2nd ed uses "response" and "predictor" Hastie and Tibshirani (GAMs) use "response" and "predictor" Ramsey & Schafer ("the statistical sleuth") use "response" and "explanatory" McCullagh & Nelder (1st ed) often use "covariate" for `X'. Venables and Ripley (MASS) tend to use "response" and "explanatory" but not exclusively. I found a passage where the text reads "the response (dependent variable) is..." From: jeff witmer <jeff.witmer@oberlin.edu> The terminology I prefer is exactly what you are pushing for:

response variable explanatory variables or predictor variables

This is what I use with my students and what appears in the book I use (but since I wrote the book, that should only count as one vote, not two!) From: "Annette Gourgey" <statsense@rcn.com> I come from educational measurement and we always used predictor and criterion, to emphasize that there isn't necessarily causation. I learned from Pedhazur's Multiple Regression in Behavioral Research. I use these terms in my business stats classes even though our text (Statistics for Managers by Levine et al.) uses independent and dependent, and explain why I use them.