Correlation Is Not Causation

Yes, we have all (likely) heard this … possibly so much that we forget to consider it and its implications. But there’s more.

I must credit my professor in my graduate school, Introductory Statistics course, for this perfect and simple example to get us started in improving our critical thinking:

  1. Stan owns an ice cream store at the beach.
  2. Stan remembered from recent history that there have been shark attacks at this beach.
  3. Stan had decided to track data regarding shark attacks and his ice cream sales, for marketing purposes.
  4. At the end of Stan’s current fiscal year, Stan examines the data he collected over the years and finds that during the weeks when there were shark attacks, he had spikes in ice cream sales.
  5. Stan concludes that shark attacks are causing increased ice cream sales for him.

Does this seem like reasonable thinking?

This example does help us see that there is a correlation between ice cream sales and shark attacks. That is a reasonable conclusion. But there might be an unsettling feeling within you, telling you that “shark attacks” shouldn’t be causing spikes in ice cream sales. But maybe … since people like looking at proverbial train wrecks, including shark attacks, shark attacks are causing spikes in ice cream sales. If we think more about this, we might realize that authorities don’t allow people in the water when there is a shark in the water, especially after an attack. Might this, perhaps, be driving people to Stan’s ice cream store? Those might be some of the thoughts going through your head.

Reasonable thinking would suggest a few things. First, the number of the people at the beach when there is a shark attack would be the same on that day, whether there was a shark. So, the number of people that would have gone to the beach on that day wouldn’t change. Perhaps if the authorities do make people get and stay out of the water, the people would need something to do and might go get ice cream. This also seems like it might be reasonable, but is it good, critical thinking?

Let’s look further.

Perhaps there would be more people out of the water, well, all of them … but is that an important factor? The number of people at the beach is going to be the same, or will it? It might cause people to go home or to another recreational area or beach, reducing the number of people available to go to Stan’s ice cream store. Either way, isn’t it possible or even probable that there will be the same number of people to be going to Stan’s ice cream store, that day, regardless of a shark attack?

Here is the sticky situation that, if we involve as part of our critical thinking skills, we improve our critical understanding of the situation

What variables are Stan using to understand his annual ice cream sales? Stan is using the 2 variables of ice cream sales volume and total number of shark attacks for each week. Assuming Stan is using correct data analysis skills to reach his conclusion, his conclusion appears accurate. But is his assessment of the situation valid?

No.

What Stan is missing is frequently what is also missing from a substantial number, if not a great majority of media from news sources, bloggers, and what people are posting on social media. In Stan’s situation, he is not using enough variables, and more specifically, enough of the relevant variables.

While this seems to be better reasoning in Stan’s situation, the information we are likely to receive and (re-)post in social media frequently does not use this level of thinking. This level of critical thinking can be hard, convoluted, hidden, or even intentionally disguised by carefully selected language and grammar … and appeals to your emotion.

SUBSTANTIVE KNOWLEDGE

In research, we consider a concept called “substantive knowledge.” Simply, this just means that the researcher has previous experience and knowledge regarding a subject matter that influences this researcher’s decision to include specific variables to help understand the situation the researcher is considering. The researcher will include these variables throughout the observation and collection of data. After that data collection phase is over, the researcher begins the analysis.

If the situation is simple (which is not likely, but a researcher might consider it to be simple, anyway), then the data analysis will consider the variables as individual contributions and (hopefully) as combinations to be contributors. (Yes, I am greatly oversimplifying this process because this process is not important.) In this data analysis process (if done correctly), some variables and variable combinations will be found not likely to be caused by randomness (be found to be “statistically significant”), and the other variables and variable combinations will “drop out” (essentially meaning that they are highly likely NOT to be included as contributing to the cause of the situation).

Back to Stan’s ice cream sales situation

If, from Stan’s experience of operating his ice cream eatery over the years, he has been noticing some connection between his ice cream sales and the shark attacks, but he just wasn’t sure. His experience would be considered “substantive knowledge” to help him determine what variables to consider. In other words, he is not choosing a variable, such as the fluctuating rate of wheat futures, that is outside his area of specialized experience. [This is actually an important point, and I will include it, further below.] Stan works at a beach and Stan is only including shark attacks that happen within a reasonable range of his location.

Is Stan considering all the important variables?

If Stan spent some extra time and used good, critical thinking skills, could Stan have thought of other variables that would connect the shark attacks to the ice cream sales? In research, we would use the term “mediate” – meaning to serve as middle variable that necessarily sits between the other variable and the “dependent variable‘ (the “dependent variable” is the variable we are trying to understand). Stan ought to be looking for a variable or variables that mediate (are between) the shark attacks and Stan’s spikes in ice cream sales (Stan’s “ice cream sales” is Stan’s ‘dependent variable’).

For example, while Stan certainly can include the variable of the number of shark attacks per week (or even per day), spending that extra time to think and be critical, Stan could also include variables such as:

  • daily temperature
  • weekends/holidays
  • dates and times of events in the local area (and the number of people that come)
  • average number of people in the water per day
  • daily numbers of people who bought tickets to use the beach
  • typical food availability for sharks (migratory patterns of smaller fish)

This list can continue if Stan really puts effort into understanding his situation.

I never did any actual data collection or data analysis for Stan’s make-believe scenario, but some of those variables do seem like excellent candidates for being mediating variables. Also, some of those other variables just do better at explaining Stan’s ice cream sales spikes than shark attacks do. If Stan would analyze the model (equation) that includes this second set of variables, the contribution of these new variables may cause the variable “number of shark attacks” to drop out.

Applying critical thinking skills and substantive knowledge, we can see that there is a very high potential for daily temperature to be a strong contributor, not just to ice cream sales, but to the number of people at the beach and in the water. The greater the number of people in the water, the more likely sharks will become curious and attracted to potential (slow moving) food (mistaking people as other animals) … and “attack.” Here, we see that the variable of daily temperature mediates the contribution of shark attacks to ice cream sales. Thus, the contributing variable to Stan’s increased ice cream sales spikes … the causal factor … would be the daily temperature. Through critical thinking in Stan’s situation, it is reasonable that shark attacks would have a correlation to the sales of ice cream, but we also see that a more likely causal factor would be the daily temperature.

“Substantive Knowledge” and Critical Thinking

Previously, I mentioned that there is some importance with “substantive knowledge” and critical thinking.

People who write content for news, blogging, status updates/comments in social media, etc. tend to have college degrees. At least it’s a general idea that the more financial investment in the project/entity/organization/etc., the more that the organization chooses to establish and maintain a reporter’s/blogger’s reputability by increased levels of academic achievement.

Did you ever think about the difference between a typical news anchor (“talking head”) and the person explaining to you the weather?

Well, it’s typical for the weather person to have a degree in a specialized field (Meteorology), yet the “talking head” will frequently have a degree in Media Communications. For newspaper articles, the writer likely will have a degree in Journalism, though will usually have a minor specialization (at the undergraduate level) in another topic that is more specialized. But this degree in Media Communications is more so about how to present ideas in palatable ways … so people will have an increased likelihood of paying attention. And undergraduate degrees are more about getting general ideas to the students, while people who advance to graduate degrees gain stronger skills and more critical knowledge in more specific areas of knowledge. People with a degree in Business Marketing are similar to those with a degree in Media Communications in that the intent is to generate people’s interest. Their goal tends not to be to discuss the various sides and positions, including the ones that oppose those that the seller needs to generate income (that results in having the ability to pay the people in the Marketing Division or the “talking heads”).

When it comes to information on the Internet, independent journalists/bloggers usually do not have to meet standards, nor meet knowledge requirements. The more “journalist” the presenter is, the less likely the person will have that “substantive knowledge” that makes the person reputable in that specialized area. People with specialized knowledge tend to be more informed (having that “substantive knowledge”), but don’t tend to put information on the Internet. But the opposite isn’t exactly a good “rule of thumb,” either. Many people “blog” for different reasons, and they will have different experiences to support (or not support) their claims or information. I have seen people write on the Internet, on LinkedIn, etc. making claims about parts of the world related to their professional work, that also happen to be areas of my specialization … frequently within autism. I would read people, professionals, even people with advanced degrees up to Ph. D. making statements that were inaccurate (based on existing research). I would even write in these discussions, citing research papers that showed the reason why they were incorrect. (That never went well, though I would receive nods of support from other readers.)

It is up to you to be a critical thinker and delve into the person’s background, level of academic achievement in the specified field, years of experience … years of training … the quality of the training and the trainer … and even more. All situations are going to be different. Can you find this information? Are they transparent? Are their statements about themselves true? Is their information reliable?

When I was doing my graduate studies at University of Pennsylvania Graduate School of Education, I learned quickly to examine who wrote the article/research study, from which university were they teaching/researching, who were their department colleagues, what was the reputation of that department, where did this author get her/his education pedigree, what was this person’s doctoral dissertation topic … all this information was available to me, at Penn. I capitalized on it. I also considered what sources this author cited as references – I knew or learned which ones were important and needed, and I expected the author to do a sufficient background investigation (the “literature review”). I also looked at the author’s ability to write coherently, simply and effectively. And yes, I always carried a red pen that I would use to “correct” the mistakes when I became too frustrated because there were so many. Usually, I stopped reading those articles. I wasn’t being judgmental. I was being critical. I wanted the best information. Oh, and I also looked at how many other academic papers cited the article of interest.

Some may say that this is excessive. To most people, this would probably be true. Mediocrity and inadequate information seems to be enough, if your interaction with them, indirect through their writings/vlogging, are appealing to you.

 But those daily, 10 … 12 … 18 hours of studying has made me as informed as I happen to be.

From whom do you want to get your information that you want to make your life better?

What did we learn?

  • Critical thinking is not simple, automatic, inherent in being born, or even likely to come from someone with a typical undergraduate degree. It involves a specialized effort, using specialized techniques … and does demand a capacity of sophisticated thinking processes. Specialized people need to know what we are doing … or at least be able to determine who we ought to be asking.
  • Statements of two (or more) variables/conditions that happen to exist simultaneously and not by random chance might have the property of correlation, but that does not mean that there necessarily is a causal relationship.
  • Having “substantive knowledge and experiences” provides a strong relevance to determining whether a variable or set of variables should be included in the investigation.
  • Missing important variables, or considering variables in the wrong sequence can create inaccurate conclusions of data analysis.

Don’t just believe what you read because you found it on the Internet … from a source you “believe” to be reputable. Citing just a single source can be an indicator of inadequacy in reporting … citing many sources can artificially inflate the reader’s perception that the information is reliable.

Be careful how you listen and read.

0 0 votes
Article Rating
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
error

Enjoy this blog? Please spread the word :)

0
Would love your thoughts, please comment.x
()
x