Friday, September 13, 2013

Using Contingency Tables and Segmented Bar Graphs to Determine Association

For my Inferential Probability and Statistics class, I am beginning to bounce back and forth between gathering data and organizing data. Students used sampling methods to collect information on cars in our school parking lot. The data are primarily categorical in nature. From preliminary discussions I saw that the classes were solid in creating pie graphs and bar graphs for their data. To help push their thinking about how to analyze data, I focused on making use of contingency tables and segmented bar graphs.

It is easy to get students started on contingency tables. I created a 3 x 3 table on the board, labeling the vertical boxes on the left as Male, Female, and Total (see table below). Across the top boxes I used the labels Jeans, No Jeans, and Total. I then took a quick poll of boys and girls as to whether they were wearing denim jeans or not. Voila, a contingency table.

Next we worked through calculating percentages of total, row percentages, and column percentages. I like to use different color markers for this so it is easier to reference percentage types. From here, it is an easy matter to begin discussing marginal and conditional distributions, to compare these distributions, and to use these to have students begin to think about variables being dependent or independent.

It is surprising how difficult it is for students to get independence and dependence straight in their minds. They are so used to thinking of independent variables and dependent variables from a mathematical perspective that it is difficult for them to shift gears. Rather than get into a formal look at independence and dependence (such as with probabilities), I simply ask them to consider if the marginal and conditional distributions are tracking along similarly. For example, if the class is 60% male, then it should be reasonable to assume that we would see 60% of jean wearers being male. If the marginal and conditional distributions are "close" then the two variables are independent.

On the other hand, if we see marked differences between the marginal and conditional distribution then the two variables are dependent. Knowing that an item has a certain characteristic, such as being female, alters my perception of how likely that person will wear jeans.

I conducted another quick poll for hair color and eye color. I like to keep it simple, so I just broke these into light and dark categories. These two variables are very much dependent, so the marginal and conditional distributions will show differences that everyone can readily see.

At  this point, I have lots of data that students can work with. We used the class survey data to gauge association between political leaning and gender. I have a worksheet that looks at highest level of education completed and whether or not the person is a smoker. This practice allows students to create contingency tables and create segmented bar graphs from the data. The discussions of what is created enables students to see many examples and allows me to point out strengths to emulate and weaknesses to avoid.

With all of this practice under their belts, they can now turn to working with the car data that they collected. I have them formulate a question about association and then create a contingency table and segmented bar graph to assess the association. It is easy for students to lose sight of why they are doing this work. As the grind through the data, double checking their counts and calculating percentages, it is easy to forget why you are putting all this effort into working with this data. I reminded the class numerous times to not lose sight of the question they were addressing. We are not creating tables and graphs simply to make the data look nice, we are doing it to understand relationships that may or may not exist in the data.

This work took up nearly 90 minutes of time. For the next class we looked at what students created. The class presentations are helpful because it allows me to focus in on things that are done well and points out issues that need attention. It also enables students to see and hear how to communicate their results. Finally, it provides a forum to view data from multiple lenses, hopefully broadening students' perspective on how to analyze data.

The presentations went well. Below is one example of the results presented in class. The group decided there appeared to be an association and the class concurred.


I was pleased with the results presented and the classes indicated they felt comfortable working with contingency tables.

No comments:

Post a Comment