Wednesday, April 24, 2013

IPS - Day 51

After reviewing the work that students produced on their chocolate experiment results and reading through more of the car analysis reports that were turned in, I concluded that students were not getting what statistical analysis was all about. I determined that I needed to refocus attention on why we proceeded in a specific manner with statistical analysis. I also had to break them out of thinking about data analysis as taught in a math class versus a statistics class.

To begin, I looked at the idea of hypotheses and questions of interest. Students hadn't attempted to create hypotheses for the article from last class. I asked students to consider what questions could be posed using the article as a basis.

This led some students to think about the situation and provide questions that could be posed. To encourage others to participate, I asked to hear from someone who had not spoken up yet today. This resulted in a broader group of students who contributed to the discussion.

By and large, the questions students formulated were appropriate, but they all assumed there was a difference in the data. For example, they posed questions about why there was a difference, what caused the difference, or what was the trend in time? All of these are good questions, assuming that a difference actually existed.

I pointed this out. Before we could investigate the questions developed we needed to address the issue of whether or not a difference existed.

The null hypothesis for this becomes

     H0: There is no difference between the percent of private schools and the percent of championships won by private schools.

The alternate hypothesis becomes

     Ha: The percent of championships won by private schools is greater than the percent of private schools.

We discussed that the null hypothesis is a statement that we ultimately hope to prove wrong. I then presented the class with two problems. The problems provided scenarios for which the class needed to develop null and alternative hypotheses.

There were definite struggles as students tried to identify what they wanted to examine and what would be the proposition that they were trying to disprove. I had many conversations around the question of whether the statement used for the null hypothesis should reflect what they wanted to prove (a common tendency) or if that should be the alternative. As groups were winding down their efforts, we shared out thinking as a class.

The first example was actually a two-tail alternative. For the null hypothesis, most groups went for the percent was equal to a specific value. A couple of people wanted to say that it should be greater than or equal to a value. I left these as two options for the null hypothesis and moved on to the alternative hypothesis.

For the alternative hypothesis, some students wanted to use less than, some wanted to use more than, and one wanted to use around. This was a good discussion. I asked students if the problem statement indicated if we were interested in differences in one direction or the other. They concluded that, no, it didn't matter. This meant we could look at differences either greater than or less than the specific value. This is precisely what a two-tail analysis involves, so the appropriate alternative hypothesis is not equal to the value.

For the second problem, students proceeded more confidently. Almost everyone came up with a null hypothesis of equal to a value, while a few looked at greater than or equal to the value. For the alternative, everyone agreed that it should reflect values less than or equal to the null hypothesis value. The problem was a lower, one-tail analysis, so this was correct. I pointed out that traditionally, the null hypothesis used only an equals to value. For a one-tail test such as this, the null hypothesis becomes equivalent to "equals or more".

I then put up the slide that discussed the big idea of statistical inference. I told them that we needed to view data as statisticians and not look at it as taught in math class. In math classes, students are taught to gather their data, calculate means, compare values and be done. I told the class that looking at just the mean is misleading. I pointed out that if my head was placed in a freezer and my butt was placed in a heated oven, then, on average, my temperature is fine. The point is we need to look at the distribution of values that we can expect to see and what random events should look like under our assumptions.

I used the car analysis as a basis and walked students through what we did and why.

  1. We created a hypothesis about the percentage of cars present in the parking lot; this was our null hypothesis.
  2. The default alternative hypothesis would be that the percentages were not the same as we hypothesized.
  3. We ran simulations using our hypothesized values. [I asked students why we did this but they couldn't really articulate the reason.] These values establish what random samples should look like under our null hypothesis.
  4. A histogram of the simulation results show the distribution of what random samples we should expect. The histogram establishes the hypothesized model and can be used to calculate probabilities that specific values or those more extreme would appear.
  5. We then create a sampling plan that will generate a random sample. We calculate a sample statistic from our random sample. The question is now how well does our random sample fit with the model.
  6. The histogram we created from the simulations is used to determine where the sample statistic fits. We can use this to calculate the probability of seeing such an extreme value by looking within our simulation data to see the number of simulation results that equal the sample statistic or are more extreme.
  7. We then draw a conclusion about the null hypothesis; either the sample statistic is consistent with the model and therefore we have no evidence of anything wrong with the null hypothesis or the probability of seeing our sample statistic is so small that we can only conclude that our null hypothesis is not correct.
After going through this and checking for understanding, I told the class that the car analysis report and their chocolate experiment analysis should be following this statistical analysis process and reporting.

We then turned to the chocolate analysis. In this situation, we cannot run simulations since we are dealing with measurement data. There are no random digits that can represent different melting times.

I asked the class what the null hypothesis would be for the chocolate experiment. The response was that the null hypothesis would be that all the chocolates melt at the same rate. The alternative would be that the chocolates melt at different rates.

I asked the class to think about what we could do to get an idea of what random samples should look like. After some discussion in their groups, one student thought we should make use of the null hypothesis. There was some discussion about how this might work. Some groups started to drift off this idea. I told the class I like the idea; we just needed to think about how to make use of the null hypothesis.

The null hypothesis states that all the chocolates melt at the same rate. This means we can treat every chocolate the same. We can put all the times together, mix them up, and then randomly split them into appropriate sized groups. I illustrated this process by writing the first four melting times in our data set for each chocolate type. This gave me 12 values, which I numbered from 1 through 12. I then had students generate random numbers. The first four unique values identified the items for my first chocolate type, the second four unique random values identified my second chocolate group, and the remaining four items made up my third chocolate type.

If we repeat this process over and over, ideally thousands of times, we start to see what melting times for random groups look like. Of course, this is not practical to do without software. Fortunately, at the NCTM annual conference in Denver, I sat in on a presentation where software was demonstrated that can perform this re-sampling technique and it was free. I briefly demonstrated how the software works using the values I had on the board.

I asked students to access the software and play around with it as we will be using it to analyze data from explorations we will make in the last few weeks of class. I posted links to the java program launcher and to the java executable jar file on my web site with instructions on installation.

I wrapped up class by formally introducing the idea of re-sampling and, more specifically, random redistribution. My plan is to move into the computer lab next class and have students use the software to assist in analyzing their chocolate experiment data.

Visit the class summary for a student's perspective and to view the lesson slides.

No comments:

Post a Comment