Today we started looking at data collection issues. The focus on this class was data integrity and preliminary analysis. The goal was for students to consider the quality of their data and to understand what they already know about data analysis.
I used the survey data that was collected on the first day. To start things off, I just briefly reviewed the questions asked and then showed them the data set. I had key entered the data exactly as was written down and had made one key stroke error in gasoline prices by omitting a decimal point.
Students looked through the data and we corrected all of the data values except one, which we had to delete as invalid. We then moved to a discussion of categorical versus quantitative data. I asked students to classify the nine variables into one of the two columns. Everyone agreed on the classifications except for shoe size.
Shoe size is always an interesting discussion. It boils down to shoe size measuring shoe length even though the measurement units are discrete.
The next topic was summarizing the data. I asked students to summarize the data by asking what they would say if someone walked in the room? I asked students to focus on two columns of data: political leaning and amount paid for gas. Students worked on calculating means and percent distributions.
I then asked students who had made a graph? Not one graph was made. I told them the first rule of statistics is to make a graph. I told them the second rule was to make a graph. I told them the third rule was to make a graph. A student asked what the fourth rule was? I told them the fourth rule was to see rules 1-3.
We talked about appropriate graphs for political leanings. The conclusion was a bar graph or pie graph were appropriate. One student had made a bar graph during the discussion and I was able to use this to highlight characteristics of a bar graph: bars don't touch, order of categories doesn't matter, labeling, and the like.
We then moved to gas price. Students responded that a scatter plot could be made. I pointed out that you need two variables to make a scatter plot and this class focuses on univariate data. Students mentioned making a bar graph. I pointed out that bar graphs are for categorical data. Students mentioned histograms. They then remembered stem and leaf and box plots.
I briefly discussed the characteristics of a histogram: bars touch, the x-axis is a scale where the order matters, the bin sizes must be the same size, there is labeling, and the like.
For homework, I asked students to pick one categorical and one quantitative variable. For the two selected variables they are to create appropriate graphs and to summarize their results.
Visit the class summary for a student's perspective and to view the lesson slides.
Monday, March 4, 2013
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment