Objectives

1. Learn to calculate descriptive statistics using Microsoft Excel and SPSS.

Analyze the descriptive statistics spatially and temporally.

Interpret statistical results in written responses.

Background:

All questions in this assignment are to be answered in your write-up. All material is to be word-processed and any required tables or figures must be appropriately labelled (title, axes, etc.) and referenced in the text of your write-up. Your mark on this assignment is based on your work, your answers, and your presentation of both. Reading Chapters 2 & 3 in your textbook, and reviewing the material covered in the Week 2 Lecture PowerPoint is recommended before attempting this assignment and questions.

Descriptive Statistics

In this assignment you will be working with a time-series data set depicting the changing proportion of immigrant population for 26 urban centres in Canada. The assignment is meant to highlight the value of the descriptive measures in a geographic context. Quite often, geographers are interested in looking at changing patterns over time, and these measures provide a way of exploring the data values that exhibit these changes. The data set is provided separately on OWL as an Excel spreadsheet. In this lab you will be working with Excel and SPSS.

Assignment

This assignment is going to introduce you to several valuable skills for working with data sets in Excel and SPSS. In particular, we will be making extensive use of Excel’s capability for inputting and calculating formulas.

The cells in an Excel workbook can be filled with numbers, text, or formulas. Consider the following example:

A B C D
1 Hours Worked Rate Pay
2 50 5.50
3 46 6.50
4 45 6.55
5 38 7.25

HerewearetryingtocalculatethePaybasedonHoursWorkedandtheRate.Thepowerofthe spreadsheetliesinthefactthatwecaninputaformulatodothiscalculationonce(cellC2)and thencopyandpaste (or drag) itdownthecolumntocalculatethevaluesforeachobservation.

Try this:

In Excel, copy and paste the content of the above table (from cells A1:C:5) into an empty Excel spreadsheet.

Type “=A2*B2” into cell C2 (without the quotation marks). When you hit enter/return (or click on the green check next to the formula bar) this cell should change to 275 – as the calculation replaces the formula.

Copy and paste the formula from C2 into cells C3:C5 (or you can click on Cell C2, and then grab the little square in the bottom right of the cell and drag the cell down to C5. This should apply that formula to the three lines below.

Tips:

You don’t need to actually type A2 or B2, you can click on the cell(s) you want to include in the formula. So, you could just type “=” in cell C2, then click on cell A2, type “*” and then click on cell B2.

Each formula needs to begin with “=” and can use * for multiplication, / for division, + or – for arithmetic.

Use^(shift-6 onthekeyboard)to raisesomethingto apower(e.g.to squarePayin theabovewewouldenter“=(A2*B2)^2”.

To take the square root of a formula, raise it to the power of 0.5 (^0.5)

Wealsomustuseparentheses()toensurethata calculationtakesplaceintheproperorder.

To reference an absolute cell location (one that doesn’t change as you drag or copy/paste your formula), use the $ before the column and row (e.g., $A$2 locks that cell into the formula). This is useful for calculations where one item remains constant – otherwise when you copy and paste the formula it changes cell numbers automatically.

Go here for Microsoft’s help page for formulas, or use the Help menu.

In this assignment, you are going to use Excel to calculate: Mean, Standard Deviation, Coefficient of Variation, Skewness, and Kurtosis using the formulas from your lecture notes and textbook. Additionally, you will be using SPSS to verify your results.

You will note that there are some commonly used formulas and shortcuts – take advantage of this fact as you proceed through the assignment!

Excel Instruction Steps:

Save the file Lab 2 Data 2210.xlsx to your computer.

Obtain the 2016 data for the 26 cities on the spreadsheet from the Statistics Canada website. Create a new data column (Column F, title it “2016”) and then enter the 2016 data from the column labeled [Proportion (%) – Immigrant population] that matches the cities in the spreadsheet provided. You will not enter data for all the cities on the website – just those cities listed on the Excel spreadsheet provided.

You are going to focus on the 2011 data values (for now). The first task is to calculate the Mean ( or x-bar) immigrant population proportion. To complete this you need to determine the number of observations and the sum of values.

In cell A29, input “Sum”. In cell E29, input the following: “=sum(E2:E27)”. This provides the sum of the data values for the 2011 proportion of foreign-born residents variable.

In cell E32, input the following: “=count(E2:E27). This value is “n”, the number of observations for the variable.

In cell E33, calculate the Mean () using a formula. The mean is the sum of the data values divided by the number of observations for the variable.

You can also calculate mean using an Excel formula by typing “=average” followed by the cell range in parentheses as a formula (e.g. “=average(E2:E27)”.

You now must prepare to calculate the Sample Standard Deviation (s).

Label columns G, H, I, and J “(X – Xbar)”, “(X – Xbar)^2”, “(X – Xbar)^3”, and “(X

– Xbar)^4” respectively.

Insert a formula in cell G2, under “(X – Xbar)”, that will calculate that observation’s deviation from the mean. Use the $ symbol when entering the cell with the mean (cell F33) into your formula. This will look like this: “=E2-$E$33“. Ensure that your answer makes sense. The dollar signs keep the formula referencing cell E33 even as you move the formula to other cells in the E column in the next step.

Click on G2 after you have entered the formula. You could copy and paste this formula into all the cells below it; however, also try hovering the cursor over the lower, right corner of G2. When a solid cross appears, click and drag down to the last cell for which you want to calculate this value (G27). You should be rewarded with a column full of deviations.

Insert a formula in cell H2, under “(X – Xbar)^2”, that will calculate the square of the values in column E. Repeat the copy and paste procedure to calculate the squared deviations for every observation of the 2011 data. To have Excel square a value from a cell you’d use the formula “=L3^2”. That would yield a value of L3 squared in the cell where you’ve entered the formula.

In order to help with later calculations, calculate the cubed deviations (for skewness) and deviations raised to the power of 4 (for kurtosis) in the remaining two columns (columns I and J). In step d. above you used ^2 to square a value, you can use ^3, ^4, etc. to raise a value to the third, fourth, etc. powers. In the skewness and kurtosis columns be sure to use the mean deviation value from column G for your formulas. Drag these down or copy/paste for each of the 26 cities.

You are now going to sum up deviation calculations for the last three columns of data you just created, (X – Xbar)^2 to (X – Xbar)^4. Move to E29, where you originally calculated the sum for the 2011 data. Hit “ctrl-c” to copy the formula in that cell. Move to cell G29. Click and hold the left mouse button and drag it to the right, highlighting the cells in that row that correspond to columns G through J and release. Hit “ctrl-v” to paste the formula in those cells. You should be rewarded with the sum of each of those columns.

In the cells below the one labelled “Mean”, you will notice the following labels: Standard Deviation, Coefficient of Variation, Skewness, and Kurtosis. Move to the empty cell beside the Standard Deviation under 2011 (E34).

You have now created all the components you need to calculate the standard deviation for the 2011 Proportion of immigrants variable. Check the equation you will need (**Is this sample data or population data?**) from the lecture notes or textbook, and input the formula into this cell (E34). Don’t type in the terms of the equation, but refer instead to the cell addresses where the numbers have been calculated when you make your formula. Remember, you can take the square root of something by raising it to the power of 0.5 [or use the sqrt() function in Excel]. Also, if you widen the column, you will see more of the calculated value.

For this step, you will be calculating Coefficient of Variation (CV) for the 2011 data. Again, move to the blank cell adjacent to the appropriate label (E35). Refer to the proper formula from lecture and input it in this cell, referring to cell addresses rather than values when you create your formula.

Calculate Skewness and Kurtosis, again referring to the proper formulas and placing them in the appropriate cells (E36 and E37). Remember, no numbers – cell addresses only when creating your formula. The formulas you need for these were provided in lecture and are also in the textbook.

One of the advantages of working in the spreadsheet is that we can change data values, and all the formulas we have input will change accordingly. We are going to take advantage of this fact to quickly calculate our descriptive statistics for the three remaining years of data.

Move to the cell A1. Click and hold the left mouse button and drag out to highlight everything in this worksheet. Copy this highlighted area.

On the bottom of the Excel window you’ll see a + in a grey tab, click on this; a new blank worksheet should appear. Click on cell A1 in this new blank worksheet. When you “Paste” (or ctrl-v), the data and work from your original worksheet should be staring back out at you.

Move to the column labelled “1991”. Click and hold the left mouse button and drag down to highlight the data in this column (B1:B27). Hit “ctrl-c” to make a copy. You are now going to replace the 2011 column of data with this one [make sure you’re in your newly created worksheet]. Move to the cell where the “2011” label is. Hit “ctrl-v” to paste the 1991 data over the 2011 data. With any luck, you should immediately notice that everything has been recalculated. Move to the bottom and check the values for each statistic which should hopefully be new. You can relabel this new worksheet “1991 results” by clicking on the worksheet label (Sheet1) and typing the new title.

Move back to the worksheet labelled “base data” (little tab at the bottom) and begin the procedure again at a). This time, copy the original data to another new worksheet and transfer the values from the 1996 column to the 2011 column. Your result will then be the calculation of descriptive statistics for the 1996 data. Follow this same procedure for the 2001 and 2016 data.

You should now have multiple worksheets: base data, 1991 results, 1996 results, 2001 results, and 2016 results. [Your 2011 results will be on your base data worksheet]

Your last task is to create a summary table which contains the five descriptive statistics as calculated for each different year. Pick an empty spot on the “base data” sheet and copy and paste your statistics into a properly labelled table like this:

1991

1996

2001

2011

2016

Mean

Std. Dev.

CV

Skewness

Kurtosis

Caution: currently, all those calculated values you see on the screen are formulas. If you copy and paste them somewhere, they will lose their cell references and no longer work (you’ll see the message “#REF!”. Follow these instructions to copy the VALUES ONLY:

In “base data” sheet, copy the column of statistics you calculated, Mean through Kurtosis. NOW: move to the area where you are creating the summary table. Then right mouse and select “Paste Special”. A selection box should appear, and under the column of options click on the “Values” option. The values you calculated should be pasted and you now see that instead of formulas, the cells contain actual numbers. [you can also find “Paste Special” under the Edit menu].

Repeat this procedure copying the statistics from the 1991 results, 1996 results, 2001 results, and 2016 results worksheets to the summary table you are creating.

You can increase or reduce the number of digits after the decimal by highlighting the cells and clicking on either the “increase decimal” button () or the “decrease decimal” button ( ) on the Home menu ribbon. You should have two decimal places for all your numbers in the table.

SPSS Instruction Steps:

You have calculated descriptive statistics using Excel, now you can use statistical software (SPSS) to do the same thing.

Open the “SPSS Statistics” program. At the welcome screen select “New Dataset” in the upper lefthand box. Then click “Close” in the bottom right of the window.

You now have what looks like a spreadsheet (like Excel), but this spreadsheet has two “views”. The first view is the “Data View” (see the tab at the bottom of the spreadsheet. This view is nearly identical to Excel as it shows your data. Click on the other view tab (“Variable View”). The Variable View sheet is where you indicate important traits of your data such as the name of your variables, the measurement scale (ordinal, nominal, scale), number of decimal places, type of data, etc. You can see the traits you can enter along the top of the “Variable View” window.

From the “base data” worksheet in your Excel file, copy the cells from A2 to F27 (the city names and data values for proportion of immigrants for the five years). Paste these values into SPSS starting in the top-left cell on the “Data View” sheet. [Do not include the labels for the years (1991, 1996, etc.) in your copy selection – just the data].

You’ll see that SPSS named your columns (variables) as VAR0001, VAR0002, etc. We want to tell SPSS about this data. So, click on “Variable View” and replace VAR0001 with City, VAR0002 with Imm_1991, VAR0003 with Imm_1996, and so on. You can’t have spaces or just numbers in your variable names.

You can now tell SPSS what type of data is in each column. The city names are “nominal” data. The immigration proportions are “scale” level data (SPSS treats “enumeration”, “interval” and “ratio” data as “scale”). In the “Measure” column on the Variable View page select “Nominal” for the first variable, and “Scale” for the proportion columns. Now click back to your “Data View” page and you’ll see the columns have the proper names for your variables. Your SPSS “Data View” should look something like this:

To determine the Descriptive Statistics values for each numeric variable go to the Analyze menu, choose Descriptive Statistics Descriptives… The dialog box will give you a list of your numeric variables on the left, with an empty box on the right. To calculate the descriptive statistics of your variables, drag or arrow all five the variables to the right box. [You don’t see the City variable in this list because it’s a nominal variable, and it’s not possible to calculate descriptive statistics for nominal variables)]. To choose the descriptive statistics you want SPSS to calculate, click on the “Options…” button. Choose the same measures of central tendency and dispersion you calculated in Excel (CV is not an option here however). Then click “Continue” on this Options window, and then “OK” on the main Descriptives dialog box.

SPSS then opens a new window – an Output window. This is where SPSS places all your analyses, graphs, etc. – not on the spreadsheet like Excel. You can copy the table by right-mouse selecting the table and pasting it into MS Word.

To make a histogram of the data click on the Graphs menu in SPSS and select “Chart Builder…”. Select “Histogram” from the white rectangular box on the bottom left of the window, and then drag the left-most graph icon into the big white box at the top-centre of the Chart Builder window. It should look like this:

To make a histogram of the 1991 data, drag the Imm_1991 variable to the X-axis of the histogram template. You can then click OK on the bottom of the dialog box and SPSS will generate your histogram. If you want to set the parameters for the width or number of classes (bin width, number of bins) you can alter that in the “Element Properties” window that is to the right of your Chart Builder window. To do that you’d select Bar1 in the top of that window and then click on “Set Parameters”. You can alter the look of the X and Y axes too if you select those in the top of the window and then click on “Set Parameters”.

To facilitate comparisons between the histograms for the five census dates, you should assure that the histograms all have the same x-axis and bin sizes. If you look at the descriptive statistics output you can see the minimum and maximum values for each census. It is probably best if the x-axis for all the histograms range from 0 to 50 (to accommodate the cases in the high 40s in the 2011 and 2016 censuses. To create bins with round numbers, you should have a bin width of 10, or five bins. You can remake the first histogram using these guidelines.

Once you’ve clicked OK in the dialog box, your histogram will appear in the Output file. You can double click on the histogram and edit its appearance.

Make histograms for all five years of data. These can also be copied and pasted into MS Word like you did above for the descriptive statistics table.

Questions:

The questions in this series are based in part on the “Summary Table of Descriptive Statistics” you created and the histograms.

The mean and standard deviation are absolute descriptive measures, whose values are a direct function of the magnitude of the data. Briefly explain the changes to these measures over time for the proportion of immigrant residents in the 26 cities AND speculate on the reasons for these changes. (5 marks)

Similarly, explain the differences you observe over time in the relative descriptive measures – coefficient of variation, skewness, and kurtosis. [Given that Excel and SPSS use different formulas to calculate skewness and kurtosis, use your Excel results to answer this question]. (3 marks)

From looking at the data and the descriptive measures, briefly describe a spatial pattern of the proportion of foreign-born urban residents in each year and the changes in this pattern through each time period. (2 marks)

Where are the highest and lowest proportions of foreign-born urban residents in each year? Provide a hypothesis, which may form the basis of a more detailed research project. (5 marks).

Discuss the relationship between the histograms and the descriptive measures explored in this lab. (5 marks)

Deliverables to be uploaded to OWL:

Excel file with each worksheet and your “Summary Table of Descriptive Statistics” from Step 8 in the Excel instructions above (30 marks)

MS Word file with a copy of the Descriptive Statistics table from SPSS for the five census dates (10 marks), five histograms (40 marks), and written answers to the questions above (20 marks total).