Geography Statistics Guide

Basic Statistical Skills

Mean

When to use: To find the mathematical average of a continuous dataset (e.g., average river depth, mean annual temperature, average pebble size).

Advantages

Uses every value in the dataset, making it mathematically precise.
Essential for higher-level statistics like Standard Deviation.

Disadvantages

Can be significantly skewed by extreme values or anomalies.
May provide invalid numbers that need manual correction (e.g., 2.4 pedestrians).

Median

When to use: To find the middle value of an ordered dataset. Ideal when data is heavily skewed (e.g., income levels, property values).

Advantages

Unaffected by extreme outliers, providing a more representative value in skewed datasets.
More accurately reflects the 'typical' value when the data contains significant anomalies.

Disadvantages

Ignores the specific values of most data points, focusing only on their ranked position.
Requires the dataset to be ordered, which can be time-consuming for large samples.

Mode

When to use: To identify the most frequently occurring value or category. Best suited for nominal (categorical) data, like land use types or dominant ethnic groups.

Advantages

The only measure of central tendency applicable to categorical data.
Unaffected by extreme numerical outliers.

Disadvantages

A dataset may have no mode or multiple modes (bimodal/multimodal), leading to ambiguity.
Does not consider all data points in its calculation.

Statistics using Ranges

Range

When to use: To quickly calculate the difference between the highest and lowest values in a dataset.

Advantages

Simple and quick to calculate.
Provides an immediate, though basic, understanding of the data's total spread.

Disadvantages

Extremely sensitive to outliers as it only uses the two most extreme values.
Provides no information about the distribution of data between the extremes.

Interquartile Range (IQR)

When to use: To measure the spread of the middle 50% of the data. Often used to compare data distributions and identify potential outliers (e.g., using box plots).

Advantages

Eliminates the influence of extreme upper and lower outliers.
Effective for comparing the internal spread of different datasets.

Disadvantages

Ignores 50% of the dataset (the lowest 25% and highest 25%).
More complex to calculate than the standard range.

Standard Deviation

When to use: To measure the average distance of all data points from the mean. A small SD indicates data is clustered tightly around the mean (high reliability); a large SD indicates data is widely spread out.

Advantages

Allows you to see how much scores vary around the mean.
Not as affected by extreme values.

Disadvantages

It is hard to calculate.
Has assumptions, meaning it cannot be used on skewed or irregular data.

Statistical Tests & Advanced Skills

Spearman's Rank

When to use: To measure the strength and direction of a relationship between two variables. Ideal for testing hypotheses such as "Does pebble size decrease with distance from the source?" or "Is there a correlation between a town's deprivation index and distance from the CBD?".

Advantages

Provides a precise numerical value (-1 to +1) indicating the strength and direction of a correlation.
Can be used with ranked data and is not reliant on a normal data distribution. This means it can be used to confirm confusing correlation patterns.

Disadvantages

Manual ranking of large datasets can be laborious and prone to error.
Correlation does not imply causation; it cannot prove that one variable causes a change in the other.

Mann-Whitney U Test

When to use: To test for a significant difference between two independent sets of data that are not normally distributed. It is the non-parametric equivalent of the t-test. For example, "Is there a significant difference in the Environmental Quality Score between two different residential streets?"

Advantages

Effective for skewed data as it does not require a normal distribution.
Clearly determines if the difference between two data sets is statistically significant.

Disadvantages

The manual calculation is lengthy and can be prone to human error.
Accuracy can be reduced with very small sample sizes (typically below 5).
Can only be used to test the difference between two data sets.

Chi-Squared Test

When to use: To compare observed categorical data with expected data to see if there is a significant association between them. For example, "Is there a significant difference in the preferred shopping location (e.g., city centre, retail park) between different age groups?".

Advantages

Useful for testing hypotheses using categoric data / variables.

Disadvantages

Extremely sensitive to the size of the sample used.
It only indicates if a relationship is exists (if it is significant), not how strong that relationship is.

Geography Statistical Skills

Mean

Advantages

Disadvantages

Median

Advantages

Disadvantages

Mode

Advantages

Disadvantages

Range

Advantages

Disadvantages

Interquartile Range (IQR)

Advantages

Disadvantages

Standard Deviation

Advantages

Disadvantages

Spearman's Rank

Advantages

Disadvantages

Mann-Whitney U Test

Advantages

Disadvantages

Chi-Squared Test

Advantages

Disadvantages