Mean

When to use: To find the mathematical average of a continuous dataset (e.g., average river depth, mean annual temperature, average pebble size).

Advantages

  • Uses every value in the dataset, making it mathematically precise.
  • Essential for higher-level statistics like Standard Deviation.

Disadvantages

  • Can be significantly skewed by extreme values or anomalies.
  • May provide invalid numbers that need manual correction (e.g., 2.4 pedestrians).

Median

When to use: To find the middle value of an ordered dataset. Ideal when data is heavily skewed (e.g., income levels, property values).

Advantages

  • Unaffected by extreme outliers, providing a more representative value in skewed datasets.
  • More accurately reflects the 'typical' value when the data contains significant anomalies.

Disadvantages

  • Ignores the specific values of most data points, focusing only on their ranked position.
  • Requires the dataset to be ordered, which can be time-consuming for large samples.

Mode

When to use: To identify the most frequently occurring value or category. Best suited for nominal (categorical) data, like land use types or dominant ethnic groups.

Advantages

  • The only measure of central tendency applicable to categorical data.
  • Unaffected by extreme numerical outliers.

Disadvantages

  • A dataset may have no mode or multiple modes (bimodal/multimodal), leading to ambiguity.
  • Does not consider all data points in its calculation.

Range

When to use: To quickly calculate the difference between the highest and lowest values in a dataset.

Advantages

  • Simple and quick to calculate.
  • Provides an immediate, though basic, understanding of the data's total spread.

Disadvantages

  • Extremely sensitive to outliers as it only uses the two most extreme values.
  • Provides no information about the distribution of data between the extremes.

Interquartile Range (IQR)

When to use: To measure the spread of the middle 50% of the data. Often used to compare data distributions and identify potential outliers (e.g., using box plots).

Advantages

  • Eliminates the influence of extreme upper and lower outliers.
  • Effective for comparing the internal spread of different datasets.

Disadvantages

  • Ignores 50% of the dataset (the lowest 25% and highest 25%).
  • More complex to calculate than the standard range.

Standard Deviation

When to use: To measure the average distance of all data points from the mean. A small SD indicates data is clustered tightly around the mean (high reliability); a large SD indicates data is widely spread out.

Advantages

  • Allows you to see how much scores vary around the mean.
  • Not as affected by extreme values.

Disadvantages

  • It is hard to calculate.
  • Has assumptions, meaning it cannot be used on skewed or irregular data.

Spearman's Rank

When to use: To measure the strength and direction of a relationship between two variables. Ideal for testing hypotheses such as "Does pebble size decrease with distance from the source?" or "Is there a correlation between a town's deprivation index and distance from the CBD?".

Advantages

  • Provides a precise numerical value (-1 to +1) indicating the strength and direction of a correlation.
  • Can be used with ranked data and is not reliant on a normal data distribution. This means it can be used to confirm confusing correlation patterns.

Disadvantages

  • Manual ranking of large datasets can be laborious and prone to error.
  • Correlation does not imply causation; it cannot prove that one variable causes a change in the other.

Mann-Whitney U Test

When to use: To test for a significant difference between two independent sets of data that are not normally distributed. It is the non-parametric equivalent of the t-test. For example, "Is there a significant difference in the Environmental Quality Score between two different residential streets?"

Advantages

  • Effective for skewed data as it does not require a normal distribution.
  • Clearly determines if the difference between two data sets is statistically significant.

Disadvantages

  • The manual calculation is lengthy and can be prone to human error.
  • Accuracy can be reduced with very small sample sizes (typically below 5).
  • Can only be used to test the difference between two data sets.

Chi-Squared Test

When to use: To compare observed categorical data with expected data to see if there is a significant association between them. For example, "Is there a significant difference in the preferred shopping location (e.g., city centre, retail park) between different age groups?".

Advantages

  • Useful for testing hypotheses using categoric data / variables.

Disadvantages

  • Extremely sensitive to the size of the sample used.
  • It only indicates if a relationship is exists (if it is significant), not how strong that relationship is.