# Analysing and Interpreting Data

## Aim of this topic

To provide officer-level guidance to help Tasmanian Government agencies apply appropriate statistical techniques and thinking to identify key messages in data.

**This topic covers**

**Also see**

Getting started – a process for analysing data

The data analysis process is about turning data into information. Following the steps below will give you a methodical and systematic approach to analysing data which will help ensure your analysis is accurate and enable you to draw appropriate conclusions from the data.

- Identify the issues or questions you require information about, specify objectives and formulate expectations.
- Identify appropriate data sources.
- Determine appropriate analytical techniques and undertake data analysis.
- Assess the results against the objectives and expectations.

### Resources

## Statistical measures and descriptive techniques to summarise quantitative data

There are many ways to summarise and present data so that it can be used more readily and usefully as part of an evidence base for decision making. Quantitative data analysis techniques assist with making sense of and communicating your data to others by organising, summarising and exploring your results.

### Graphical analysis

Graphical analysis is a useful way to gain an instant picture of the distribution of data and identify any relationships that require further investigation. Patterns in data can be discerned more easily when displayed in graphs and a range of graphical techniques can be used to present data in a pictorial format. For example, column graphs, row graphs, dot graphs and line graphs.

One way of summarising data is to produce a frequency distribution table or graph. A frequency distribution table groups data into categories, showing the number of observations in each category. These categories are referred to as classes. Once the class frequencies have been produced, the distribution can be represented graphically by column, row, dot or line graph. It may also be appropriate to plot relative frequencies to show the percentage of the population within each class interval – which enables the different sizes to be directly compared.

### Summary statistical measures

Calculating summary statistics will assist you to understand the distribution of the data. These summary measures are useful for comparing information and are more precise than graphical analysis. Summary statistics assist you to develop an understanding of:

**the centre of a set of data -** this is important as we often want to know what the central value is for the sample or population. The mean, median and mode are useful measures of central location. However, these measures of location cannot tell the whole story about the distribution of the data, it is possible for two data sets to have the same mean but vastly different distributions. Therefore, you should also analyse the amount of variability in the data; and**the variability or the spread of the data -** the range, inter-quartile range, standard deviation and variance are useful measures of variability or the spread of the data.

There are also a range of analytical techniques that can enable you to gain a deeper understanding of the data. This can involve analysing the data to determine change over time; comparison between groups; comparing like with like; and relationships between variables. Modelling techniques such as linear regression, logistic regression and time series analysis are ways that may be used to explore these relationships.

### Resources

## Issues to consider when interpreting data

When interpreting data there are a number of issues that need to be considered to ensure data is used correctly. These may impact on your analysis and the conclusions you may draw.

Some useful questions to consider when interpreting data, include:

- Has the original question been answered?
- Do the results meet expectations? Do they make sense?
- What are the main conclusions? Are there other interpretations?
- Is the supporting data of sufficient quality? How current is it? How was it collected? Can the results be supported statistically i.e. are they statistically significant?

The process of data interpretation can be quite complex depending on the questions you are seeking answers for and in some instances, the answers will not be clear cut. Your analysis may provide you with the basis for describing what happened but there may be many possible reasons why this has occurred. It is important not to consider the issue in isolation but to think about the interrelationships between social, economic and environmental factors. You may need to seek clarification through further analysis and research to ensure the conclusions you draw are accurate.

### Correlation and causation

Understanding the extent to which one variable relates to another is often the objective of data analysis, for example is there a relationship between a person’s education level and their health outcomes. Relationships between variables can be explored by examining correlation and causality. A sound understanding of correlation and causality can enable more targeted policies and programs to be developed to bring about desired outcomes.

Two or more variables are considered to be related (or correlated), in a statistical context, if their values change so that as the value of one variable increases or decreases so does the value of the other variable (although it may be in the opposite direction).

For example, for the two variables "hours worked" and "income earned" there is a relationship between the two if the increase in hours worked is associated with an increase in income earned. If we consider the two variables "price" and "purchasing power", as the price of goods increases a person's ability to buy these goods decreases (assuming a constant income).

Correlation is a statistical measure (expressed as a number) that describes the size and direction of a relationship between two or more variables. A correlation between variables, however, does not automatically mean that the change in one variable is the cause of the change in the value of the other variable.

Causation indicates that one event is the result of another event, ie. there is a causal relationship, for example between smoking and lung cancer. This is also referred to as cause and effect. Further information can be found at the Australian Bureau of Statistics.

### Resources

## Where can I access data?

There are many sources of data available but careful consideration should be given to choosing the right data for the intended purpose. Choosing a data set that is not appropriate, can lead to inaccurate conclusions being drawn. Data is increasingly available from a number of sources, and can be accessed electronically from the internet, or in publication format. Published data may be available through libraries (public and university), government departments, community groups, newspapers, books, journals and abstracts. The following resources are various sources of data.

### Resources

### Commonly used economic and population data

## How does data quality affect analysis and interpretation?

Ensuring that data is fit for purpose is an important consideration when selecting data sets for use in analysis. A data set that is not fit for purpose may limit the conclusions that can be drawn and the usefulness of findings from the analysis.

The Australian Bureau of Statistics' Data Quality Framework outlines seven areas to consider when assessing whether a data source is fit for purpose, as well as suggested questions to help assess each component. The Framework is designed to meet a number of purposes including designing data products and collections that are fit for purpose as well as assessment of existing products and collections.

Some useful questions to consider when assessing whether an existing data source is fit for use in analysis may include:

**Institutional environment**: Is the organisation supplying the data impartial/objective? Under what authority do they collect the information?**Relevance**: How well does the data align with data requirements? Is it collected for the population, area and time period of interest?**Timeliness**: How current is the data?**Accuracy**: How reliable is the data and are there any limits on how it should be used?**Coherence:** How well does the data align with other sources? Are populations and data items of interest measured consistently?**Interpretability**: How readily is the data understood? Is information available to help understand potential errors in the data and concepts measured?**Accessibility:** How readily can the data be obtained? Is it publically available?

### Resources