skip to primary navigationskip to content

Global Challenges Initiative

The Cambridge Strategic Research Initiative for the Sustainable Development Goals (SDGs)

Studying at Cambridge

 

NERC-affiliated short course: Big Data in environmental biology

This course is endorsed by the Natural Environment Research Council (NERC), and is run by the Department for Continuing Education at the University of Oxford. It aims to impart an in-depth knowledge of current analytical techniques in several important statistical and industry-relevant areas. The course is based wholly on R and features comprehensive, real-life projects and case studies, all demonstrated by expert facilitators.
When Feb 19, 2018 09:00 AM to
Feb 22, 2018 05:00 PM
Where Department for Continuing Education, University of Oxford, Oxford
Add event to calendar vCal
iCal

NERC

 

University of Oxford

A free NERC-funded four-day postgraduate skills training course analysing very large environmental and ecological data-sets with advanced multivariate statistics (including big data analysis) that aims to provide integrated training for those needing a skillset of statistical approaches for industry-specific applications in ecology and environmental science that is relevant to student research and employability needs.

A basic level of R is assumed, but further R skills will be taught through R Studio and other R packages. Students will learn how to carry out advanced regression and ordination/dimensionality reduction analyses on their own laptops as well as acquiring experience with mining Big Data-sets and creating personalised data visualisations in the widely-recommended graphics system ggplot2.

Throughout and integrated within the course, students will be required to execute real-life projects in the context of four case studies, ensuring practical and hands-on experience.

Case studies:

  1. Using PCA to find locally important variables in the Earth System Data Cube
  2. Deforestation data and governmental responses in NE Cambodia
  3. Citizen science data: methods and analysis
  4. Knowledge discovery through the Earth System Data Cube

The course fills a widening gap in skills provision in the current UK research landscape. The ability to analyse and interpret data underpins all ecological research and decision making: core skills valued not only by academics, but also by industrial employers such as institutes, NGOs and consultants.

The course programme is as follows:

  • Day 1: This will consist of a very compressed revision of knowledge during the morning including a seminar on Big Data use in business, followed by intense sessions on data visualisation in the afternoon.
  • Day 2: This applies multivariate ordination approaches to various kinds of data and includes an in-depth case study.
  • Day 3: Comprises a presentation of Generalised Linear Model (GLM) and Generalised Linear Mixed Model (GLMM) techniques and two further case studies.
  • Day 4: Includes a session on data-mining followed by a case-study as well as a business perspective presentation, before the course concludes with a student data workshop in the afternoon.

The learning outcomes for each day are:

Day 1:

  • Working with R and the RStudio development environment, import analysis and basic visualisation of data.
  • Knowledge of traditional statistical testing is assumed. However, ordination and simple linear models will be briefly summarised.
  • An overview of the use of big data in different environmental business sectors to provide context for the examples introduced throughout the course.
  • Data visualisation including basic plotting in R, lattice and panel plots, leading up to the Grammar of Graphics approach to data visualisation and the ggplot2 package. A working knowledge of GIS-related spatial data visualisation on maps, and interactive web-based plotting widgets using the RShiny system.

Day 2:

  • Understand the difference between supervised and unsupervised methods.
  • An Introduction to some ordination/dimensionality reduction methods (PCA, MDS, Isomap, Autoencoders, t-SNE).
  • How to measure the quality of an embedding, how to choose the right parametrization.
  • A basic overview of how to use NetCDF to manage large data-sets (NCO editing tools, visualisation tools).
  • Case study 1: Finding locally important variables in the earth system using the Earth System Data Cube and PCA.

Day 3:

  • Understand strategies for dealing with spatial and temporal data including GLMs, repeated measures analysis and mixed-effects models (GLMMs).
  • Revision of error structure issues in ecological data (binomial, Poisson, Gaussian) and the use of canonical link functions.
  • Case study 2 - Deforestation data and governmental responses during the development of the Seima Protection Forest REDD+ project in NE Cambodia
  • Case study 3 – Gathering large data-sets using citizen scientists – opportunities and pitfalls. A brief general introduction to citizen science followed by an example of analysing ant mating flights with GLMMs from 13,000 observations across the UK.

Day 4:

  • An overview of data mining and exploring Big Data.
  • Case study 4 - Focus on the Earth System Data Cube as an example of a Big Data resource. Access to the cube via online Jupyterhub, analysis interface via Jupyter scripts, with examples based on R scripts on the Jupyter portal.
  • Business perspectives presentation: The potential for big data, including key business questions around innovation and commercialisation of big data projects. Client perspectives from government and business sectors.
  • Student data workshop: Students are invited to spend the afternoon either continuing working with the example data-sets and case studies provided earlier in the course or by applying the learned techniques to their own data, with help and advice from the course tutors.

The facilitators and tutors for this programme are:

  • Dr Thomas Hesselberg, Departmental Lecturer in Environmental Science and Course Director for the Postgraduate Certificate in Ecological Survey Techniques at OUDCE. He has more than 10 years of experience teaching and conducting research in the area of invertebrate biology and behavioural ecology, utilising a variety of statistical methods including a recent study on ant nuptial flights using citizen science.
  • Dr Toby Marthews, a researcher at the Centre for Ecology & Hydrology (CEH). Originally trained in maths and physics, he has extensive experience of ecological experimental design and analysis including RAINFOR plot census work in Peru and Malaysia, tropical forest research in Panama, geospatial analysis using ArcGIS, modelling the hydrology of African wetlands and the use of large-scale Land Surface Models. Toby has written an online Beginners’ R course.
  • Mr Guido Kraemer, a researcher at the Max Planck Institute in Jena, Germany (MPI). He holds an MSc in Ecology from the University of Jena. His current PhD project focuses on the development of a multivariate system state indicator for the Earth System. An expert in R, Julia, and dimensionality reduction, Guido teaches at the International Max Planck Research School for Global Biogeochemical Cycles and has authored three R packages relevant to this course: dimRed (A Framework for Dimensionality Reduction in R), coRanking (Assessing the Quality of a Dimensionality Reduction) and DRR (Dimensionality Reduction via Regression).
  • Mr Keith Binding has twenty years’ experience of providing commercialisation training and coaching support via Research Council initiatives and of developing supporting innovation guidance and materials. Originally a commercial lawyer, Keith is Director of Spirit Consulting, providing market evaluation, commercialisation and business development support for early-stage businesses from both the public and private sectors, with a particular focus on environmental/sustainability and data analytics ventures.

 

For more information on this programme, and how to apply, please see the course webpage here.

The Global Challenges Initiative is a Strategic Research Initiative of the University of Cambridge that aims to enhance the contribution of its research towards addressing global challenges and achieving the Sustainable Development Goals (SDGs) by 2030.

SDGs

Find out about open funding opportunities, pre-call announcements, responsive calls, studentships and awards relevant to the global challenges agenda.

Read more