Code Club :)

Otho Mantegazza

2019-03-11

Why Data Analysis for Biologists?

As a biologist, you’ll often find yourself dealing with data.

Your experiment, be it phenotype measurements, gene expression analysis, genomic comparisons, will produce data.

To understand your results and to explain them to others, you’ll have to be able to make sense of those data.

Why Data Analysis in R

No need, pick the software that suits your needs, use many of them and feel free to change.

R has some advantages:

  • It’s professional and open source,
  • It has great packages for data manipulation and visualization,
  • It has powerful tools for communicating of your results,
  • It has clear, extensive and user friendly documentation,
  • It is powerful and easy to use.

Learn by doing

TidyTuesday

Check out the TidyTuesday, it’s a weekly Social project on Twitter.

You can learn a lot just by looking at what other people are doing!

If you also publish TidyTuesdays frequently, you’ll learn how to analyze data in R, with a friendly and welcoming community.

Resources

Bookdown

Bookdown, has wonderful books on data analysis in R that you can consult openly online.

You might want to start from R for Data Science.

Load your data into R

Readr

If you want to analyze data in R, first you have to load them.

You can do it with readr and readxl. These two packages cover two very common data formats: text rectangular data (csv, tsv) and excel data.

Other kinds of data

You can load many kind of data into R. For most of them you can find manuals and howto online.

Plot - Visualize your data

If you know how to visualize your data in graphs and plots you’re half way there. You can use plots both to explore your data and to communicate your results.

You can use ggplot2, an R package for data visualization. You can find guides on how to use it on its website or in this chapter of r4ds.

Use Plots to Explore and to Communicate

Keep in mind that you’ll use plots for at least two reasons:

  1. Explore: You can use plots to explore your data. Explorative plots should be produced quickly. (at expense of details)

  2. Communicate: You can use plots to communicate your results to others. Plots for communication should be detailed and clear to everybody.

Unleash your Inner Designer

Plots and data visualizations are real work of design.

If you combine a technical and aesthetic representation, you make your plots nice and easy to understand.

Learn from the best

Wrangle - Manipulate your data

Sometime you have manipulate your datasets: you might want to filter them or to change, remove, add new columns to your data.

You can manipulate your data with dplyr. Learn how to use dplyr here or with this chapter of r4ds.

You might want to filter your data, summarize them, or mutate them making new columns.

Or much more

Source

I made this presentation with the R markdown implementation of reveal.js.

The source code is here.