Getting Data

How to access public gene expression data

There is a video available for this lesson instead of reading the text, if you prefer.

The first thing you will need is data!

Scientists can upload data from their experiments to a website called the Gene Expression Omnibus. Anyone (including you) can visit the website and download the data.

In this course, we will do an example project on COVID-19. So, the first step will be to get gene expression data on COVID-19 infections.

The link above should take you to this page,

In the search bar, type in COVID-19 and click Search.

It will search the GEO database for datasets matching your search term (feel free to switch "COVID-19" with a different disease that you would like to research).

On this page, there is a lot of data. You can filter data by organism (human, mice, etc.), type of study, age, and many other factors.

Feel free to browse through the datasets and read the titles.

I chose this dataset for our research:

Please visit the page.

You will see the following page. This page is the "Accession Display" and is basically a summary of the dataset and what it is about. You can see details such as the title, the day it was submitted, which lab it came from, and more.

For example, this particular dataset is fairly new. It was submitted on January 14, 2021 by scientists from Zhejiang University in China.

The important details to look at on any Accession Display page are: 1. Title 2. Summary 3. Overall Design

You should always read these carefully to make sure the dataset is what you are looking for.

In our case, this dataset contains microarray data on the transcriptome (RNA) of peripheral blood mononuclear cells from COVID-19 patients and healthy people.

This means that the scientists who did this experiment measured how much each RNA was expressed in people's blood. Their purpose was to determine the differences in gene expression between people with COVID-19 and people without COVID-19.

This research has implications for treatment, diagnosis, and for helping doctors understand the effects of COVID-19 on our bodies. For example, with diagnosis, if we wanted to design a COVID-19 test, we could use this data to identify RNA molecules that are more common in the blood of people with COVID-19 than healthy people. Then, COVID-19 testers could use a microarray to determine whether people have high levels of those RNA molecules, and thus, could they potentially have COVID-19.

If you don't know what RNA or microarrays are, please read the "Background Knowledge" page to get up to speed.

In the next lesson, we will go over how to actually analyze the data.