Next Steps

Where to go from here
  1. Instead of comparing COVID-19 to controls, what about comparing severe COVID-19 to mild COVID-19? You could find genes that are linked to worse side effects. This could help doctors with assigning treatments. The dataset we used in this course has data for severe and mild COVID-19, if you remember.

  2. Find your own dataset! Gene Expression Omnibus has thousands of public gene expression datasets. You can find your own on virtually any disease you want, from Alzheimer's to pancreatic cancer to diabetes.

  3. Check out these papers I wrote for examples/inspiration. (bottom of page)

  4. Certificate: To get your certificate, complete this feedback form: https://forms.gle/NBv6Gi7pbQqreLGu7

Identification of Blood-based Biomarkers for Early Stage Parkinson’s Disease
Parkinson’s disease (PD) affects millions of people worldwide and causes symptoms such as bradykinesia and disrupted speech. Parkinson’s disease is known to be characterized by the mass death of dopaminergic neurons in the substantia nigra region. In the status quo, PD is often diagnosed at late stages because obvious motor symptoms appear after the disease has progressed far. It is advantageous to diagnose PD before the onset of motor symptoms because treatments are often more effective at early stages. While motor symptoms usually manifest when over 50% of dopaminergic neurons in the substantia nigra are already lost, molecular signatures of PD may be present at early stages in patient blood. This study aimed to analyze several gene expression studies’ data for commonly differentially expressed genes (DEGs) in the blood of early stage PD patients. 147 DEGs were identified in at least two out of three datasets and passed cut-off criteria. A protein interaction network for the DEGs was constructed and various tools were used to identify network characteristics and hub genes. PANTHER analysis revealed that the biological process “cellular response to glucagon stimulus” was overrepresented by almost 21 times among the DEGs and “lymphocyte differentiation” by 5.98 times. Protein catabolic processes and protein kinase functions were also overrepresented. ESR1, CD19, SMAD3, FOS, CXCR5, and PRKACA may be potential biomarkers and warrant further study. Overall, the findings of the present study provide insights on molecular mechanisms of PD and provide greater confidence on which genes are differentially expressed in PD. The results also are additional evidence for the role of the immune system in PD, a topic that is gaining interest in the PD research community. ### Competing Interest Statement The authors have declared no competing interest. ### Clinical Trial This study was using pre-existing gene expression data available freely from the NCBI Gene Expression Omnibus. ### Funding Statement No funding was received. ### Author Declarations I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained. Yes The details of the IRB/oversight body that provided approval or exemption for the research described are given below: Not needed. All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived. Yes I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance). Yes I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable. Yes The data is publicly available from the respective Gene Expression Omnibus studies from which they were retrieved. I do not claim credit for the data. Other researchers obtained that data and credit goes to them (study accession pages are linked). Supplemental STRING network: https://version-11-0b.string-db.org/cgi/network?networkId=bns9q9ZNmuqR List of 147 Differentially Expressed Genes https://docs.google.com/spreadsheets/d/1kAj7B2oXeNSK-Bha7Xo14IRv_Z1xTGCcdwOxRYdWem8/edit?usp=sharing Venn Diagram Results https://docs.google.com/spreadsheets/d/1g4-k2lGj78hG1rLhMsQbK77MQ-TBe9QQNAADhLfFDlI/edit?usp=sharing <https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE6613> <https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=gse54536> <https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=gse72267>
www.medrxiv.org
Integrative Bioinformatics Analysis Identifies Noninvasive miRNA Biomarkers for Lung Cancer
Non-small cell lung cancer (NSCLC), a subtype of lung cancer, affects millions of people. While chemotherapy and other treatments have improved, the 5 year survival rate of NSCLC patients is still only 21%. Early diagnosis is essential for increasing survival as treatments have higher effectiveness at earlier stages of NSCLC. Noninvasive blood-based liquid biopsy tests for NSCLC may be useful for diagnosis and prognosis. MicroRNA (miRNA) and messenger RNA present in blood can serve as biomarkers for such tests. The present study identified 13 miRNAs that are underexpressed in the tissue and blood of NSCLC patients using Gene Expression Omnibus data. Following Kaplan Meier analysis, miR-140-3p, miR-29c, and miR-199a were selected as candidate biomarkers and demonstrated statistically significant prognostic power. An ROC analysis of miR-140-3p expression between NSCLC patients and controls had an area under curve value of 0.85. Functional enrichment analysis of the miRNA target genes revealed several overrepresented pathways relevant to cancer. Eight target genes were hub genes of the protein protein interaction network and possessed significant prognostic power. A combination of IL6, SNAI1, and CDK6 achieved a hazard ratio of 1.4 with p < 0.001. These biomarkers are especially valuable because they can be identified in blood and reflect the tumor state. Since all miRNAs were underexpressed in both tissue and blood, detecting expression of a biomarker miRNA in blood provides information on its expression in tissue as well. These miRNAs may be useful biomarkers for NSCLC prognostic and diagnostic tests and should be further studied. ### Competing Interest Statement The authors have declared no competing interest. ### Clinical Trial As this study only used publicly available data and was computational, no clinical trial ID was necessary. ### Funding Statement No funding was received. ### Author Declarations I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained. Yes The details of the IRB/oversight body that provided approval or exemption for the research described are given below: No IRB approval was necessary. All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived. Yes I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance). Yes I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable. Yes Data is freely and publicly available from the Gene Expression Omnibus. <https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE137140> <https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE94536> <https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE53882> <https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE93300> * NSCLC : Non-small cell lung cancer LUAD : Lung adenocarcinoma LUSC : Squamous cell lung cancer miRNA : microRNA DEMiRNA : Differentially expressed microRNA
www.medrxiv.org

Remember, teens can do research too!