Bioinformatics

Course date:  11-13 December 2017

Location : Streatham Campus, University of Exeter

Course open to:  All years

Course organiser: David Studholme

Course fee for students not funded through the DTP: £200

Course registration: https://nercdtp.onlinesurveys.ac.uk/bioinformatics-december-2017

Registration deadline: 1 December 2017

Objective:  Student to be able to navigate, analyse and manipulate files in a Unix/Linux file-system and be able to automate repetitive tasks using scripts.  Student to be able to effectively communicate and collaborate with bioinformaticians in the handling, modelling, and analysis of large-scale biological data.   Student to gain experience of bioinformatics tools widely used in handling modern DNA sequencing data.

Composition :  Students work through the tasks at their own pace and ask the instructors for help when they get stuck or something is not clear.  They will be working on a Linux server, bio-ugserver01.ex.ac.uk or bio-ugserver02.ex.ac.uk . They will access the server via their laptop PC using either a virtual desktop (using X2Go software) or via a simple text-only interface (using Putty software). The instructors will help configure X2Go/Putty to get logged-in to the server.

Section 1: Getting started

Section 2: Introduction to Unix
  • Understand how to use any Unix-based file-system
  • Be able to manipulate files on any Unix-based system
  • Feel comfortable performing basic scripting operations
Section 3: Mapping sequence reads against a reference genome
  • Interpret FASTQ quality metrics
  • Remove poor quality data
  • Trim adaptor/contaminant sequences from FASTQ data
  • Count the number of reads before and after trimming and quality control
  • Align reads to a reference sequence to form a SAM file (Sequence AlignMent file) using BWA -Convert the SAM file to BAM format (Binary AlignMent format)
  • Identify and select high quality SNPs and Indels using SAMtools
  • Identify missing or truncated genes with respect to the reference genome
  • Identify SNPs which overlap with known coding regions
Section 4a: De novo assembly
  • Extract reads which do not map to the reference sequence
  • Assemble these reads de novo using SPAdes
  • Generate summary statistics for the assembly – Identify potential genes within the assembly
  • Search for matches within the NCBI database via BLAST and against the Pfam database
  • Visualize the taxonomic distribution of BLAST hits
  • Perform gene prediction and annotation using RAST
Section 4b: De novo assembly
  • Assemble these reads de novo using SPAdes
  • Generate summary statistics for the assembly
  • Understand how to incorporate long PacBio reads into the assembly.
  • Identify open reading frames within the assembly
  • Search for matches within the NCBI database via BLAST and against the Pfam database
  • Visualize species distribution of potential matches
Section 5: Transcriptomics (RNA-seq)
  • Perform a splicing aware alignment of RNA-seq data against a reference
  • Count number of reads that fall within each gene.
  • Perform a differential analysis of a dataseq using DESeq2
  • Produce some simple visualization of patterns of gene expression
  • Pick out the most highly regulates genes from a dataset

Powered by WordPress. Designed by WooThemes

css.php