SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Tech Summary: Illumina's Solexa Sequencing Technology ECO Illumina/Solexa 74 01-07-2019 08:05 AM
Preprocessing needed for RNA-Seq data PFS Bioinformatics 10 03-06-2014 08:36 AM
RNA-Seq: Characterizing short read sequencing for gene discovery and RNA-Seq analysis Newsbot! Literature Watch 0 01-17-2012 05:50 AM
500 million reads needed for RNA-Seq?! epistatic RNA Sequencing 6 10-31-2011 03:53 PM
quality of RNA needed for prokaryotic RNA-seq? greigite RNA Sequencing 1 12-01-2010 09:53 AM

Reply
 
Thread Tools
Old 01-24-2012, 08:08 AM   #1
ccard28
Member
 
Location: Rhode Island

Join Date: Jan 2012
Posts: 20
Default New to RNA-Seq: Help obtaining sequencing summary needed.

We have just recently submitted our first RNA-Seq sample in lab and were directed to using galaxy to do our alignment to the bovine genome to obtain our transcript profile. I have little experience working with this data and am just trying to figure out which tool to use to get a basic summary of our read alignment to the reference genome. The goal is to have raw numbers on total sequenced fragments, uniquely mapped fragments, fragments mapped to annotated exons, fragments mapped to annotated genes (exons + introns), etc. Is there anyone that can point me in the right direction? Thank you in advance for the help, trying to get started in the RNA-Seq world.
ccard28 is offline   Reply With Quote
Old 01-24-2012, 08:16 AM   #2
TonyBrooks
Senior Member
 
Location: London

Join Date: Jun 2009
Posts: 298
Default

http://main.g2.bx.psu.edu/u/jeremy/p...lysis-exercise

There is a tutorial on this.
TonyBrooks is offline   Reply With Quote
Old 01-24-2012, 09:02 AM   #3
turnersd
Senior Member
 
Location: Charlottesville, VA

Join Date: May 2011
Posts: 112
Default

This tutorial is fantastic, but I'm not sure the tutorial answers the OP's question, which is more geared toward post-alignment analytics. I've often wondered what else I should look at besides running samtools flagstat or EstimateLibraryComplexity.jar (picard). (I'm not even really sure how to interpret the output from EstimateLibraryComplexity.)
turnersd is offline   Reply With Quote
Old 01-24-2012, 10:03 AM   #4
ccard28
Member
 
Location: Rhode Island

Join Date: Jan 2012
Posts: 20
Default

I am definitely looking more for post-alignment information. I just want to know how to find out how many of my reads are mapping to exons vs introns. How many reads are aligning to annotated vs novel junctions. A general overview of how many reads are aligning to different things within the genome. This may be a very basic task but my lack of knowledge in working with the data is proving troublesome. I have looked at our data in the past and gotten it aligned and then done some cufflinks work with what transcripts are being expressed at various levels based on FPKM and such. However I need to backtrack and get these general statistics of what my reads are aligning to in the genome (exons, introns, annotated junctions, novel junctions, annotated exons, annotated genes, etc).
ccard28 is offline   Reply With Quote
Old 01-24-2012, 10:29 AM   #5
kopi-o
Senior Member
 
Location: Stockholm, Sweden

Join Date: Feb 2008
Posts: 319
Default

ever-seq is pretty good for this kind of thing
kopi-o is offline   Reply With Quote
Old 01-25-2012, 12:36 AM   #6
arvid
Senior Member
 
Location: Berlin

Join Date: Jul 2011
Posts: 156
Default

You can use intersectBed from BEDtools to check and count alignments overlapping various genomic features (anything described by GFF or BED) - you could set thresholds for overlap to reduce minimally overlapping reads... Not really feasible for splice junctions, however...
arvid is offline   Reply With Quote
Old 01-25-2012, 08:53 AM   #7
mmreich
Junior Member
 
Location: California

Join Date: Jan 2010
Posts: 1
Default RNASeQC for post-alignment metrics

There is a GATK-based package called RNASeQC that provides a large number of post-alignment metrics along the lines of what you are looking for, including total, unique, duplicate reads, mapped reads and mapped unique reads, rRNA reads, strand specificity, GC bias, correlation to a reference sequence, and many coverage metrics.

It is available as a standalone piece of software and as a module on the GenePattern public server at http://genepattern.broadinstitute.org. More information is at https://confluence.broadinstitute.or...Tools/RNA-SeQC .

Michael
mmreich is offline   Reply With Quote
Old 03-30-2012, 09:26 AM   #8
aparna
Member
 
Location: USA

Join Date: Feb 2009
Posts: 15
Default

Hi Michael,
Have you used RNAseQC? I am getting an error - wondering if you know how to fix it.

Thanks,
aparna is offline   Reply With Quote
Old 04-01-2012, 07:15 PM   #9
Nomijill
Member
 
Location: Southwest Florida

Join Date: Sep 2009
Posts: 24
Default

At CLC bio we have several tutorials for RNASeq analysis. They are a great way for a beginner to understand the various steps required, and you will also get detailed reports of your mappings. This will require that you download a trial version of the software, but it is free for at least two weeks, and you can also analyze your RNASeq samples. Let me know if you need more guidance.
Nomijill is offline   Reply With Quote
Old 04-03-2012, 03:19 AM   #10
swaraj
Member
 
Location: Naples, Italy

Join Date: Feb 2012
Posts: 50
Default

I have found picard tools very useful for generating post alignment statistics

1. Get read map statistics
java -Xmx10000m -jar picard-tools-1.58/BamIndexStats.jar I= sorted.bam >sorted.stats

2. Get quality score of aligned read statistics
java -Xmx10000m -jar picard-tools-1.58/QualityScoreDistribution.jar I=sorted.bam O=qualstats CHART=qualstats.pdf

3. Get RNAseq mapped reads metrics
java -Xmx10000m -jar picard-tools-1.58/picard-tools-1.58/CollectRnaSeqMetrics.jar STRAND_SPECIFICITY=NONE REF_FLAT= annotation.refflat CHART_OUTPUT=graph.pdf INPUT= sorted.bam OUTPUT=RNA_seq.stats

Getting genomic reflat files for all genes of an organism can be tricky hence it can be build with this simple command

a) Download gtf annotation file for all genes of the organism
b) Convert gtf into refflat

gtfToGenePred -genePredExt annotation.gtf tmp

awk 'BEGIN{FS="\t"};{print $12"\t"$1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8"\t"$9"\t"$10}' tmp > annotation.refflat

The gtfToGenePred software is available at http://hgdownload.cse.ucsc.edu/admin/exe/

I hope this is helpful.
swaraj is offline   Reply With Quote
Old 05-09-2012, 10:37 PM   #11
October
Junior Member
 
Location: Singapore

Join Date: May 2012
Posts: 2
Default RNAseQC

RNAseQC seems to be a nice piece of software but I haven't seen much conversation about it on here yet.

So far, I have been able to get the read metrics on a single sample but have been unable to get the 'text-delimited description of samples and their bams' list into the correct format so that it can be recognized. Has anyone else run into this problem and/or know what the format of the .list file should look like?

Thanks so much, I'm a big fan of all the people on here who take the time to answer noob questions.
October is offline   Reply With Quote
Old 05-10-2012, 06:48 AM   #12
aparna
Member
 
Location: USA

Join Date: Feb 2009
Posts: 15
Default

This is how it should look like:

Sample ID Bam File Notes
Y903GFAZ Y903GFAZ.bam Y903GFAZ

Aparna
aparna is offline   Reply With Quote
Old 05-14-2012, 12:44 AM   #13
October
Junior Member
 
Location: Singapore

Join Date: May 2012
Posts: 2
Default

Thank you Aparna, that format works perfectly!

Have you been able to get the coverage output? I am running the newest version (v1.1.6) but have been unable to generate anything other than the read metric files.
October is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:58 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO