We have just recently submitted our first RNA-Seq sample in lab and were directed to using galaxy to do our alignment to the bovine genome to obtain our transcript profile. I have little experience working with this data and am just trying to figure out which tool to use to get a basic summary of our read alignment to the reference genome. The goal is to have raw numbers on total sequenced fragments, uniquely mapped fragments, fragments mapped to annotated exons, fragments mapped to annotated genes (exons + introns), etc. Is there anyone that can point me in the right direction? Thank you in advance for the help, trying to get started in the RNA-Seq world.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
This tutorial is fantastic, but I'm not sure the tutorial answers the OP's question, which is more geared toward post-alignment analytics. I've often wondered what else I should look at besides running samtools flagstat or EstimateLibraryComplexity.jar (picard). (I'm not even really sure how to interpret the output from EstimateLibraryComplexity.)
Comment
-
I am definitely looking more for post-alignment information. I just want to know how to find out how many of my reads are mapping to exons vs introns. How many reads are aligning to annotated vs novel junctions. A general overview of how many reads are aligning to different things within the genome. This may be a very basic task but my lack of knowledge in working with the data is proving troublesome. I have looked at our data in the past and gotten it aligned and then done some cufflinks work with what transcripts are being expressed at various levels based on FPKM and such. However I need to backtrack and get these general statistics of what my reads are aligning to in the genome (exons, introns, annotated junctions, novel junctions, annotated exons, annotated genes, etc).
Comment
-
You can use intersectBed from BEDtools to check and count alignments overlapping various genomic features (anything described by GFF or BED) - you could set thresholds for overlap to reduce minimally overlapping reads... Not really feasible for splice junctions, however...
Comment
-
RNASeQC for post-alignment metrics
There is a GATK-based package called RNASeQC that provides a large number of post-alignment metrics along the lines of what you are looking for, including total, unique, duplicate reads, mapped reads and mapped unique reads, rRNA reads, strand specificity, GC bias, correlation to a reference sequence, and many coverage metrics.
It is available as a standalone piece of software and as a module on the GenePattern public server at http://genepattern.broadinstitute.org. More information is at https://confluence.broadinstitute.or...Tools/RNA-SeQC .
Michael
Comment
-
At CLC bio we have several tutorials for RNASeq analysis. They are a great way for a beginner to understand the various steps required, and you will also get detailed reports of your mappings. This will require that you download a trial version of the software, but it is free for at least two weeks, and you can also analyze your RNASeq samples. Let me know if you need more guidance.
Comment
-
I have found picard tools very useful for generating post alignment statistics
1. Get read map statistics
java -Xmx10000m -jar picard-tools-1.58/BamIndexStats.jar I= sorted.bam >sorted.stats
2. Get quality score of aligned read statistics
java -Xmx10000m -jar picard-tools-1.58/QualityScoreDistribution.jar I=sorted.bam O=qualstats CHART=qualstats.pdf
3. Get RNAseq mapped reads metrics
java -Xmx10000m -jar picard-tools-1.58/picard-tools-1.58/CollectRnaSeqMetrics.jar STRAND_SPECIFICITY=NONE REF_FLAT= annotation.refflat CHART_OUTPUT=graph.pdf INPUT= sorted.bam OUTPUT=RNA_seq.stats
Getting genomic reflat files for all genes of an organism can be tricky hence it can be build with this simple command
a) Download gtf annotation file for all genes of the organism
b) Convert gtf into refflat
gtfToGenePred -genePredExt annotation.gtf tmp
awk 'BEGIN{FS="\t"};{print $12"\t"$1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8"\t"$9"\t"$10}' tmp > annotation.refflat
The gtfToGenePred software is available at http://hgdownload.cse.ucsc.edu/admin/exe/
I hope this is helpful.
Comment
-
RNAseQC
RNAseQC seems to be a nice piece of software but I haven't seen much conversation about it on here yet.
So far, I have been able to get the read metrics on a single sample but have been unable to get the 'text-delimited description of samples and their bams' list into the correct format so that it can be recognized. Has anyone else run into this problem and/or know what the format of the .list file should look like?
Thanks so much, I'm a big fan of all the people on here who take the time to answer noob questions.
Comment
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
59 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
57 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
||
Started by seqadmin, 04-10-2024, 09:21 AM
|
0 responses
52 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 09:21 AM
|
||
Started by seqadmin, 04-04-2024, 09:00 AM
|
0 responses
56 views
0 likes
|
Last Post
by seqadmin
04-04-2024, 09:00 AM
|
Comment