SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Bowtie --sam option - includes non-aligned reads? ledsall Bioinformatics 3 02-24-2013 12:31 AM
Bowtie - only 4.12% reads aligned to transcriptome mcek RNA Sequencing 0 11-15-2011 02:10 AM
How to retrieve un-aligned reads from Bowtie shuang Bioinformatics 1 10-17-2011 01:35 PM
losing %reads aligned with Bowtie paired end analysis Batool Bioinformatics 0 04-21-2010 09:14 AM
the number of aligned reads by maq and bwa totalnew Bioinformatics 3 12-14-2009 11:18 AM

Reply
 
Thread Tools
Old 07-21-2011, 05:50 AM   #1
mgibson
Junior Member
 
Location: US

Join Date: Jun 2011
Posts: 4
Default TopHat/Bowtie - number of reads aligned

Sorry if this is a really obvious question, but I am new to the analysis of sequencing data. (Also sorry if this is posted in the wrong forum.) We are aligning an Illumina paired-end RNA Seq run to the hg19 human genome. I want to know how many of the reads aligned to the genome and I am not sure I am looking in the right place. For each lane, there are 5 files in the logs directory that I think might be helpful:

bowtie.left_kept_reads.fixmap.log
# reads processed: 51079805
# reads with at least one reported alignment: 29895367 (58.53%)
# reads that failed to align: 21144050 (41.39%)
# reads with alignments suppressed due to -m: 40388 (0.08%)
Reported 38111199 alignments to 1 output stream(s)

bowtie.left_kept_reads_seg1.fixmap.log
# reads processed: 21144050
# reads with at least one reported alignment: 4892206 (23.14%)
# reads that failed to align: 16209674 (76.66%)
# reads with alignments suppressed due to -m: 42170 (0.20%)
Reported 8921307 alignments to 1 output stream(s)

bowtie.left_kept_reads_seg2.fixmap.log
# reads processed: 21144050
# reads with at least one reported alignment: 5032085 (23.80%)
# reads that failed to align: 16048266 (75.90%)
# reads with alignments suppressed due to -m: 63699 (0.30%)
Reported 9308464 alignments to 1 output stream(s)

bowtie.left_kept_reads_seg3.fixmap.log
# reads processed: 21144050
# reads with at least one reported alignment: 4938783 (23.36%)
# reads that failed to align: 16146855 (76.37%)
# reads with alignments suppressed due to -m: 58412 (0.28%)
Reported 9092214 alignments to 1 output stream(s)

bowtie.left_kept_reads_seg4.fixmap.log
# reads processed: 21144050
# reads with at least one reported alignment: 3500454 (16.56%)
# reads that failed to align: 17621660 (83.34%)
# reads with alignments suppressed due to -m: 21936 (0.10%)
Reported 5527529 alignments to 1 output stream(s)

(There are the duplicate files for the right kept reads, which I know should be dealt with in the same way...)

Obviously the first file is the initial alignment. The next 4 seem to be mapping the reads that were unmapped during the first pass (given the reads processed in each is the same as the reads unmapped in the first file). From the run data, I am also assuming that these originally unmapped reads are mapped to junctions?

58% alignment isn't very good, but if I add the reads aligned in the 4 seg files, the total alignment is 94% - is this actually correct to do though?

I also want to know how many of the reads map to junction sites - am I correct in thinking the 4 seg files are mapping reads to junctions? This seems like a really high number map to junction sites if this is the case (35%). If not, is there somewhere else I can find this data.

Thanks for any help/advice you all can give me!
mgibson is offline   Reply With Quote
Old 07-23-2011, 09:06 PM   #2
pbluescript
Senior Member
 
Location: Boston

Join Date: Nov 2009
Posts: 224
Default

A simple way to get the data you want and a bit more would be to use bamtools or Picard. bamtools has the stats command and Picard has several different commands for extracting various metrics from your bam/sam files.
pbluescript is offline   Reply With Quote
Old 09-01-2011, 08:33 AM   #3
townway
Member
 
Location: Rockville

Join Date: May 2009
Posts: 40
Default

how can I calculate the number of reads mapped to junctions by bamtools or picard?

a simple way I can think about is to counts the splitted reads, but I am not sure it is right way.

Thank you

Quote:
Originally Posted by pbluescript View Post
A simple way to get the data you want and a bit more would be to use bamtools or Picard. bamtools has the stats command and Picard has several different commands for extracting various metrics from your bam/sam files.
townway is offline   Reply With Quote
Old 09-01-2011, 08:57 AM   #4
pbluescript
Senior Member
 
Location: Boston

Join Date: Nov 2009
Posts: 224
Default

Quote:
Originally Posted by townway View Post
how can I calculate the number of reads mapped to junctions by bamtools or picard?

a simple way I can think about is to counts the splitted reads, but I am not sure it is right way.

Thank you
You can use the Bamtools filter option to get all reads with an "N" in the cigar string. This indicates a split read.
pbluescript is offline   Reply With Quote
Old 09-01-2011, 10:50 AM   #5
townway
Member
 
Location: Rockville

Join Date: May 2009
Posts: 40
Default

would you mind which version of bamtools you used?
the current one does not have cigar filter
Thank you

General Filters:
-alignmentFlag <int> keep reads with this *exact*
alignment flag (for more detailed queries,
see below)
-insertSize <int> keep reads with insert size
that mathces pattern
-mapQuality <[0-255]> keep reads with map quality
that matches pattern
-name <string> keep reads with name that
matches pattern
-queryBases <string> keep reads with motif that
mathces pattern
-tag <TAG:VALUE> keep reads with this
key=>value pair

Quote:
Originally Posted by pbluescript View Post
You can use the Bamtools filter option to get all reads with an "N" in the cigar string. This indicates a split read.
townway is offline   Reply With Quote
Old 09-01-2011, 11:29 AM   #6
pbluescript
Senior Member
 
Location: Boston

Join Date: Nov 2009
Posts: 224
Default

The current version of Bamtools does have the filter by cigar string option.
You can read a manual here:
https://github.com/pezmaster31/bamto...mTools-1.0.pdf
Get the current version here:
https://github.com/pezmaster31/bamtools

Your bamtools command will be something like:

Code:
bamtools filter -in reads.bam -out split_reads.bam -script cigarN.script
Your cigarN.script should look like this:
Code:
{
        "cigar" : "*N*"
}
pbluescript is offline   Reply With Quote
Old 09-02-2011, 06:32 AM   #7
vbiaudet
Member
 
Location: Paris

Join Date: Apr 2011
Posts: 13
Default unmapped reads with TopHat and mismatch

We work also with RNASeq in Plant genome Arabidopsis and for us the majority of unmapped reads are not due to intron junctions but are due to errors at the end of sequences (we got read of 100b and the ten or twenty last bases contain mismatch). We know that the parameters for bowtie are very strict, no more 3 mismatch are authorized by bowtie. So are you sure that the majority of unmapped reads were due to exon/inrton junctions?

bye, VB

Quote:
Originally Posted by mgibson View Post
Sorry if this is a really obvious question, but I am new to the analysis of sequencing data. (Also sorry if this is posted in the wrong forum.) We are aligning an Illumina paired-end RNA Seq run to the hg19 human genome. I want to know how many of the reads aligned to the genome and I am not sure I am looking in the right place. For each lane, there are 5 files in the logs directory that I think might be helpful:

bowtie.left_kept_reads.fixmap.log
# reads processed: 51079805
# reads with at least one reported alignment: 29895367 (58.53%)
# reads that failed to align: 21144050 (41.39%)
# reads with alignments suppressed due to -m: 40388 (0.08%)
Reported 38111199 alignments to 1 output stream(s)

bowtie.left_kept_reads_seg1.fixmap.log
# reads processed: 21144050
# reads with at least one reported alignment: 4892206 (23.14%)
# reads that failed to align: 16209674 (76.66%)
# reads with alignments suppressed due to -m: 42170 (0.20%)
Reported 8921307 alignments to 1 output stream(s)

bowtie.left_kept_reads_seg2.fixmap.log
# reads processed: 21144050
# reads with at least one reported alignment: 5032085 (23.80%)
# reads that failed to align: 16048266 (75.90%)
# reads with alignments suppressed due to -m: 63699 (0.30%)
Reported 9308464 alignments to 1 output stream(s)

bowtie.left_kept_reads_seg3.fixmap.log
# reads processed: 21144050
# reads with at least one reported alignment: 4938783 (23.36%)
# reads that failed to align: 16146855 (76.37%)
# reads with alignments suppressed due to -m: 58412 (0.28%)
Reported 9092214 alignments to 1 output stream(s)

bowtie.left_kept_reads_seg4.fixmap.log
# reads processed: 21144050
# reads with at least one reported alignment: 3500454 (16.56%)
# reads that failed to align: 17621660 (83.34%)
# reads with alignments suppressed due to -m: 21936 (0.10%)
Reported 5527529 alignments to 1 output stream(s)

(There are the duplicate files for the right kept reads, which I know should be dealt with in the same way...)

Obviously the first file is the initial alignment. The next 4 seem to be mapping the reads that were unmapped during the first pass (given the reads processed in each is the same as the reads unmapped in the first file). From the run data, I am also assuming that these originally unmapped reads are mapped to junctions?

58% alignment isn't very good, but if I add the reads aligned in the 4 seg files, the total alignment is 94% - is this actually correct to do though?

I also want to know how many of the reads map to junction sites - am I correct in thinking the 4 seg files are mapping reads to junctions? This seems like a really high number map to junction sites if this is the case (35%). If not, is there somewhere else I can find this data.

Thanks for any help/advice you all can give me!
vbiaudet is offline   Reply With Quote
Old 10-22-2011, 08:04 PM   #8
rahulvarma
Junior Member
 
Location: india

Join Date: Aug 2011
Posts: 9
Default

hi...

I am done with eukaryotic de novo assembly of transcriptome( illuma platform) 30mn paired-end reads.now I would ike to chk the accuracy .so desided to map against reads using bowtie.when I use mapping qulaity 100 (--mapq 100).getting following results

reads processed:47mn
reads with atleast one reported alignment:4mn(9%)
reads with failed to align:21%
reads with alignments suppressed due to -m:69%
reported:4mn

same time when I don't use --mapq 100 I get follwing values.

reads processed:47mn
reads with atleast one reported alignment:37mn(78%)
reads with failed to align:21%
reads with alignments suppressed due to -m:69%
reported:37mn

bit confused abt setting the mapping quality for mapping RNA-seq reads aginst transcriptome assembely.can anyone suggest me ? thanks in advance!
rahulvarma is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:40 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO