Seqanswers Leaderboard Ad

**pbluescript** · 07-23-2011, 09:06 PM

A simple way to get the data you want and a bit more would be to use bamtools or Picard. bamtools has the stats command and Picard has several different commands for extracting various metrics from your bam/sam files.

**townway** · 09-01-2011, 08:33 AM

how can I calculate the number of reads mapped to junctions by bamtools or picard?

a simple way I can think about is to counts the splitted reads, but I am not sure it is right way.

Thank you

Originally posted by pbluescript View Post

A simple way to get the data you want and a bit more would be to use bamtools or Picard. bamtools has the stats command and Picard has several different commands for extracting various metrics from your bam/sam files.

**pbluescript** · 09-01-2011, 08:57 AM

Originally posted by townway View Post

how can I calculate the number of reads mapped to junctions by bamtools or picard?

a simple way I can think about is to counts the splitted reads, but I am not sure it is right way.

Thank you

You can use the Bamtools filter option to get all reads with an "N" in the cigar string. This indicates a split read.

**townway** · 09-01-2011, 10:50 AM

would you mind which version of bamtools you used?
the current one does not have cigar filter
Thank you

General Filters:
-alignmentFlag <int> keep reads with this *exact*
alignment flag (for more detailed queries,
see below)
-insertSize <int> keep reads with insert size
that mathces pattern
-mapQuality <[0-255]> keep reads with map quality
that matches pattern
-name <string> keep reads with name that
matches pattern
-queryBases <string> keep reads with motif that
mathces pattern
-tag <TAG:VALUE> keep reads with this
key=>value pair

Originally posted by pbluescript View Post

You can use the Bamtools filter option to get all reads with an "N" in the cigar string. This indicates a split read.

**pbluescript** · 09-01-2011, 11:29 AM

The current version of Bamtools does have the filter by cigar string option.
You can read a manual here:

https://github.com/pezmaster31/bamtools/wiki/Tutorial_Toolkit_BamTools-1.0.pdf

Get the current version here:

GitHub - pezmaster31/bamtools: C++ API & command-line toolkit for working with BAM data

https://github.com/pezmaster31/bamtools

C++ API & command-line toolkit for working with BAM data - pezmaster31/bamtools

Your bamtools command will be something like:

Code:

bamtools filter -in reads.bam -out split_reads.bam -script cigarN.script

Your cigarN.script should look like this:

Code:

{
        "cigar" : "*N*"
}

**vbiaudet** · 09-02-2011, 06:32 AM

unmapped reads with TopHat and mismatch

We work also with RNASeq in Plant genome Arabidopsis and for us the majority of unmapped reads are not due to intron junctions but are due to errors at the end of sequences (we got read of 100b and the ten or twenty last bases contain mismatch). We know that the parameters for bowtie are very strict, no more 3 mismatch are authorized by bowtie. So are you sure that the majority of unmapped reads were due to exon/inrton junctions?

bye, VB

Originally posted by mgibson View Post

Sorry if this is a really obvious question, but I am new to the analysis of sequencing data. (Also sorry if this is posted in the wrong forum.) We are aligning an Illumina paired-end RNA Seq run to the hg19 human genome. I want to know how many of the reads aligned to the genome and I am not sure I am looking in the right place. For each lane, there are 5 files in the logs directory that I think might be helpful:

bowtie.left_kept_reads.fixmap.log
# reads processed: 51079805
# reads with at least one reported alignment: 29895367 (58.53%)
# reads that failed to align: 21144050 (41.39%)
# reads with alignments suppressed due to -m: 40388 (0.08%)
Reported 38111199 alignments to 1 output stream(s)

bowtie.left_kept_reads_seg1.fixmap.log
# reads processed: 21144050
# reads with at least one reported alignment: 4892206 (23.14%)
# reads that failed to align: 16209674 (76.66%)
# reads with alignments suppressed due to -m: 42170 (0.20%)
Reported 8921307 alignments to 1 output stream(s)

bowtie.left_kept_reads_seg2.fixmap.log
# reads processed: 21144050
# reads with at least one reported alignment: 5032085 (23.80%)
# reads that failed to align: 16048266 (75.90%)
# reads with alignments suppressed due to -m: 63699 (0.30%)
Reported 9308464 alignments to 1 output stream(s)

bowtie.left_kept_reads_seg3.fixmap.log
# reads processed: 21144050
# reads with at least one reported alignment: 4938783 (23.36%)
# reads that failed to align: 16146855 (76.37%)
# reads with alignments suppressed due to -m: 58412 (0.28%)
Reported 9092214 alignments to 1 output stream(s)

bowtie.left_kept_reads_seg4.fixmap.log
# reads processed: 21144050
# reads with at least one reported alignment: 3500454 (16.56%)
# reads that failed to align: 17621660 (83.34%)
# reads with alignments suppressed due to -m: 21936 (0.10%)
Reported 5527529 alignments to 1 output stream(s)

(There are the duplicate files for the right kept reads, which I know should be dealt with in the same way...)

Obviously the first file is the initial alignment. The next 4 seem to be mapping the reads that were unmapped during the first pass (given the reads processed in each is the same as the reads unmapped in the first file). From the run data, I am also assuming that these originally unmapped reads are mapped to junctions?

58% alignment isn't very good, but if I add the reads aligned in the 4 seg files, the total alignment is 94% - is this actually correct to do though?

I also want to know how many of the reads map to junction sites - am I correct in thinking the 4 seg files are mapping reads to junctions? This seems like a really high number map to junction sites if this is the case (35%). If not, is there somewhere else I can find this data.

Thanks for any help/advice you all can give me!

**rahulvarma** · 10-22-2011, 08:04 PM

hi...

I am done with eukaryotic de novo assembly of transcriptome( illuma platform) 30mn paired-end reads.now I would ike to chk the accuracy .so desided to map against reads using bowtie.when I use mapping qulaity 100 (--mapq 100).getting following results

reads processed:47mn
reads with atleast one reported alignment:4mn(9%)
reads with failed to align:21%
reads with alignments suppressed due to -m:69%
reported:4mn

same time when I don't use --mapq 100 I get follwing values.

reads processed:47mn
reads with atleast one reported alignment:37mn(78%)
reads with failed to align:21%
reads with alignments suppressed due to -m:69%
reported:37mn

bit confused abt setting the mapping quality for mapping RNA-seq reads aginst transcriptome assembely.can anyone suggest me ? thanks in advance!

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

TopHat/Bowtie - number of reads aligned

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News