Good to know, thanks!
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Hi all,
I am having similar problems filtering my bowtie2 output, although not as severe as the originally reported numbers. I mapped reads back to a denovo assembled transcriptome with the following settings
-all --end-to-end --score-min L,-0.1,-0.1 --no-discordant --no-mixed
44223325 reads; of these:
44223325 (100.00%) were paired; of these:
7691237 (17.39%) aligned concordantly 0 times
25521175 (57.71%) aligned concordantly exactly 1 time
11010913 (24.90%) aligned concordantly >1 times
82.61% overall alignment rate
I am aware that the --all setting leaves the MAPQ without much meaning, so using this to filter out uniquely mapped reads is not possible. I could redo the analysis to avoid this problem, but would rather continue working with the same dataset to keep things consistent There are no mentions of this setting causing any other problems though.
Is there something I am missing, or did my settings do something unexpected?
Best,
Jan Philip
Comment
-
The presence of an XS auxiliary tag doesn't mean that an alignment isn't unique (n.b., "unique" isn't really a useful term, there's a reason for MAPQ scores). bowtie2 should only count an alignment as not unique if the XS and AS scores are the same.
Note that you're typically best off simply filtering by MAPQ score.
Comment
-
Thank you for the swift reply. I however do not fully understand how uniqueness is defined, if not by the fact that a given read only maps to one location (given the set score thresholds etc). I understand that if the first location a read maps to is significantly better than the alternate locations (by MAPQ score feks) it could probably also be considered unique in some respect.
However, if it is true that uniquely mapped reads could also have a XS tag, I would have expected the number of reads without it to be significantly lower, not higher, than the number reported by bowtie. So I am still pretty puzzled by my results.
I will try to have a look at your posts regarding the MAPQ scores and seriously consider redoing my analyses.
Comment
-
Therein lies the problem, there is no single definition of "uniqueness". There are multiple incompatible definitions. Further, if we relax the --score-min settings enough then by some definitions there will never be any unique alignments. This is why MAPQ is a generally more useful concept and you'd be better served just forgetting about the term "unique" in this context.
Comment
-
Thanks for the help, I will try to figure out how to best solve the issue for my experiments.
I found this, which might be of interest to others trying to understand how bowtie2 assigns scores: link. There are also some interesting thoughts on uniqueness discussed in this and an older blog post.
Comment
-
Hi all,
This worked for me, but I don't know if it is a general solution. If you set the -k paramenter in Bowtie2 to >=2, you should have at least twice the name of the read in your SAM file. You can use that to remove reads that appear >1 times in the file my_filename.sam. This way you don't have to undertand how Bowtie sets tags and flags.
prefix="my_filename"
tail -n +$(expr $(grep "^@" "$prefix.sam" | wc -l | cut -f 1 -d " ") + 1) "$prefix.sam" | sort | cut -f 1 | uniq -cd | cut -d " " -f 8 > "$prefix.toremove"
grep -vwF -f "$prefix.toremove" "$prefix.sam" > "$prefix.unique.sam"
rm "$prefix.toremove"Last edited by keo; 03-30-2017, 07:18 PM.
Comment
-
Originally posted by dpryan View PostThe presence of an XS auxiliary tag doesn't mean that an alignment isn't unique (n.b., "unique" isn't really a useful term, there's a reason for MAPQ scores). bowtie2 should only count an alignment as not unique if the XS and AS scores are the same.
Note that you're typically best off simply filtering by MAPQ score.
MAPQ=39 ... AS:i:0 XS:i:0
39 seems like an arbitrary value. In my case, The lines that don't have XS score, have a score of 42.
Comment
Latest Articles
Collapse
-
by seqadmin
Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...-
Channel: Articles
03-22-2024, 06:39 AM -
-
by seqadmin
The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.
Avian Conservation
Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...-
Channel: Articles
03-08-2024, 10:41 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Yesterday, 06:37 PM
|
0 responses
12 views
0 likes
|
Last Post
by seqadmin
Yesterday, 06:37 PM
|
||
Started by seqadmin, Yesterday, 06:07 PM
|
0 responses
10 views
0 likes
|
Last Post
by seqadmin
Yesterday, 06:07 PM
|
||
Started by seqadmin, 03-22-2024, 10:03 AM
|
0 responses
51 views
0 likes
|
Last Post
by seqadmin
03-22-2024, 10:03 AM
|
||
Started by seqadmin, 03-21-2024, 07:32 AM
|
0 responses
68 views
0 likes
|
Last Post
by seqadmin
03-21-2024, 07:32 AM
|
Comment