Good to know, thanks!
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Hi all,
I am having similar problems filtering my bowtie2 output, although not as severe as the originally reported numbers. I mapped reads back to a denovo assembled transcriptome with the following settings
-all --end-to-end --score-min L,-0.1,-0.1 --no-discordant --no-mixed
44223325 reads; of these:
44223325 (100.00%) were paired; of these:
7691237 (17.39%) aligned concordantly 0 times
25521175 (57.71%) aligned concordantly exactly 1 time
11010913 (24.90%) aligned concordantly >1 times
82.61% overall alignment rate
I am aware that the --all setting leaves the MAPQ without much meaning, so using this to filter out uniquely mapped reads is not possible. I could redo the analysis to avoid this problem, but would rather continue working with the same dataset to keep things consistent There are no mentions of this setting causing any other problems though.
Is there something I am missing, or did my settings do something unexpected?
Best,
Jan Philip
Comment
-
The presence of an XS auxiliary tag doesn't mean that an alignment isn't unique (n.b., "unique" isn't really a useful term, there's a reason for MAPQ scores). bowtie2 should only count an alignment as not unique if the XS and AS scores are the same.
Note that you're typically best off simply filtering by MAPQ score.
Comment
-
Thank you for the swift reply. I however do not fully understand how uniqueness is defined, if not by the fact that a given read only maps to one location (given the set score thresholds etc). I understand that if the first location a read maps to is significantly better than the alternate locations (by MAPQ score feks) it could probably also be considered unique in some respect.
However, if it is true that uniquely mapped reads could also have a XS tag, I would have expected the number of reads without it to be significantly lower, not higher, than the number reported by bowtie. So I am still pretty puzzled by my results.
I will try to have a look at your posts regarding the MAPQ scores and seriously consider redoing my analyses.
Comment
-
Therein lies the problem, there is no single definition of "uniqueness". There are multiple incompatible definitions. Further, if we relax the --score-min settings enough then by some definitions there will never be any unique alignments. This is why MAPQ is a generally more useful concept and you'd be better served just forgetting about the term "unique" in this context.
Comment
-
Thanks for the help, I will try to figure out how to best solve the issue for my experiments.
I found this, which might be of interest to others trying to understand how bowtie2 assigns scores: link. There are also some interesting thoughts on uniqueness discussed in this and an older blog post.
Comment
-
Hi all,
This worked for me, but I don't know if it is a general solution. If you set the -k paramenter in Bowtie2 to >=2, you should have at least twice the name of the read in your SAM file. You can use that to remove reads that appear >1 times in the file my_filename.sam. This way you don't have to undertand how Bowtie sets tags and flags.
prefix="my_filename"
tail -n +$(expr $(grep "^@" "$prefix.sam" | wc -l | cut -f 1 -d " ") + 1) "$prefix.sam" | sort | cut -f 1 | uniq -cd | cut -d " " -f 8 > "$prefix.toremove"
grep -vwF -f "$prefix.toremove" "$prefix.sam" > "$prefix.unique.sam"
rm "$prefix.toremove"Last edited by keo; 03-30-2017, 07:18 PM.
Comment
-
Originally posted by dpryan View PostThe presence of an XS auxiliary tag doesn't mean that an alignment isn't unique (n.b., "unique" isn't really a useful term, there's a reason for MAPQ scores). bowtie2 should only count an alignment as not unique if the XS and AS scores are the same.
Note that you're typically best off simply filtering by MAPQ score.
MAPQ=39 ... AS:i:0 XS:i:0
39 seems like an arbitrary value. In my case, The lines that don't have XS score, have a score of 42.
Comment
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 08:47 AM
|
0 responses
12 views
0 likes
|
Last Post
by seqadmin
Today, 08:47 AM
|
||
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
60 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
59 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
||
Started by seqadmin, 04-10-2024, 09:21 AM
|
0 responses
54 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 09:21 AM
|
Comment