SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Tophat uniquely mapped reads mrfox Bioinformatics 2 05-23-2013 05:58 AM
not uniquely mapped reads unidodo RNA Sequencing 2 04-22-2011 02:07 PM
Uniquely mapped reads with bowtie mapper Bioinformatics 2 11-22-2010 10:44 PM
DEGseq and uniquely mapped reads PFS Bioinformatics 1 07-14-2010 09:37 AM
cufflinks and non-uniquely mapped reads clariet Bioinformatics 1 05-08-2010 11:13 AM

Reply
 
Thread Tools
Old 02-24-2009, 12:38 PM   #1
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default Uniquely mapped short reads (MAQ)

I searched for this discussion but could not find one.

I would like to use only uniquely mapped reads for SNP calling, etc in MAQ. By uniquely mapped I mean, the top hit/best alignment is unique and no other position on reference maps with equally good score. I would want to use only these reads to do SNP calling and further analysis.. I believe its doable using MAQ, but not sure.

Next, I would like to keep track of what reads were dumped because of non-unique mapping, apart from the regular dumped reads found using MAQ's -u option.
bioinfosm is offline   Reply With Quote
Old 02-24-2009, 04:27 PM   #2
frozenlyse
Senior Member
 
Location: Australia

Join Date: Sep 2008
Posts: 136
Default

if i recall correctly maq by default reports non-unique reads; it just gives them a mapping quality of 0
frozenlyse is offline   Reply With Quote
Old 03-10-2009, 09:43 AM   #3
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

True, is there a direct way to use just the uniquely mapping reads in order to do SNP calling..
bioinfosm is offline   Reply With Quote
Old 03-17-2009, 08:34 AM   #4
digitonin
Junior Member
 
Location: USA

Join Date: Mar 2009
Posts: 6
Default Finding unique reads

Has anyone been able to work around this? Essentially, I would like MAQ to assemble unique reads only. If it is not able to do this (which I think is an easy modification), how do I view the mapping qualities? I would probably just throw those away with scores of 0 and map the rest again from start.

Any help would be appreciated. Thanks
digitonin is offline   Reply With Quote
Old 03-17-2009, 08:47 AM   #5
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

The ELAND/Bowtie type aligners tell your for each read how many times they aligned to your reference sequence. So you could align with one of thsoe programs first, and then sift the output, pulling out the read info on lines where the read hits only one time. Then give only those reads to MAQ.

You would have to align twice, which is kind of silly, but it would work.
swbarnes2 is offline   Reply With Quote
Old 03-17-2009, 03:48 PM   #6
frozenlyse
Senior Member
 
Location: Australia

Join Date: Sep 2008
Posts: 136
Default

i think you will be better off using bowtie for the alignment step - bowtie is nicer and faster anyway and can automatically filter out non-unique reads (use the -m 1 option)

then split your bowtie map file into 2 million line chunks and use bowtie-maqconvert to convert into maq map files, then merge into a single maq map file to go on and perform downstream maq applications

something along these lines
Code:
mkdir splitout
split -l 2000000 -d bowtie.map splitout/bowtie.map
for files in splitout/*
    do bowtie-maqconvert "$files" "$files".maq maq_reference.bfa;
done
maq mapmerge maq.map splitout/*.maq
frozenlyse is offline   Reply With Quote
Old 03-17-2009, 04:02 PM   #7
zee
NGS specialist
 
Location: Malaysia

Join Date: Apr 2008
Posts: 249
Default

It's easy enough to exclude non-unique reads from MAQ by using the builtin features e.g.

for assembly use

Code:
maq assemble -q 10 ...
Where 10 is your quality score filter.
zee is offline   Reply With Quote
Old 03-18-2009, 12:45 PM   #8
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

I am slightly confused about how the SNP calls are being made etc, and where the exclusion occurs on using maq assemble -q 10

I ran the same data with and without -q 10, all other commands remaining same. The results are interestingly different.

The maq map alignments are different, but the number of final SNPs reported are way different - 2797 when using -q 10, and 1751 otherwise (1411 common)

Can someone walk me through these numbers...

Without -q 10
Code:
Lane1
-- 135 potential soa-indels pass the filter.
-- == statmap report ==
-- # single end (SE) reads: 6655133
-- # mapped SE reads: 1676926 (/ 6655133 = 25.19%)

Lane2
-- 101 potential soa-indels pass the filter.
-- == statmap report ==
-- # single end (SE) reads: 6925588
-- # mapped SE reads: 2213553 (/ 6925588 = 31.96%)
With -q 10
Code:
Lane1
-- 131 potential soa-indels pass the filter.
-- == statmap report ==
-- # single end (SE) reads: 6655133
-- # mapped SE reads: 1676926 (/ 6655133 = 25.19%)

Lane2
-- 101 potential soa-indels pass the filter.
-- == statmap report ==
-- # single end (SE) reads: 6925588
-- # mapped SE reads: 2213553 (/ 6925588 = 31.96%)
Code:
$ diff regular.maq.aln q10.map.aln
1,2c1,2
< 30PERAAXX_300164-25:1:52:1644:1221    MLH1    2       -       0       0       99      99      99      0       0       1       0       36      TTGGCTGAAGGCACTTCCGTTGAGCATCTAGACG
TT    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
< 30PERAAXX_300164-25:1:80:1159:720     MLH1    2       -       0       0       96      96      96      0       0       1       0       36      TtGGCTGAAGGCaCTtCCGTTGAGCATCTAGACG
TT    @0IIIIIBIIII#II5IIIIIIIIIIIIIIIIIII>
---
> 30PERAAXX_300164-25:1:52:1644:1221    MLH1    2       -       0       0       98      98      98      0       0       1       0       36      TTGGCTGAAGGCACTTCCGTTGAGCATCTAGACG
TT    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
> 30PERAAXX_300164-25:1:80:1159:720     MLH1    2       -       0       0       95      95      95      0       0       1       0       36      TtGGCTGAAGGCaCTtCCGTTGAGCATCTAGACG
TT    @0IIIIIBIIII#II5IIIIIIIIIIIIIIIIIII>

34,44c34,44
< 30PERAAXX_300164-25:1:33:693:1787     MLH1    14      -       0       0       99      99      99      0       0       1       0       36      ACTTCCGTTGAGCATCTAGACGTTTCCTTGGCTC
TT    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
< 30PERAAXX_300164-25:1:41:347:1531     MLH1    14      -       0       0       99      99      99      0       0       1       0       36      ACTTCCGTTGAGCATCTAGACGTTTCCTTGGCTC
TT    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
< 30PERAAXX_300164-25:1:67:393:1068     MLH1    14      -       0       0       99      99      99      0       0       1       0       36      ACTTCCGTTGAGCATCTAGaCGTTTCCTTGGCTC
TT    HIIHIIIIIIIIIIIIIII3IIIIIIIIIIIIIIII
< 30PERAAXX_300164-25:1:70:959:1492     MLH1    14      -       0       0       99      99      99      0       0       1       0       36      aCTTCCGTTGAGCATCTAGACGTTTCCTTGGCTC
TT    2IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
< 30PERAAXX_300164-25:1:93:1678:647     MLH1    14      -       0       0       96      96      96      0       0       1       0       36      aCTTcCGTTGAGCATCTAGACGTTTCCTTGGCTC
TT    +III6IIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
< 30PERAAXX_300164-25:1:4:220:1034      MLH1    15      -       0       0       99      99      99      0       0       1       0       36      CTTCCGTTGAGCATCTAGACGTTTCCTTGGCTCT
TC    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
< 30PERAAXX_300164-25:1:9:249:1562      MLH1    15      -       0       0       99      99      99      0       0       1       0       36      CTTCCGTTGAGCATCTAGACGTTTCCTTGGCTCT
TC    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
< 30PERAAXX_300164-25:1:12:1505:614     MLH1    15      -       0       0       96      96      96      0       0       1       0       36      CttCCGTTGAGCATCTAGACGTTTCCTTGGCTCT
TC    I81IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
< 30PERAAXX_300164-25:1:13:368:1753     MLH1    15      -       0       0       99      99      99      0       0       1       0       36      CTTCCGTTGAGCATCTAGACGTTTCCTTGGCTCT
TC    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
< 30PERAAXX_300164-25:1:15:9:1341       MLH1    15      -       0       0       99      99      99      0       0       1       0       36      CTTCCGTTGAGCATCTAGACGTTTCCTTGGCTCT
TC    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
< 30PERAAXX_300164-25:1:19:639:103      MLH1    15      -       0       0       99      99      99      0       0       1       0       36      CTTCCGTTGAGCATCTAGACGTTTCCTTGGCTCT
TC    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
---
> 30PERAAXX_300164-25:1:33:693:1787     MLH1    14      -       0       0       98      98      98      0       0       1       0       36      ACTTCCGTTGAGCATCTAGACGTTTCCTTGGCTC
TT    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
> 30PERAAXX_300164-25:1:41:347:1531     MLH1    14      -       0       0       98      98      98      0       0       1       0       36      ACTTCCGTTGAGCATCTAGACGTTTCCTTGGCTC
TT    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
> 30PERAAXX_300164-25:1:67:393:1068     MLH1    14      -       0       0       98      98      98      0       0       1       0       36      ACTTCCGTTGAGCATCTAGaCGTTTCCTTGGCTC
TT    HIIHIIIIIIIIIIIIIII3IIIIIIIIIIIIIIII
> 30PERAAXX_300164-25:1:70:959:1492     MLH1    14      -       0       0       98      98      98      0       0       1       0       36      aCTTCCGTTGAGCATCTAGACGTTTCCTTGGCTC
TT    2IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
> 30PERAAXX_300164-25:1:93:1678:647     MLH1    14      -       0       0       95      95      95      0       0       1       0       36      aCTTcCGTTGAGCATCTAGACGTTTCCTTGGCTC
TT    +III6IIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
> 30PERAAXX_300164-25:1:4:220:1034      MLH1    15      -       0       0       98      98      98      0       0       1       0       36      CTTCCGTTGAGCATCTAGACGTTTCCTTGGCTCT
TC    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
> 30PERAAXX_300164-25:1:9:249:1562      MLH1    15      -       0       0       98      98      98      0       0       1       0       36      CTTCCGTTGAGCATCTAGACGTTTCCTTGGCTCT
TC    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
> 30PERAAXX_300164-25:1:12:1505:614     MLH1    15      -       0       0       95      95      95      0       0       1       0       36      CttCCGTTGAGCATCTAGACGTTTCCTTGGCTCT
TC    I81IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
> 30PERAAXX_300164-25:1:13:368:1753     MLH1    15      -       0       0       98      98      98      0       0       1       0       36      CTTCCGTTGAGCATCTAGACGTTTCCTTGGCTCT
TC    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
> 30PERAAXX_300164-25:1:15:9:1341       MLH1    15      -       0       0       98      98      98      0       0       1       0       36      CTTCCGTTGAGCATCTAGACGTTTCCTTGGCTCT
TC    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
> 30PERAAXX_300164-25:1:19:639:103      MLH1    15      -       0       0       98      98      98      0       0       1       0       36      CTTCCGTTGAGCATCTAGACGTTTCCTTGGCTCT
TC    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
bioinfosm is offline   Reply With Quote
Old 03-30-2009, 11:05 AM   #9
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

I realize that there is a -q option that one may use with maq pileup as well.
Would a repeat mapped read mapping to different locations with same score (mapping quality or the 7th column of reads.mapview is 0) be used to do SNP calling?

Thanks
bioinfosm is offline   Reply With Quote
Reply

Tags
align, maq, snp, solexa, unique

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:56 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO