SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   maq for 454 data? (http://seqanswers.com/forums/showthread.php?t=6008)

litali 07-18-2010 06:42 AM

maq for 454 data?
 
Hi,
Is it possible to use Maq for 454 data? which are the input files needed? if no, is there anything similar for 454?

Naujv 07-22-2010 05:34 PM

Hi litali,

I'm not sure if it's the right thing to do, the MAQ website FAQ section actually says: "Maq maps short reads to the reference and calls the genotypes from the alignment. It is speficially designed for Illumina-Solexa/AB-SOLiD reads, not for 454 or capillary ones." Personally, I'd like to use it too since I'm not too hot on Roche software.

If you want to run it and see for yourself, you can convert your sff files into sanger fastq (several methods below), make a reference fasta file, and follow the commands shown at the MAQ website (there's an easyrun).

Join qual and fna file into fastq:
(a) http://seqanswers.com/forums/showthread.php?t=2775
Convert sff into fastq:
(b) There's also sff2fastq at github.

The files came out the same with either code.

Naujv 07-23-2010 09:09 AM

Hi litati,

I just tried using maq on my 454 data. What I'm seeing from the alignment (maq mapview all.map > $someoutputfile) are my sequences are being cut off at 34 nts.

krobison 07-23-2010 11:45 AM

Try bwasw (a mode of bwa)

jgibbons1 07-23-2010 12:28 PM

I'm pretty sure MAQ can only map reads 63bp or smaller.

Naujv 07-23-2010 01:59 PM

krobison, thanks! I appreciate your input.

Took your advice and tried bwasw, but sort of ran into a problem with my alignments. My CIGAR string has "S" in them. Found another post where I guess there's a problem with the CIGAR ??

If you have time, I would like your thoughts (and others) regarding using bwasw for reference sequences (not whole genome and not whole chromosomes). Mine are made up of 100 non-overlapping sequences in fasta format.

nilshomer 07-23-2010 03:34 PM

Quote:

Originally Posted by Naujv (Post 22306)
krobison, thanks! I appreciate your input.

Took your advice and tried bwasw, but sort of ran into a problem with my alignments. My CIGAR string has "S" in them. Found another post where I guess there's a problem with the CIGAR ??

If you have time, I would like your thoughts (and others) regarding using bwasw for reference sequences (not whole genome and not whole chromosomes). Mine are made up of 100 non-overlapping sequences in fasta format.

The "S" character indicates soft-clipping, which is described in the SAM specification. If you still think it is a bug, could you post the SAM record in question?

Naujv 07-23-2010 05:14 PM

nils thanks for the help! i'm new (as in today new) to bwa. it may not be an error/bug, though i tried to look where the alignment is, i couldn't find it. the sequence looks like one big ugly repeat, so maybe this is spurrious. maybe you can help me understand what 40 in the line is? mapping quality? where does good and bad lie?

GKTESVC03GKDWH 16 ref|NG_023054.1|:5000-113024 77792 40 46S48M143S * 0 0 ATTCCATTCCATTCCATTCGGTTTNAACGGTATTCCAATCGATTCCATTCCATTCCATTCCATTCCATTCCATTCCATTCCTTTCCATTCCATTACGGATGATTCCATTCCATTGCATTCCATTCCATTCCATTCCCCTGTACTCGGGTTGATTCCATTCCATTCCATTCCAATCCATGCCATTCCACTCGTGTTGATTCCATTCTTTCCATTCCATTCAAGTTGATTCCATTCCAT .199;:992131111:.,.--,,,!--.--17995566999:=BBABBBBBDDDAAA????DAAAAADBBBAA>=<900000..22:;9;;<62444444<<==>=>>>>>AB===A?????DDDDFFDDFFFF;;99<[email protected]@<<44488ABBBBDDDFFF[email protected]A????FCCDDHF
AS:i:44 XS:i:0 XF:i:2 XE:i:6 XN:i:0

nilshomer 07-23-2010 10:23 PM

Quote:

Originally Posted by Naujv (Post 22314)
nils thanks for the help! i'm new (as in today new) to bwa. it may not be an error/bug, though i tried to look where the alignment is, i couldn't find it. the sequence looks like one big ugly repeat, so maybe this is spurrious. maybe you can help me understand what 40 in the line is? mapping quality? where does good and bad lie?

GKTESVC03GKDWH 16 ref|NG_023054.1|:5000-113024 77792 40 46S48M143S * 0 0 ATTCCATTCCATTCCATTCGGTTTNAACGGTATTCCAATCGATTCCATTCCATTCCATTCCATTCCATTCCATTCCATTCCTTTCCATTCCATTACGGATGATTCCATTCCATTGCATTCCATTCCATTCCATTCCCCTGTACTCGGGTTGATTCCATTCCATTCCATTCCAATCCATGCCATTCCACTCGTGTTGATTCCATTCTTTCCATTCCATTCAAGTTGATTCCATTCCAT .199;:992131111:.,.--,,,!--.--17995566999:=BBABBBBBDDDAAA????DAAAAADBBBAA>=<900000..22:;9;;<62444444<<==>=>>>>>AB===A?????DDDDFFDDFFFF;;99<[email protected]@<<44488ABBBBDDDFFF[email protected]A????FCCDDHF
AS:i:44 XS:i:0 XF:i:2 XE:i:6 XN:i:0

Have a real close read of the SAM specification. You will be going back to this quite a bit. The 5th column is the PHRED-scaled mapping quality. Looking at the CIGAR field (6th column), "46S48M143S", there seems to be 48 bases matching your reference, with the first 46 and last 143 soft-clipped.

Naujv 07-24-2010 12:20 AM

Thank you! Going through 2008 MAQ paper now.

geschickten 07-25-2010 11:02 PM

Hi All,

We have a MAQ that works with 125bp Illumina read and we also a version that works with 454 data. Its not in open domain. If anybody is interested then please send me a request at [email protected]. Thanks.


All times are GMT -8. The time now is 06:51 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.