SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Mutation Detection Using Illumina Single-end Data dnusol Bioinformatics 3 02-08-2012 06:19 AM
Discrepancy in paired-end Illumina data kopardev Bioinformatics 1 01-04-2012 12:23 AM
Paired-sample (tumor/normal) somatic mutation detection software alexischr Bioinformatics 1 04-14-2011 05:56 AM
SNP/mutation Using Illumina Paired-end Data qqcandy General 0 10-01-2008 05:58 PM
Paired-end Illumina data mchaisso Bioinformatics 7 07-17-2008 12:52 PM

Reply
 
Thread Tools
Old 10-01-2008, 06:06 PM   #1
qqcandy
Member
 
Location: los angeles

Join Date: Sep 2008
Posts: 15
Default SNP/Mutation Detection Using Illumina Paired-end Data

This website is really useful and I've found answers to many questions I had by just reading the existing threads.

However, I have not found answer to this question -- what program/pipeline I shall use if I want to use Illumina paired-end data (about 50bp each side, genomic) to map to genome in order to find SNP/mutations?

From what I've read in the thread "Software packages for next gen sequence analysis", which is excellent btw, it seems there is no package that detects SNP while also taking advantage of the distance information from the paired-end data. If it is the case, shall I use some package that do paired-end mapping (e.g. novocraft) using loose cut-off, and then take the hits and throw them into SNP detection package, e.g. ssahaSNP, to refine the data? Will I miss significant amount of SNP information from this alternative?

Any comments or suggestions are appreciated! Thanks in advance ~
qqcandy is offline   Reply With Quote
Old 10-03-2008, 07:30 AM   #2
spirit
Member
 
Location: Canada

Join Date: Feb 2008
Posts: 11
Default

I think MAQ can deal with Illumina paired-end data and output the SNPs. Maybe I misunderstand your meaning of "detects SNP while also taking advantage of the distance information from the paired-end data". I thought it means mapping reads in paired way, then using the mapping results to extract SNP. Or you want the SNP on the two reads from one pair to have some correlation? I am adding SNP detection part to ZOOM on extracting SNP information from the alignment and assembly results. So could you give me more information? Thanks.
spirit is offline   Reply With Quote
Old 10-03-2008, 08:18 AM   #3
qqcandy
Member
 
Location: los angeles

Join Date: Sep 2008
Posts: 15
Default

Thank you for your reply!

Your first understanding is right -- mapping reads in paired way, then using the mapping results to extract SNP.

For MAQ, does it do mapping first and then SNP detection, i.e. two filtering steps to the final SNP result?

Is there any software and can do the two at the same time, i.e. put it in the same model thus get a single p-value for each SNP detected? Is it possible to do it at all?
qqcandy is offline   Reply With Quote
Old 10-03-2008, 10:00 AM   #4
apfejes
Senior Member
 
Location: Oakland, California

Join Date: Feb 2008
Posts: 236
Default

Hi qqcandy,

I have written software that uses MAQ paired alignments to do SNP calling - though maq now also has processes to do the same thing.

I don't know of any process that provides a p-value for SNPs, but you could calculate it yourself with the information available.

As for wanting to do alignment and SNP calling at one time, I only know of one piece of software that does that (Slider), but I consider that to be a terrible design decision, and I would stay away from any software that tries to do everything in one step. (The opportunity for bugs to be present increases dramatically as the complexity of the software increases.)
__________________
The more you know, the more you know you don't know. —Aristotle
apfejes is offline   Reply With Quote
Old 10-03-2008, 11:56 AM   #5
spirit
Member
 
Location: Canada

Join Date: Feb 2008
Posts: 11
Default

Yes. MAQ will take two steps to get final SNP results. I agree with apfejes that it's not good to do alignment and SNP calling at the same time, because you need to map all possible reads to one position before you know whether this position could be a SNP candidate according to the information, say the frequency of different nucleotides on this position, the quality of mapped reads. ZOOM will also adopt the two-steps way.
spirit is offline   Reply With Quote
Old 10-03-2008, 01:34 PM   #6
qqcandy
Member
 
Location: los angeles

Join Date: Sep 2008
Posts: 15
Default

Thanks a lot for the suggetions!

Now I understand that getting the alignment while assessing SNP is not a good idea. It is better to have the reads aligned first, then take all the reads that align to the region to asess SNPs.

As far as I know, there are software/algorithms that give a p-value for each SNP reported given the alignment. It was designed for EST analysis but I think it can also be applied to the short reads analysis. (They both come with scores for each nucleotide in the read.)

http://www.nature.com/ng/journal/v26...g1000_233.html

There is another related question: if the organism is diploid, the reference has C in a position, and the sample has one copy with G, the other copy with T, how the SNP detection program (e.g. the one in MAQ) deal with it? Can the program report two SNPs at the position for the sample? (Asumming we have enough reads for both G and T)
qqcandy is offline   Reply With Quote
Old 10-03-2008, 01:38 PM   #7
apfejes
Senior Member
 
Location: Oakland, California

Join Date: Feb 2008
Posts: 236
Default

There are likely many software packages out there that do SNP calling - and each one will be different. In fact, you can write your own SNP caller in a few days, or modify one of the available ones out there to do what you want.

Whether you get a p-value or whether the software can do multiple snps at the same location depends heavily on what application you choose to use.

The one I've written doesn't give a p-value, but does call multiple snps at one location. However, by calling multiple snps at one location, you start to wander into muddy waters. What if you have a trisomy, or duplication event? It becomes less and less clear what the real answer is.
__________________
The more you know, the more you know you don't know. —Aristotle
apfejes is offline   Reply With Quote
Old 10-03-2008, 02:10 PM   #8
spirit
Member
 
Location: Canada

Join Date: Feb 2008
Posts: 11
Default

I think MAQ's paper take into account of the diploid problem. However, I haven't have a try. So I have no idea of what its output looks like.

You can also try apfejes's software. Hi, apfejes, what is the name of your software? I didn't find it on your blog. Is it included in the packeage of FindPeaks?

The snp detection part of the next release of ZOOM will output the snp automatically, including the diploid ones. But now if you want to write your own SNP caller based on the alignment results, ZOOM's output may be helpful to you. Because ZOOM will output a frequency file together with the mapping results and assembled consensus. The frequency file record the frequency of different nucleotides on this position. It is like this:

position A C G T deletion insertion coverage
33113 0 1 54 2 0 0 0
4192402 0 43 0 53 0 0 0

Then you can write your perl script to decide what the snp is according to the comparing of the frequency.

Last edited by spirit; 10-03-2008 at 02:13 PM.
spirit is offline   Reply With Quote
Old 10-03-2008, 02:38 PM   #9
apfejes
Senior Member
 
Location: Oakland, California

Join Date: Feb 2008
Posts: 236
Default

Hi spirit,

My software is part of the Vancouver Short Read Analysis package (sourceforge), and is undergoing a lot of development right now.

The SNP caller itself is in pretty good shape, but has been tailored to take advantage of WTSS data, using a Maq alignment pipeline developed by another graduate student here. However, It would be relatively easy to make that part optional, which would allow the package to be used for general purpose snp calling. Since SNP callers are relatively easy to write, I wasn't expecting much interest in this - but I'm happy to make the minor changes required, if someone intends to use it.

The only caveat is that anyone who'd like to try it would need to download the source and compile it. It's only two commands ("svn checkout <path/trunk>", and "ant buildall"), but I know command line can be scary for some people.

I'll do a file release and a manual for it in the near future, but that would obviously be accelerated if there's interest.

In the meantime, you can also let me know if you want a feature list, as it's a pretty "full featured" snp caller.

Cheers.
__________________
The more you know, the more you know you don't know. —Aristotle

Last edited by apfejes; 10-03-2008 at 02:39 PM. Reason: add in one last sentance.
apfejes is offline   Reply With Quote
Old 10-03-2008, 03:29 PM   #10
qqcandy
Member
 
Location: los angeles

Join Date: Sep 2008
Posts: 15
Default

Quote:
Originally Posted by spirit View Post
The frequency file record the frequency of different nucleotides on this position. It is like this:

position A C G T deletion insertion coverage
33113 0 1 54 2 0 0 0
4192402 0 43 0 53 0 0 0

Then you can write your perl script to decide what the snp is according to the comparing of the frequency.
For more sophisticated SNP detection (although in most cases not needed), it may be useful to have the quality score for each call. E.g. for position 33133, what is the score for the one C, what are the scores for the 54 Gs. In this case the situation is clear, but if there are 2 Cs and 3 Gs, the score for each of the call can make a big diffrence. It would be nice if we can have that form of output too.
qqcandy is offline   Reply With Quote
Old 10-03-2008, 03:31 PM   #11
qqcandy
Member
 
Location: los angeles

Join Date: Sep 2008
Posts: 15
Default

Just found this which may be useful to people who are interested in SNP call using short reads data:

Conversion of Novoalign to Maq's .map format (calling SNPs and Indels)

http://www.novocraft.com/wiki/tiki-v...desc&forumId=1

"At present most people are used to using the maq (http://maq.sourceforge.net) tools for alignment, SNP calling, assembly, etc. We plan to develop our own flavour of these but in the meantime it's possible to get more 'good' alignments than maq using novoalign, and then presumably use the maq tools to call assemble reads/call SNPs from novo* alignments."
qqcandy is offline   Reply With Quote
Old 10-06-2008, 01:39 PM   #12
spirit
Member
 
Location: Canada

Join Date: Feb 2008
Posts: 11
Default

Yes. That's right. That's the reason why we are developing the new SNP detector of ZOOM. We are adding the SNP detection part considering many elements affects how probably this position is a true SNP, such as quality score of mismatch position and mapping probability of this read...

Everybody is rushing to do better !!



Quote:
Originally Posted by qqcandy View Post
For more sophisticated SNP detection (although in most cases not needed), it may be useful to have the quality score for each call. E.g. for position 33133, what is the score for the one C, what are the scores for the 54 Gs. In this case the situation is clear, but if there are 2 Cs and 3 Gs, the score for each of the call can make a big diffrence. It would be nice if we can have that form of output too.

Last edited by spirit; 10-06-2008 at 01:47 PM.
spirit is offline   Reply With Quote
Old 10-08-2008, 08:44 AM   #13
zee
NGS specialist
 
Location: Malaysia

Join Date: Apr 2008
Posts: 249
Default

I've already heard of 4 or more good SNP callers. What we try to do with novoalign is simply maximize the number of good read alignments.
IMO the best way to validate a SNP caller is to use experimental validation and not all of us are in a position to do that.
I expect people to introduce more flavours of aligners/SNP detectors especially with the advent of longer read lengths and better sequencing protocols.
zee is offline   Reply With Quote
Old 01-12-2009, 10:08 AM   #14
doxologist
Member
 
Location: USA

Join Date: Jan 2009
Posts: 96
Default

Has anyone used Bowtie for alignment and then did SNP calling with MAQ? Is this possible? To take advantage of the speed of Bowtie and the functions of MAQ?
doxologist is offline   Reply With Quote
Old 07-24-2009, 08:43 AM   #15
AnamikaDarwin
Member
 
Location: Boston

Join Date: Nov 2008
Posts: 26
Question

Quote:
Originally Posted by qqcandy View Post

There is another related question: if the organism is diploid, the reference has C in a position, and the sample has one copy with G, the other copy with T, how the SNP detection program (e.g. the one in MAQ) deal with it? Can the program report two SNPs at the position for the sample? (Asumming we have enough reads for both G and T)


Hello qqcandy,

I came across your posting from 10 months ago and I am wondering if you have resolved the issue? If yes, how did you do it?

I am using maq to find snps and for most part get only 2 calls (on my diploid organism), but there is a certain percentage of 3-allele calls as well, which for most part is far less than the major and minor allele.

Thanks,
Anamika
AnamikaDarwin is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:43 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO