SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
RNA-seq SNP-calling without a complete reference shoegame2001 RNA Sequencing 6 07-04-2012 01:55 AM
SNP base calling shuang Bioinformatics 7 10-24-2011 12:50 PM
SNP base calling for multiple samples shuang Bioinformatics 2 09-07-2011 03:06 PM
SNP calling from a reference sequence blackrabite Genomic Resequencing 2 05-21-2011 09:48 PM
Hierarchical reference-free SNP calling Marius Bioinformatics 1 12-27-2010 09:38 AM

Reply
 
Thread Tools
Old 08-08-2011, 12:28 AM   #1
moriah
Member
 
Location: Israel

Join Date: Dec 2010
Posts: 23
Default Editing fasta , reference base in snp calling samtools

Couple questions, I have a fasta file of hg18 which is about 3Gb large and I'm afraid that samtools pileup can't read it correctly because there are lower and upper case letters, is there a tool that can open such a large file so I can edit it or it there something that can convert all lower to upper case?

Second question is about samtools pileup , after generating a raw.pileup file and filtering snps with R bioconductor package (Rsamtools) using the following commands :
system("samtools pileup -f genome.fasta output.sorted.bam > raw.pileup")
snps <- readPileup("raw.pileup", variant="SNP")
write.table(as.data.frame(snps), "snps.xls")

I get a xls file in which the column of "referenceBase" is all filled with "*" instead of A/C/T/G....

someone knows what might be the problem?

Thanks ahead!
moriah is offline   Reply With Quote
Old 08-08-2011, 10:40 AM   #2
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

maybe sed can replace all the lowercase letters with uppercase?

I haven't used the R package, but in general, if you are getting not getting the right reference letter in the pileup, it's because pileup didn't properly connect the reference fasta you gave it in the command line with what was aligned to. Mismatching (or possibly weird names with spaces and odd characters) names might do that, or if the samtools indexing step failed (having one long string of a genome is one reason that step fails), that'll cause pileup to behave like that.
swbarnes2 is offline   Reply With Quote
Old 08-10-2011, 12:11 AM   #3
moriah
Member
 
Location: Israel

Join Date: Dec 2010
Posts: 23
Default

Thanks for your answer!

I get the right letter in the pileup file but as I use the R package in order to determine the snps I get the "*" instead of the reference nucleotides...

What other tool can exclude the snps from the pileup file ?

Thanks!
moriah is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:06 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO