![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Variant Frequency Calculation (non-SNP)? | dmtruong | Bioinformatics | 0 | 12-27-2011 06:53 PM |
PubMed: Resequencing of 200 human exomes identifies an excess of low-frequency non-sy | Newsbot! | Literature Watch | 1 | 12-02-2010 08:04 AM |
Software for variant frequency | sasignor | Bioinformatics | 4 | 04-30-2010 01:31 PM |
samtools pileup for low frequency alleles? | greigite | Bioinformatics | 0 | 03-02-2010 04:43 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Singapore Join Date: Nov 2010
Posts: 30
|
![]()
Hi all,
we've developed a generic, sensitive and fast method (quite a mouthful but it's true, see for yourself ![]() LoFreq takes a samtools [m]pileup (via file or pipe) as input and produces a simple list of variants including associated frequencies, p-values (use for filtering!), base-counts, strand-bias info etc. The format is self-explanatory, plain CSV at the moment, but I will add VCF-support soon. For now, if your data consists of several chromosomes, you have to run it once per chromosome. An example call would look like this: Code:
samtools mpileup -f your-ref.fa your-bam.bam | lofreq_snpcaller.py -b 1 -o snp-out.txt -v The code is accessible from https://github.com/andreaswilm/LoFreq (requires Python 2.6 or Python 2.7). I'd be happy to receive any type of feedback! Cheers, Andreas |
![]() |
![]() |
![]() |
#2 |
Member
Location: Raleigh, NC Join Date: Nov 2008
Posts: 51
|
![]()
Hi Andreas
Does this tool handle indels as well as SNPs? Thanks Mark |
![]() |
![]() |
![]() |
#3 |
Member
Location: Singapore Join Date: Nov 2010
Posts: 30
|
![]()
Hi Mark,
I'm afraid it doesn't at the moment. I'm not sure, but I think FreeBayes might be able to handel indels. Andreas |
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: Boston, MA Join Date: Nov 2010
Posts: 100
|
![]()
Hi Andreas,
Do I understand the following correctly. If I have a reference fasta with two segments (from a virus), do I need to split those two segments into separate fasta files and run them each separately? |
![]() |
![]() |
![]() |
#5 |
Member
Location: Singapore Join Date: Nov 2010
Posts: 30
|
![]()
Hi,
No need to split the fasta file. Just run the Samtools/LoFreq combo once for each fragment, each time providing the corresponding fragment name to samtools, so that the [m]pileup is created for one fragment at a time. The reason is the following: The current version of LoFreq only prints SNV coordinates, without a chromosome/sequence name. In order to avoid a mixup you better run the analysis separately per chromosome. We'll release a version that can handle this properly and produces standard vcf output in the coming weeks. Let me know if I can help further, Andreas |
![]() |
![]() |
![]() |
#6 |
Member
Location: Singapore Join Date: Nov 2010
Posts: 30
|
![]()
Hi all,
please note: we've moved the project over to Sourceforge and also updated to a new version (much faster, lots of bug-fixes etc) Andreas Last edited by me_myself_andI; 07-12-2012 at 01:31 AM. |
![]() |
![]() |
![]() |
#7 |
Member
Location: Singapore Join Date: Nov 2010
Posts: 30
|
![]()
Hi all,
we've released version 0.3.1, which is much easier to use. The most visible changes are: Samtools is now called internally, support for regions (bed), chromosome awareness (overdue) and a "somatic" (SNVs unique to one sample) SNV calling pipeline script. The paper is now accessible as well, see http://nar.oxfordjournals.org/cgi/co...St&keytype=ref Andreas |
![]() |
![]() |
![]() |
#8 |
Junior Member
Location: Singapore Join Date: Jan 2013
Posts: 3
|
![]()
Hi Andreas,
I have installed LoFreq but have some issues with samtools. Our system-wide copy of samtools is outdated and so I have 0.1.18 installed in one of my user folders. I use an alias as well as exported it's path to my .bashrc, so when I run samtools from the command line, it uses the 0.1.18 version. However, when I run lofreq_snpcaller.py, it appears to be reverting to the outdated version as it can't find the 'depth' command. Is there a way to tell the script the path to my updated samtools copy? Cheers, WK |
![]() |
![]() |
![]() |
#9 |
Member
Location: Singapore Join Date: Nov 2010
Posts: 30
|
![]()
Hi WK,
an alias won't work, because as LoFreq will use a system call to execute samtools. It therefore will use samtools found in the first path mentioned in your PATH variable. Can you make sure that the directory for 0.1.18 comes before any other installation? In other words, make sure that after removing the alias, a simple samtools call will execute the right version. The next version has samtools build-in, so quirks like this can't happen. Andreas |
![]() |
![]() |
![]() |
#10 |
Junior Member
Location: Singapore Join Date: Jan 2013
Posts: 3
|
![]()
Hi Andreas,
Thanks for that. I managed to get it to work by inserting my samtools path in front of the PATH variable so that it gets looked at first. WK |
![]() |
![]() |
![]() |
#11 |
Junior Member
Location: Singapore Join Date: Jan 2013
Posts: 3
|
![]()
Hi Andreas,
Another question.. I am running the lofreq_uniq_pipeline.py script on a tumour-normal whole exome pair that has been realigned and re calibrated by GATK. I get numerous warnings about mismatches between base count and coverage value WARNING [2013-01-22 15:12:50,484]: Mismatch between number of bases (= 749) and samtools coverage value (= 750). Ins/del events: 0/0. Cleaned base_str is.... Is this expected? WK |
![]() |
![]() |
![]() |
#12 |
Member
Location: Singapore Join Date: Nov 2010
Posts: 30
|
![]()
Hi Wengkhong,
it's not expected, but you can safely ignore those messages. It's actually a bug in one of the routines checking the integrity of the data. The data is fine, so don't worry. This is fixed in the latest version 0.5.0 which I uploaded last week. As a bonus: this version also integrates mapping quality. Andreas |
![]() |
![]() |
![]() |
#13 |
Member
Location: Southeast Asia Join Date: Nov 2008
Posts: 34
|
![]()
Hi Andreas,
I've been getting result which is puzzling when doing a viral quasispecies analysis with LoFreq . When I ran lofreq_snpcaller.py with the -Q filter, I get less variants with the default base quality of 3 than when I ran with 20. Perhaps I'm missing something here, I thought by filtering out lower quality bases, I should get more reliable and less variants? In addition, I'm also getting different number of variants using versions 0.4 and 0.5. This leads me to the next question of which parameters to use for filtering the raw variants calls. Should I ignore all those with AF values below 0.05? Thank you. ZL |
![]() |
![]() |
![]() |
#14 |
Member
Location: Singapore Join Date: Nov 2010
Posts: 30
|
![]()
Hi ZL,
when you change the Q parameter, you filter all bases, i.e. reference and non-reference bases. This is not necessarily what you want. Qualities are built into LoFreq's model. By filtering too harshly you introduce unnecessary biases. Low quality bases are not a problem per se for LoFreq; low quality means higher error probability and therefore higher chance of seeing a random error (i.e. a SNV becomes less likely). All this is part of the model. I wouldn't play with the default parameters unless you have good reason to do so. Regarding the changes between 0.4 and 0.5: By default the newer version builds mapping quality into the model as well by combining base and mapping qualities. You can switch this off with --dont-join-mapq-and-baseq. Again, I usually wouldn't mess with the defaults unless you know your mapping qualities are completely off. In addition, the automatic Bonferroni settings (for p-value filtering) have been slightly changed in the new version. For filtering recommendations have a look at the Wiki. If the qualities in your BAM file were calibrated with GATK, then we the only recommended filtering steps would be a strand-bias and coverage filter. No need to filter based on frequency (we've in-vitro validated SNVs down to 0.5%(!) and in silico the resolution can go much lower depending on the coverage etc), unless you have prior knowledge. Without GATK recalibration you might see a few spurious SNVs at the lower frequency range. Andreas |
![]() |
![]() |
![]() |
#15 |
Junior Member
Location: San Diego Join Date: Dec 2013
Posts: 6
|
![]()
Hi Andreas,
I used SRMA to perform local realignment so indells can be detected. However, Lofreq only generates null bam file after SRMA treatment. Is there a fix to this situation? I can see my INDEL clearly from looking at my bam file on IGV. Thanks, Joe |
![]() |
![]() |
![]() |
#16 |
Member
Location: Singapore Join Date: Nov 2010
Posts: 30
|
![]()
Hi Joe,
I'm sorry, but currently available versions of LoFreq can only predict substitutions, but not indels (see earlier post in this thread). Version 2.1 of LoFreq will be able to predict indels and should be available within the next two months. Andreas |
![]() |
![]() |
![]() |
#17 |
Junior Member
Location: San Diego Join Date: Dec 2013
Posts: 6
|
![]()
Thanks, Andrea! LoFreq rocks.
|
![]() |
![]() |
![]() |
Tags |
diploid, haploid, samtools, snp calling, variant calling |
Thread Tools | |
|
|