HI,
I have 2 question on DINDEL please.
I am using DINDEL to identify short indels from ILLUMINA NextGen data generated from a custom pull down experiment. I have used exons from 34 genes to design 60pb baits library (custom library from agilent) , pulled down genomic DNA form diseased and normal samples with this library, and went through ILUMINA hiseq.
Because of the small size of the pull down library , I have a very good coverage on my targets regions, when I calculate the coverage/baits , this is an example of coverage ( minimum 10000 reads per 60pb bait)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0 10460 14220 13680 17590 30920
I have used dindel-1.01. Reduced the bam to my target regions using samtools view command, and indexed the ref and the bam files.
stage 1 ok:
dindel --analysis getCIGARindels --bamFile sample.bam \
--outputFile sample.dindel_output --ref ref.fa
stage 2 OK ( realignment windows) using makeWindows.py. I create 18 windows form this point.
Q1-stage 3 issue(first question)
dindel-1.01 --analysis indels --doDiploid --bamFile / --ref / --varFile / --libFile / --outputFile /'
causes problems may be due to my high coverage (?). this is the kind of error I get
##skipped Chr9 6032047 reason: error_above_read_count_threshold
##skipped Chr9 6032047 reason: error_above_read_count_threshold
I have seen that there is a reads count options with --maxRead which is set to 10000 at the moment. Is that too small for very high coverage data? Is that Ok to push this up?
Q2-Another question , I have 2 set of data, sample with disease and sample with no disease ( from the same individual). Is there a way to use the no disease sample as a reference to ignore those indels in the disease samples ( I hope I am making sense )
many thanks
I have 2 question on DINDEL please.
I am using DINDEL to identify short indels from ILLUMINA NextGen data generated from a custom pull down experiment. I have used exons from 34 genes to design 60pb baits library (custom library from agilent) , pulled down genomic DNA form diseased and normal samples with this library, and went through ILUMINA hiseq.
Because of the small size of the pull down library , I have a very good coverage on my targets regions, when I calculate the coverage/baits , this is an example of coverage ( minimum 10000 reads per 60pb bait)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0 10460 14220 13680 17590 30920
I have used dindel-1.01. Reduced the bam to my target regions using samtools view command, and indexed the ref and the bam files.
stage 1 ok:
dindel --analysis getCIGARindels --bamFile sample.bam \
--outputFile sample.dindel_output --ref ref.fa
stage 2 OK ( realignment windows) using makeWindows.py. I create 18 windows form this point.
Q1-stage 3 issue(first question)
dindel-1.01 --analysis indels --doDiploid --bamFile / --ref / --varFile / --libFile / --outputFile /'
causes problems may be due to my high coverage (?). this is the kind of error I get
##skipped Chr9 6032047 reason: error_above_read_count_threshold
##skipped Chr9 6032047 reason: error_above_read_count_threshold
I have seen that there is a reads count options with --maxRead which is set to 10000 at the moment. Is that too small for very high coverage data? Is that Ok to push this up?
Q2-Another question , I have 2 set of data, sample with disease and sample with no disease ( from the same individual). Is there a way to use the no disease sample as a reference to ignore those indels in the disease samples ( I hope I am making sense )
many thanks