View Single Post
Old 04-30-2020, 04:30 AM   #3
Junior Member
Location: Netherlands

Join Date: Apr 2020
Posts: 4

Originally Posted by GenoMax View Post
It may be best to use "" (to look at in-line help) to introduce the mutations after you generate the reads with You will get a VCF files of changes.

I think using "local=t" is causing the alignment issues. It is meant for local alignments when you expect errors at end of reads.
Thank you for your ultra-fast reply GenoMax! I have a question on your comment. Did you mean to say to use "" before generating the reads? I actually tried that part:
1) First mutate my original sequence:
./ in=reference.fa out=reference_Mut.fa vcf=Variants.vcf subrate=0.01 indelrate=0.005 maxindel=3 overwrite=t
The vcf file indeed looks super!
2) Then generate reads based on the new sequence, without any additional flags (only min quality):
./ ref=reference_Mut.fa out1=Sample1_R1.fastq out2=Sample1_R2.fastq coverage=10000 paired=t minq=12 -Xmx10g
3) Map the reads to the mutated reference (exclude "local" option):
./ ref=reference_Mut.fa in1=Sample1_R1.fastq in2=Sample1_R2.fastq covstats=Covstats.cov
This indeed yields a 99.905% mapping, which makes sense based on some quality filter I assume.
4) Map the reads to the original reference.
./ ref=reference.fa in1=Sample1_R1.fastq in2=Sample1_R2.fastq covstats=Covstats.cov
This yielded a 99.875% mapping which also makes sense.

My follow-up question then is can I safely assume that the fastq files I'm generating, using the method above, contain the variants at a rate of 100%?

If that assumption is correct, what would be the best way to regulate the variation rate? I'm thinking along the lines of:
1) Generate perfect reads from un-mutated reference.
2) Generate perfect reads from mutated sequence.
3) Mix the file sets in various percentages to get the desired effect and make a new set of files.
Is that correct logic or is there a way to already do that?

Thank you in advance, I really appreciate it!
DPalachan is offline   Reply With Quote