Seqanswers Leaderboard Ad

**GenoMax** · 04-30-2020, 03:02 AM

It may be best to use "mutate.sh" (to look at in-line help) to introduce the mutations ~~after you generate the reads with randomreads.sh~~. You will get a VCF files of changes.

I think using "local=t" is causing the alignment issues. It is meant for local alignments when you expect errors at end of reads.

**DPalachan** · 04-30-2020, 04:30 AM

Originally posted by GenoMax View Post

It may be best to use "mutate.sh" (to look at in-line help) to introduce the mutations after you generate the reads with randomreads.sh. You will get a VCF files of changes.

I think using "local=t" is causing the alignment issues. It is meant for local alignments when you expect errors at end of reads.

Thank you for your ultra-fast reply GenoMax! I have a question on your comment. Did you mean to say to use "mutate.sh" before generating the reads? I actually tried that part:
1) First mutate my original sequence:

Code:

./mutate.sh in=reference.fa out=reference_Mut.fa vcf=Variants.vcf subrate=0.01 indelrate=0.005 maxindel=3 overwrite=t

The vcf file indeed looks super!

2) Then generate reads based on the new sequence, without any additional flags (only min quality):

Code:

./randomreads.sh ref=reference_Mut.fa out1=Sample1_R1.fastq out2=Sample1_R2.fastq coverage=10000 paired=t minq=12 -Xmx10g

3) Map the reads to the mutated reference (exclude "local" option):

Code:

./bbmap.sh ref=reference_Mut.fa in1=Sample1_R1.fastq in2=Sample1_R2.fastq covstats=Covstats.cov

This indeed yields a 99.905% mapping, which makes sense based on some quality filter I assume.
4) Map the reads to the original reference.

Code:

./bbmap.sh ref=reference.fa in1=Sample1_R1.fastq in2=Sample1_R2.fastq covstats=Covstats.cov

This yielded a 99.875% mapping which also makes sense.

My follow-up question then is can I safely assume that the fastq files I'm generating, using the method above, contain the variants at a rate of 100%?

If that assumption is correct, what would be the best way to regulate the variation rate? I'm thinking along the lines of:
1) Generate perfect reads from un-mutated reference.
2) Generate perfect reads from mutated sequence.
3) Mix the file sets in various percentages to get the desired effect and make a new set of files.
Is that correct logic or is there a way to already do that?

Thank you in advance, I really appreciate it!

**GenoMax** · 04-30-2020, 05:54 AM

Perhaps I am missing a subtle point but since you can control mutation type/rates with mutate.sh would it not be better to go with just that data (#2 in list above)? If you mix reads from mutated and un-mutated reference you can't be sure that you will maintain the fraction of mutations at the same level in new pool?

**DPalachan** · 04-30-2020, 06:26 AM

Hi GenoMax! Yes, you are right, the fraction of mutations may change in the new pool, but - based on the mixing percentages (in my scenario) - I'll be able to control the level of mutations per position. So, say on position 100, go from 100% MUT to 80% MUT if I mix my original reference fastq and mutated reference fastq using 80/20 percent of reads respectively.

**GenoMax** · 04-30-2020, 06:31 AM

You could try it out and see if it works (I have a hunch it won't be perfect).

If that does not work acceptably then I suggest you run mutate.sh multiple times with new values and create new datasets.

**DPalachan** · 04-30-2020, 06:50 AM

Thank you, I'll try it and report back!

Topics	Statistics	Last Post
Evaluating Genome Sequencing for ECMO Patients in the NICU by seqadmin Started by seqadmin, 12-17-2024, 10:28 AM	0 responses 27 views 0 likes	Last Post by seqadmin 12-17-2024, 10:28 AM
New Genetic Toolkit Refines Studies on Gene Function and Disease by seqadmin Started by seqadmin, 12-13-2024, 08:24 AM	0 responses 43 views 0 likes	Last Post by seqadmin 12-13-2024, 08:24 AM
Study Links Brain Mechanism to Emotional Responses in Animals and Humans by seqadmin Started by seqadmin, 12-12-2024, 07:41 AM	0 responses 29 views 0 likes	Last Post by seqadmin 12-12-2024, 07:41 AM
Study Identifies Ribosomal RNA Fingerprints as Early Cancer Biomarkers by seqadmin Started by seqadmin, 12-11-2024, 07:45 AM	0 responses 42 views 0 likes	Last Post by seqadmin 12-11-2024, 07:45 AM

Seqanswers Leaderboard Ad

Announcement

BBmap generation and mapping of artificial paired-end fastq

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News