Seqanswers Leaderboard Ad

**krobison** · 12-16-2009, 07:37 AM

There are some programs out there; see previous thread

Simulated Dataset of Solexa - SEQanswers

http://seqanswers.com/forums/showthread.php?t=806

Any topic/question that does not fit into the subcategories below. If you're unsure of where to put something, ask in here!

This my personal one; it needs some more work (doesn't generate quality information & the errors are evenly distributed in the read) but is useful as a baseline.

#!/usr/bin/perl
use strict;
use Bio::SeqIO;
use Statistics:

istrib::Normal;

# quick & dirty Illumina read simulator
# Keith Robison. Infinity Pharmaceuticals Inc

# heavily modified from http://wiki.bioinformatics.ucdavis.e...asta_reference

## several flaws in original modified
## 1) proper FASTA reading via Bioperl
## 2) handles multiple input sequences correctly
## 3) reads can come from either strand
## 4) depth of reads set with constant instead of fixed number of reads

my $insertMean=200;
my $insertSd=20;
my $dist = Statistics:

istrib::Normal->new(mu => $insertMean,
sigma => $insertSd);

my $coverage=40;
my $readLen=50;
my $bases="ATCG";
my $stem="reads";
if ($ARGV[0] eq '-s') # -s stem where stem will be the beginning of output file names
{
shift(@ARGV); $stem=shift(@ARGV);
}
open(FWD,">$stem.1.fasta");
open(REV,">$stem.2.fasta");
my $sourceSeq=0;
foreach my $arg(@ARGV)
{
my $seqFile=new Bio::SeqIO(-file=>$arg,-format=>"Fasta");
while (my $seq=$seqFile->next_seq())
{
$sourceSeq++;
my $seqLen=$seq->length;
my $nReadsToGenerate=int($seqLen/$readLen * $coverage);
my @fragmentSizes=$dist->rand($nReadsToGenerate);

for (my $i=1; $i<scalar(@fragmentSizes); $i++)
{
my $fragmentSize=$fragmentSizes[$i];
my $pos=int(rand($seqLen-$fragmentSizes[$i]-1));
my $fragment=substr($seq->seq,$pos,$fragmentSize);
if (rand()>=0.5) # reverse complement
{
$fragment=reverse($fragment);
$fragment=~tr/ATCG/TAGC/;
}
for (my $j=0; $j<$readLen; $j++)
{
if (rand()<0.001)
{
substr($fragment,$j,1)=substr($bases,int(rand(4)),1);
}
if (rand()<0.001)
{
substr($fragment,$fragmentSize-$j,1)=substr($bases,int(rand(4)),1);
}
}

print FWD ">$sourceSeq-$i\n",substr($fragment,0,$readLen),"\n";
my $revRead=reverse(substr($fragment,$fragmentSize-$readLen,$readLen));
$revRead=~tr/ATCG/TAGC/;
print REV ">$sourceSeq-$i\n",$revRead,"\n";
}
}
}

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 27 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 31 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 27 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

In silico data sets from BACs for GAII Illumina

Comment

Latest Articles

ad_right_rmr

News