Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
in silico restriction digest of Illumina reads (fastq file) berthold Bioinformatics 2 05-13-2018 11:54 PM
mRNA-seq illumina GAII - microRNAs? vebaev RNA Sequencing 3 09-16-2011 12:15 AM
public data sets muchomaas Bioinformatics 2 06-08-2010 01:48 AM
sff_extract: combining data from 454 Flx and Titanium data sets agroster Bioinformatics 7 01-14-2010 10:19 AM
Assembling pooled BACs from 454 data westerman Bioinformatics 2 01-22-2009 05:23 AM

Thread Tools
Old 12-16-2009, 03:43 AM   #1
Junior Member
Location: the Netherlands

Join Date: May 2008
Posts: 5
Default In silico data sets from BACs for GAII Illumina

Hi there! Currently I am planning a next-gen seq project with the GAII
of Illumina. But before I dive in, I have one question.

Basically I want to do targetted sequencing of a 300kb region.
The region is notorious for it's copynumber variation and could easily vary in size by 100kb.

Up till now one conventional BAC clone of about 190kb has been paired-end Sanger sequenced. Basically I want to know if Alignment Software can
handle this region filled with repeats.

My question is the following: Is there a software program: where I can give the 190kb sequence as input. And the software then does an in silico cutting/shearing similar as would happen during the actual experiment.
(It would be helpfull if variable fragment sizes can be analysed.)

Then from these fragments, an in silico mate-paired data set is generated, and this is fed back into an allignment program. Basically a new contig is generated, and this is then compared with the actual input sequence.

Any help/suggestions is much appreciated! Thanks
CG&R is offline   Reply With Quote
Old 12-16-2009, 06:37 AM   #2
Senior Member
Location: Boston area

Join Date: Nov 2007
Posts: 747

There are some programs out there; see previous thread

This my personal one; it needs some more work (doesn't generate quality information & the errors are evenly distributed in the read) but is useful as a baseline.

use strict;
use Bio::SeqIO;
use Statistics:istrib::Normal;

# quick & dirty Illumina read simulator
# Keith Robison. Infinity Pharmaceuticals Inc

# heavily modified from http://wiki.bioinformatics.ucdavis.e...asta_reference

## several flaws in original modified
## 1) proper FASTA reading via Bioperl
## 2) handles multiple input sequences correctly
## 3) reads can come from either strand
## 4) depth of reads set with constant instead of fixed number of reads

my $insertMean=200;
my $insertSd=20;
my $dist = Statistics:istrib::Normal->new(mu => $insertMean,
sigma => $insertSd);

my $coverage=40;
my $readLen=50;
my $bases="ATCG";
my $stem="reads";
if ($ARGV[0] eq '-s') # -s stem where stem will be the beginning of output file names
shift(@ARGV); $stem=shift(@ARGV);
my $sourceSeq=0;
foreach my $arg(@ARGV)
my $seqFile=new Bio::SeqIO(-file=>$arg,-format=>"Fasta");
while (my $seq=$seqFile->next_seq())
my $seqLen=$seq->length;
my $nReadsToGenerate=int($seqLen/$readLen * $coverage);
my @fragmentSizes=$dist->rand($nReadsToGenerate);

for (my $i=1; $i<scalar(@fragmentSizes); $i++)
my $fragmentSize=$fragmentSizes[$i];
my $pos=int(rand($seqLen-$fragmentSizes[$i]-1));
my $fragment=substr($seq->seq,$pos,$fragmentSize);
if (rand()>=0.5) # reverse complement
for (my $j=0; $j<$readLen; $j++)
if (rand()<0.001)
if (rand()<0.001)

print FWD ">$sourceSeq-$i\n",substr($fragment,0,$readLen),"\n";
my $revRead=reverse(substr($fragment,$fragmentSize-$readLen,$readLen));
print REV ">$sourceSeq-$i\n",$revRead,"\n";
krobison is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 12:50 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO