JackieBadger 06-20-2012 03:55 PM

MiSeq Amplicon: grouping indexed FASTQs by cluster

I apologize if this is common knowledge or a repeat question in advance.

I am new to Illumina and will be getting data from the MiSeq very soon.
We are paired-end sequencing 1000s of indexed/barcoded amplicons, each of which may contain between 5-30 variants of the target region. it just so happens that our target region is ~200bp so we can cover the whole sequences with a little overlap by using paired ends. However, we need to be able to sort each FASTQ and group by clusters so as to isolate and align the sequences from each individual variant within a sample. The allocation of reads to unique clusters is important so as to avoid building contigs between similar yet phylogenetically distinct variants (i.e. chimeras) within each sample.

Is there a program which will group sequences within each FASTQ and perform an alignment between sequences of each cluster, so as to join paired ends and create our full length target contig?

If this could then be implemented in batch so as to run through all FASTQ files and produce a new file containing contigs, that would be desirable.

Im thinking that if this does not exists then someone with the know-how could fairly easily write some code to perform this function?

Any insight would be much appreciated.



