SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Problem with BWA mapping of Illumina PE short insert size fragments (FFPE material) LadyGray Bioinformatics 2 10-22-2012 02:20 AM
Short fragments removal- Amplicon sequencing sanju0891 454 Pyrosequencing 7 03-29-2012 06:43 AM
sequencing of short DNA fragments (40-240bp) volks SOLiD 1 09-07-2010 06:04 AM
Short reads fragments of genome... hicham Bioinformatics 2 03-24-2010 05:15 AM
PubMed: Phylogenetic classification of short environmental DNA fragments. Newsbot! Literature Watch 0 02-21-2008 06:17 AM

Reply
 
Thread Tools
Old 06-09-2013, 09:34 AM   #1
impeachme2
Junior Member
 
Location: Florida

Join Date: Apr 2012
Posts: 6
Default Recommendations for aligning short fragments from illumina PE

Hello,

I've recently performed Illumina PE (2x75) sequencing of enzymatically-fragmented genomic DNA, where the fragments going into the library prep ranged from 20 to 400 bp. I would like to align these reads to a reference genome to obtain locations and insert sizes, while discarding all reads that are not paired.

Based on this workflow, we assume (to use bowtie2 manual terminology and diagrams) that:

1) Mates may 'overlap' each other

Code:
Mate 1:    GCAGATTATATGAGTCAGCTACGATATTGTT
Mate 2:                               TGTTTGGGGTGACACATTACGCGTCTTTGAC
Reference: GCAGATTATATGAGTCAGCTACGATATTGTTTGGGGTGACACATTACGCGTCTTTGAC
2) Mates may 'contain' each other

Code:
Mate 1:    GCAGATTATATGAGTCAGCTACGATATTGTTTGGGGTGACACATTACGC
Mate 2:                               TGTTTGGGGTGACACATTACGC
Reference: GCAGATTATATGAGTCAGCTACGATATTGTTTGGGGTGACACATTACGCGTCTTTGAC


Mate 1:                   CAGCTACGATATTGTTTGGGGTGACACATTACGC
Mate 2:                      CTACGATATTGTTTGGGGTGAC
Reference: GCAGATTATATGAGTCAGCTACGATATTGTTTGGGGTGACACATTACGCGTCTTTGAC
3) Mates may 'dovetail' each other

Code:
Mate 1:                 GTCAGCTACGATATTGTTTGGGGTGACACATTACGC
Mate 2:            TATGAGTCAGCTACGATATTGTTTGGGGTGACACAT                   
Reference: GCAGATTATATGAGTCAGCTACGATATTGTTTGGGGTGACACATTACGCGTCTTTGAC
I would like to find alignments and accurate insert sizes, even for the 20-bp fragments which may be classified in one of the above situations.

After clipping 3' adapters and low-quality ends, and filtering reads of low quality, I've attempted to align these reads with bowtie2:

Code:
bowtie2 --dovetail --no-mixed --nodiscordant --no-unal -x reference -1 mates1.fastq -2 mates2.fastq -S aligned.sam
The resulting sam contains only paired alignments with 'insert sizes' of the 20+ bp fragments I am interested in, but also a very large population of 'inserts sizes' the same size as the length of the reads themselves (75 bp). Importantly, there was not a large population of ~75 bp molecules that went into the library prep.

It seems that this population of ~75-bp insert sizes is an alignment artifact. What can I do to test or resolve this? I cannot find another alignment program that explicitly states they can handle mates that dovetail or contain each other.
impeachme2 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:47 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO