SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Aligners for Illumina's mate-pairs Margarida Bioinformatics 8 07-29-2013 09:28 AM
Mate pairs in ABySS VNou Bioinformatics 0 06-29-2012 06:57 AM
Fixing mate pairs in fastq lukas1848 Bioinformatics 2 05-03-2012 11:08 AM
Merging mate pairs by quality Yrinky Bioinformatics 2 04-13-2012 01:48 AM
454 mate pairs and mosaik afb Bioinformatics 4 04-02-2010 05:07 AM

Reply
 
Thread Tools
Old 09-13-2012, 08:28 AM   #1
JackieBadger
Senior Member
 
Location: Halifax, Nova Scotia

Join Date: Mar 2009
Posts: 381
Default Orientation of mate-pairs

It seems our MiSeq mate paired data isn't orientated as 5'--3 3'---5'

In a sequence that would be
5'-------------3'
We see

5'----- 3'-----

Additionally each FASTQ file is mixed with both mate pair ends.
Is there a tool which will reverse one of these fastq files to get

5'----- -----3'

??

Thanks,
J
JackieBadger is offline   Reply With Quote
Old 09-13-2012, 12:55 PM   #2
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default

How did you prepare your library? What file are you looking at? What application are these being used for?

Do you really mean mate pairs, which in the Illumina world & Ion Torrent world, these are prepared by circularizing DNA and removing a large portion of it, or paired ends (the more conventional reads generated from Nextera or TruSeq libraries with no circularization step)

Standard FASTQ files from paired end data report each read in its forward orientation. FLASH is an excellent tool for merging them if they overlap. If they don't overlap, I think in general you are better off leaving them separate (not reverse-complementing one & adding a string of N), as most tools can use that more profitably.

Depending on how you prepared your template, the reads may or may not be expected to be oriented relative to some reference. An aligner (e.g. Bowtie, BWA) will align them in either direction to the reference.
krobison is offline   Reply With Quote
Old 09-14-2012, 08:34 AM   #3
JackieBadger
Senior Member
 
Location: Halifax, Nova Scotia

Join Date: Mar 2009
Posts: 381
Default

Sorry, yes paired-ends, and directionality was not incorporated into the library prep.

The paired ends are either end of a short PCR amplicon.
Reverse complementing the reverse (second) sequence would work, however because no directionality was employed both Fastq files contain 5'--3' and 3'--5' sequences. Essentially what I need to do is separate sequences based on priming sequence (i.e. forward and reverse). Then I can reverse complement all reverse 3'---5' sequences.

Is there a tool to filter out sequences based on a known string of bases?

Cheers,

J
JackieBadger is offline   Reply With Quote
Old 09-14-2012, 08:45 AM   #4
JackieBadger
Senior Member
 
Location: Halifax, Nova Scotia

Join Date: Mar 2009
Posts: 381
Default

Do you think a simple Perl script using grep could do this?
JackieBadger is offline   Reply With Quote
Old 09-14-2012, 10:37 AM   #5
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

I'm not sure that you need to bother. bwa will just align them I don't think it will care if some of then are not oriented right.

But that result is very strange. I'd stop and make sure that there isn't some serious error in your experiment or analysis, because it is not normal to have a whole lot of reads both in the same orientation.
swbarnes2 is offline   Reply With Quote
Old 09-14-2012, 07:20 PM   #6
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default

There are a number of tools for dealing with amplicon sequencing, though often with fusion primers (sequencing adapters built into primers, so everything is oriented). E.g. PANDA.

If it is a short amplicon, then FLASH will merge the reads but won't help you with orienting.

Trying to use grep/Perl regexp for this is problematic, as you will probably have errors. Aligners. But, since you have two bites on the apple you might attempt it (i.e. try matching first one end then the other to determine orientation. I would just use an aligner to solve the problem.
krobison is offline   Reply With Quote
Old 09-18-2012, 05:07 PM   #7
JackieBadger
Senior Member
 
Location: Halifax, Nova Scotia

Join Date: Mar 2009
Posts: 381
Default

Thanks for the input.
From the same MiSeq run I have RADseq data that produced two fastq files; one of which contained all forward reads, and the second all reads in the same orientation that needed to be reverse complimented.

The data that seems to be problematic used TrueSeq amplicon library prep..
So each fastq file shows a mixture of orientated reads:

Fastq1
BARCODE--F. PRIMER--NNNN
BARCODE--R.PRIMER--NNNN

Fastq 2

BARCODE--R.PRIMER--NNNN
BARCODE--F. PRIMER--NNNN

I'm guessing that the programs designed to join overlapping PE reads will not be able to take into consideration the mixture of Fwd and Rev sequences in each file?
is this normal, or should I be seeing all Fwd and Rev primer sequences in separate Fastqs?

Sorry if my question is repetitive/obvious to answer. New to Illumina data.

Cheers,
J
JackieBadger is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:59 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO