SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Uniquely mapped reads and difference for single end and paired end reads gene_x Bioinformatics 2 01-13-2015 12:55 PM
50 bp paired end reads vs. 100 bp single end reads efoss Bioinformatics 12 01-15-2014 08:05 PM
How to count number of mapped paired-end and single-end rna-seq reads repinementer Bioinformatics 8 01-06-2013 05:06 AM
Can Cuffdiff treat paired-end and single-end reads at the same time? zun RNA Sequencing 3 06-12-2012 05:37 PM
Can paired-end mapping produce more reads than single-end ? warrenemmett Bioinformatics 13 03-20-2012 11:10 PM

Reply
 
Thread Tools
Old 01-14-2015, 10:56 AM   #1
cburke04
Junior Member
 
Location: USA

Join Date: Jan 2015
Posts: 2
Default Creating psuedo paired-end sequencing reads from single-end reads

Does anyone know of a program out there that will generate psuedo paired-end sequencing reads from Illumina 100bp single-end reads? This is for an initial pass at looking for structural variation (large scale inversions) between a reference genome and some high coverage SE read population genomics data that I have, I would like to use Pindel and some other methods that required paired-end sequencing data.
cburke04 is offline   Reply With Quote
Old 01-14-2015, 12:41 PM   #2
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

I'm not aware of such a program, but depending on exactly what you wanted it wouldn't be difficult to write. My assumption is that you just want to split the 100bp read into two non-overlapping 50bp reads, which is simple to write (just reverse complement the sequence for read #2 and reverse its quality line). That should be doable in basically any language, though either of biopython or bioperl have the advantage of providing reverse complement functions, which means the whole thing could be a small handful of lines.
dpryan is offline   Reply With Quote
Old 01-14-2015, 12:55 PM   #3
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

You may also need to put a gap between the paired ends. Say bases 1-45 and then 55-100. While I would tend to do what dpryan suggest -- roll your own -- that is just because it would be easy to do. Another thought is to use the FastX tools.

fastx_trimmer to get the first 45 (or 50) bases
fastx_reverse_complement and then fastx_trimmer to get the last 45 (50) bases.
westerman is offline   Reply With Quote
Old 01-14-2015, 01:29 PM   #4
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

I have a program which does that...

bbfakereads.sh in=reads.fastq out1=r1.fastq out2=r2.fastq length=100

That will generate fake pairs from the input file, with whatever length you want (maximum of input read length). We use it in some cases for generating a fake LMP library for scaffolding from a set of contigs. Read 1 will be from the left end, and read 2 will be reverse-complemented and from the right end; both will retain the correct original qualities. And " /1" " /2" will be suffixed after the read name.
Brian Bushnell is offline   Reply With Quote
Old 01-14-2015, 01:34 PM   #5
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Man, BBMap really is a swiss army knife. Does it come with a tooth-pick too?
dpryan is offline   Reply With Quote
Old 01-14-2015, 01:45 PM   #6
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

It does do contaminant removal As long as whatever is stuck between your teeth is genetically distinct from human, at least.

Last edited by Brian Bushnell; 01-14-2015 at 01:50 PM.
Brian Bushnell is offline   Reply With Quote
Old 01-14-2015, 06:10 PM   #7
dcameron
Member
 
Location: Australia

Join Date: Mar 2013
Posts: 26
Default

I would have thought using a soft-clip based SV caller (local alignment required) would be more appropriate for single-end reads. A mean fragment untemplated sequence length of 0 with a variance of 0 does seem rather unusual and I wouldn't be surprised if some of the callers you tried crashed or gave meaningless results (eg: on my idealised simulated indel data, breakdancer-max drops to ~2% TP calling rate, and clever & gasv-pro refuse to run when average fragment size < 2*read length). I'd be interested to hear how you go.
dcameron is offline   Reply With Quote
Reply

Tags
genomics, illumina reads, pindel

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:12 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO