SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Splitting NuGen barcodes from paired-end sequences ester Bioinformatics 10 10-03-2011 10:36 AM
RNA-seq: read counts in single- vs paired-end sequences fbarreto Illumina/Solexa 4 08-03-2011 06:19 PM
SAMTool tview problem for paired-end sequences mapped by BFAST NanYu SOLiD 5 06-07-2011 02:08 PM
Velvet insert length on Illumina NGS Paired end reads sari_khaleel Illumina/Solexa 0 10-29-2010 09:12 AM
the oligonucleotide sequences of P7 and P5 on paired-end flow cells lhemivw Illumina/Solexa 0 08-20-2010 02:52 AM

Reply
 
Thread Tools
Old 01-27-2012, 06:23 AM   #1
LizBent
Member
 
Location: Guelph, Ontario, Canada

Join Date: Jan 2012
Posts: 31
Default Velvet paired end after some sequences removed?

Hi,

After trimming and filtering my Illumina sequences (paired end 100 bp) for quality, I am left with files that probably don't have the same number of sequences in them any longer. This will be a problem if I try to interleave the files and use them as input for Velvet, right?

Would it be better to:

1. Don't remove any sequences from my files, even if very short or 0 bases, so they can be interleaved
2. Remove sequences and use the (smaller) files in Velvet but not as paired-end reads

Are there other options I'm not aware of?

Thanks
LizBent is offline   Reply With Quote
Old 01-27-2012, 10:22 AM   #2
jjohnson
Member
 
Location: Washington DC Metro Area

Join Date: Aug 2009
Posts: 20
Default

I would bin the valid pairs and singletons (those with mates removed due to quality trimming/filtering) into 2 separate fastq files. Velvet can accept mutiple files and then you can paramertize around the files (such as specifying insert sizes for mates file, etc).

i.e.

velveth Assem 35 -shortPaired -fasta pe_lib1.fasta -short3 se_lib1.fa
__________________
Justin H. Johnson | Twitter: @BioInfo | LinkedIn: http://bit.ly/LIJHJ | EdgeBio
jjohnson is offline   Reply With Quote
Old 03-05-2012, 01:29 PM   #3
nposnien
Member
 
Location: Göttingen, Germany

Join Date: May 2011
Posts: 13
Default

Hi,
we are facing the same problem at the moment. We will have uneven files (one for each pair) after trimming/filtering.

My question is, if there is a script/program out there that would find the mates in in two different files (or in one file if I would merge/shuffle the files prior to trimming/filtering) and bins the unpaired reads into an extra file?

Any help is highly appreciated!
nposnien is offline   Reply With Quote
Old 03-05-2012, 01:42 PM   #4
rahularjun86
Member
 
Location: Frankfurt(M), Germany

Join Date: Jan 2011
Posts: 58
Default

Dear nposnien,
you can use Sickle tool (https://github.com/najoshi/sickle). You only need to input the pair fastq files, and other parameters (scoring system used, quality score to keep and length cutoff etc.), and it will generate the paired and singleton files.
If you want to filter out reads with N's, Just replace the whole sequence with N and quality with #, then set Sickle length and quality values. This way it will filter out reads with N's.
Best wishes,
Rahul
__________________
Rahul Sharma,
Ph.D
Frankfurt am Main, Germany
rahularjun86 is offline   Reply With Quote
Old 03-06-2012, 12:21 AM   #5
LizBent
Member
 
Location: Guelph, Ontario, Canada

Join Date: Jan 2012
Posts: 31
Default

This script may be useful for interleaving pairs for Velvet (and generating non-paired singleton files):

https://github.com/lexnederbragt/den...leave_pairs.py
LizBent is offline   Reply With Quote
Old 03-06-2012, 05:13 AM   #6
nposnien
Member
 
Location: Göttingen, Germany

Join Date: May 2011
Posts: 13
Default

First of all, thanks for the answers!

@ LizBent: Can I use the script for data that has been processed using CASAVA 1.8? In the discussion you added a link to, it is proposed to replace

f_suffix = "/1"
r_suffix = "/2"

with

f_suffix = ""
r_suffix = ""

My question is: How are the pairs identified then?
nposnien is offline   Reply With Quote
Old 03-06-2012, 05:25 AM   #7
LizBent
Member
 
Location: Guelph, Ontario, Canada

Join Date: Jan 2012
Posts: 31
Default

No idea, you might want to ask the original script writer, who is cited in the comments at the top of the script (and there is also a reference to another SeqAnswers thread there that might answer your question). Sorry I can't help.
LizBent is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:30 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO