SEQanswers

Go Back   SEQanswers > Introductions



Similar Threads
Thread Thread Starter Forum Replies Last Post
BWA aligned pairs read1=XT:A:U, read2=XT:A:R jflowers Bioinformatics 1 07-29-2014 10:06 PM
Poor quality 2nd end from paired-end sequencing on HiSeq agent99 Illumina/Solexa 6 05-03-2013 09:53 AM
Current read1, index, and read2 primers for HiSeq2000 SeqVicious Illumina/Solexa 1 09-26-2011 02:12 PM
paired-end reads mapped to genome.. gene with only one direction of paired-end reads? danwiththeplan Bioinformatics 2 09-22-2011 02:06 AM
RNA-seq: Replicates, single-end, paired-end story pasta Bioinformatics 2 07-04-2011 11:51 PM

Reply
 
Thread Tools
Old 05-23-2013, 03:07 AM   #1
flacchy
Member
 
Location: UK

Join Date: Apr 2013
Posts: 33
Default Paired end- search for complement of Read2 in Read1

Hi all,
I am having a bit of trouble with my Illumina paired end reads. Read2 is very poor quality then read2 so after filtering them I ended up with 8mil read for R1 and 3mil for R2. I can't run velveth with this so I want to do is extract from total read2 the reads that I choose to use from read1. How can I do that?? It's driving me insane...

Thanks
flacchy is offline   Reply With Quote
Old 05-23-2013, 03:33 AM   #2
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default Paired end- search for complement of Read2 in Read1

You could probably write a script in something like perl to match up your read pairs. Actually, there must be existing scripts to do this.

Otherwise, you could run velveth with your filtered R1 and R2 files as single end reads.

Or, you could try cleaning the reads with trimmomatic, which will give you files for R1 and R2 of remaining matched pairs, and separate R1 and R2 files for reads where the mate has been filtered out.

Hope this helps,
Maria
mastal is offline   Reply With Quote
Old 05-23-2013, 03:40 AM   #3
flacchy
Member
 
Location: UK

Join Date: Apr 2013
Posts: 33
Default

Thank you but I just started so I am no good with perl... do you know any link for scripts? Or a better search quest? Because I couldn't find anything I guess I am searching with wrong parameters

so do you think I could run velveth firs with read 1 and then read2 and then what??? Merge the contigs files? Is it not going to be different the output? Using both paired ends wouldn't be more accurate?

The problem is that I am supposed to find a way to extract from filtered read1 the name of the reads and them extrapolate them fro raw read2... and then uses these two to run velveth...

I could try to have a look at trimmomatic but I am not sure I could have install in the platform anytime soon
flacchy is offline   Reply With Quote
Old 05-23-2013, 03:47 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,975
Default

See this thread: http://seqanswers.com/forums/showthread.php?t=14708
GenoMax is offline   Reply With Quote
Old 05-23-2013, 03:48 AM   #5
flacchy
Member
 
Location: UK

Join Date: Apr 2013
Posts: 33
Default

Thank you I'll have a look
flacchy is offline   Reply With Quote
Old 05-23-2013, 04:00 AM   #6
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default

Have a look at this thread, for a python script:

http://seqanswers.com/forums/showthread.php?t=24076


Quote:
Originally Posted by flacchy View Post

so do you think I could run velveth firs with read 1 and then read2 and then what??? Merge the contigs files? Is it not going to be different the output? Using both paired ends wouldn't be more accurate?
Using the two files as paired reads would probably give a better assembly.

But you can run velveth with both files as single reads:

$ velveth dir k -fastq -short read1.fastq read2.fastq
mastal is offline   Reply With Quote
Old 05-23-2013, 04:08 AM   #7
flacchy
Member
 
Location: UK

Join Date: Apr 2013
Posts: 33
Default

I've tried that but it gave me this error:
>>velveth: Right sequence file 'Read2QualityFiltered.fastq' has too few sequences
that's why I am tring a way to extract from filtered read 1 the ID of the reads (I have done that with grep) and now I need a script to extrapolate the corresponding reads in read2... the command they propose in another form:
[QUOTE=maasha;53769]You can do this with Biopieces (www.biopieces.org) like this:

First you need a file with the FASTQ sequence names you are interested in - or IDs if you like - one per line. And then:

Code:
read_fastq -i in.fastq | grab -E ids.txt | write_fastq -xo out.fastq
need to install biopiece, so I am trying to get that installed ... but if I could found also other ways it will be better, since I don't know if that one will work on my data...

is it to confusing??? This is my first month of PhD and I do have tons of things to learn...
flacchy is offline   Reply With Quote
Old 05-23-2013, 06:23 AM   #8
Champi
Member
 
Location: Japan

Join Date: Dec 2011
Posts: 17
Default

You're going the complicated way... simply filter your data using a tool that works already with paired-end reads, as mastal said.

I find PRINSEQ pretty easy going, with good documentation:
http://prinseq.sourceforge.net/

It will give you a file with your reads_1, a file with your reads_2 (both of them paired), and those single good reads as a file called singletons. You can even recover the discarded reads if you wanted.

But anyway, if you're getting 3 M reads for one of the pairs, it means your provider probably did something wrong (as long as you're not too strict in the filtering...)

Champi
Champi is offline   Reply With Quote
Old 05-23-2013, 06:34 AM   #9
flacchy
Member
 
Location: UK

Join Date: Apr 2013
Posts: 33
Default

Thank you...
I will try first trimmomatic since it should be specific for illumina while prinseq is specific for 454 ...

Before I used fastx toolkit and this what I choose to filter we were trying not to be to strict but it is still viral metagenomic...

fastq_quality_filter -Q33 -q 18 -p 60 -v

Last edited by flacchy; 05-23-2013 at 06:43 AM.
flacchy is offline   Reply With Quote
Old 05-23-2013, 06:55 AM   #10
Champi
Member
 
Location: Japan

Join Date: Dec 2011
Posts: 17
Default

PRINSEQ is not specific for 454. That was when it was designed, but I have used it for my Illumina data pretty well. As I said, you should read the documentation, it is everything there.

I haven't used trimmomatic, but I'm sure it'll do pretty much the same, so it's up to you what to use.

Good luck!
Champi is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:57 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO