![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
BWA aligned pairs read1=XT:A:U, read2=XT:A:R | jflowers | Bioinformatics | 1 | 07-29-2014 11:06 PM |
Poor quality 2nd end from paired-end sequencing on HiSeq | agent99 | Illumina/Solexa | 6 | 05-03-2013 10:53 AM |
Current read1, index, and read2 primers for HiSeq2000 | SeqVicious | Illumina/Solexa | 1 | 09-26-2011 03:12 PM |
paired-end reads mapped to genome.. gene with only one direction of paired-end reads? | danwiththeplan | Bioinformatics | 2 | 09-22-2011 03:06 AM |
RNA-seq: Replicates, single-end, paired-end story | pasta | Bioinformatics | 2 | 07-05-2011 12:51 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: UK Join Date: Apr 2013
Posts: 33
|
![]()
Hi all,
I am having a bit of trouble with my Illumina paired end reads. Read2 is very poor quality then read2 so after filtering them I ended up with 8mil read for R1 and 3mil for R2. I can't run velveth with this so I want to do is extract from total read2 the reads that I choose to use from read1. How can I do that?? It's driving me insane... Thanks |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: uk Join Date: Mar 2009
Posts: 667
|
![]()
You could probably write a script in something like perl to match up your read pairs. Actually, there must be existing scripts to do this.
Otherwise, you could run velveth with your filtered R1 and R2 files as single end reads. Or, you could try cleaning the reads with trimmomatic, which will give you files for R1 and R2 of remaining matched pairs, and separate R1 and R2 files for reads where the mate has been filtered out. Hope this helps, Maria |
![]() |
![]() |
![]() |
#3 |
Member
Location: UK Join Date: Apr 2013
Posts: 33
|
![]()
Thank you but I just started so I am no good with perl... do you know any link for scripts? Or a better search quest? Because I couldn't find anything I guess I am searching with wrong parameters
so do you think I could run velveth firs with read 1 and then read2 and then what??? Merge the contigs files? Is it not going to be different the output? Using both paired ends wouldn't be more accurate? The problem is that I am supposed to find a way to extract from filtered read1 the name of the reads and them extrapolate them fro raw read2... and then uses these two to run velveth... I could try to have a look at trimmomatic but I am not sure I could have install in the platform anytime soon |
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,080
|
![]()
See this thread: http://seqanswers.com/forums/showthread.php?t=14708
|
![]() |
![]() |
![]() |
#5 |
Member
Location: UK Join Date: Apr 2013
Posts: 33
|
![]()
Thank you I'll have a look
![]() |
![]() |
![]() |
![]() |
#6 | |
Senior Member
Location: uk Join Date: Mar 2009
Posts: 667
|
![]()
Have a look at this thread, for a python script:
http://seqanswers.com/forums/showthread.php?t=24076 Quote:
But you can run velveth with both files as single reads: $ velveth dir k -fastq -short read1.fastq read2.fastq |
|
![]() |
![]() |
![]() |
#7 |
Member
Location: UK Join Date: Apr 2013
Posts: 33
|
![]()
I've tried that but it gave me this error:
>>velveth: Right sequence file 'Read2QualityFiltered.fastq' has too few sequences that's why I am tring a way to extract from filtered read 1 the ID of the reads (I have done that with grep) and now I need a script to extrapolate the corresponding reads in read2... the command they propose in another form: [QUOTE=maasha;53769]You can do this with Biopieces (www.biopieces.org) like this: First you need a file with the FASTQ sequence names you are interested in - or IDs if you like - one per line. And then: Code:
read_fastq -i in.fastq | grab -E ids.txt | write_fastq -xo out.fastq is it to confusing??? This is my first month of PhD and I do have tons of things to learn... |
![]() |
![]() |
![]() |
#8 |
Member
Location: Japan Join Date: Dec 2011
Posts: 17
|
![]()
You're going the complicated way... simply filter your data using a tool that works already with paired-end reads, as mastal said.
I find PRINSEQ pretty easy going, with good documentation: http://prinseq.sourceforge.net/ It will give you a file with your reads_1, a file with your reads_2 (both of them paired), and those single good reads as a file called singletons. You can even recover the discarded reads if you wanted. But anyway, if you're getting 3 M reads for one of the pairs, it means your provider probably did something wrong (as long as you're not too strict in the filtering...) Champi |
![]() |
![]() |
![]() |
#9 |
Member
Location: UK Join Date: Apr 2013
Posts: 33
|
![]()
Thank you...
I will try first trimmomatic since it should be specific for illumina while prinseq is specific for 454 ... Before I used fastx toolkit and this what I choose to filter we were trying not to be to strict but it is still viral metagenomic... fastq_quality_filter -Q33 -q 18 -p 60 -v Last edited by flacchy; 05-23-2013 at 07:43 AM. |
![]() |
![]() |
![]() |
#10 |
Member
Location: Japan Join Date: Dec 2011
Posts: 17
|
![]()
PRINSEQ is not specific for 454. That was when it was designed, but I have used it for my Illumina data pretty well. As I said, you should read the documentation, it is everything there.
I haven't used trimmomatic, but I'm sure it'll do pretty much the same, so it's up to you what to use. Good luck! ![]() |
![]() |
![]() |
![]() |
Thread Tools | |
|
|