Seqanswers Leaderboard Ad

**mastal** · 05-23-2013, 03:33 AM

Paired end- search for complement of Read2 in Read1

You could probably write a script in something like perl to match up your read pairs. Actually, there must be existing scripts to do this.

Otherwise, you could run velveth with your filtered R1 and R2 files as single end reads.

Or, you could try cleaning the reads with trimmomatic, which will give you files for R1 and R2 of remaining matched pairs, and separate R1 and R2 files for reads where the mate has been filtered out.

Hope this helps,
Maria

**flacchy** · 05-23-2013, 03:40 AM

Thank you but I just started so I am no good with perl... do you know any link for scripts? Or a better search quest? Because I couldn't find anything I guess I am searching with wrong parameters

so do you think I could run velveth firs with read 1 and then read2 and then what??? Merge the contigs files? Is it not going to be different the output? Using both paired ends wouldn't be more accurate?

The problem is that I am supposed to find a way to extract from filtered read1 the name of the reads and them extrapolate them fro raw read2... and then uses these two to run velveth...

I could try to have a look at trimmomatic but I am not sure I could have install in the platform anytime soon

**GenoMax** · 05-23-2013, 03:47 AM

See this thread: http://seqanswers.com/forums/showthread.php?t=14708

**flacchy** · 05-23-2013, 03:48 AM

Thank you I'll have a look

**mastal** · 05-23-2013, 04:00 AM

Have a look at this thread, for a python script:

matching up paired-end reads after fastx-toolkit filtering - SEQanswers

http://seqanswers.com/forums/showthread.php?t=24076

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

Originally posted by flacchy View Post

so do you think I could run velveth firs with read 1 and then read2 and then what??? Merge the contigs files? Is it not going to be different the output? Using both paired ends wouldn't be more accurate?

Using the two files as paired reads would probably give a better assembly.

But you can run velveth with both files as single reads:

$ velveth dir k -fastq -short read1.fastq read2.fastq

**flacchy** · 05-23-2013, 04:08 AM

I've tried that but it gave me this error:
>>velveth: Right sequence file 'Read2QualityFiltered.fastq' has too few sequences
that's why I am tring a way to extract from filtered read 1 the ID of the reads (I have done that with grep) and now I need a script to extrapolate the corresponding reads in read2... the command they propose in another form:
[QUOTE=maasha;53769]You can do this with Biopieces (www.biopieces.org) like this:

First you need a file with the FASTQ sequence names you are interested in - or IDs if you like - one per line. And then:

Code:

read_fastq -i in.fastq | grab -E ids.txt | write_fastq -xo out.fastq

need to install biopiece, so I am trying to get that installed ... but if I could found also other ways it will be better, since I don't know if that one will work on my data...

is it to confusing??? This is my first month of PhD and I do have tons of things to learn...

**Champi** · 05-23-2013, 06:23 AM

You're going the complicated way... simply filter your data using a tool that works already with paired-end reads, as mastal said.

I find PRINSEQ pretty easy going, with good documentation:

PRINSEQ @ SourceForge.net

http://prinseq.sourceforge.net/

Description

It will give you a file with your reads_1, a file with your reads_2 (both of them paired), and those single good reads as a file called singletons. You can even recover the discarded reads if you wanted.

But anyway, if you're getting 3 M reads for one of the pairs, it means your provider probably did something wrong (as long as you're not too strict in the filtering...)

Champi

**flacchy** · 05-23-2013, 06:34 AM

Thank you...
I will try first trimmomatic since it should be specific for illumina while prinseq is specific for 454 ...

Before I used fastx toolkit and this what I choose to filter we were trying not to be to strict but it is still viral metagenomic...

fastq_quality_filter -Q33 -q 18 -p 60 -v

**Champi** · 05-23-2013, 06:55 AM

PRINSEQ is not specific for 454. That was when it was designed, but I have used it for my Illumina data pretty well. As I said, you should read the documentation, it is everything there.

I haven't used trimmomatic, but I'm sure it'll do pretty much the same, so it's up to you what to use.

Good luck!

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Paired end- search for complement of Read2 in Read1

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News