Seqanswers Leaderboard Ad

**safina** · 04-10-2015, 11:18 PM

Originally posted by GenoMax View Post

Tried the following.

Code:

$ fastq-dump -F --split-files ./SRR1561197.sra

@safina: Not sure why you are changing the fastq headers

Code:

$ fastq_quality_filter -i SRR1561197_1.fastq -q 28 -p 100 -Q33 -o SRR1561197_1_filt.fastq

@safina: Note the -Q33 option. This data is most certainly sanger fastq formatted so you need to add this option (it remains undocumented in fastx_toolkit). I used the latest fastx_toolkit.

I chose to use repair.sh from BBMap and did

Code:

$ repair.sh in1=SRR1561197_1_filt.fastq in2=SRR1561197_2_filt.fastq out1=fixed1.fq out2=fixed2.fq outsingle=single.fq

Here is where things fell apart.

@Brian: I get the following error about a Gb into the filtered files.

Multiple possibilities:

1. Original sra file from SRA is corrupt
2. fastx_toolkit is messing up the files in the filter process
3. Not sure why repair.sh is asking to run itself

Should try BBDuk to see if that works instead of fastq_filter.

Originally posted by SES View Post

I tried the whole process using the commands above and did not find any issues. Here is the script: seqanswers163784.sh (link to a gist, not a direct link). You can fetch that script and run it on your own machine. Here is the output:

Code:

========= pairfq version : 0.14.1 (completion time: Wed Apr  8 12:14:41 EDT 2015)
Total forward reads (SRR1561197_1_filt_info.fastq)                   :    8492638
Total reverse reads (SRR1561197_2_filt_info.fastq)                   :   13525478
Total forward paired reads (SRR1561197_1_filt_info_p.fastq)          :    7105003
Total reverse paired reads (SRR1561197_2_filt_info_p.fastq)          :    7105003
Total forward unpaired reads (SRR1561197_1_filt_info_s.fastq)        :    1387635
Total reverse unpaired reads (SRR1561197_2_filt_info_s.fastq)        :    6420475

Total paired reads                                                   :   14210006
Total unpaired reads                                                 :    7808110

real	21m14.372s
user	9m54.612s
sys	0m19.421s

This used 5.5g of RAM on my machine, so you should be fine to use it without the --index option. For reference, the only issue was the missing pair information, which was one of my earlier suggestions in this thread, but it appears that modifying the headers and perhaps some other operations messed up the files for @safina. For the commands in the script, you can replace "pairfq" with

Code:

curl -sL git.io/pairfq_lite | perl -

and you'll never need to download any package or update it.

EDIT: Just my 2c, but I think fastx still has a place. It is stable, no need to update frequently, and is probably on most workstations. Also, it works very well in a Unix environment because of the single binaries that use one CPU, which allows you to use it on a cluster.

The headers were not messing the file. still i have problem if i trim my fastq files with this command i get the empty files:

after filtering from the command :

## quality filter
fastq_quality_filter -i SRR1561197_1.fastq -q 28 -p 100 -Q33 -o SRR1561197_1_filt.fastq
fastq_quality_filter -i SRR1561197_2.fastq -q 28 -p 100 -Q33 -o SRR1561197_2_filt.fastq

then i did trimming:

fastx_trimmer -i SRR1561197_1_filt.fastq -l 100 -f 14 -o SRR1561197_1_filt_trim.fastq
fastx_trimmer -i SRR1561197_2_filt.fastq -l 100 -f 14 -o SRR1561197_2_filt_trim.fastq

## add pair info to reads and remove comment to reduce size
pairfq addinfo -i SRR1561197_1_filt_trim.fastq -o SRR1561197_1_filt_trim_info.fastq -p 1
pairfq addinfo -i SRR1561197_1_filt_trim.fastq -o SRR1561197_2_filt_trim_info.fastq -p 2

## pair the reads
time pairfq makepairs -f SRR1561197_1_filt_trim_info.fastq \
-r SRR1561197_2_filt_trim_info.fastq \
-fp SRR1561197_1__p.fastq \
-rp SRR1561197_2__p.fastq \
-fs SRR1561197_1_s.fastq \
-rs SRR1561197_2_s.fastq \
--stats

Still i get all reads in these two files:
-fs SRR1561197_1_s.fastq \
-rs SRR1561197_2_s.fastq \

**GenoMax** · 04-11-2015, 06:43 AM

Originally posted by safina View Post

Thanx for this. but i have a question...
why you havent used the trim command.. as i need to trim SRR1561197 reads from start as well as from end. After trimming i get error in pairfq and it gives me empty files....

i used fastx tool kit for triming as well:

Code:

fastx_trimmer -f 14 -l 100 -o SRR1561197_1_filt_trim.fastq

And when i run pairfq after this i get empty files and all reads in unpaired file.

Fastx_trimmer (as above) followed by repair.sh works fine.

@SES will need to comment on pairfq question.

**SES** · 04-11-2015, 04:24 PM

Originally posted by GenoMax View Post

Fastx_trimmer (as above) followed by repair.sh works fine.

@SES will need to comment on pairfq question.

That was my bad, one of the files was named incorrectly in the script. I fixed that typo and changed the command to use the curl method so we can rule out the installation being an issue. I ran the script and get the same result I posted above. This command will get the script:

Code:

curl -L https://gist.githubusercontent.com/sestaton/09781a5ac8849753d6ed/raw/af767ad46961c19438b9fe95e14ba87270337f6f/seqanswers163784.sh > seqanswers163784.sh

Then edit the paths to the fastx trimmer and fastq-dump, if necessary, and run the script:

Code:

nohup bash seqanswers163784.sh 2>&1 > seqanswers163784.out &

Or send it to the queuing system, it doesn't matter. That should work on anyone's machine that has those programs installed (pairfq need not be installed). The first few steps take quite awhile but the pairing step should take 10-12 min. depending on the machine.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 13 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News