Seqanswers Leaderboard Ad

**odoyle81** · 03-08-2012, 08:42 PM

I'm actually confused about this too...
Is it better to do clipper first, and then quality filter?
When determining where to clip, do you look at the fastQC results? For example if my graph of quality scores across all bases and I see that after position 85 QC falls below 20, should I just clip all reads at 85?
Thanks!

**pbluescript** · 03-09-2012, 05:17 AM

Originally posted by odoyle81 View Post

I'm actually confused about this too...
Is it better to do clipper first, and then quality filter?
When determining where to clip, do you look at the fastQC results? For example if my graph of quality scores across all bases and I see that after position 85 QC falls below 20, should I just clip all reads at 85?
Thanks!

I would not hard clip based on just Fastqc. Remember it's just showing you the distribution of quality scores, and you will have plenty of reads that have good quality all the way through. As for whether to even trim the reads at all, that depends. Could you provide more details about your library and what you plan to do with the reads?

**odoyle81** · 03-09-2012, 09:37 AM

We have a couple different projects:

1. A population of mutants segregating for a phenotype... we want to locate the deletion, so I want to use one of these programs to do that (pindel, svseq, cortex (just learned about that one today)).
2. We also want to do a reference alignment with another sample.. I was going to use BWA..

We have Illumina 100bp PE reads.

If I trim then quality filter, I keep 74% of reads
If I just quality filter then I keep 70% of reads
I was trimming to 89bp and quality filter q=20 p=80

I thought it was really important to QC the reads before further processing?

**pbluescript** · 03-09-2012, 01:57 PM

Originally posted by odoyle81 View Post

I thought it was really important to QC the reads before further processing?

It is important to QC the reads, but it might not be necessary to trim the reads based on quality. Most aligners are aware of the quality of the bases and will take that into account when mapping. BWA is a good example since it can soft clip reads. If you do trim off low quality tails with PE data and map with BWA, you might even get worse results than if you just map the reads without trimming them. BWA can have a hard time determining the size distribution of the insert if you do quality trimming.

**odoyle81** · 03-09-2012, 07:38 PM

Thanks for that perspective!
So after quality filtering, I will probably lose some of the reads from pairs. I've been reading about how remove the orphaned reads. Does everyone do this with custom scripts or is there a tool for this?

**ETHANol** · 03-09-2012, 10:58 PM

Two things:
1) The best way to get an idea of the best way to trim is to trim a couple different ways and see which aligns the best. While it is probably too time consuming to do this for all data sets, it's informative to kind of get an idea what things are doing.
2) You might want to take a look at Trimmomatic. It is way faster than the FASTX Toolkit.

**Zam** · 03-13-2012, 01:56 AM

Just a comment for odoyle81 about using Cortex - you should not need to pre-quality filter the reads for Cortex (unless you have massive massive coverage, in which case it will do no harm I guess). Just use the inbuilt error-cleaning mechanisms, and it should work just fine.

**vivi7** · 05-14-2014, 01:49 AM

fastx_barcodes_splitter issue with the run

Hi,

I saw the post and I hope maybe some of you can help me

When I run fastx_barcode_splitter.pl with this script

/usr/local/bin/fastx_barcode_splitter.pl --bcfile ./Barcodes9nt.txt --prefix ./Rescued9nt --suffix .fq –bol

In the command line it looks like is running (no error message, no > sign), see attachment for screenshot.
However is not running at all, I can see with top that is not using any memory or CPUs and it has been ‘running’ for days on a very small file without producing any results.
The input file is in the STDIN folder as supposed to.

I would be very grateful if you could suggest what might be wrong.
Thanks in advance
Vivi

**odoyle81** · 05-14-2014, 11:26 AM

Unfortunately I can't advice on why that isn't working for you, but I would recommend you just write your own script, or try to find one on the internet - most of the FASTX tools are out of date and not updated and don't work that well. For example, this looks like one that might work:

Page Not Found (404)

https://pypi.python.org/pypi/paired_sequence_utils/0.1

The Python Package Index (PyPI) is a repository of software for the Python programming language.

If you google, you should be able to find a bunch, as it is a pretty simple operation that needs to be done.
I can't offer much support, and maybe this isn't the most efficient way to do it (it is kinda slow), but the one I wrote is here:

404 — Bitbucket

https://bitbucket.org/odoyle81/pythonstuff/src/beb81db541833194e4f02a28ebd18d9bcf95102a/extractreadsbybarcodev3.py?at=master

In any case, learning to write your own will allow you to adapt to your specific needs.

hope that helps.

**vivi7** · 05-15-2014, 12:01 AM

Thank you very much!!!

**luofastx** · 10-10-2014, 08:22 PM

fastx_trimmer: input file (/BJPROJ/Data_production/HiseqX/140807_ST-E00142_0036_BH04CYALXX/DHE00358/DHE00358_L5_2.fq.gz) has unknown file format (not FASTA or FASTQ), first character = ^_ (31) ????

**luofastx** · 10-10-2014, 08:29 PM

if fastx-toolkit can read gzip file?

fastx_trimmer: input file (/BJPROJ/Data_production/HiseqX/140807_ST-E00142_0036_BH04CYALXX/DHE00358/DHE00358_L5_2.fq.gz) has unknown file format (not FASTA or FASTQ), first character = ^_ (31)
what is the reason to this error?

**GenoMax** · 10-11-2014, 04:52 AM

See: https://www.biostars.org/p/83237/

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

FastX-toolkit

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News