Seqanswers Leaderboard Ad

**maubp** · 03-29-2010, 01:29 PM

Does "Paired-End Read Splitting and Joining" work on trimmed reads which may be of differing lengths?

**quinlana** · 03-30-2010, 05:32 PM

Is there (could there be) a tool for scrubbing low-complexity or otherwise poor/low-information content sequence?

Specifically, something that would ditch single end reads or both ends of a paired-end read if either end meets the following hypothetical criteria?

1. >= X% of the read is a single nucleotide (80% of the read is As)
2. More the X% of the read is Ns.
3. Low complexity (ATATATATA...)

Such reads slow down alignments and in many cases, are irrelevant to downstream analyses. Many aligners filter them inherently, but others don't.

It seems that a Galaxy and command-line analog would benefit folks.

My devalued 2 cents.
Aaron

**blankenberg** · 04-01-2010, 10:11 AM

Originally posted by maubp View Post

Does "Paired-End Read Splitting and Joining" work on trimmed reads which may be of differing lengths?

No, currently, Splitting and Joining should only be performed on paired-end reads having equal lengths.

Would it be useful to add an option to the Trimming tool to allow it to work directly on Joined paired-end reads? This could cause the joined reads to be split in half, where each half is trimmed according to the user specification and then the two trimmed halves are rejoined (similar to an option already available in the filter tool).

**blankenberg** · 04-01-2010, 10:44 AM

Originally posted by quinlana View Post

Is there (could there be) a tool for scrubbing low-complexity or otherwise poor/low-information content sequence?

Specifically, something that would ditch single end reads or both ends of a paired-end read if either end meets the following hypothetical criteria?

1. >= X% of the read is a single nucleotide (80% of the read is As)
2. More the X% of the read is Ns.
3. Low complexity (ATATATATA...)

Such reads slow down alignments and in many cases, are irrelevant to downstream analyses. Many aligners filter them inherently, but others don't.

It seems that a Galaxy and command-line analog would benefit folks.

My devalued 2 cents.
Aaron

I like the sound of this and I think it would be a good fit for the Manipulate FASTQ reads on various attributes tool. Match by attribute (e.g. 1-3) with the action of Remove. Let me think about a good way to do this (it can likely already be done by constructing a sufficiently complex regular expression).

For now, if you are interested, much of this can be done using the the various Text Manipulation and filter tools: first convert the FASTQ data to tabular (this tool is not yet available on the main server, but is on our test server, and will be on the main server after it is next updated), then use the "Compute an expression on every row" tool to compute the desired value (e.g. float( c2.count('A') ) / float( len( c2 ) ) ), then use the 'Filter data on any column using simple expressions' found under the 'Filter and Sort' tool menu to filter on the new column (c4), use the Tabular to FASTQ converter to convert back to fastq and then Groom your filtered data. These steps could then be built into a workflow, so you wouldn't have to do each step manually each time. -- This approach is less-than-ideal and I'll look into implementing the ability to do this directly on FASTQ files.

**maubp** · 04-02-2010, 02:59 AM

Originally posted by blankenberg View Post

No, currently, Splitting and Joining should only be performed on paired-end reads having equal lengths.

Would it be useful to add an option to the Trimming tool to allow it to work directly on Joined paired-end reads? This could cause the joined reads to be split in half, where each half is trimmed according to the user specification and then the two trimmed halves are rejoined (similar to an option already available in the filter tool).

As long as the documentation on splitting and joining is clear it should be fine. Personally I don't like the join/split approach since it requires all the reads to have the same lengths.

I'm interested from the point of view of doing trimming and filtering on paired end data. Trimming means the reads will be of different lengths. Filtering may mean that half a pair is lost, making the remaining read effectively a single end read.

This is further complicated by the fact you can have the forward/reverse pairs interleaved in a single FASTQ file, or in two separate files.

**vinhha** · 11-21-2012, 12:59 AM

Trouble with FASTQ summary statistics

I am learning to use Galaxy to analysis RNA seq. I have trouble when I used FASTQ Summary Statistics to check the data. I received the message like this:
6: FASTQ Summary Statistics on data 3
0 bytes
An error occurred running this job: Traceback (most recent call last):
File "/galaxy/home/g2main/galaxy_main/tools/fastq/fastq_stats.py", line 48, in <module>
if __name__ == "__main__": main()
File "/galaxy/home/g2main/galaxy_main/tools/fastq/fastq_stats.py", line 17, in main
fo

Please help me to understand and solve it!
Thanks in advance,
VH

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 26 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 29 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 25 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

FASTQ manipulation in Galaxy

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News