Unconfigured Ad

**GenoMax** · 03-13-2015, 07:21 AM

There is no need to "groom" if your reads are already in sanger format.

**hyates** · 03-13-2015, 07:28 AM

Originally posted by GenoMax View Post

There is no need to "groom" if your reads are already in sanger format.

I am reading these notes and trying to duplicate the results. It states they used galaxy and the first step in the cleaning process was grooming. Specifically,

Groomed 28403332 sanger reads into sanger reads

They then trimmed and then filtered the results. So what I should really be doing is focusing on the trimming and filtering, yes?

If so, my question is how can I obtain the tools galaxy uses for trim and filter for local commandline processing? Thank you for your patience and prompt answer.

**GenoMax** · 03-13-2015, 07:36 AM

You can get the code galaxy uses here (individual tools likely have other dependencies and it may not be simple to run them on the command line): https://toolshed.g2.bx.psu.edu/repos/devteam

As long as you know the reads are in sanger format (phred+33) you can go on to trimming/filtering.

**hyates** · 03-13-2015, 07:54 AM

Originally posted by GenoMax View Post

You can get the code galaxy uses here (individual tools likely have other dependencies and it may not be simple to run them on the command line): https://toolshed.g2.bx.psu.edu/repos/devteam

As long as you know the reads are in sanger format (phred+33) you can go on to trimming/filtering.

That's a great place to look. It would be nice if the tools had a readme.txt for dependencies, but I read whatever docs I can find first. If that doesn't work, I can always reach out again.

That being said, thank you so much Geno. You've been a lot of help and I want you to know this. Have a great day.

**hyates** · 03-13-2015, 07:58 AM

Originally posted by GenoMax View Post

As long as you know the reads are in sanger format (phred+33) you can go on to trimming/filtering.

Okay, my background is not biology and I am in computer science. So maybe you can answer this to me because the person who did this isn't here anymore.

It seems the data is 100 cycle SE from high output: 1 lane (Illumina HiSeq 2500)
They did trimming from sanger to sanger data?
How can I verify myself that this is phred+33 format?

Did they make a mistake in grooming? It seems to me that Illumina HiSeq fastq data is not sanger? Or am I totally n00b?

**GenoMax** · 03-13-2015, 08:13 AM

Options for verifying phred+33 format: https://www.biostars.org/p/63225/

If your dataset is already sanger format then in galaxy it is possible to assign ".fastqsanger" type to this data avoiding the grooming step.

Topics	Statistics	Last Post
High-Resolution Sequencing Exposes Hidden Toxoplasma Diversity by SEQadmin2 Started by SEQadmin2, Yesterday, 11:08 AM	0 responses 7 views 0 reactions	Last Post by SEQadmin2 Yesterday, 11:08 AM
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, 06-30-2026, 05:37 AM	0 responses 12 views 0 reactions	Last Post by SEQadmin2 06-30-2026, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 20 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 54 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM

Unconfigured Ad

How to perform grooming that galaxy does but on the commandline?

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News