Seqanswers Leaderboard Ad

**Biocomputronics** · 11-22-2017, 02:00 PM

I would suggest running FastQC on the data. It is a program that measures a wide variety of quality metrics. That way you can see with your own eyes the data quality measures, including adapter sequence contamination.

Babraham Bioinformatics - FastQC A Quality Control tool for High Throughput Sequence Data

https://www.bioinformatics.babraham.ac.uk/projects/fastqc/

**GenoMax** · 11-24-2017, 06:01 AM

@cement_head: If there are no remaining adapters then all you lost is some time. For miseq datasets you would need less than 30 min to scan/trim data with bbduk.sh from BBMap. You can then be sure that there would be no extraneous sequences remaining in your data. Especially important if you were doing any de novo work.

**[email protected]** · 12-01-2017, 11:37 PM

I suggest fastp to do automatic adapter trimming, read filtering and quality control. fastp is developed in C++ with multi-threading support, it's ultra-fast.

fastp has following features:
1, filter out bad reads (too low quality, too short, or too many N...)
2, cut low quality bases for per read in its 5' and 3' by evaluating the mean quality from a sliding window (like Trimmomatic but faster).
3, trim all reads in front and tail
4, cut adapters. Adapter sequences can be automatically detected,which means you don't have to input the adapter sequences to trim them.
5, correct mismatched base pairs in overlapped regions of paired end reads, if one base is with high quality while the other is with ultra low quality
6, preprocess unique molecular identifer (UMI) enabled data, shift UMI to sequence name.
7, report JSON format result for further interpreting.
8, visualize quality control and filtering results on a single HTML page (like FASTQC but faster and more informative).
9, split the output to multiple files (0001.R1.gz, 0002.R1.gz...) to support parallel processing. Two modes can be used, limiting the total split file number, or limitting the lines of each split file.
10, support long reads (data from PacBio / Nanopore devices).

fastp creates reports in both HTML and JSON format.

HTML report: http://opengene.org/fastp/fastp.html
JSON report: http://opengene.org/fastp/fastp.json

fastp is an open source project at github: https://github.com/OpenGene/fastp

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 58 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 53 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 45 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Post-demultiplex adaptor removal?

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News