SEQanswers

Go Back   SEQanswers > General



Similar Threads
Thread Thread Starter Forum Replies Last Post
Trimming adapters with Cutadapt Elfangor RNA Sequencing 3 08-10-2016 11:26 AM
finding adapters for trimming bluepoison Bioinformatics 5 11-26-2015 09:56 PM
Trimming multiple adapters in a single run dena.dinesh Bioinformatics 9 11-28-2014 10:21 AM
Adapters trimming: Cutadapt vs Trimmomatic MafaldaSF Bioinformatics 8 03-20-2014 06:16 AM
Trimming Haloplex adapters jordi Bioinformatics 10 01-03-2014 05:41 AM

Reply
 
Thread Tools
Old 02-12-2019, 05:44 PM   #1
jgroh
Junior Member
 
Location: Australia

Join Date: Jan 2019
Posts: 3
Default trimming BGI adapters

Hello,

I have paired-end 100bp reads generated from BGI-seq 500. The sequencing center did some adapter removal trimming before delivering the data but there appears to be a fraction of reads which still have putative adapter sequences. I am making this judgement based on the presence of 'overrepresented kmers' at the start and ends of both forward and reverse reads seen in the output of FastQC. I've included an image of this module output for one particular sample as an attachment.

The overrepresented kmer at the 3' end match the beginning of the 3' adapter sequence, which makes sense, and I assume this is due to cases where the insert size is less than the read length, so the reads sequence into the adapter on the other side of the genomic fragment.

What is confusing me is that the overrepresented kmers at the 5' end of reads contain what looks like partial sequence of the 5' adapter sequence, but degraded at the 3' end, which I wouldn't expect, and also with one base pair position variable. I wouldn't necessarily expect sequencing error either as the quality scores are generally very high at the start of the reads.

Here is the 5' adapter sequence provide by the sequencing center:
5' adapter AAGTCGGATCGTAGCCATGTCGTTCTGTGAGCCAAGGAGTTG. The underlined part is what is appearing in fragments at the start of reads, and the position in bold is variable among these. Libraries were prepared by the sequencing center, and the sequencing technology is still a bit unclear to me, so I'm not sure whether this is a true artefact. Has anyone seen this patterns in the data from BGI before? I may just leave the data as is and proceed with mapping, as these reads are a small fraction overall, but I'm trying to understand what might be going on....

Thanks
Attached Images
File Type: png Screen Shot 2019-02-13 at 12.29.44 PM.png (81.0 KB, 12 views)
jgroh is offline   Reply With Quote
Old 03-03-2020, 07:11 AM   #2
Rnasoup
Junior Member
 
Location: Spain

Join Date: Feb 2020
Posts: 2
Default

Probably is a bit late for you, but I hope this may help other people. I struggled to find out the sequences to trim adapters from BGI/MGI sequencing data. At the end, I found a pdf with the oligos used for library prep

Normally I use grep to inspect the adapters, but with BGI/MGI it was confusing because I found that there are often mutations in the adapters (I rarely see it when looking Illumina data).

In short, for paired-end, this cutadapt command found adapters in around 10% of the read pairs in this dataset.

Code:
cutadapt -a AAGTCGGAGGCCAAGCGGTCTTAGGAAGACAA -A AAGTCGGATCGTAGCCATGTCGTTCTGTGAGCCAAGGAGTTG
Rnasoup is offline   Reply With Quote
Old 03-27-2020, 02:18 AM   #3
Melissa
Senior Member
 
Location: Switzerland

Join Date: Aug 2008
Posts: 124
Default

Hi,

Thanks Rnasoup for sharing the link to the sequencing adapter. I tried to find it but nothing showed up in google.

I hope jgroh resolved the adapter issue. It does look quite strange.

By the way, how do you find the data quality of MGI sequencer in terms of error rate etc?

Thanks
Melissa

Last edited by Melissa; 03-27-2020 at 02:29 AM.
Melissa is offline   Reply With Quote
Old 04-01-2020, 06:42 AM   #4
Rnasoup
Junior Member
 
Location: Spain

Join Date: Feb 2020
Posts: 2
Default

Thanks Melissa,

I am not sure which data quality you mean. Anyway, I donīt have much experience with MGI sequencing, I have just had to dig into it to analyze the GEO dataset that I mentioned in my post, so all I had was the fastq raw data. I guess you can run any software to extract quality information from the fastq files, like Picard tools.

Good luck
Rnasoup is offline   Reply With Quote
Old 04-02-2020, 05:43 AM   #5
Melissa
Senior Member
 
Location: Switzerland

Join Date: Aug 2008
Posts: 124
Default

Hi Rnasoup,

Thanks for your reply. What I meant is the data quality in terms of reported phred score vs observed phred score, substitution/indel error rate, problematic region for variant calling etc. Some metrics similar to this thread on a new sequencer.

Cheers
Melissa
Melissa is offline   Reply With Quote
Reply

Tags
adapter trimming, adapters, bgi

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:51 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO