Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • jgroh
    Junior Member
    • Jan 2019
    • 3

    trimming BGI adapters

    Hello,

    I have paired-end 100bp reads generated from BGI-seq 500. The sequencing center did some adapter removal trimming before delivering the data but there appears to be a fraction of reads which still have putative adapter sequences. I am making this judgement based on the presence of 'overrepresented kmers' at the start and ends of both forward and reverse reads seen in the output of FastQC. I've included an image of this module output for one particular sample as an attachment.

    The overrepresented kmer at the 3' end match the beginning of the 3' adapter sequence, which makes sense, and I assume this is due to cases where the insert size is less than the read length, so the reads sequence into the adapter on the other side of the genomic fragment.

    What is confusing me is that the overrepresented kmers at the 5' end of reads contain what looks like partial sequence of the 5' adapter sequence, but degraded at the 3' end, which I wouldn't expect, and also with one base pair position variable. I wouldn't necessarily expect sequencing error either as the quality scores are generally very high at the start of the reads.

    Here is the 5' adapter sequence provide by the sequencing center:
    5' adapter AAGTCGGATCGTAGCCATGTCGTTCTGTGAGCCAAGGAGTTG. The underlined part is what is appearing in fragments at the start of reads, and the position in bold is variable among these. Libraries were prepared by the sequencing center, and the sequencing technology is still a bit unclear to me, so I'm not sure whether this is a true artefact. Has anyone seen this patterns in the data from BGI before? I may just leave the data as is and proceed with mapping, as these reads are a small fraction overall, but I'm trying to understand what might be going on....

    Thanks
    Attached Files
  • Rnasoup
    Junior Member
    • Feb 2020
    • 2

    #2
    Probably is a bit late for you, but I hope this may help other people. I struggled to find out the sequences to trim adapters from BGI/MGI sequencing data. At the end, I found a pdf with the oligos used for library prep

    Normally I use grep to inspect the adapters, but with BGI/MGI it was confusing because I found that there are often mutations in the adapters (I rarely see it when looking Illumina data).

    In short, for paired-end, this cutadapt command found adapters in around 10% of the read pairs in this dataset.

    Code:
    cutadapt -a AAGTCGGAGGCCAAGCGGTCTTAGGAAGACAA -A AAGTCGGATCGTAGCCATGTCGTTCTGTGAGCCAAGGAGTTG

    Comment

    • Melissa
      Senior Member
      • Aug 2008
      • 124

      #3
      Hi,

      Thanks Rnasoup for sharing the link to the sequencing adapter. I tried to find it but nothing showed up in google.

      I hope jgroh resolved the adapter issue. It does look quite strange.

      By the way, how do you find the data quality of MGI sequencer in terms of error rate etc?

      Thanks
      Melissa
      Last edited by Melissa; 03-27-2020, 02:29 AM.

      Comment

      • Rnasoup
        Junior Member
        • Feb 2020
        • 2

        #4
        Thanks Melissa,

        I am not sure which data quality you mean. Anyway, I don´t have much experience with MGI sequencing, I have just had to dig into it to analyze the GEO dataset that I mentioned in my post, so all I had was the fastq raw data. I guess you can run any software to extract quality information from the fastq files, like Picard tools.

        Good luck

        Comment

        • Melissa
          Senior Member
          • Aug 2008
          • 124

          #5
          Hi Rnasoup,

          Thanks for your reply. What I meant is the data quality in terms of reported phred score vs observed phred score, substitution/indel error rate, problematic region for variant calling etc. Some metrics similar to this thread on a new sequencer.

          Cheers
          Melissa

          Comment

          • uzi_maruzi
            Junior Member
            • Nov 2022
            • 4

            #6
            Originally posted by Rnasoup View Post
            Probably is a bit late for you, but I hope this may help other people. I struggled to find out the sequences to trim adapters from BGI/MGI sequencing data. At the end, I found a pdf with the oligos used for library prep

            Normally I use grep to inspect the adapters, but with BGI/MGI it was confusing because I found that there are often mutations in the adapters (I rarely see it when looking Illumina data).

            In short, for paired-end, this cutadapt command found adapters in around 10% of the read pairs in this dataset.

            Code:
            cutadapt -a AAGTCGGAGGCCAAGCGGTCTTAGGAAGACAA -A AAGTCGGATCGTAGCCATGTCGTTCTGTGAGCCAAGGAGTTG
            Thank you for sharing the adapters sequence. May I ask you if this adapters work for PE or just for SE? I am a bit confused in starting off my data analysis. the sequencing company provide me with the following adapters index:

            i7_Index_ID i7index配列 i5_Index_ID i5index配列
            DNA Index i7ID i7Seq i5ID i5Seq
            SG-003 M5041/M7091 M7091 CAATCGTT M5041 CTGCGGAT
            SG-015 M5041/M7092 M7092 AGCCATAC M5041 CTGCGGAT
            ....etc....

            So, I used those sequence as adapter trimming. Am I wrong?

            Comment

            • TGRANGE
              Junior Member
              • Jul 2009
              • 1

              #7
              Yes, you are wrong, these are the index sequences, they are in the middle of the adapter. You should use the sequences described by Rnasoup that are abutting the inserts and that should be used for trimming.

              Comment

              • uzi_maruzi
                Junior Member
                • Nov 2022
                • 4

                #8
                Thank you TGRANGE!

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Pathogen Surveillance with Advanced Genomic Tools
                  by seqadmin




                  The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                  03-24-2025, 11:48 AM
                • seqadmin
                  New Genomics Tools and Methods Shared at AGBT 2025
                  by seqadmin


                  This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                  The Headliner
                  The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                  03-03-2025, 01:39 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 03-20-2025, 05:03 AM
                0 responses
                49 views
                0 reactions
                Last Post seqadmin  
                Started by seqadmin, 03-19-2025, 07:27 AM
                0 responses
                57 views
                0 reactions
                Last Post seqadmin  
                Started by seqadmin, 03-18-2025, 12:50 PM
                0 responses
                50 views
                0 reactions
                Last Post seqadmin  
                Started by seqadmin, 03-03-2025, 01:15 PM
                0 responses
                200 views
                0 reactions
                Last Post seqadmin  
                Working...