Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • FastQC: 2 peak per sequence GC content

    Dear all,

    I have some genomic pair-end data from a nematode. I ran FASTQC to have an overview of the data.

    I was surprised to see a "Per sequence GC content" graph with 2 peaks (see image attached).
    I ran trimmomatic but the graph of per sequence GC content remained the same.
    Do you know why I get this profile?

    Best,
    Sophie
    Attached Files

  • #2
    It *may* be indicative of contamination from an unrelated species/source. Have you tried to analyze the data? Is this a simple WGS experiment?

    Comment


    • #3
      What should be the normal GC content? 41? Is there anything within the genome, which could have the other GC content?

      I had once also 2 peaks in some samples.
      Was a low GC bacterium (30%). The second peak (50%) turned out to be totally from the rRNA operons within this bacterium. Our guess was that the GC bias of the adapter ligation kicked somehow in, and ruined the dataset. The supplier doesn't know what happened.
      I'm not sure if that could be the case here, because I don't know if you have biological differences within the DNA in your sample, but is probably worth checking.

      Comment


      • #4
        Hello GenoMax and bastianwur,

        Thanks a lot for your answers.

        We don´t know what the GC content is for this species. We do think it is around 35-40% as in other worms.

        After talking to the people in my lab, the second peak around 70% could very much be due to a bacterium present in the gut of the worm.

        Otherwise, the strain used is inbred but I believe still presents biological differences. I wouldn´t say that would explain the 2nd peak though.

        Do you think it is still possible to do a genome assembly on this data?

        Anyhow, thanks for your answers,
        Sophie

        Comment


        • #5
          We're normally assembling here meta-genomes and -transcriptomes, and haven't encountered many problems with the different species.
          One of my colleagues has a paper in submission, where they investigated that and got very little false assemblies.
          -> assembling 2 totally different organisms from this dataset shouldn't be a problem.
          You might have to do some QA though, to ensure that everything gets corretly assigned/separated.

          Comment


          • #6
            Hi Sophie,

            We observed a similar bimodal distribution from C. elegans samples contaminated with Streptomyces (and the relative height of the high-GC peak varied with the degree of contamination). You could BLAST a sampling of the GC-rich reads and see if they match any known species.

            Comment


            • #7
              If you know what that bacterium (present in the gut) is (and if a genome is available for that species or a close relative) you could try to separate your reads into two pools before trying assembly.

              You can do that easily with BBSplit.

              Comment


              • #8
                Dear all,

                Sorry for the late reply.
                Thanks a lot for your answers! They were much appreciated.

                Unfortunately, I don´t know the gut bacterium of this nematode. But I´ll try doing what HESmith suggested and see if its sequenced I´ll do what GenoMax suggested.

                To Genomax: thanks for telling me about BBSplit! I didn´t know about that tool.

                To Bastianwur: Your message made me very happy! It is very good to know that there shouldn´t be problems assembling this peculiar data. Good luck for the publishing!

                Cheers,
                Sophie

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                26 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                29 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                25 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                52 views
                0 likes
                Last Post seqadmin  
                Working...
                X