Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • tahamasoodi
    Success
    • May 2012
    • 130

    Sequence Duplication

    I have some whole genome fastq files, one for read 1 and one for read 2. Before going for its analysis, I checked their quality using FASTQC but amazingly some of the samples show a very high level of duplication (around 90%). Just want to know what might be the reason for this? Can these samples be further processed for analysis or just discard them?

    Any help will be appreciated!
    Thanks,
  • tahamasoodi
    Success
    • May 2012
    • 130

    #2
    There seems to be no answers for my query!
    Thanks,

    Comment

    • dpryan
      Devon Ryan
      • Jul 2011
      • 3478

      #3
      Contamination, really small genome with a high sequencing depth... There are a number of possibilities, you could probably get more if you provided more details. Whether your data is useless or not will likely depend on the nature of the samples and what you intend to do with the data.

      Comment

      • fkrueger
        Senior Member
        • Sep 2009
        • 627

        #4
        Here is a nice blog post about interpreting the duplication plot of FastQC.

        Comment

        • tahamasoodi
          Success
          • May 2012
          • 130

          #5
          Thanks dpryan,
          I have around 100 cancer samples with equal number of controls for which we are doing WGS. Till now we have completed around 16 samples but when I started analysing them, I get a high level of duplication and in some samples the base quality is also not good.
          Thanks,

          Comment

          • tahamasoodi
            Success
            • May 2012
            • 130

            #6
            Can anyone throw more light on this?
            Thanks,

            Comment

            • tahamasoodi
              Success
              • May 2012
              • 130

              #7
              Can I use MarkDuplicates command of Picard for removing the duplicate sequences?
              Thanks,

              Comment

              • kopi-o
                Senior Member
                • Feb 2008
                • 319

                #8
                Yes, you can.

                Comment

                • tahamasoodi
                  Success
                  • May 2012
                  • 130

                  #9
                  Will it remove all the duplicate sequences from the fastq file?
                  Thanks,

                  Comment

                  • kopi-o
                    Senior Member
                    • Feb 2008
                    • 319

                    #10
                    It will if you supply REMOVE_DUPLICATES=true (http://picard.sourceforge.net/comman...MarkDuplicates). Otherwise it will just flag them as duplicates in the output file.

                    Comment

                    • tahamasoodi
                      Success
                      • May 2012
                      • 130

                      #11
                      I have a few samples with over 80% duplication (detected by FASTQC), will picard work for these samples?
                      Thanks,

                      Comment

                      • kopi-o
                        Senior Member
                        • Feb 2008
                        • 319

                        #12
                        I don't see why it wouldn't. By the way, it seems the numbers you get from FastQC usually overstate the duplication you detect with Picard.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Pathogen Surveillance with Advanced Genomic Tools
                          by seqadmin




                          The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                          03-24-2025, 11:48 AM
                        • seqadmin
                          New Genomics Tools and Methods Shared at AGBT 2025
                          by seqadmin


                          This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                          The Headliner
                          The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                          03-03-2025, 01:39 PM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 03-20-2025, 05:03 AM
                        0 responses
                        49 views
                        0 reactions
                        Last Post seqadmin  
                        Started by seqadmin, 03-19-2025, 07:27 AM
                        0 responses
                        57 views
                        0 reactions
                        Last Post seqadmin  
                        Started by seqadmin, 03-18-2025, 12:50 PM
                        0 responses
                        50 views
                        0 reactions
                        Last Post seqadmin  
                        Started by seqadmin, 03-03-2025, 01:15 PM
                        0 responses
                        200 views
                        0 reactions
                        Last Post seqadmin  
                        Working...