Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sequence Duplication

    I have some whole genome fastq files, one for read 1 and one for read 2. Before going for its analysis, I checked their quality using FASTQC but amazingly some of the samples show a very high level of duplication (around 90%). Just want to know what might be the reason for this? Can these samples be further processed for analysis or just discard them?

    Any help will be appreciated!
    Thanks,

  • #2
    There seems to be no answers for my query!
    Thanks,

    Comment


    • #3
      Contamination, really small genome with a high sequencing depth... There are a number of possibilities, you could probably get more if you provided more details. Whether your data is useless or not will likely depend on the nature of the samples and what you intend to do with the data.

      Comment


      • #4
        Here is a nice blog post about interpreting the duplication plot of FastQC.

        Comment


        • #5
          Thanks dpryan,
          I have around 100 cancer samples with equal number of controls for which we are doing WGS. Till now we have completed around 16 samples but when I started analysing them, I get a high level of duplication and in some samples the base quality is also not good.
          Thanks,

          Comment


          • #6
            Can anyone throw more light on this?
            Thanks,

            Comment


            • #7
              Can I use MarkDuplicates command of Picard for removing the duplicate sequences?
              Thanks,

              Comment


              • #8
                Yes, you can.

                Comment


                • #9
                  Will it remove all the duplicate sequences from the fastq file?
                  Thanks,

                  Comment


                  • #10
                    It will if you supply REMOVE_DUPLICATES=true (http://picard.sourceforge.net/comman...MarkDuplicates). Otherwise it will just flag them as duplicates in the output file.

                    Comment


                    • #11
                      I have a few samples with over 80% duplication (detected by FASTQC), will picard work for these samples?
                      Thanks,

                      Comment


                      • #12
                        I don't see why it wouldn't. By the way, it seems the numbers you get from FastQC usually overstate the duplication you detect with Picard.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Recent Innovations in Spatial Biology
                          by seqadmin


                          Spatial biology is an exciting field that encompasses a wide range of techniques and technologies aimed at mapping the organization and interactions of various biomolecules in their native environments. As this area of research progresses, new tools and methodologies are being introduced, accompanied by efforts to establish benchmarking standards and drive technological innovation.

                          3D Genomics
                          While spatial biology often involves studying proteins and RNAs in their...
                          01-01-2025, 07:30 PM
                        • seqadmin
                          Advancing Precision Medicine for Rare Diseases in Children
                          by seqadmin




                          Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                          12-16-2024, 07:57 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 01-09-2025, 04:04 PM
                        0 responses
                        443 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 01-09-2025, 09:42 AM
                        0 responses
                        444 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 01-08-2025, 03:17 PM
                        0 responses
                        459 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 01-03-2025, 11:18 AM
                        1 response
                        50 views
                        1 like
                        Last Post Tonia
                        by Tonia
                         
                        Working...
                        X