Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to fix fastq files from a bacteria sample?

    I have a pair of PE read files generated from a Truseq library. It has 33M pairs of 150bp reads. The bacteria is S auerus which has a genome size of 2.8Gbp. My goal is to call protein coding variants.

    I am getting Per Base Sequence Quality, Sequence Duplication Level and Kmer Content errors from fastqc.

    Does this look like it is only because of over-sequencing such that I don't need to do anything and let picard remove duplicates for me after mapping? If not, can you tell me what else I need to do to fix up the fastq.

    Thanks a lot in advance!
    Attached Files

  • #2
    I would say go ahead with the analysis and then see if you have a real problem once you have alignment results.

    Comment


    • #3
      It looks like you have over 3000x coverage, so it's not surprising that some reads appear to be duplicates. You should not deduplicate them unless the library was amplified. But even then, at such a high coverage level, I don't think deduplication is wise because there will be so many true duplicates that occur just by random chance.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Advancing Precision Medicine for Rare Diseases in Children
        by seqadmin




        Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
        12-16-2024, 07:57 AM
      • seqadmin
        Recent Advances in Sequencing Technologies
        by seqadmin



        Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

        Long-Read Sequencing
        Long-read sequencing has seen remarkable advancements,...
        12-02-2024, 01:49 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 12-17-2024, 10:28 AM
      0 responses
      26 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 12-13-2024, 08:24 AM
      0 responses
      43 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 12-12-2024, 07:41 AM
      0 responses
      29 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 12-11-2024, 07:45 AM
      0 responses
      42 views
      0 likes
      Last Post seqadmin  
      Working...
      X