Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • problem with low coverage genome sequencing

    hi ,
    I am doing a whole genome sequencing of a fungi of size ~20 mb. I have done a 500 bp library paired end 2*75 bp illumina miseq sequencing with 50x coverage. But several genes (~8% core genes) are remained uncovered. It is not an assembly issue. I have checked several short fragments of the genes in the raw fastq file and they are absent.Now what should I do to cover the maximum genes? Will a mate pair library with larger inserts help? or sequencing with ion torrent with its larger product size would be a better option?

  • #2
    If it's a platform-specific bias then a different platform might help. Illumina has problems with super-high GC areas, for example. Also, overamplification will increase coverage variability.

    But if the uncovered genes are completely uncovered, maybe they don't actually exist in your sample? And anyway, I don't understand what you mean by "I have checked several short fragments of the genes in the raw fastq file and they are absent". What exactly did you do?

    Comment


    • #3
      actually i have some gene sequences from pcr products. So the genes are there. i have searched short portions from those say 35 bases in the raw sequences and have not found them. Also there are several core genes absent so i think it is a coverage related problem.Can you plz suggest between ion torrent and large mate pair library which approach would be better?

      Comment


      • #4
        I don't know much about the Ion Torrent platform, so all I can say is "maybe". I would not expect a long-mate-pair library to fill in low-coverage areas of a fragment library. PacBio data has a very unbiased coverage distribution, though, so if you have access to it, I would try that.

        Comment


        • #5
          Looks like you’re just hitting that random chance of uncovered regions from the shotgun strategy. Mate pair libraries are for scaffolding so don’t directly help in recovering genes, but they can do some things that might still be of benefit. That is, they will allow you to close the gaps between the generated scaffolds from your original PE reads. While some regions might not have assembled in your contigs from the PE reads, more will come up with gap filling after scaffolding. With the illumina mate pairs you’ll have to understand too that few cores will do that library prep. So you’ll have to call around to find a good core. I’ve heard Hudson Alpha’s core is good and does this, but its been a while since I was part of pricing this out.

          Given that your genome is 40mb, you should have a lot of options if you have the money to do additional sequencing. One easy options could be to track down a MiSeq run with the 2x250 reads and use roughly at 400bp library size. Another good option would be PacBio, as suggested above, but you’ll have to figure out the best approach for combining the PacBio and Illumina data for your use case. Mixing data types is something that hasn’t been worked out as thoroughly as an all Illumina assembly, but there are good reasons to go with different types of technologies and PacBio in particularly.

          PacBio can be used to create an assembly all by itself if you have the money to get the require coverage. If each SMRT cell produces about 300Mbp, you’d probably want 3 or 4 of them if you’re going to take the PacBio only assembly approach (thinking you should get 50x coverage or so). Then you could use your current PE illumina reads to just correct errors via alignment to your draft genome using something like iCORN which comes with PAGIT. Now, PacBio also has the ability to be corrected by your Illumina data before assembly, and this requires less PacBio read depth, but if you’re concerned you don’t have what you want in your Illumina data, then you might be left with low quality (from PacBio only regions) or still gaps in your assembly.

          So, I’d think you have several options which could be mixed and matched depending on budget and how “finished” you ultimately want this genome to be.

          1) High coverage PacBio (1 library prep and 3-4 SMRT cells ~$1250-$2000 depending on pricing available to you)
          2) Low coverage PacBio (1 library prep and 1 SMRT cell ~$500-$1000)
          3) Additional PE Illumina sequencing (1 MiSeq lane, 1 library prep $1500)
          4) Illumina mate pair library (1 MiSeq lane 1 library prep $1500)

          Now I’ve only dealt with vertebrate genomes, so I might be off, but personally I’d lean towards a high coverage PacBio, potentially with Illumina used for corrections after the fact. This would be especially true if you have a university core that will do preps for ~$400 and sequencing for $250 per SMRT cell.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 11:49 AM
          0 responses
          15 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-24-2024, 08:47 AM
          0 responses
          16 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          62 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          60 views
          0 likes
          Last Post seqadmin  
          Working...
          X