Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    well, there are probably at least two genomes in there and lots of contaminating host DNA. One genome is pretty easy because it has >95% sequence identity to an annotated genome. The other is more of a challenge b/c is likely quite diverged from anything else. both are under 500kb in length.

    jt

    also i tried aggressive filtering. reads >50bp, quality >30 for 100% of the read.
    cut it down to 5 million PE reads. didn't help. did lower the coverage of the more abundant genome accordingly, but also reduced the contig lengths and didn't effect the number of Ns in the contigs.
    Last edited by jvanleuven; 11-04-2011, 12:45 PM.

    Comment


    • #17
      cliffbeall

      I would try being very aggressive in quality filtering/trimming. Most accounts that I have seen don't see improvement beyond 50X coverage, so you should be able to be selective with your input data.
      [filter using fastx-tool kit - remove reads < 20bp in length, <30 quality over 90% of the read
      Is it a good quality filtering/trimming no?

      Comment


      • #18
        Not sure if that is good enough filtering, this is the first time i've tried it.

        I think it is pretty good. The first few bases and the last few bases have average qualities around 34, the rest are higher.

        Comment


        • #19
          Thanks for all the help. Still not sure about how to fix the N gaps created by scaffolding. I am thinking about using Cap3 to try and cover the gaps with contigs created from a single read assembly using all the quality filtered reads.

          What software may be used to map lots of shorter contigs to longer contigs to give a consensus sequence?

          Comment


          • #20
            It's not really a case of fixing them as Velvet is simply telling you that it thinks two contigs are joined together with a gap of a specific length by reference to paired-end information. But it cannot fill in that gap with sequence because there is no apparent unambiguous path through the assembly graph. This is usually a result of repetitive sequence longer than the read length, but sometimes could be due to low coverage.

            Basically you have scaffolds formed of contigs (actually I wish Velvet would call the output scaffolds.fa not contigs.fa to make this more clear, or produce an AGP file, but that's another matter).

            You might find that GapCloser (part of SOAPdenovo) can be coaxed into filling some of these gaps, by performing local assemblies using pair-halves which fall within the gap region as determined by some mapping approach (http://soap.genomics.org.cn/about.html). IMAGE is another utility that does this (http://genomebiology.com/2010/11/4/R41). In my experience results are mixed and it will depend on the repeats in your genome.

            Comment


            • #21
              And, PS

              Read filtering is probably not the significant factor here.

              Comment


              • #22
                yeah, i think my reads are sufficiently filtered. I think my problem may be a result of such widely varying coverages, or maybe variance in my insert size, or possibly lots of repeats? both genomes are AT rich (25%).

                if I can resolve some of the N gaps with single reads, my contigs are long enough.

                jt

                Comment


                • #23
                  99 times out of 100 it will be due to repeats. It's normal!

                  Comment


                  • #24
                    Yeah, but a lot of my contigs look like this;


                    NODE_8445_length_3509_cov_38.863495
                    ATGGGATTGGAATACAAGAAGTTACTTCAAATATATAACATGTCGAACAACAAACATGGA
                    GATAATCGATATCGCTAAATCAAAGCTATCAGAAAGACATCCTGAATGTTGATTTTAAAA
                    CAATTGATAACGAAGTAATATTGGAATTGATATTCGATTTATCCAATGGTATTAAAATAA
                    ATGTTCAAGACATAGATATAAAGAAGGATATTAAGTTCTTAGAATCACGTATTGATCTAA
                    AAGATGATAAATATCAATATCAATTATACGATGGGATAACAGGTTTAAAACATGATAGAA
                    AGACAACGGTTGGAATTTATGTATGTTTATAAATTAAATCACATGGTAGATGATAAAATG
                    TATGCCAGATCTATAGGACCTTATTCAACTATTACACAACAACCACTTAAAGGTAAGTTC
                    CATAAAGGAGGACAGATGTTGGGTGAGATGGAGGTATGGTTGTTGTAAAGCTATGGTGTT
                    GCATTCGTTACAAACGAAGCATTGACCGCTAAATGTGATGATATTCAAGCATGAAACAAA
                    TTGCATGATAACATGTTATTCGGTACACCATTGNNNNNNNNNNTTTAGATTAGTAATGTT
                    GTATGGCTTTAGATTAGTAATGTTGTATGGTGATTCGATTTTGTTATTAAAATATGATGT
                    TAATTTATGATTATTGATTATAAAGAATAGCGACAAGTTGTAATATTATTGGTATGANNN
                    NNNNNNNGAGTTAACGATAGCCTCTCCAGAGGCTACAATGACCTATTCTTGGGGAGAGGT
                    ATACAACGGTATTGGGTTCGATTTTGTTGTTAAAATATGATGTTAATTTGTGATTATTGA
                    TTATAAAGAATAGTGACAAGTTGTAATATTATTGGTATGAAAGTTATCAAGTAGTATATT
                    GTCTGTTACTAATATAATCAATGAGATATATAAACACTGTGATTTGTTAAAGGTANNNNN
                    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
                    NNNNNNNNNNNNNNNNAATTATCAACATAAGTTAATAGGTTGATTGATAAAACAACAAAT
                    TGATAGATTATTGGGTATAATTTGGTCAAANNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
                    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNAATTT
                    GAATTATTTACTCTAAATTGTATAAATCACTTGGTAATGATGGATTCAATAAATCTCTAA
                    GATAACAAACGGTTTTTATTTCTTAAATAACTATCTGATAACAAAACAGTTGCTTTAGTG
                    ATATCGTTATCAATTCTTTTCAGAAAACCTAATCGGTTGTCTTCACAAATTCCTTTAACT
                    AATTTAATCTTATTTATAAATTCCTTTCTAACTGTTCTGATACTTGTTTTATATCTTTCA
                    AACATTTTAGATAATTCATTCAAAGCAATCTTTTGTCTCTGTGTTATTATTGATGGTAAG
                    GCTTAACTTGATATGATTCCCGTTAAGTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
                    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNAATTATATCGTTTGTTCTAAGTTTACA
                    TATTGGTCGTAGTTCAATAATTGTATCAGAAAGATAAGTACATTNNNNNNNNNNGGACTT
                    CAATGATTGATTTTTTGTTGTTCTTATTTCATTAACAAGAGACATATCAAGTCTTGATGG
                    TAAAATTTTGTTTGATATAAGATCAAAATTTTTAATTTAAGTTTAATAATAATTGGTTCA
                    ATTGTATGTATATATCTTCCAATGTTAACTTTGAATAATTTGGCTTTAATTTAAACAATT
                    ATAAGAATTAGTTCAGTTCATCTAATATTCAGATGGAAATTAAACGAGAAATAACATATT
                    GAATTATAGAACAAACAATAAGAGAACCAACAATAAGATAATTTGGAATTGTAATCCAAT
                    TTAGATGAGTTTAGTTTATAAATTTTGATAATCTCTGGAATATATTAATTTAAATGCAGT
                    TGTTGGTAGAATCTNNNNNNNNNNGAAAATAATATATTGAATTATAGAATCAACAATAAG
                    ATAATTTGGGATTATAATCCAATTTAGATGAGTTTTAGTTATAAATTTTGATAATCTTTG
                    GAAGATATTAATTTAAATGCAGTTAGTTGGTAGAATTTGGAAATAAACAGATGATCTTTA
                    AATCACTATTTTTCTTAAAANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
                    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNAATAAAATATTAACAGTTGATCAATAA
                    TTCGATATTATTAAACGAATCCAATGAAAATAAACAAGTTAATAACNNNNNNNNNNGATA
                    AAATATTAACAGTTGATCAATAATTCAATATTATTGAATGAATCCAATGAAAATAAACAA
                    GTTAATAACCAATTTTATTGGCATTGATAGATAGTTGGTTTCTATTCCTAACAAATACTT
                    GATTACTTAATAAGATTCAAAGGAGAGATAAATAACTTATGGTCCGTTTTAAACTAATTA
                    ATGATTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
                    NNNNNNNNNNNGACAATTAATAACAATTTATATCTGAATTATATTCGATTATTTTAGAAA
                    GAGATCGTTTAAAAAAAAANNNNNNNNNNTATTGATAGATAGTTGGTTCTTATTTCTAAC
                    AAATATTTGATTAGTTAATAAGTTTCAAAGGAGAGTATAATAACTTANNNNNNNNNNNNN
                    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
                    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
                    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGATTATTTTCTCTATGT
                    TTATATTTATAATTTTCGATGAAATCATATTCTTGATAACTCTCTAATAAGAGTAATTTT
                    TAAATCTAATAATATTAAAACTCAGTTTGAGAAATTAGTTTGTTACGATCGAATATAAAT
                    TAAAATTAAGCACAACAAACCTAATAATTTTGATCAAAATTTATTAAGTGTTTAATTTAT
                    GTGTGTTAAATATTATAAATAATACTTAATTAACTTAATTTAAAAATTAATTTGTTGGTA
                    ATAATTTTTATTTTTTAATGTGTTAAGATTTATCGTATTAATTATATAATAATTTATTAT
                    TCTTAATAGAAGTAACACTAATAACTTATCGGATATTGATTGAATCTATTGAAATTGATG
                    TTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
                    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNAGCGGTTACATATATTATTTTTGAA
                    TTGTGAGAGATATGAATTTCATTTTCTTTATTCAAAATAAACAATTATTTATTTCAATT

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM
                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    27 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    30 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    26 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-04-2024, 09:00 AM
                    0 responses
                    52 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X