Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    You may wish to keep the (possibly few) long reads which have high quality, instead of clipping them at a fixed length. It is a bit difficult to read the quality boxplot, could you upload it in full resolution?
    Did you check for adapter contamination? If you ran FastQC on it, you should get an idea if you have some overrepresented sequences or k-mers in your reads.

    Comment


    • #17
      This boxplot (a tiny bit bigger) is biggest file they will let me upload on SeqAnswers. Alternatively another version...

      I ran FastQC - there are no overrepresented seq's or anything.
      I do get over-represented kmers but these are in the first 5bp (which I trimmed) and in the very edge after 100bp (which I wasnt using either) - picture attached.

      I will try to trim by quality and not a fixed length and see if that helps any.
      Attached Files

      Comment


      • #18
        Originally posted by Noa View Post
        I also tried MIRA yesterday for the first time and got with 2 million trimmed reads (55bp): 22000 contigs covering consensus of 4.8M largest conti is 3200bp, N50 is 287 - which is also not so great for a bacterial genome. Happy for advice there too!
        Since you've tried a de brujin assembler an an OLC one, have you tried to combine the contigs together using Minimus? Sometimes it works wonders especially if youre getting alot of overlaps but sometimes it doesnt help at all, but still worth a shot.

        Comment


        • #19
          Trim the Reads?

          I've noticed that trimming the reads has had a significant impact on assembly results. I've tried this on both ABySS and Velvet and it seems to work fairly well in terms of generating larger contigs. I haven't done this for Illumina reads (solid only) but it could potentially be worthwhile. For my solid data I've trimmed the 5' end of the 3' mate and it has improved the quality under the same settings used by the assembler. If the quality does drop off on the 3' mate of your reads, try trimming to, say, 50bp? I'd be interested to see the results if you do so.

          Hope this helps!

          Comment


          • #20
            Thanks a million- the trimming made a HUGE difference.
            I had actually trimmed too much before i guess- the reads were worse near the end but I think I trimmed off too much.
            I did a sliding window now for quality score and then trashed any sequences that were too short, and now i got everything in 300 contigs, and the largest is 2.5M which is about half the genome! Yahoo!

            Comment


            • #21
              That's fantastic, Noa!

              For my personal interest, what lengths did you trim them to? Did you just trim the 3' mate? Details details details!

              Comment


              • #22
                They were originally 144 bp (each mate). I trimmed the first 5bp and the last 44 since they looked bad. Then I did a sliding window requiring a quality score of 20. Then I trashed any sequence that was left with less than 40bp. I looked at each step with FASTQC by eye to eyeball the next step with respect to trimming.

                I used velvetoptimiser and so far it seems that 25kmer was the best but I want to check some >31kmers manually tomorrow or next week. (I have to run only a few kmers at a time due to memory constraints on my machine).

                Let me know if I can give you more details, and I will also get back to you once I fully complete the runs. I also want to fool around some more with MIRA, or maybe taking MIRA data to velvet? Does that even make sense?

                n

                Comment


                • #23
                  Just to clarify: I'm assuming that the sliding window trimmed what didn't have a quality score of 20?

                  In regards to the MIRA assembler, I don't know if it would make much of a difference since you're dealing with a bacterial genome and they're generally not as complex as other eukaryotes, so it may be wasting your time. However, I have no experience in doing that and would be interested to see what you come up with.

                  The more knowledge about assembly the better off we'll be!

                  Comment


                  • #24
                    yup- trimmed anything without a mean score of 20.

                    not sure i will get anything better with MIRA but hey, worth a shot. Basically i wanted to try one deBruijn and one OCL algorithm and MIRA and velvet got the best reviews for bacterial genomes.

                    I will update (and feel free to remind me if i didnt update in a week or so!)

                    Comment


                    • #25
                      Hi Noa, some colleagues and I cobbled together a pipeline to automate the process of quality trimming, error correction, contig assembly, scaffolding, and some QC for bacterial and archaeal genomes. We've seen great results on our data, but could really use feedback from others about it. Available here. Documentation here. At least one other person in the seqanswers forum has tried it, and ran it successfully but apparently had sequenced a mixed culture so interpreting the result was ambiguous.

                      If you're interested and willing to give it a go I'll try to field any questions that may arise.

                      Comment


                      • #26
                        Thanks- I will download hopefully next week and give it a try!
                        I will let you know how it goes.

                        Comment


                        • #27
                          What software do you use for trimming and sliding window?

                          Comment


                          • #28
                            Ive been working with Galaxy (http://main.g2.bx.psu.edu/) for a lot of my analyses -

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Current Approaches to Protein Sequencing
                              by seqadmin


                              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                              04-04-2024, 04:25 PM
                            • seqadmin
                              Strategies for Sequencing Challenging Samples
                              by seqadmin


                              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                              03-22-2024, 06:39 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, 04-11-2024, 12:08 PM
                            0 responses
                            18 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-10-2024, 10:19 PM
                            0 responses
                            22 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-10-2024, 09:21 AM
                            0 responses
                            16 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-04-2024, 09:00 AM
                            0 responses
                            47 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X