Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Aligning transcritptome with trimmed or untrimmed data

    I recently aligned the transcriptomes of 5 different algal species, using oases.

    I found that I get higher N50 values and maximal contig lengths if I use the untrimmed data. Furthermore, even though the percentage of reads used is higher if I use trimmed data, overall with the massive reduction in data if I trim it, I use more of my reads if I assemble the transcriptomes with untrimmed data.

    I know one side effect of sequencing errors is a higher RAM requirement, but besides that, is there any other negative (or positive) effect if I use untrimmed data for my assembly?

    RAM wasn't really an issue for me, since I had access to a high performance computer with several nodes with 64 GB RAM each.

  • #2
    Is there really nobody with an answer to this question?

    Comment


    • #3
      Originally posted by RogerH View Post
      I recently aligned the transcriptomes of 5 different algal species, using oases.

      I found that I get higher N50 values and maximal contig lengths if I use the untrimmed data. Furthermore, even though the percentage of reads used is higher if I use trimmed data, overall with the massive reduction in data if I trim it, I use more of my reads if I assemble the transcriptomes with untrimmed data.

      I know one side effect of sequencing errors is a higher RAM requirement, but besides that, is there any other negative (or positive) effect if I use untrimmed data for my assembly?

      RAM wasn't really an issue for me, since I had access to a high performance computer with several nodes with 64 GB RAM each.
      Hi,

      I'm not sure what type of reads you are using, but if you are using Illumina reads, you should always trim off the first 12 to 15 bases, as it presents substantial biases.
      Did you do a FastQC quality check?
      If you see some severe biases in the 5' end, you should trim this off. I also trim off some 3' end bases, depending on whether the quality of the reads falls off dramatically. In addition I filter out reads containing even one base that drops below a certain Q score.

      If you use untrimmed reads, while you may get more contigs from this it will be quite unreliable due to misassemblies and possibly chimeras.

      Cheers

      Comment


      • #4
        Hi,

        Thanks for the reply. Yes, I'm using Illumina 100bp paired-end data.

        My supervisor told me that I should just try trimmed and untrimmend, and then suggested that I use the untrimmed assembly for annotation. But I did fear that there might be a problem with that.

        I used FastQC on my data, there is a bit of a problem with the GCAT content in the first 10 bp (due to the not-so-random random primers that are used for Illumina library preparation I believe). And the Q value of the last 15-20 bases drops off considerably.

        The problem is that I'm pressed for time, so before Christmas I decided to stop working on the assembly and go ahead with the annotation step (which takes a considerable amount of time, using Blast2go).

        Comment


        • #5
          Originally posted by RogerH View Post
          Hi,

          Thanks for the reply. Yes, I'm using Illumina 100bp paired-end data.

          My supervisor told me that I should just try trimmed and untrimmend, and then suggested that I use the untrimmed assembly for annotation. But I did fear that there might be a problem with that.

          I used FastQC on my data, there is a bit of a problem with the GCAT content in the first 10 bp (due to the not-so-random random primers that are used for Illumina library preparation I believe). And the Q value of the last 15-20 bases drops off considerably.

          The problem is that I'm pressed for time, so before Christmas I decided to stop working on the assembly and go ahead with the annotation step (which takes a considerable amount of time, using Blast2go).
          Sounds like you are doing exactly the same thing as me.
          I am also using blast2go now, and I have around 240k transcripts, and this will probably take 2 weeks or more as i am doing it through the web-interface.
          I also have 100-nt paired end Illumina reads, and the first 10 bases or so is like yours. It is indeed due to the not-so-random nature of the random hexamers used the library prep. I trimmed off the first 12 for good measure, though trimming off 15 is not unusual.

          Unfortunately I don't think the annotation for the untrimmed data would be reliable, particularly since you say the Q score of the 3' end also drops off alot. I would recommend using trimmed data.

          Comment


          • #6
            Thanks, this is really helpful.

            I'm mainly interested to find a handful of housekeeping genes and another handful of genes of interest to design qPCR primers for unsequenced species, but not a complete transcriptome at this stage.

            I did manage to find some sequences that did match published sequences of my key enzyme, but there were also some weird results.

            As I said, I'm a bit behind with my PhD (who isn't) and I'm hard pressed for time this year. So I think I will just go ahead and try to find my enzymes with the assembly I have, but also assemble a better transcriptome at the side.

            240k transcripts in 2 weeks is fairly ambitious based on my experience. I annotated 5 different species in parallel, and the one with 80k transcripts took over a month. But maybe I was doing something wrong. I might look into a local blast to speed up things for my annotation of the untrimmed data.

            Comment


            • #7
              Actually, thank you for the info. It's my first time using blast2go, so i'll keep in mind the time expected.
              Cheers.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 03-27-2024, 06:37 PM
              0 responses
              12 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-27-2024, 06:07 PM
              0 responses
              11 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              53 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              68 views
              0 likes
              Last Post seqadmin  
              Working...
              X