Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Depending on the type of libraries that were made high duplication levels can be normal. In total RNA preps, rRNAs will be so abundant that they will show lots of duplicates. Many highly abundant mRNA transcripts will cause the same thing in other types of preps. Often times if you take overrepressented sequences from fastqc and blast them, you’re going to find rRNAs or even other transcripts at times. Remember fastqc only checks the first 50bp of 200K reads for its duplication check and its only taking into account one side of the pair. Picardtools markduplicates will give you the kind of duplication checks you really need.

    Comment


    • #17
      Originally posted by GenoMax View Post
      Just to confirm: These samples have already been demultiplexed, correct? The only time I remember seeing that sort of pattern at the end of the read is for the "tag".
      What I got from the core facility is the separate fastq files, a R1 and a R2 file per sample. So I assume that the samples have been demultiplexed. If in case the tags are left in the sequences, is there anyway to check it?

      I'll read Simon's article about duplication level. Thank you!

      Comment


      • #18
        Originally posted by zzhao2 View Post
        What I got from the core facility is the separate fastq files, a R1 and a R2 file per sample. So I assume that the samples have been demultiplexed. If in case the tags are left in the sequences, is there anyway to check it?
        Tags won't be there is you received separate R1/R2 files for each sample. You may have some adapter left (if the inserts were short) but you have already done trimming, so most should be gone. Consider @mastal's advice a couple of posts up about trimmomatic, in case some adapters are left.

        Comment


        • #19
          Originally posted by mastal View Post
          Try running FastQC using the --nogroup parameter, it will let you see how many bases at the end of the read are affected, it could be only the last base, which often has a much lower quality than the rest of the read.
          I've run FastQC with --nogroup. This is very helpful. I see 5 or 6 bases with abnormal GC. Please see attached plots.
          As for the trimmomatic trimming, how long are the adapter sequences you are using for palindrome trimming?
          With a threshold score of 30 for palindrome trimming, if each matching base adds 0.6 to the score (see the trimmomatic web page), unless the sequences in your adapter.fasta file are quite long, trimmomatic will not recognise and trim the adapter sequences.
          I used the Illumina's TruSeq3-PE-2.fa file provided by trimmomatic. The sequences there are 34 bases long.

          You can use grep to see how many adapter sequences are in your reads before and after trimmomatic trimming.
          Just by searching the 34-base sequences in the trimmomatic's adapter file, I found only <10 exact matches in both my raw reads and trimmed reads. But trimmomatic reported that thousands of reads were trimmed during a test run which trimmed adapters only.

          Comment


          • #20
            Originally posted by Wallysb01 View Post
            Picardtools markduplicates will give you the kind of duplication checks you really need.
            Do you mean that it's safe to remove duplicates marked by Picard? I think it's easier to check PCR duplicates for PE reads. What about SE reads? Is Picard equally reliable or not?

            Comment


            • #22
              Just FYI, I tried trimmomatic with different palindrome clip thresholds including 10,15,20, and 30, and all gave me very similar numbers of dropped sequences. I think this is consistent with the following words in trimmomatic's manual:
              "For palindromic matches, a longer alignment is possible, as described above. Therefore this
              threshold can be higher, in the range of 30. Even though this threshold is very high (requiring
              a match of almost 50 bases) Trimmomatic is still able to identify very, very short adapter
              fragments."
              So it sounds like 30 should be OK, and based on my tests different thresholds didn't affect the dropped sequences that much, so I would assume that they also output similar trimming results.

              Comment


              • #23
                Hi,
                Just another line of thought for trimming or not http://journal.frontiersin.org/Journ...014.00017/full

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                30 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                32 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                28 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                53 views
                0 likes
                Last Post seqadmin  
                Working...
                X