Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BBBBBB read quality

    Hello everyone,

    I have come across instances or articles stating that the reads with quality as only 'BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB' are meant to be LOW quality as they are called read-quality indicators. They define that it is the 3' end of the read which must be removed or filtered to prevent mis-assembly.

    Is this true or is that okay to ignore?

    Also, If i want to quality filter my illumina sequence file then is there any available tool or any speicific feature to recognise bad quality reads? so that I can create a custom program to do so if not ready..

    I know this might be a trivial question, but I am new to Illumina technology. Kindly help.

    Thanks in advance!

  • #2
    I assume that you are looking in the qseq.txt files. The last column in that file is whether the read passed the chastity filter or not. So any line with a 0 in that column should be dropped.

    do a: cat FILE | awk '$11 == 1' > NEWFILE

    If you want to map the reads with bwa/bowtie/b* you need to recalculate the QV score, this is mentioned in multiple old threads at seqanswers.

    Comment


    • #3
      hi

      want to know the same too..
      Last edited by pratibhamani; 07-04-2010, 10:19 PM.

      Comment


      • #4
        If you're seeing this you're probably using one of the more recent Illumina pipelines. Illumina uses a quality value of 2 (which is what B decodes to) as a way to mark the point at which it believes a read to become unreliable. This only applies to strings of B which run to the end of the read. In these cases this isn't intended as a true estimate of the error rate, but rather more as a flag to suggest you don't use these bases.

        Frankly it's a bit of a pain when collecting aggregate statistics about sequence qualities as it tends to skew the distribution of Phred scores.

        Comment


        • #5
          Thank you simonandrews..

          This is what I wanted to know. Okay so if I use these bases in assembly, say de-novo, then will it lead to mis-assembly? Should I remove these reads before I proceed or is it Okay to use them?

          Comment


          • #6
            Originally posted by ritzriya View Post
            Hello everyone,

            I have come across instances or articles stating that the reads with quality as only 'BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB' are meant to be LOW quality as they are called read-quality indicators. They define that it is the 3' end of the read which must be removed or filtered to prevent mis-assembly.

            Is this true or is that okay to ignore?

            Also, If i want to quality filter my illumina sequence file then is there any available tool or any speicific feature to recognise bad quality reads? so that I can create a custom program to do so if not ready..

            I know this might be a trivial question, but I am new to Illumina technology. Kindly help.

            Thanks in advance!
            Hi ritzriya,

            I can give you an simple script for quality control, which can filter out the low quality sequence or N's reads, it due to the option you set:
            -n N's content of the reads, default 0.5, i.e. if there are over 50% are N, it will be filtered.
            -q the lowest average quality control, default 20, i.e. if the average quality is lower than 20, it will be filtered
            -sd the standard deviation value of quality, default 10, it used to filtered the undulated and unstable quality reads


            Hope it would be help.
            Attached Files
            Last edited by BENM; 07-08-2010, 08:48 PM.

            Comment


            • #7
              Problem still unsolved!

              Thank you BENM for the filtering script. It will surely be useful to quality filter when necessary.

              But still my question remains the same- If I use these reads having quality of 'BBB' matter during assembly of reads - will it lead to misassembly if used or it won't make much of difference? That's all I want to know.

              Comment


              • #8
                Originally posted by pratibhamani View Post
                Thank you BENM for the filtering script. It will surely be useful to quality filter when necessary.

                But still my question remains the same- If I use these reads having quality of 'BBB' matter during assembly of reads - will it lead to misassembly if used or it won't make much of difference? That's all I want to know.

                It is due to Assembly tools you use.
                But for Solexa reads, there is just lower than 1% error rates. So these reads with high error rates must be in lower coverage, then they still can be figured out by common tools like VELVET, although VELVET doesn't consider about the quality of reads.

                Comment


                • #9
                  ??

                  It is due to Assembly tools you use.
                  But for Solexa reads, there is just lower than 1% error rates. So these reads with high error rates must be in lower coverage, then they still can be figured out by common tools like VELVET, although VELVET doesn't consider about the quality of reads.
                  That's exactly why I am asking this question. Because VELVET does not take care of the quality of the reads, I will have to before it starts processing.

                  It is obvious if my input is incorrect or with errors, then my output will not be pleasing enough, no matter how many times I change my kmer, right?

                  I hope everyone got what I have explained above..

                  Comment


                  • #10
                    Originally posted by pratibhamani View Post
                    That's exactly why I am asking this question. Because VELVET does not take care of the quality of the reads, I will have to before it starts processing.

                    It is obvious if my input is incorrect or with errors, then my output will not be pleasing enough, no matter how many times I change my kmer, right?

                    I hope everyone got what I have explained above..
                    Hello pratibhamani,

                    If you have enough sequencing coverage, it is no need to worry about the error rate of sequencing quality. Because in de Bruijn graphs algorithm, it can deal with them by weight of different coverage. It attributes to NGS high throughput and high quality control technologies.

                    If you're still anxious about it, you can make a comparison between non-pre-error-correction and pre-error-correction, using a known genome sequencing project. The other way, you can use another tools, like SOAPdenovo(http://soap.genomics.org.cn/soapdenovo.html), it is the same algorithm as VELVET, but with error correction function.

                    And for kmer option of VELVET is estimated by your sequencing depth, no error rate, see below link:

                    And you can use VELVET contrib package--"contrib/VelvetOptimiser-2.1.0/" to find out the appropriate kmer set.

                    Hope it would be help.

                    Comment


                    • #11
                      Thanks!

                      Yes BENM. I do have a good coverage of sequencing in my case, so I need not worry about these reads. Fine.

                      I will surely have a look at the links you have sent. Thanks for the information. It will help me surely!

                      Comment


                      • #12
                        Very helpful threads. And thanks BENM for the script!

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Current Approaches to Protein Sequencing
                          by seqadmin


                          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                          04-04-2024, 04:25 PM
                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 04-11-2024, 12:08 PM
                        0 responses
                        18 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 10:19 PM
                        0 responses
                        22 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 09:21 AM
                        0 responses
                        16 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-04-2024, 09:00 AM
                        0 responses
                        47 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X