Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • filtered_subreads.fastq contain multipass reads

    Hello , I'm new to the Pacbio. I strated with the SMRT Portal to pre-process my raw h5 files, so I had to use the RS_Subreads.1 protocol to filter reads by quality, length etc..., but I noticed that the output filtered_subreads.fastq contain full multi pass reads (
    CCS) and partial ones.
    I know that I have to extract CCS reads for error correction , but my question, do multi pass reads affect the assembly results since they are duplicated ans Is it worth to remove them from the filtered_subreads.fastq before the assembly step.
    Many thanks.

  • #2
    If you want CCS reads then you should use "RS_ReadsOfInsert" protocol. Not a very logical name but that is what we have now.

    Comment


    • #3
      Yes that's waht I expect to do. But is it worth to remove ccs reads and partial mutli pass reads from the fltered_subreads.fastq ?

      Comment


      • #4
        Are you going to assemble outside of SMRTportal or are you planning to use HGAP within SMRTportal?

        "CCS-like" reads (is probably the best term to use per PacBio) would give you the longest and best representation of that particular fragment.

        Comment


        • #5
          Yes I plan to use HGAP within SMRTportal and may be also the CeleraAssembler but of course after processing the error correction of Pacbio reads by using CCS and illumina reads.

          Comment


          • #6
            Yes I plan to use HGAP within SMRTportal and may be also the CeleraAssembler but of course after processing the error correction of Pacbio reads by using CCS and illumina reads.
            But how about multi (full and partial) pass in the filtered_subreads.fastq file generated by RS_Subread.1 protocol in the SMRT portal. do you suggest to remove these fragments from this file, as they are generally short, duplicated and with poor quality ?

            Comment


            • #7
              If you are going to do the assembly outside SMRTportal it may be best to filter out those reads (or use ReadsOfInsert output instead). A nice summary here: https://github.com/PacificBioscience...Bio-Long-Reads

              Since in HGAP_Assembly.2 protocol in SMRTPortal the following happens at step 1:

              Filtering Parameters (PreAssembler Filter v1)

              Minimum Subread Length: Subreads shorter than this value (in base pairs) are filtered out and excluded from analysis.
              Minimum Polymerase Read Quality: Polymerase reads with lower quality than this value are filtered out and excluded from analysis.
              Minimum Polymerase Read Length: Polymerase reads shorter than this value (in base pairs) are filtered out and excluded from analysis.

              Comment


              • #8
                Thank you for this nice summary. I have 50X pacbio reads and 50X illumina, so according to the summary, it is best for me to use Celera or Ectools for the assembly.
                I think that the HGAP PreAssembler Filter v1 step can be done also by the RS_Subread.1 protocol since it uses the same parameters and it does not filter out totallty the "duplicated" (multi pass) reads. So I expect to do a house script to remove these reads, which should have the same id beginning of the CCS reads and then replace them .
                Thank you again for your response

                Comment


                • #9
                  With 50X PacBio you can simply run HGAP without worrying about the illumina data. If you have SMRT Analysis installed, run the HGAP.3 protocol, with a reasonable estimate of genome size: http://programs.pacificbiosciences.c...3-07-15/2t6ztt

                  Comment


                  • #10
                    Thank you Rhall, but sorry I did a miss estimation of the pacbio reads coverage, I have only about 14x that's why I would perform the hybrid assembly. But before that, I have some steps to do:
                    Quality filtering,
                    CCS extraction,
                    Multi pass subreads removal,
                    Error correction,
                    And Chimeras removal.

                    Comment


                    • #11
                      To go into ECTools hybrid assembly all that needs to be done is run the basic filter protocol to generate a filtered_subreads.fasta file (RS_subreads.1 protocol in SMRT Analysis)

                      Comment


                      • #12
                        I ran the RS_subreads.1 protocol but I noticed that multi pass reads are still present in the filtered_subread.fasta, so I need a specific script to remove them.

                        Comment


                        • #13
                          Why do you need to remove them? You will only be using the longest reads for error correction, which will likely have few passes, it shouldn't be a problem for hybrid assembly.

                          Comment


                          • #14
                            In fact my genome is really complicated to assemble as it is ultra repeated, so I thought that reducing as maximum artifact reads would be benefict for asssembly resuts statistics. I expect to use all the reads for assembly (longest ones and short ones) So if I'am understanding you, you think that multi passe reads should't be a problem for the assembly ?

                            Comment


                            • #15
                              I wouldn't worry about it until after you have an initial assembly. The only problem I foresee is if the library wasn't good and the reads are all short, but in that case, no amount of filtering is going to improve the assembly.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Advancing Precision Medicine for Rare Diseases in Children
                                by seqadmin




                                Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                                12-16-2024, 07:57 AM
                              • seqadmin
                                Recent Advances in Sequencing Technologies
                                by seqadmin



                                Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                                Long-Read Sequencing
                                Long-read sequencing has seen remarkable advancements,...
                                12-02-2024, 01:49 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 12-17-2024, 10:28 AM
                              0 responses
                              23 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-13-2024, 08:24 AM
                              0 responses
                              42 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-12-2024, 07:41 AM
                              0 responses
                              28 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-11-2024, 07:45 AM
                              0 responses
                              42 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X