Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by swbarnes2 View Post
    The other possibility; doublecheck that your -aln command line was right. If you accidently put a typo in one of your fastq names, and one fastq doesn't actually get aligned, sampe proceeds along anyway, and it returns crazy large insert sizes. So try running samse on each of your individual fastqs. You want to know that they are working, and you want to know if the two files are in sync.
    Thanks a lot!

    I am trying it right now.

    Comment


    • #17
      I had a very similar problem which was very helpfully fixed using Trimmomatic and TrimGalore as detailed in this thread:
      http://seqanswers.com/forums/showthread.php?t=19874
      The author of TrimGalore was particularly accommodating in modifying the script to allow different trimming of R1 and R2.

      Comment


      • #18
        Does discarding the size estimate affect anything with the read data, the quality, or any potential variant calls?

        I am trying to determine if I should use the -A option for all of my data or if there is a way to dynamically determine that sampe will take forever and the -A option should be used.

        Thanks.

        Comment


        • #19
          Originally posted by rskr View Post
          I have seen it when one of the pairs was quality filtered but the other then it gets replaced with whatever was next in the file so, it not longer matches.

          1.1 1.2
          2.1 2.2
          3.1 3.2
          4.1 5.2 <--4.2 was omitted, they are no longer in parity.
          5.1 6.2

          I have a question regarding using the -A option in the case above. If the reads are out of sync, as is the case between 4.1 and 5.2, bwa will not perform SW on the unmapped mate. What happens after that? will 5.1 and 6.2 be thrown away also bc they do not match...etc? I guess what I am asking is, is it dangerous to use -A and force bwa to throw away unmatched pairs. Are we losing important data by doing this? And is the mismatch something that carries on to all the reads after the mismatch?

          Comment


          • #20
            -A should really only be used if you know that your files are lined up right, and you know that the insert sizes won't properly match what bwa is expecting.

            Fix your fastqs. You can pull out the singletons, align them separately, then combine the bams.

            Comment


            • #21
              Originally posted by swbarnes2 View Post
              -A should really only be used if you know that your files are lined up right, and you know that the insert sizes won't properly match what bwa is expecting.

              Fix your fastqs. You can pull out the singletons, align them separately, then combine the bams.
              Thank you so much for the advice. As tempting as it is to use -A as a quick solution, I am not completely comfortable with the idea because I don't completely understand what is being tossed away: the true "orphaned" read, or a read that does have a mate but simply does not line up correctly with its mate due to the presence of these singleton "orphans".

              I am looking for more details on this but havent found it yet. If anyone can confirm that only the true singletons are ignored, then I guess -A would be a good solution. In the meantime, I think barnes' advice is the safest.
              Last edited by dGho; 07-11-2013, 05:43 AM.

              Comment


              • #22
                Originally posted by swbarnes2 View Post
                -A should really only be used if you know that your files are lined up right, and you know that the insert sizes won't properly match what bwa is expecting.

                Fix your fastqs. You can pull out the singletons, align them separately, then combine the bams.
                Does anyone know how to pull out the singletons from paired end fastqs separated into two fastq (read1.fastq and read2.fastq)? I haven't found a tool that does this yet. Is this something I should write a script for?

                Comment


                • #23
                  Originally posted by swbarnes2 View Post
                  -A should really only be used if you know that your files are lined up right, and you know that the insert sizes won't properly match what bwa is expecting.

                  Fix your fastqs. You can pull out the singletons, align them separately, then combine the bams.
                  Could you describe a method for identifying singletons between one read.fq file and its mate? Thanks.

                  Comment


                  • #24
                    Originally posted by dGho View Post
                    I have a question regarding using the -A option in the case above. If the reads are out of sync, as is the case between 4.1 and 5.2, bwa will not perform SW on the unmapped mate. What happens after that? will 5.1 and 6.2 be thrown away also bc they do not match...etc? I guess what I am asking is, is it dangerous to use -A and force bwa to throw away unmatched pairs. Are we losing important data by doing this? And is the mismatch something that carries on to all the reads after the mismatch?
                    Any results with files out of parity are invalid(in addition to being a waste of time waiting for the results). If the files are in parity, and the mate doesn't map, is a different question.

                    Comment


                    • #25
                      answering my own question, but if anyone else is looking for a way remove singletons, check out this thread. I am trying this out now. azneto shared his script for making sure that two fastqs are in sync. It seems to use a whole lot of ram though

                      http://seqanswers.com/forums/showthread.php?t=17974

                      Comment


                      • #26
                        I just wanted to confirm that azneto's script worked well. It removed singletons and ordered the two fastq files so reads were synchronized. Running bwa sampe on the resulting fastqs produced no errors and had runtimes that feel within the expected range
                        .

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Essential Discoveries and Tools in Epitranscriptomics
                          by seqadmin




                          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                          04-22-2024, 07:01 AM
                        • seqadmin
                          Current Approaches to Protein Sequencing
                          by seqadmin


                          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                          04-04-2024, 04:25 PM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, Yesterday, 08:47 AM
                        0 responses
                        12 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-11-2024, 12:08 PM
                        0 responses
                        60 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 10:19 PM
                        0 responses
                        60 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 09:21 AM
                        0 responses
                        54 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X