Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • featureCounts 'Found reads that are not properly paired' despite being sorted by STAR

    Hi all,

    I have mapped 101 bp PE reads to GRCh38 and I am using featureCounts for read summarization.

    Why is it that, featureCounts says 'Found reads that are not properly paired. (missing mate or the mate is not the next read)', when STAR was run with argument '--outSAMtype BAM SortedByCoordinate'?

    Cheers,
    Leon

  • #2
    'Found reads that are not properly paired. (missing mate or the mate is not the next read)'

    '--outSAMtype BAM SortedByCoordinate'

    So mate is not the next read necessarily if BAM is sorted by coordinate. Although the featureCounts help does say that if input is not correctly sorted it will automatically sort, so perhaps you need to update to latest version?

    Comment


    • #3
      Originally posted by bruce01 View Post
      'Found reads that are not properly paired. (missing mate or the mate is not the next read)'

      '--outSAMtype BAM SortedByCoordinate'

      So mate is not the next read necessarily if BAM is sorted by coordinate. Although the featureCounts help does say that if input is not correctly sorted it will automatically sort, so perhaps you need to update to latest version?
      Thanks for input, I think I'm using the latest version(s):

      star --version
      STAR_2.4.0i
      featureCounts -v
      featureCounts v1.4.6

      Cheers,
      Leon

      Comment


      • #4
        Did you try sorting by name and see if it works? TBH I ran the same thing as you about two weeks ago and didn't get this error; my STAR is f1 so may be worth posting to STAR Google group as I am using same version of featureCounts.

        Comment


        • #5
          Originally posted by LeonDK View Post
          Hi all,

          I have mapped 101 bp PE reads to GRCh38 and I am using featureCounts for read summarization.

          Why is it that, featureCounts says 'Found reads that are not properly paired. (missing mate or the mate is not the next read)', when STAR was run with argument '--outSAMtype BAM SortedByCoordinate'?

          Cheers,
          Leon
          Hi Leon,

          featureCount gives this warning, but it keeps working (I believe, it re-sorts the BAM file) and produces the final result without any errors. So I think there is no problem here with either STAR (2.4.0i) or featureCount.

          Cheers
          Alex

          Comment


          • #6
            Originally posted by alexdobin View Post
            Hi Leon,

            featureCount gives this warning, but it keeps working (I believe, it re-sorts the BAM file) and produces the final result without any errors. So I think there is no problem here with either STAR (2.4.0i) or featureCount.

            Cheers
            Alex
            Hi Alex,

            Thanks for input!

            I'm sure I understand, why featureCounts perceives the input bam file as being NOT sorted, when I specifically chose to output a SORTED bam file from my star run?

            Best,
            Leon

            Comment


            • #7
              Originally posted by LeonDK View Post
              I'm sure I understand, why featureCounts perceives the input bam file as being NOT sorted, when I specifically chose to output a SORTED bam file from my star run?

              Best,
              Leon
              When read pairs are reported multiple times in a bam file, featureCounts not only requires the two reads from the same pair to have the same name, but also requires the mate position reported in one read to match with the mapping position of the other read to make sure the two reads are properly paired (next to each other in the bam file). If this is the not the case, featureCounts will resort it.

              This might be the discrepancy between STAR sorting and featureCounts sorting. We have already observed this discrepancy between featureCounts sorting and samtools sorting by names.

              Wei

              Comment


              • #8
                Originally posted by shi View Post
                When read pairs are reported multiple times in a bam file, featureCounts not only requires the two reads from the same pair to have the same name, but also requires the mate position reported in one read to match with the mapping position of the other read to make sure the two reads are properly paired (next to each other in the bam file). If this is the not the case, featureCounts will resort it.

                This might be the discrepancy between STAR sorting and featureCounts sorting. We have already observed this discrepancy between featureCounts sorting and samtools sorting by names.

                Wei
                Okay, thanks for input!

                I guess the bottomline is if i'm all good using the STAR option '--outSAMtype BAM SortedByCoordinate' and then feeding the result into featureCounts, despite this 'error message'?

                Cheers,
                Leon

                Comment


                • #9
                  Your featureCounts results should be correct because featureCounts resorts the bam file for you. See my post below for more comments.

                  Wei
                  Last edited by shi; 01-29-2015, 03:57 PM. Reason: correction

                  Comment


                  • #10
                    Hi Leon, Shi,

                    when STAR is run with --outSAMtype BAM SortedByCoordinate option, it will output BAM sorted by coordinate, which, I think, is not the preferred format for featureCount. That's why featureCount produces this warning message.

                    You can also make STAR output unsorted file Aligned.out.bam with "--outSAMtype BAM SortedByCoordinate Unsorted". In this file the mates are adjacent to each other, so it conforms to featureCount requirements and the warning message is not generated.

                    Shi,
                    when working with the file sorted by coordinate, featureCount says:
                    || Found reads that are not properly paired. ||
                    || (missing mate or the mate is not the next read) ||
                    || 0 read has missing mates. ||
                    || Input was converted to a format accepted by featureCounts. ||
                    It seems to me that the file is actually re-sorted to an acceptable format, and the calculation proceeds without any trouble. The results for both the "proper" format and "sorted by cooridnate" files are exactly the same. Should we conclude then that both the "SortedByCoordinate" and "Unsorted" formats from STAR are compatible with featureCount?

                    In practical terms, for large enough files, I am not sure how long featueCount re-sorting will take, so it might be more efficient to output both "SortedByCoordinate" and "Unsorted", run the latter file through featureCount, and then delete it.

                    Cheers
                    Alex

                    Comment


                    • #11
                      Hi Alex,

                      Yes, you are right. featureCounts resorts the coordinate-sorted bam file output from STAR and the result should be correct (I have corrected my last post). You can conclude that both sorted and unsorted STAR output work with featureCounts.

                      The re-sorting process is very time-consuming, so it will be a lot better if unsorted reads are saved in the bam file.

                      Cheers,
                      Wei

                      Comment


                      • #12
                        I have a similar problem both for sorted and unsorted output from star.
                        featurecounts resorts the bam. For the file in question feature count reports 1 read has missing mates. I thought STAR does not output unpaired reads. Also RseQC
                        bam_stat.py reports that all reads are properly paired. Could this have to do with multi mapping reads ?

                        Comment


                        • #13
                          It is possible that this was due to the reporting of multi-mapping reads. If one read from a read pair was reported more times than the other read from the same pair, featureCounts will report that the first read has missing mates and then performs re-sorting. I am not sure if STAR and RseQC will take such mapping as properly paired alignment or not.

                          We will let featureCounts report details of the read that is reported to be not properly paired. This should shed more light on what went on.

                          Comment


                          • #14
                            We have released a patched version of Subread package (1.4.6-p3). The featureCounts program in this version will report the details of the first read pair that was found not properly paired.

                            We also added a new argument "--donotsort" to featureCounts to allow users to turn off the read sorting procedure. However, care must be taken for using this argument because the read counting result might be misleading if read pairs were not properly sorted.

                            Comment


                            • #15
                              Dear Shi and Alex,

                              Thank you for taking the time to confer directly with the users! I really appreciate you taking the time to look further into this.

                              Have a nice day!
                              Leon

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              23 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              24 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              21 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              52 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X