Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #46
    Hi Bruce,

    I would suggest you trying to count reads instead of fragments to see if you will still get a large number of reads overlapping with multiple genes. This will help determine if the large number of multi-overlapping fragments you observed were due to summarization.

    If you still get a large number, then that will mean it is either a mapping problem, or a problem with your data generation, or a lot of genes in your annotation overlapping with each other.

    You can simply remove those paired-end parameters, such as -p -P -D and -C, from your command to summarize reads instead of fragments.

    Best wishes,

    Wei

    Comment


    • #47
      Hi Wei,

      I had looked at these before, but am keen to keep the counts to fragments as I believe there is more accuracy in this method. I have run for the initial trimmo sam, and the one with all non-pairs removed:

      Code:
      ::::::::::::::
      featco.pair.read.diags
      ::::::::::::::
      35364121 ACCEPTED_GENE
      4944952 MULTI_MAPPING
      6848219 NOTFOUND_GENE
       338020 OVERLAPPED_GENES
      ::::::::::::::
      featco.read.diags
      ::::::::::::::
      35711285 ACCEPTED_GENE
      4944952 MULTI_MAPPING
      6902210 NOTFOUND_GENE
       340271 OVERLAPPED_GENES

      Comment


      • #48
        Hi Bruce,

        I agree it is better to count fragments for paired-end data. Looking at read counts is just to help diagnose what the problem was and it turned out this is quite helpful.

        The percentage of multi-overlapping reads is much smaller that of multi-overlapping fragments, suggesting that something went wrong when read pairs were being summarized. We have never seen this before for the summarization of mapping results from Subread and a few other aligners.

        One possibility is that in your mapping results the order of the two reads from the same pair was altered, ie the second read appeared before the first read. If this is the case, the read pair might be wrongly assigned. You may check a few multi-overlapping fragments to see if this is the case.

        Alternatively, you may try other aligners as well. Subread is guaranteed to work with featureCounts.

        Hope this helps.

        Best wishes,

        Wei

        Comment


        • #49
          Dear Wei,

          I have been trying to run featureCounts, both from R and command line, but I keep getting a segmentation fault. I checked the different things suggested on this discussion thread and also tried to allocate more memory to the job resp. R session, but it didn't help. I use subread 1.3.6 and here is my command line:

          featureCounts -p -P -d 50 -D 600 -a mm10/annotation/mm10.allmrna.gtf -t exon -g gene_id -b -f -i tophat_out/accepted_hits.sort.bam -o subread_counts.txt

          Here is the error message:

          /var/spool/gridengine/node-hp0211/job_scripts/1095023: line 10: 57569 Segmentation fault (core dumped)


          Have I overseen anything?

          Thanks in advance,
          Cho
          Last edited by choseqid; 10-02-2013, 06:36 AM.

          Comment


          • #50
            Dear Cho,

            Could you provide the complete output of your featureCounts run? It is hard to figure out what went wrong from the information you currently provided.

            Also could you provide the first 100 lines of your annotation file and also the first 100 reads in your BAM file?

            Cheers,
            Wei

            Comment


            • #51
              Dear Wei,

              Thanks for the quick reply. Attached are the files you ask for. I am including the R output, as I do not have any from command line other than the error message I already quoted.

              Cheers,
              CHo
              Attached Files
              Last edited by choseqid; 10-03-2013, 03:01 AM.

              Comment


              • #52
                "featureCounts requires that for paired-end read data both ends must be included in the SAM/BAM file and the two reads from the same pair must be next to each other."

                If this is not stated in the User Guide (I did not see it there) then it should be added as it is essential for correct functioning of the program.

                Comment


                • #53
                  Thanks, ddb, for the reminder. It looks like the problem lies in the way I aligned the reads with tophat: I allowed multiple hits (which apparently hampers the sorting by name) and didn't disable the separate alignment reporting for unpairable reads (ie. didn't use --no-mixed). Would fixing these two parameters help?

                  Comment


                  • #54
                    I'm still not sure if it is the issue with paired-end reads that caused the problem. You can try to change those parameters to see it will work. But you may also try to count your reads as single-end reads by NOT using the '-p' option. This will tell us if the problem arose from dealing with the paired-end reads. Your command should be like this:

                    featureCounts -a mm10/annotation/mm10.allmrna.gtf -t exon -g gene_id -b -f -i tophat_out/accepted_hits.sort.bam -o subread_counts.txt

                    Wei

                    Comment


                    • #55
                      Dear Wei,

                      I tried that command line, but it still drops a Segmentation fault. I also tried aligning my reads using Subreads (which succeeded), but when I ran featureCounts on the resulting SAM file I also got a Segmentation fault. The output is the same as I attached to a previous post.

                      Any more ideas?

                      Comment


                      • #56
                        Hi, quick question. I was wondering if there was a way to get featureCounts to work on a Windows 7 OS. Going through R and Bioconductor would be perfect, but it looks like Rsubreads does not have a Windows version? Is there any other way?

                        Comment


                        • #57
                          Originally posted by choseqid View Post
                          Dear Wei,

                          I tried that command line, but it still drops a Segmentation fault. I also tried aligning my reads using Subreads (which succeeded), but when I ran featureCounts on the resulting SAM file I also got a Segmentation fault. The output is the same as I attached to a previous post.

                          Any more ideas?
                          Hi,

                          Thank you for trying these options. We found featureCounts always works nicely with Subread. So the segment fault is likely to be due to some unexpected data in the annotation. We have also received some other bug reports similar to this recently. The 1.3.x version of featureCounts allows up to 60 features overlapping with each other in the annotation. If the number of such features exceeded this limit, we found the program crashed. Although this is rare but it may happen and we suspect this might be the reason causing the seg fault seen in your data.

                          We have removed this limit in the latest version 1.4.0 and hopefully this will solve the problem.

                          Also, if reads in your BAM file were sorted by chromosomal locations, you should include '-S' option in your command. Not doing so will not crash the program, but will result in incorrect read counts.

                          Let me know if the problem persists.

                          Wei

                          Comment


                          • #58
                            Originally posted by adaigle View Post
                            Hi, quick question. I was wondering if there was a way to get featureCounts to work on a Windows 7 OS. Going through R and Bioconductor would be perfect, but it looks like Rsubreads does not have a Windows version? Is there any other way?
                            You are correct that Rsubread does not have a Windows version. It is pretty hard to develop a Windows version for this package due to most of the code was written in C. I think we might eventually come up with a Windows version, but it will take a fair bit of time. If you have access to a unix machine, you can fairly easily use featureCounts via the Bioconductor package Rsubread.

                            Wei

                            Comment


                            • #59
                              Hi,
                              I would like to use featureCounts, but miss the stats provided by htseq-count (copied below) as these let me make sure I got the 'strand' setting right and other things.
                              Any chance you could add similar output to featureCounts (either as a separate 'stats.txt' file or as part of the main table)?

                              no_feature 20123817
                              ambiguous 9026940
                              too_low_aQual 0
                              not_aligned 0
                              alignment_not_unique 3034042

                              Thanks
                              -Ben

                              Comment


                              • #60
                                Ben, I had the same issue, so made a command to get this info. It requires you to make the 'reads' output using -R flag.

                                cut -f 2 <featco.counts.reads> | sort | uniq -c > <featco.counts.diags>

                                Output looks like:

                                154266 ACCEPTED_2VOTE_GENE
                                23169444 ACCEPTED_GENE
                                40066 MULTI_MAPPING
                                4470627 NOTFOUND_GENE
                                100013 OVERLAPPED_GENES
                                2850 PAIR_DISTANCE

                                Hope that helps, Bruce.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM
                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 06:37 PM
                                0 responses
                                11 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, Yesterday, 06:07 PM
                                0 responses
                                10 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-22-2024, 10:03 AM
                                0 responses
                                51 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-21-2024, 07:32 AM
                                0 responses
                                68 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X