Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    the dexseq_count.py is just yelling at me "claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)"! I sorted the bam files using "samtools sort" and then converted it to sam file using samtools view but Is still have these messages!

    Comment


    • #32
      Have you checke your free space. Out of quota maybe? Or maybe 'sort' puts its temporary files to /tmp (see option -T), an at least on our big server, this one is on a small partition which is always full. Also, use '-S 100G' to tell 'sort' that you have lots of memory. BTW, why '-s -c'? We don't need a stable sort.

      Comment


      • #33
        'samtools sort' sort by position. Use 'samtools sort -n' to sort by name.

        Comment


        • #34
          I have the same question as below.

          Can anyone answer those questions?

          very appreciate

          Originally posted by glados View Post
          Dear all.

          I'm trying to find information about how HTSeq counts reads. I understood that one pair (properly paired) is counted as 1 count.
          What about pairs that are not flagged as 'properly paired'?
          What about the reads that lost their mate and became single reads?
          Are they counted as 1 count as well? Or not counted at all?

          Additionally I'm loosing quite many reads that have multiple mappings. Anyone figured out a way to deal with this in HTSeq, instead of just throwing them all out?

          Comment


          • #35
            A read that lost its mate will be counted once and a warning will be produced if the unmapped mate isn't actually in the file (tophat does this). I don't recall htseq-count caring about the properly paired flag.

            Regarding multimappers, you don't know with certainty where they align, so the proper solution for downstream analyses toward which htseq-count is oriented would be to discard them.
            Last edited by dpryan; 09-09-2013, 08:11 AM.

            Comment


            • #36
              This task is a related one to the bamtofastq conversion in that collation by name is necessary, but not necessarily a full sort.

              Collation is often fast, while sorting is very slow. I don't know if there are dedicated collation tools out there (but I'd be suprised if there aren't).

              Comment


              • #37
                Originally posted by jkbonfield View Post
                This task is a related one to the bamtofastq conversion in that collation by name is necessary, but not necessarily a full sort.

                Collation is often fast, while sorting is very slow. I don't know if there are dedicated collation tools out there (but I'd be suprised if there aren't).
                Try bamcollate2 in biobambam (https://github.com/gt1/biobambam).

                Comment


                • #38
                  Dear All,

                  Why not sort them BAM files instead of the SAM? SAM takes to much space. After running tophat2, I am thinking something like this:

                  $samtools sort -n accepted_hits.bam output.bam

                  then count using htseq-count.

                  Thoughts????

                  Comment


                  • #39
                    There's no need to name sort anymore, htseq-count can handle coordinate sorted BAM files.

                    Comment


                    • #40
                      Originally posted by Gonza View Post
                      Dear All,

                      Why not sort them BAM files instead of the SAM? SAM takes to much space. After running tophat2, I am thinking something like this:

                      $samtools sort -n accepted_hits.bam output.bam

                      then count using htseq-count.

                      Thoughts????
                      You will of course get output.bam.bam. Oh how I hate thee Samtools!

                      Comment


                      • #41
                        Dear all,

                        Originally posted by dpryan View Post
                        There's no need to name sort anymore, htseq-count can handle coordinate sorted BAM files.
                        Looking at htseq version 0.6.0, the help doc mentioned

                        Code:
                         -r ORDER, --order=ORDER
                                                'pos' or 'name'. Sorting order of <alignment_file>
                                                (default: name). Paired-end sequencing data must be
                                                sorted either by [B]position[/B] or by read name, and the
                                                sorting order must be specified. Ignored for single-
                                                end data.
                        Please forgive this very naive question, does position sorted bam = coordinated sorted bam? Im guessing its yes, but a conformation would be reassuring.

                        Many thanks

                        Comment


                        • #42
                          Yes, position sorted is another name for coordinate sorted.

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Advancing Precision Medicine for Rare Diseases in Children
                            by seqadmin




                            Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                            12-16-2024, 07:57 AM
                          • seqadmin
                            Recent Advances in Sequencing Technologies
                            by seqadmin



                            Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                            Long-Read Sequencing
                            Long-read sequencing has seen remarkable advancements,...
                            12-02-2024, 01:49 PM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 12-17-2024, 10:28 AM
                          0 responses
                          33 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 12-13-2024, 08:24 AM
                          0 responses
                          49 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 12-12-2024, 07:41 AM
                          0 responses
                          34 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 12-11-2024, 07:45 AM
                          0 responses
                          46 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X