Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    the dexseq_count.py is just yelling at me "claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)"! I sorted the bam files using "samtools sort" and then converted it to sam file using samtools view but Is still have these messages!

    Comment


    • #32
      Have you checke your free space. Out of quota maybe? Or maybe 'sort' puts its temporary files to /tmp (see option -T), an at least on our big server, this one is on a small partition which is always full. Also, use '-S 100G' to tell 'sort' that you have lots of memory. BTW, why '-s -c'? We don't need a stable sort.

      Comment


      • #33
        'samtools sort' sort by position. Use 'samtools sort -n' to sort by name.

        Comment


        • #34
          I have the same question as below.

          Can anyone answer those questions?

          very appreciate

          Originally posted by glados View Post
          Dear all.

          I'm trying to find information about how HTSeq counts reads. I understood that one pair (properly paired) is counted as 1 count.
          What about pairs that are not flagged as 'properly paired'?
          What about the reads that lost their mate and became single reads?
          Are they counted as 1 count as well? Or not counted at all?

          Additionally I'm loosing quite many reads that have multiple mappings. Anyone figured out a way to deal with this in HTSeq, instead of just throwing them all out?

          Comment


          • #35
            A read that lost its mate will be counted once and a warning will be produced if the unmapped mate isn't actually in the file (tophat does this). I don't recall htseq-count caring about the properly paired flag.

            Regarding multimappers, you don't know with certainty where they align, so the proper solution for downstream analyses toward which htseq-count is oriented would be to discard them.
            Last edited by dpryan; 09-09-2013, 08:11 AM.

            Comment


            • #36
              This task is a related one to the bamtofastq conversion in that collation by name is necessary, but not necessarily a full sort.

              Collation is often fast, while sorting is very slow. I don't know if there are dedicated collation tools out there (but I'd be suprised if there aren't).

              Comment


              • #37
                Originally posted by jkbonfield View Post
                This task is a related one to the bamtofastq conversion in that collation by name is necessary, but not necessarily a full sort.

                Collation is often fast, while sorting is very slow. I don't know if there are dedicated collation tools out there (but I'd be suprised if there aren't).
                Try bamcollate2 in biobambam (https://github.com/gt1/biobambam).

                Comment


                • #38
                  Dear All,

                  Why not sort them BAM files instead of the SAM? SAM takes to much space. After running tophat2, I am thinking something like this:

                  $samtools sort -n accepted_hits.bam output.bam

                  then count using htseq-count.

                  Thoughts????

                  Comment


                  • #39
                    There's no need to name sort anymore, htseq-count can handle coordinate sorted BAM files.

                    Comment


                    • #40
                      Originally posted by Gonza View Post
                      Dear All,

                      Why not sort them BAM files instead of the SAM? SAM takes to much space. After running tophat2, I am thinking something like this:

                      $samtools sort -n accepted_hits.bam output.bam

                      then count using htseq-count.

                      Thoughts????
                      You will of course get output.bam.bam. Oh how I hate thee Samtools!

                      Comment


                      • #41
                        Dear all,

                        Originally posted by dpryan View Post
                        There's no need to name sort anymore, htseq-count can handle coordinate sorted BAM files.
                        Looking at htseq version 0.6.0, the help doc mentioned

                        Code:
                         -r ORDER, --order=ORDER
                                                'pos' or 'name'. Sorting order of <alignment_file>
                                                (default: name). Paired-end sequencing data must be
                                                sorted either by [B]position[/B] or by read name, and the
                                                sorting order must be specified. Ignored for single-
                                                end data.
                        Please forgive this very naive question, does position sorted bam = coordinated sorted bam? Im guessing its yes, but a conformation would be reassuring.

                        Many thanks

                        Comment


                        • #42
                          Yes, position sorted is another name for coordinate sorted.

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Current Approaches to Protein Sequencing
                            by seqadmin


                            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                            04-04-2024, 04:25 PM
                          • seqadmin
                            Strategies for Sequencing Challenging Samples
                            by seqadmin


                            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                            03-22-2024, 06:39 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 04-11-2024, 12:08 PM
                          0 responses
                          30 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 10:19 PM
                          0 responses
                          32 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 09:21 AM
                          0 responses
                          28 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-04-2024, 09:00 AM
                          0 responses
                          53 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X