Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • STAR Output Sam Files

    Hello

    I am using STAR2.3.0e for alignments.
    Are the output sam files (Aligned.out..sam) already sorted and can be used directly with htseq-count?
    Or do I need to convert them into bam, sort the bam, and use the sorted bam for htseq-count?

    Any help is appreciated.

    Thanks.
    M

  • #2
    No, they aren't. You need to run something like

    Code:
    samtools view -bS yourStarOutput.sam | samtools sort - yourSortedStarOutput

    Comment


    • #3
      Hi

      Thanks for the reply.
      However, if I run:

      Code:
      samtools view -bS sample1.sam | samtools sort - sample1_sorted
      I get the error message:
      Code:
      [bam_header_read] EOF marker is absent. The input is probably truncated.
      [samopen] SAM header is present: 1870 sequences.
      Although the conversion is working.

      On the other hand, if I just convert the sam file to bam file using:
      Code:
      samtools view -bS sample1.sam > sample1
      And then sort it using:
      Code:
      samtools sort sample1.bam sample1_sorted.bam
      Then everything appears to work fine, and there is no error message.
      I have checked my sam files and they have headers, and seem to be fine.

      Is this a bug with samtools or am I missing something?

      Thanks.
      M

      Comment


      • #4
        The End Of File marker just tells samtools that the file ends at this point. It can be missing or unrecognizable because of different reasons (eg. moving files between linux and windows or it was deleted). If you know that your file isnt't truncated, you can ignore this warning (Its not an error! ). The conversion is not influenced by that.

        Comment


        • #5
          But then why I don't get the error if I run the commands separately?

          Comment


          • #6
            I think that the sort command throws the warning and that you don't get it by splitting the command is because you write to a new file using view. This way you get a file with a proper EOF marker set at the end.
            You could try to set an EOF manually in your original file (Ctrl+D under Linux, no idea under Windows) and see if that prevents the warning.

            Comment


            • #7
              See #11 in this post for Peter's solution: http://seqanswers.com/forums/showthread.php?t=19228

              Comment


              • #8
                Some versions of samtools incorrectly show the EOF message when you pipe a file into them (at least that's the case with the sort command). What's supposed to happen is that it keeps track of how it's reading the file and then doesn't check for the EOF if you use a pipe (since you can't seek to the end), but at some point that code got changed. I think this works correctly in the current (1.0) version.

                Comment


                • #9
                  Originally posted by mtiwaridros View Post
                  Hello

                  I am using STAR2.3.0e for alignments.
                  Are the output sam files (Aligned.out..sam) already sorted and can be used directly with htseq-count?
                  Or do I need to convert them into bam, sort the bam, and use the sorted bam for htseq-count?

                  Any help is appreciated.

                  Thanks.
                  M
                  STAR .sam files do not need to be sorted for ht-seq, you can simply convert them to .bam and feed the .bam to ht-seq.
                  I would strongly recommend switching to a newer version of STAR from: https://github.com/alexdobin/STAR/releases
                  Then you can get the BAM file, or - if you need - coordinate-sorted BAM directly from STAR:
                  --outSAMtype BAM Unsorted
                  OR
                  --outSAMtype BAM SortedByCoordinate
                  OR BOTH
                  --outSAMtype BAM Unsorted SortedByCoordinate

                  Cheers
                  Alex

                  Comment


                  • #10
                    Thank you all for the replies. The issue is clear now.

                    Best
                    M

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM
                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    27 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    30 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    26 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-04-2024, 09:00 AM
                    0 responses
                    52 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X