Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Confused about samtools view

    This might be a silly question, but I'm confused about the output of samtools view. When I execute:

    samtools view 1_BirA_mm10.bam | head

    The output is this:

    HWI-M01495:10:000000000-A5BGC:1:2107:12480:27069 337 1 30503670150M chrY 20418997 0 TCCTTCTCCTTCTCCTTCTCCTTCTCCTTCTCCTTCTCCTTCTCCTTCTCCTTCTCCTTCTCCTTCTCCTTCTCCTTCTCCTTCTCCTTCTCCTTCTCCTTCTCCTTCTCCTTCTCCTTCTCCTTCTCCTTCTTCTTCTTCTTCTTCTTC HAGC/CHGFFHGHFHGCAGA0G/CG0FG/HAHHGCHFHGGHHFC?GFHGGFGEEAHHFCGGGCGCHFHHHEHGFFHFHFHHG1FHHGCHGFFFFHHHG3HHGAFHHEEGGCFADHGHG2HGAHFDGCABGCF4BA4FGFBFFFFCAAA@A AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:150YT:Z:UU NH:i:20 CC:Z:= CP:i:174010380 HI:i:3

    As I understand it, this is not the Sam format. The Sam format should look like it does here:



    Since you can use samtools view to convert from bam to sam, I'm trying to understand why the output is not in the sam format? Can someone explain the obvious thing I am apparently missing?

  • #2
    This is indeed a strange output. It looks like all fields are there but in the wrong order and with strange values. How did you produce this bam file / what mapper did you use? what version of samtools do you have?

    Comment


    • #3
      I used Tophat2, invoked like this:

      tophat --num-threads 12 --library-type fr-unstranded -o ./tophat_trimmed_mm10/"${prefix%_*}" /home/kat/Data/Reference_sequences/Mus_musculus/UCSC/mm10/Sequence/Bowtie2Index/genome ./BIP_post_trim/Paired/Forward/"${b}" ./BIP_post_trim/Paired/Reverse/"${b2}"

      Samtools version: 0.1.19-96b5f2294a

      Comment


      • #4
        And there were no error printed at any point?
        Have you build the bowtie2 index on your own (from the data structure, I would assume its an iGenomes download?!) - maybe it helps to re do that.
        Do actually all entries look like that?
        Does the sample dataset from tophat work for you?

        Comment


        • #5
          I can't see any errors in the terminal output. The terminal output is identical for all my biological replicates.

          Yes, the reference file is from iGenomes. I didn't think it was necessary to build your own bowtie2 index if your reference is something as common as mouse.

          Yes, all entries in the file look like that.

          I ran the tophat test data quite some time ago. Let me see if I still have the output somewhere. Otherwise, I'll do it again.

          4_Csde1_R1_paired.fq.gz and 4_Csde1_R2_paired.fq.gz match!

          [2014-06-27 02:33:13] Beginning TopHat run (v2.0.9)
          -----------------------------------------------
          [2014-06-27 02:33:13] Checking for Bowtie
          Bowtie version: 2.1.0.0
          [2014-06-27 02:33:13] Checking for Samtools
          Samtools version: 0.1.19.0
          [2014-06-27 02:33:13] Checking for Bowtie index files (genome)..
          [2014-06-27 02:33:13] Checking for reference FASTA file
          [2014-06-27 02:33:13] Generating SAM header for /home/Data/Reference_sequences/Mus_musculus/UCSC/mm10/Sequence/Bowtie2Index/genome
          format: fastq
          quality scale: phred33 (default)
          [2014-06-27 02:33:14] Preparing reads
          left reads: min. length=50, max. length=150, 2067444 kept reads (43 discarded)
          right reads: min. length=50, max. length=150, 2067265 kept reads (222 discarded)
          [2014-06-27 02:34:48] Mapping left_kept_reads to genome genome with Bowtie2
          [2014-06-27 02:40:24] Mapping left_kept_reads_seg1 to genome genome with Bowtie2 (1/6)
          [2014-06-27 02:41:11] Mapping left_kept_reads_seg2 to genome genome with Bowtie2 (2/6)
          [2014-06-27 02:42:03] Mapping left_kept_reads_seg3 to genome genome with Bowtie2 (3/6)
          [2014-06-27 02:42:48] Mapping left_kept_reads_seg4 to genome genome with Bowtie2 (4/6)
          [2014-06-27 02:43:29] Mapping left_kept_reads_seg5 to genome genome with Bowtie2 (5/6)
          [2014-06-27 02:44:07] Mapping left_kept_reads_seg6 to genome genome with Bowtie2 (6/6)
          [2014-06-27 02:44:33] Mapping right_kept_reads to genome genome with Bowtie2
          [2014-06-27 02:50:00] Mapping right_kept_reads_seg1 to genome genome with Bowtie2 (1/6)
          [2014-06-27 02:50:45] Mapping right_kept_reads_seg2 to genome genome with Bowtie2 (2/6)
          [2014-06-27 02:51:42] Mapping right_kept_reads_seg3 to genome genome with Bowtie2 (3/6)
          [2014-06-27 02:52:31] Mapping right_kept_reads_seg4 to genome genome with Bowtie2 (4/6)
          [2014-06-27 02:53:13] Mapping right_kept_reads_seg5 to genome genome with Bowtie2 (5/6)
          [2014-06-27 02:53:49] Mapping right_kept_reads_seg6 to genome genome with Bowtie2 (6/6)
          [2014-06-27 02:54:12] Searching for junctions via segment mapping
          [2014-06-27 02:57:15] Retrieving sequences for splices
          [2014-06-27 02:58:33] Indexing splices
          [2014-06-27 02:58:48] Mapping left_kept_reads_seg1 to genome segment_juncs with Bowtie2 (1/6)
          [2014-06-27 02:59:03] Mapping left_kept_reads_seg2 to genome segment_juncs with Bowtie2 (2/6)
          [2014-06-27 02:59:20] Mapping left_kept_reads_seg3 to genome segment_juncs with Bowtie2 (3/6)
          [2014-06-27 02:59:36] Mapping left_kept_reads_seg4 to genome segment_juncs with Bowtie2 (4/6)
          [2014-06-27 02:59:52] Mapping left_kept_reads_seg5 to genome segment_juncs with Bowtie2 (5/6)
          [2014-06-27 03:00:08] Mapping left_kept_reads_seg6 to genome segment_juncs with Bowtie2 (6/6)
          [2014-06-27 03:00:22] Joining segment hits
          [2014-06-27 03:02:20] Mapping right_kept_reads_seg1 to genome segment_juncs with Bowtie2 (1/6)
          [2014-06-27 03:02:36] Mapping right_kept_reads_seg2 to genome segment_juncs with Bowtie2 (2/6)
          [2014-06-27 03:02:52] Mapping right_kept_reads_seg3 to genome segment_juncs with Bowtie2 (3/6)
          [2014-06-27 03:03:08] Mapping right_kept_reads_seg4 to genome segment_juncs with Bowtie2 (4/6)
          [2014-06-27 03:03:23] Mapping right_kept_reads_seg5 to genome segment_juncs with Bowtie2 (5/6)
          [2014-06-27 03:03:38] Mapping right_kept_reads_seg6 to genome segment_juncs with Bowtie2 (6/6)
          [2014-06-27 03:03:52] Joining segment hits
          [2014-06-27 03:05:52] Reporting output tracks
          -----------------------------------------------
          [2014-06-27 02:33:13] A summary of the alignment counts can be found in ./tophat_trimmed_mm10/4_Csde1/align_summary.txt
          [2014-06-27 02:33:13] Run complete: 00:49:30 elapsed
          Last edited by Rivalyn; 06-27-2014, 08:08 AM.

          Comment


          • #6
            I found the output from the tophat test data. All of the reads look like this:

            test_mRNA_3_187_51 99 test_chromosome 53 50 75M = 163 185 TACTATTTGACTAGACTGGAGGCGCTTGCGACTGAGCTAGGACGTGCCACTACGGGGATGACGACTCGGACTACG IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:-12 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:6C59A8 YT:Z:UU NH:i:1

            Comment


            • #7
              Well, the test data and your run-log look perfectly fine...

              However, you're versions of tophat and bowtie are quite outdated - the current releases are 2.0.12 and 2.2.3. It doesn't make much sense to me how this could have resulted in an output like you showed, but it's worth trying to update them...

              I always build the indices myself as only then you can be sure that no strange version issues occur.

              Btw: Have you tried a different mapper?

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              30 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              32 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              28 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              53 views
              0 likes
              Last Post seqadmin  
              Working...
              X