Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sam to Bam issue after bowtie2

    Dear all,
    I'm trying to generate a BAM files from the SAM output of Bowtie2.
    However I obtain the following error message:

    HTML Code:
    expected '@XY', got [@HWI-ST610:134:C089FACXX:2:2308:274
    5:149689 1:N:0:TGACCA%0ATAAAGATTAGGCAAGTTTGTGCTAATTTATAACTGTTTTTGAATGGAGGTGTAACA
    ATACAAATACAAGTTTTGTGATAGATTATC%0A+%0AHHBFHFHHHIIHBCCA@F<CCGGE?D?BF@FDAEDF>GBGHBB
    BCCHCH5@.@8=D@CCEHHE>CHD?DCDBCAA;A((5;@;A>@%0A]
    Hint: The header tags must be tab-separated.
    [samopen] no @SQ lines in the header.
    [sam_read1] missing header? Abort!
    My script is the following:

    HTML Code:
    /usr/local/bin/samtools view -bS /home/jeremie/pdacidLT/reference/8.1_r1.sam > /home/jeremie/pdacidLT/reference/8.1_r1.bam
    And my SAM file look-like good :


    HTML Code:
    @HD	VN:1.0	SO:unsorted
    @SQ	SN:Locus_1685_Transcript_1/2_Confidence_1.000_Length_7457_transcripts_v2_1|spectrin	LN:7457
    [...the list of all my contigs ~70000]
    @PG	ID:bowtie2	PN:bowtie2	VN:2.0.5
    @HWI-ST610:134:C089FACXX:2:2208:17424:162295 1:N:0:ACAGTG%0AGCGAGGGGACACATCGAAACATAATCCTGGCTTGATCTTCTGCGGGAAGAGGATGGAGACATTTTGATGGCAA
    CGAATTCACGAAT%0A+%0AJJJJJJIJIIJJJJJJJJJJIIGFHJGEIGGJIJGIIGIHHHFHFDBB?=AB<?C?AACBCACDED?:@>CCDBB@8??CA>ABBB%0A
    @HWI-ST610:134:C089FACXX:2:2208:17424:162295 2:N:0:ACAGTG%0AGCGGGCTAGGTTTGTACAGCAGTCCAAATCGTCGTCTCCCAGGCAGCTGAGAACCACAGAAAGTGCGTTGCCA
    AACAAACCAGACA%0A+%0AGGHJGIJJJJ8@FHHJJJGIIHGHHHEFFFFDDCDB?CDDCDDDDDDDDDDCDDB@BBDDDDAACDD?@DDDDDDDD?BB@ABDDD%0A
    HWI-ST610:134:C089FACXX:2:2208:17409:162308	83	transcripts_v2_656|kruppel-like	1108	255	86M	=	1074	-120TTACCAAGTATGGTATTGCCAGTGTCAACTGTGAGCACAGCAACATTACCCACTGGTACACAATCAGTACATTCATGATTATGTAA	CA;@CCECDFFDFHHHHHIHGGIHCIJJJJJJJJIIIJJIJJIIG
    GHHIJIHGGHHEIIGIJJIJIIGHJIGIJJJJJJIJJJJJJ	AS:i:0	XN:i:0	XM:i:0	XO:i:0	XG:i:0	NM:i:0	MD:Z:86	YS:i:0	YT:Z:CP
    @HWI-ST610:134:C089FACXX:2:2208:17409:162308 1:N:0:ACAGTG%0ATTACATAATCATGAATGTACTGATTGTGTACCAGTGGGTAATGTTGCTGTGCTCACAGTTGACACTGGCAATA
    CCATACTTGGTAA%0A+%0AJJJJJJIJJJJJJIGIJHGIIJIJJIGIIEHHGGHIJIHHGGIIJJIJJIIIJJJJJJJJICHIGGHIHHHHHFDFFDCECC@;AC%0A
    HWI-ST610:134:C089FACXX:2:2208:17409:162308	163	transcripts_v2_656|kruppel-like	1074	255	86M	=	1108	120	CACACAGCAGCCTATGCAAATTACAAGTAACTCTTTACCAAGTATGGTATTGCCAGTGTCAACTGTGAGCACAGCAACATTACCCA	JJJJJJJIJJJJJJJJJJJJJJIJJIJJJJJJJJJJJJJIJJFHI
    JJFFHIIHIJJGHIJIGJHHEHFHFFFFFDEEECD@CDDDD	AS:i:0	XN:i:0	XM:i:0	XO:i:0	XG:i:0	NM:i:0	MD:Z:86	YS:i:0	YT:Z:CP
    @HWI-ST610:134:C089FACXX:2:2208:17409:162308 2:N:0:ACAGTG%0ACACACAGCAGCCTATGCAAATTACAAGTAACTCTTTACCAAGTATGGTATTGCCAGTGTCAACTGTGAGCACA
    GCAACATTACCCA%0A+%0AJJJJJJJIJJJJJJJJJJJJJJIJJIJJJJJJJJJJJJJIJJFHIJJFFHIIHIJJGHIJIGJHHEHFHFFFFFDEEECD@CDDDD%0A
    I wonder if the problem come from the @ that is located just before the beginning of my reads ID.

    Does anyone know how I can remove this @ from my sam file or have another idea?

    Jeremie

  • #2
    Hi Jeremie,

    It seems that you have right. A way to fix it is to remove the @ in front of each HWI:...

    1. extract your header :
    Code:
    head -n 3 file.sam > header.sam
    2. remove all @
    Code:
    for i in file.sam ; do sed 's/^@//g' $i > $i.txt ; done
    3. Now your .sam file look like file.sam.txt. Delete the header (he is corrupt now):
    Code:
    for i in file.sam.txt ; do sed '1,3d' $i > yournewfile.sam ; done
    4. Add the original header
    Code:
    cat header.sam yournewfile.sam > your_final_file.sam
    Be carefull: i expect that your header is 3 lines long. If it's not, you have to modify the code line 1 and 3 by yourself.

    At the end you can remove all temporary file (file.sam.txt and yournewfile.sam), if all goes well

    I hope this helped you,

    Rémi

    Comment


    • #3
      Bonjour Rémi ;-),
      thanks a lot!
      I will try this ASAP.

      Jeremie

      Comment


      • #4
        Assuming that all your reads start with "HWI" you could simply do

        Code:
        sed 's/^@HWI/HWI/g' yourSamFile.sam > correctedSamFile.sam
        or (if not all start with HWI) you could first grep the header (here assuming it only contains HD, SQ and PG tags) and than add the corrected rest

        Code:
        grep -E '^@HD|^@SQ|^@PG' yourSamFile.sam > correctedSamFile.sam
        grep -Ev '^@HD|^@SQ|^@PG' yourSamFile.sam | sed 's/^@//g'  >> correctedSamFile.sam
        Last edited by WhatsOEver; 08-05-2014, 06:40 AM.

        Comment


        • #5
          Yes, this is the good way... I d'ont know why i forget it

          why make it simple when you can make it complicated?

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          30 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          32 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          28 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          53 views
          0 likes
          Last Post seqadmin  
          Working...
          X