Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bowtie truncates ID line if it has spaces

    Dear all,
    Has anyone seen this before? I am using bowtie v0.12.7 to align reads from the short read archive which have IDs as follows:

    SRR064286.51418 HWI-EAS418:1:5:1357:1070 length=50

    In the resultant SAM file where bowtie finds a match, for some reason the ID is truncated to the first space:

    SRR064286.51418

    However when no match is found the ID is reported in full.

    This seems odd, so I would appreciate someone trying to replicate this for me. Below are a couple of reads and a very short sequence to use as a reference. The first read should match but the other should not. Can someone try and align these using bowtie and let me know what you get.

    Many thanks in advance.

    Reads: Save as test.fq
    @SRR064286.10 HWI-EAS418:1:4:1:147 length=50
    TGGCTTCTTCTGTCTTCATAAGTTTTTCCAGGCGGTCTTCCAAGTCCAAA
    +SRR064286.10 HWI-EAS418:1:4:1:147 length=50
    BCBCCCCCCCCA8::>:?:>8!/@:1&7>6@BCBA@CACCA6>!<BB<BA
    @SRR064286.11 HWI-EAS418:1:4:1:119 length=50
    GGTTGTAGGACAGCATTTCAAGAACTAAACAGAGATGGTTTCGGAACATA
    +SRR064286.11 HWI-EAS418:1:4:1:119 length=50
    BBABA@BAABB:3707::9</!.B>:76:8;B9BAAAB>BBC<!<BCBB?

    Ref: Save as ref.fa and run "bowtie-build ref.fa ref" to make a reference
    >testref
    ATTTCGATGCGAGCTTATTCGAGGCGTATCGTAGCGAGTGCTAGGGCTAT
    TGGCTTCTTCTGTCTTCATAAGTTTTTCCAGGCGGTCTTCCAAGTCCAAA
    GCGGATTGCTGATGCGAGCGTAGTCGTAGTGTGCGTATTGCGATTCGATG

    Run bowtie with "bowtie --sam ref test.fq test.sam" and check out the SAM file test.sam.

    Thanks for your help
    Rich

  • #2
    I posted this in another lengthy thread to which Xi Wang replied advising the use of the --fullref parameter. I think this only applies to the reference sequence not the read ID as this had no affect on my test data and I still get the read that matches having a truncated ID. If someone can confirm this is happening on their system then I would very much appreciate it.
    Regards,
    Rich

    Comment


    • #3
      Originally posted by rfrancis View Post
      Dear all,
      Has anyone seen this before? I am using bowtie v0.12.7 to align reads from the short read archive which have IDs as follows:

      SRR064286.51418 HWI-EAS418:1:5:1357:1070 length=50
      Most tools would say given that in the FASTA > line or FASTQ @ line that the identifier was just SRR064286.51418 and the rest is free form description text.

      In this case with SRA reads you might want to remove the SRR ID leaving the original Illumina ID of HWI-EAS418:1:4:1:147 on its own.

      Comment


      • #4
        Thanks maubp. I agree that anything after the first space is description I'm just concerned that bowtie is not consistently returning either the full ID or a truncated one. I'd rather not have to edit all my reads so I hope someone knows a solution to this. Unless it's a bug of course!
        Thanks again for your reply.
        Rich

        Comment


        • #5
          For anyone following this thread I've just submitted this as a bug on their sourceforge site (ID: 3496148). There's also a similar report there too so I know it's not just me having this problem!
          Hopefully they can fix this easily.
          Regards,
          Rich

          Comment


          • #6
            For bowtie version 0.12.7 I confirm what you see:

            1) For non-SAM output, the full ID of the mapped sequence is given.

            2) For SAM output, only a partial ID of the mapped sequence is given while the full ID is given for a non-mapped sequence.

            Don't know if it is a bug or not but it does seem like strange and unexpected behavior.

            Comment


            • #7
              bowtie version 0.11.3 does not have the problem. See output below:

              Code:
              @HD	VN:1.0	SO:unsorted
              @SQ	SN:testref	LN:150
              @PG	ID=Bowtie	VN=0.11.3	CL="bowtie --fullref --sam ref test.fq test.sam"
              SRR064286.10 HWI-EAS418:1:4:1:147 length=50	0	testref	51	255	50M	*	0 0TGGCTTCTTCTGTCTTCATAAGTTTTTCCAGGCGGTCTTCCAAGTCCAAA	BCBCCCCCCCCA8::>:?:>8!/@:1&7>6@BCBA@
              CACCA6>!<BB<BA	XA:i:0	MD:Z:50	NM:i:0
              SRR064286.11 HWI-EAS418:1:4:1:119 length=50	4	*	0	0	*	*	0	0GGTTGTAGGACAGCATTTCAAGAACTAAACAGAGATGGTTTCGGAACATA	BBABA@BAABB:3707::9</!.B>:76:8;B9BAA
              Compare the above to the output from
              bowtie version 0.12.7
              below:

              Code:
              @HD	VN:1.0	SO:unsorted
              @SQ	SN:testref	LN:150
              @PG	ID:Bowtie	VN:0.12.7	CL:"bowtie --fullref --sam ref test.fq test.sam"
              SRR064286.10	0	testref	51	255	50M	*	0	0	TGGCTTCTTCTGTCTTCATA
              AGTTTTTCCAGGCGGTCTTCCAAGTCCAAA	BCBCCCCCCCCA8::>:?:>8!/@:1&7>6@BCBA@CACCA6>!<BB<BA	XA:i:0	MD:Z
              :50	NM:i:0
              SRR064286.11 HWI-EAS418:1:4:1:119 length=50	4	*	0	0	*	*	0	0GGTTGTAGGACAGCATTTCAAGAACTAAACAGAGATGGTTTCGGAACATA	BBABA@BAABB:3707::9</!.B>:76:8;B9BAA

              Comment


              • #8
                Originally posted by westerman View Post
                For bowtie version 0.12.7 I confirm what you see:

                1) For non-SAM output, the full ID of the mapped sequence is given.

                2) For SAM output, only a partial ID of the mapped sequence is given while the full ID is given for a non-mapped sequence.

                Don't know if it is a bug or not but it does seem like strange and unexpected behavior.
                Don't know if it helps but I found that tab characters within the read IDs also truncate the non-SAM output while spaces don't elicit this behavior (the question is of course: why would someone put tabs into a read ID...?)

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                18 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                22 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                17 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                49 views
                0 likes
                Last Post seqadmin  
                Working...
                X