Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Locating individual reads in alignment

    I have used 100 nt paired-end sequences to construct a reduced representation reference genome of the organism I am working with. I aligned the reads back to the reference genome. I hope to find SNPs at some point. I have a list of individual reads (with the paired read) which I would like to inspect in the alignment. Is there a way to find out what position in the reference genome these reads are aligned to? I can visualize the aligned reads in IGV and there I can zoom in to a position to inspect a region. But I cannot search for a particular read - I need to know the map position of the read first. Is there a programme of script that could extract the position (and maybe other infrmation) of an individual read from a sam/bam file?
    Last edited by Tectona; 01-16-2012, 03:33 AM. Reason: typo

  • #2
    Presumably you have the SAM file that was output from the aligner. You can look for the location of a read in it using grep: grep -m 1 -w SomeReadName.123455 Aligned.sam

    That'll be easy enough provided you only have a few reads you want to look at,

    Comment


    • #3
      Tablet lets you search for reads by name, the regular expression support is also very handy: http://bioinf.hutton.ac.uk/tablet/

      Comment


      • #4
        Originally posted by dpryan View Post
        Presumably you have the SAM file that was output from the aligner. You can look for the location of a read in it using grep: grep -m 1 -w SomeReadName.123455 Aligned.sam

        That'll be easy enough provided you only have a few reads you want to look at,

        Thanks for the suggestion. However, when I use the command grep -m 1 Sequence_read_tag alignment_file.sam > output.txt

        I get the following information:
        FCB020AACXX:6:1305:20474:84915#ATGAACCT 163 369552-8 1 60 100M = 61 160 CTTGCAAAGGAAAATCTTGAGATGAACGAGGGCGACATTAGCAAGGAGGCCATCGGAGGCACCGACGGTACCACCGTCGATGGAGAGGATGCGAACCCAT bbbeeeeeggggfiiiiiiiihgifhihffhiiiiihiihiihfghfhiihggggeeecccccccccc]acccccc_acacccccccccccccccccccc XT:A:U NM:i:0 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:100

        I recognize the name of my sequence, the sequence itself, the quality information... but where do I find the positional information?

        Comment


        • #5
          Originally posted by maubp View Post
          Tablet lets you search for reads by name, the regular expression support is also very handy: http://bioinf.hutton.ac.uk/tablet/
          Thanks for the suggestion. Tablet looks like a nice programme. And indeed, it lets you search for the name of an individual sequence. However, is there a way to display more than a single set of sequence reads? As far as I have been able to explore the programme, I do not see a way to import more than one set of reads aligned to a reference at a time... I would like to compare two or more sets of reads aligned to the same reference sequence.

          Comment


          • #6
            Originally posted by Tectona View Post
            Thanks for the suggestion. However, when I use the command grep -m 1 Sequence_read_tag alignment_file.sam > output.txt

            I get the following information:
            FCB020AACXX:6:1305:20474:84915#ATGAACCT 163 369552-8 1 60 100M = 61 160 CTTGCAAAGGAAAATCTTGAGATGAACGAGGGCGACATTAGCAAGGAGGCCATCGGAGGCACCGACGGTACCACCGTCGATGGAGAGGATGCGAACCCAT bbbeeeeeggggfiiiiiiiihgifhihffhiiiiihiihiihfghfhiihggggeeecccccccccc]acccccc_acacccccccccccccccccccc XT:A:U NM:i:0 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:100

            I recognize the name of my sequence, the sequence itself, the quality information... but where do I find the positional information?
            It's the 4th field, so position 1 in this case. If you just want the contig and position displayed then you can use something like the following:
            Code:
            grep -m 1 Sequence_read_tag alignment_file.sam | awk '{ print $3":"$4 }' > output.txt
            The output will then be "contig: position" (without the space, for some reason the forum turns colon p into an emoticon).

            Comment


            • #7
              OK, thanks, now I got it figured out...
              Last edited by Tectona; 01-17-2012, 01:41 AM. Reason: Too early reaction

              Comment


              • #8
                That's position according to the reference contig it's aligned against. You may want to browse the SAM specification. The read you showed maps to the start of a contig.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Advancing Precision Medicine for Rare Diseases in Children
                  by seqadmin




                  Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                  12-16-2024, 07:57 AM
                • seqadmin
                  Recent Advances in Sequencing Technologies
                  by seqadmin



                  Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                  Long-Read Sequencing
                  Long-read sequencing has seen remarkable advancements,...
                  12-02-2024, 01:49 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 12-17-2024, 10:28 AM
                0 responses
                33 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 12-13-2024, 08:24 AM
                0 responses
                49 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 12-12-2024, 07:41 AM
                0 responses
                34 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 12-11-2024, 07:45 AM
                0 responses
                46 views
                0 likes
                Last Post seqadmin  
                Working...
                X