Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extracting features above a certain coverage

    Hi,

    I have a bam file of all my accepted hits (output from tophat run on data from strand specific libraries/HiSeq) and an gtf file with my genes of interest for which I am trying to find potential antisense transcripts. I would like to create a list - preferably one that can be visualized in a genome browser - that shows all genes for which there are antisense reads in the accepted hits.bam file provided that there are more than a certain number of reads mapping to that gene. I can easily use bedtools intersect to get all genes that have antisense reads mapping to them, but this gives me way too many hits. I have also exported my resulting hits to excel to try to sort the file there (bioinformatics newbie that I am) but I can't get it back to a format that works for visualizing in a genome browser.

    Thank you.

  • #2
    Hi again,

    I have come up with a possible solution, but if anyone had some input to give I would very much appreciate it. If I did bedtools coverage to get the antisense coverage for each of my genes, and then sorted the resulting file on the coverage value? And then extracted everything in the first x lines? Could that work?

    I am unsure how to do the sorting and extracting. "sort" and "grep"? Any suggestions on the exact "phrasing"?

    Thank you!

    Comment


    • #3
      Once you have a coverage file, you can sort based on the result with "sort". the -n flag will sort numbers numerically (eg: 10<20<100 instead of 10<100<11<20) and the -kX flag will allow you to choose which key/column you wish to do the actual sort on. -k 4 would choose the 4th. -r would allow you to reverse the output to get the largest numbers first instead of last.

      I would suggest "head -X <filename>" to extract the first X lines, although i imagine grep could also do it relatively easily.

      Comment


      • #4
        This worked very well! Thank you!

        Comment


        • #5
          Originally posted by annavilborg View Post
          Hi,
          ... provided that there are more than a certain number of reads mapping to that gene...
          Thank you.
          Hi annavilborg,
          you can additionally pipe your sorted output to awk to get only those lines (genes) with read count above your threshold. Figure out which column (field) stores the read counts and then e.g. for field 4 and Threshold 100 apply something like:
          Code:
          bedtools coverage ... | awk '{if ($4>100) print $0}' | sort -nr -k 4
          $4 is the 4th field
          100 the read count threshold
          print $0 prints the content of the whole line
          sort -nr -k 4 will then numerically, reverse sort all these gene lines with readcount above threshold based on the corresponding column (4).

          Comment


          • #6
            Neat! Thank you!

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Advancing Precision Medicine for Rare Diseases in Children
              by seqadmin




              Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
              12-16-2024, 07:57 AM
            • seqadmin
              Recent Advances in Sequencing Technologies
              by seqadmin



              Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

              Long-Read Sequencing
              Long-read sequencing has seen remarkable advancements,...
              12-02-2024, 01:49 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 12-17-2024, 10:28 AM
            0 responses
            33 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 12-13-2024, 08:24 AM
            0 responses
            48 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 12-12-2024, 07:41 AM
            0 responses
            34 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 12-11-2024, 07:45 AM
            0 responses
            46 views
            0 likes
            Last Post seqadmin  
            Working...
            X