Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Get antisense mapped reads

    Hi,

    I'm looking for some workflow to obtain the reads which are mapped to the antisense features (gff) of a reference.
    I started using HTseq_count (which is a great tool), that takes my sam and gff files, and with the obtion -s no/yes I do obtain (no stranded minus stranded) the the number of antisense reads; but I need the reads themselves (sam or fasta). So, is there a way from a sam file (output from HTseq - no stranded reads) to take off a subset of reads (stranded sam output) and get the final sam file with the antisense pool? Maybe there are better ways to do this.

    Thanks

  • #2
    Have you tried just swapping the strands in the GFF file and then running htseq-count on it (with "-s yes"). If you need the actual reads then just use the "-o" option (possibly followed by grepping the output).

    Comment


    • #3
      Unfortunately, when I try this (get sam-output with option "-s yes" using two strand-swapped GFF files) the sam outputs are completely the same (but the output counts are different, so the GFF files are perfectly swapped). The sam file records all the input sam reads (sense and antisense) that are located within the GFF features. Any more ideas?
      Thanks

      Comment


      • #4
        In other words; is there a way to generate a fasta file from a sam/bam aligment considering if a read overlaps with a GFF feature. I mean, the output sam from HTSeq looks like this:

        Code:
        EAS1745:7:1:13167:1095/1	0	scaffold_665	35984	255	28M	*	0	0	NNNNNNNNNNNNNNNNN	IIIIIIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:28	NM:i:0	XF:Z:no_feature
        EAS1745:7:1:9468:1170/1	0	scaffold_313	108426	255	27M	*	0	0	NNNNNNNNNNNNNNNNNN	IIIIIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:27	NM:i:0	XF:Z:Feature#69
        So I'm only interested in taking the reads with a feature flag (and rule out the no_feature/not_aligned reads). thanks

        Comment


        • #5
          You can filter for "Feature" over "no_feature" and "not_aligned" with a simple grep. For generating a fasta file, I suppose that depends on if you want each entry in the multifasta file to be a read or a covered region. If you want an entry per read, you could probably just pipe the output of your grep command to awk or sed (or perl or something else if you prefer). Otherwise, you'd need to do a pileup and process that (it's likely someone has already written a script to do that if you do a bit of searching). You could also N-mask non-covered regions in a similar manner, if that's useful for you.

          Comment


          • #6
            Thanks for the advise.
            I'm fine with the reads in a fasta file. Though I'm not an expert in this thing, this seems to work:
            | awk '{OFS="\t"; if($15 != "XF:Z:no_feature") print ">"$1"\n"$10}' - > file.fasta

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            25 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            27 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            24 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            52 views
            0 likes
            Last Post seqadmin  
            Working...
            X