Seqanswers Leaderboard Ad

**dpryan** · 09-04-2012, 09:54 AM

Have you tried just swapping the strands in the GFF file and then running htseq-count on it (with "-s yes"). If you need the actual reads then just use the "-o" option (possibly followed by grepping the output).

**cascoamarillo** · 10-15-2012, 01:50 PM

Unfortunately, when I try this (get sam-output with option "-s yes" using two strand-swapped GFF files) the sam outputs are completely the same (but the output counts are different, so the GFF files are perfectly swapped). The sam file records all the input sam reads (sense and antisense) that are located within the GFF features. Any more ideas?
Thanks

**cascoamarillo** · 10-16-2012, 03:27 PM

In other words; is there a way to generate a fasta file from a sam/bam aligment considering if a read overlaps with a GFF feature. I mean, the output sam from HTSeq looks like this:

Code:

EAS1745:7:1:13167:1095/1	0	scaffold_665	35984	255	28M	*	0	0	NNNNNNNNNNNNNNNNN	IIIIIIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:28	NM:i:0	XF:Z:no_feature
EAS1745:7:1:9468:1170/1	0	scaffold_313	108426	255	27M	*	0	0	NNNNNNNNNNNNNNNNNN	IIIIIIIIIIIIIIIIIIIIIIIIIII	XA:i:0	MD:Z:27	NM:i:0	XF:Z:Feature#69

So I'm only interested in taking the reads with a feature flag (and rule out the no_feature/not_aligned reads). thanks

**dpryan** · 10-17-2012, 12:29 AM

You can filter for "Feature" over "no_feature" and "not_aligned" with a simple grep. For generating a fasta file, I suppose that depends on if you want each entry in the multifasta file to be a read or a covered region. If you want an entry per read, you could probably just pipe the output of your grep command to awk or sed (or perl or something else if you prefer). Otherwise, you'd need to do a pileup and process that (it's likely someone has already written a script to do that if you do a bit of searching). You could also N-mask non-covered regions in a similar manner, if that's useful for you.

**cascoamarillo** · 10-18-2012, 08:49 AM

Thanks for the advise.
I'm fine with the reads in a fasta file. Though I'm not an expert in this thing, this seems to work:
| awk '{OFS="\t"; if($15 != "XF:Z:no_feature") print ">"$1"\n"$10}' - > file.fasta

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 27 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Get antisense mapped reads

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News