Hello everyone,
I have a SAM file resulting from a STAR alignment, containing reads aligned to a reference of smallRNA sequences (I created the reference, downloading the fasta sequences of interest and concatenated).
Now I would like to keep only the reads that are of the same length of the region in which they are mapped in the fasta file (I also have a GFF file with the coordinates of every region).
I tried setting STAR to keep only the reads completely mapped, and it works....but I would like a SAM or BAM file with just the reads that completely overlap the mapping region and are not longer or shorter than the region (I alredy removed the adapters).
The final goal is to count correctly reads overlapping the region of interest.
Now I used this tools: cutadapt, STAR and HTSeq-count.
Please could someone help me (maybe with some awk function, script, algorithm, STAR options...)?
Thank you in advance!!
Cristian
I have a SAM file resulting from a STAR alignment, containing reads aligned to a reference of smallRNA sequences (I created the reference, downloading the fasta sequences of interest and concatenated).
Now I would like to keep only the reads that are of the same length of the region in which they are mapped in the fasta file (I also have a GFF file with the coordinates of every region).
I tried setting STAR to keep only the reads completely mapped, and it works....but I would like a SAM or BAM file with just the reads that completely overlap the mapping region and are not longer or shorter than the region (I alredy removed the adapters).
The final goal is to count correctly reads overlapping the region of interest.
Now I used this tools: cutadapt, STAR and HTSeq-count.
Please could someone help me (maybe with some awk function, script, algorithm, STAR options...)?
Thank you in advance!!
Cristian