SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to run script (csh file) in Glimmer Votinhkiem90 Bioinformatics 10 03-26-2018 09:55 PM
Sam to Bam using bowtie and using the shell script AnushaC Bioinformatics 7 11-01-2013 05:17 PM
Issue with Sam-Bam conversion samtools - how to remove last line of Sam file? TabeaK Bioinformatics 3 11-19-2012 11:05 AM
script help to format gff file Kennels Bioinformatics 4 06-14-2012 12:00 AM
HTSeq Script from DEXSeq Reports Assertion Fail in SAM file FuzzyCoder Bioinformatics 5 09-27-2011 09:52 AM

Reply
 
Thread Tools
Old 11-21-2013, 10:35 AM   #1
Sergio.pv
Member
 
Location: Berlin

Join Date: Jul 2013
Posts: 20
Default script suggestion (SAM file)

Hi everyone!
In this attached example, i would like to know how many times the element
"bta-miR-191" and "bta-miR-10" in the header of a SAM file is repeated in the rest of
the document (in this case: 6 and 5 respectively).

Could you give me an idea for an script based on this example?

Thanks
Attached Files
File Type: txt example.txt (1.2 KB, 13 views)
Sergio.pv is offline   Reply With Quote
Old 11-21-2013, 11:00 AM   #2
shoegame2001
Member
 
Location: California

Join Date: Dec 2010
Posts: 21
Default

grep -v '@SQ' example.txt | awk '{if ($3=="bta-miR-191") print}' | wc -l


grep -v '@SQ' example.txt | awk '{if ($3=="bta-miR-10") print}' | wc -l
shoegame2001 is offline   Reply With Quote
Old 11-22-2013, 12:12 AM   #3
dariober
Senior Member
 
Location: Cambridge, UK

Join Date: May 2010
Posts: 311
Default

Hello- To expand a bit shoegame2001's answer... If your input file is big you might want to read thorough it only once and count the two (or more) patterns at the same time:

Code:
grep -v '@SQ' example.txt \
| awk '{if ($3=="bta-miR-191")
            mir191+=1;
        else if ($3 == "bta-miR-10")
            mir10+=1}
        END {print "bta-miR-191:" mir191, "\nbta-miR-10:", mir10}'
Dario
dariober is offline   Reply With Quote
Old 11-22-2013, 02:28 AM   #4
Sergio.pv
Member
 
Location: Berlin

Join Date: Jul 2013
Posts: 20
Default

thanks, the grep function was very usufull to remove the headers.
After doing that, i did a count ot he repited elements on column 3.
cat myfile-noheaders.out | awk '{print$3}' | sort | uniq -c | sort -rnk1 >SAM-counts.out

I hope this can be helpfull for somebody else!
Sergio.pv is offline   Reply With Quote
Old 11-22-2013, 02:41 AM   #5
Sergio.pv
Member
 
Location: Berlin

Join Date: Jul 2013
Posts: 20
Default

if it is usefull, here the complete script to remove the headers ('@') and then count repeated terms in column 3 of the SAM file:

grep -v '@' myfile.out | awk '{print$3}' | sort | uniq -c | sort -rnk1 >SAM-counts.out

Cheeers
Sergio.pv is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:07 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO