Seqanswers Leaderboard Ad

**upendra_35** · 04-16-2011, 09:15 AM

Originally posted by Huijuan View Post

Hi,I'm a beginner in RNA-Seq analysis,now I've mapped the short reads with Bowtie,TopHat and other softwares,I'm trying to analyze the mapping efficiency by counting the uniquely mapped reads,spanning junction reads and unmappable reads in different mapping results.But how can I get the information about which one is uniquely mapped,which one is mutiple reads from the SAM format file I get with the softwares?
Thank you for your attention.

First you have to separate the Aligned and unaligned reads in the SAM file. For this you can use Picard:
For Aligned reads :
java -jar /usr/local/bin/ViewSam.jar INPUT=your_mapped_reads.sam ALIGNMENT_STATUS=Aligned PF_STATUS=All > your_Aligned_reads.sam
For Unaligned reads:
java -jar /usr/local/bin/ViewSam.jar INPUT=your_mapped_reads.sam ALIGNMENT_STATUS=Unaligned PF_STATUS=All > your_Unaligned_reads.sam

You can then count the rows in each file that don't start with @ with:
grep -cv "^@" your_Unaligned_reads.sam
Should give you the number of unaligned reads.

For the uniquely mapped reads: there should have this tag in the sam file: XT:A:U, so to count them:
grep -c "XT:A:U" your_Aligned_reads.sam
The reads that map to more than one position should be have "XT:A:R", so:
grep -c "XT:A:R" your_Aligned_reads.sam

Hope this helps.....

**Huijuan** · 04-16-2011, 05:11 PM

Originally posted by upendra_35 View Post

First you have to separate the Aligned and unaligned reads in the SAM file. For this you can use Picard:
For Aligned reads :
java -jar /usr/local/bin/ViewSam.jar INPUT=your_mapped_reads.sam ALIGNMENT_STATUS=Aligned PF_STATUS=All > your_Aligned_reads.sam
For Unaligned reads:
java -jar /usr/local/bin/ViewSam.jar INPUT=your_mapped_reads.sam ALIGNMENT_STATUS=Unaligned PF_STATUS=All > your_Unaligned_reads.sam

You can then count the rows in each file that don't start with @ with:
grep -cv "^@" your_Unaligned_reads.sam
Should give you the number of unaligned reads.

For the uniquely mapped reads: there should have this tag in the sam file: XT:A:U, so to count them:
grep -c "XT:A:U" your_Aligned_reads.sam
The reads that map to more than one position should be have "XT:A:R", so:
grep -c "XT:A:R" your_Aligned_reads.sam

Hope this helps.....

Many thanks for helping.I did as what you said and got what I want.Thank you again
Huijuan

**Huijuan** · 04-19-2011, 07:47 PM

Here's the problem
I use the way you told me with the mapping results with BWA. And the total number of unaligned reads, uniquely mapped reads and multireads is not equal to the total number of short reads and far less than it. What's the problem ?
Thank you again
Huijuan

**upendra_35** · 04-26-2011, 04:24 PM

Originally posted by Huijuan View Post

Here's the problem
I use the way you told me with the mapping results with BWA. And the total number of unaligned reads, uniquely mapped reads and multireads is not equal to the total number of short reads and far less than it. What's the problem ?
Thank you again
Huijuan

Do you get any error when you run the way i mentioned before? If yes then try running the modified way which is below. It should fix the problem. Good luck

#To separate Aligned and unaligned reads I am using Picard now:
#Aligned:
java -jar /usr/local/bin/ViewSam.jar VALIDATION_STRINGENCY=LENIENT INPUT=your_mapped_reads.sam ALIGNMENT_STATUS=Aligned PF_STATUS=All > your_Aligned_reads.sam
#Unaligned:
java -jar /usr/local/bin/ViewSam.jar VALIDATION_STRINGENCY=LENIENT INPUT=your_mapped_reads.sam ALIGNMENT_STATUS=Unaligned PF_STATUS=All > your_Unaligned_reads.sam

#You can then count the rows in each file that don't start with @ should give you the number of unaligned reads.:
grep -cv "^@" your_Unaligned_reads.sam

#For the uniquely mapped reads: they should have this tag in the sam file: XT:A:U, so to count them:
grep -c "XT:A:U" your_Aligned_reads.sam
#The reads that map to more than one position should be have "XT:A:R", so:
grep -c "XT:A:R" your_Aligned_reads.sam

**Huijuan** · 04-26-2011, 04:41 PM

Originally posted by upendra_35 View Post

Do you get any error when you run the way i mentioned before? If yes then try running the modified way which is below. It should fix the problem. Good luck

#To separate Aligned and unaligned reads I am using Picard now:
#Aligned:
java -jar /usr/local/bin/ViewSam.jar VALIDATION_STRINGENCY=LENIENT INPUT=your_mapped_reads.sam ALIGNMENT_STATUS=Aligned PF_STATUS=All > your_Aligned_reads.sam
#Unaligned:
java -jar /usr/local/bin/ViewSam.jar VALIDATION_STRINGENCY=LENIENT INPUT=your_mapped_reads.sam ALIGNMENT_STATUS=Unaligned PF_STATUS=All > your_Unaligned_reads.sam

#You can then count the rows in each file that don't start with @ should give you the number of unaligned reads.:
grep -cv "^@" your_Unaligned_reads.sam

#For the uniquely mapped reads: they should have this tag in the sam file: XT:A:U, so to count them:
grep -c "XT:A:U" your_Aligned_reads.sam
#The reads that map to more than one position should be have "XT:A:R", so:
grep -c "XT:A:R" your_Aligned_reads.sam

I‘ve figured out the reason.Thanks so much for helping.

**upendra_35** · 04-26-2011, 04:46 PM

Originally posted by Huijuan View Post

I‘ve figure out the reason.Thanks so much for helping.

Can you share with me how you fixed the problem? Thanks

**Huijuan** · 04-26-2011, 04:57 PM

This time I didn't use viewSam.jar and just used awk and grep to get them apart and then did counting

**brpet** · 04-28-2011, 03:44 AM

#For the uniquely mapped reads: they should have this tag in the sam file: XT:A:U, so to count them:
grep -c "XT:A:U" your_Aligned_reads.sam
#The reads that map to more than one position should be have "XT:A:R", so:
grep -c "XT:A:R" your_Aligned_reads.sam

Both of these grep's return 0 for me.

Are you doing any other steps between tophat and changing the file from bam to sam?

**ttnguyen** · 04-28-2011, 04:20 AM

I used awk and grep for this task - just need to base on the 'read name' and CIGAR in sam file.

**upendra_35** · 04-28-2011, 08:43 AM

Originally posted by brpet View Post

Both of these grep's return 0 for me.

Are you doing any other steps between tophat and changing the file from bam to sam?

Yes i do the following to convert the bam to sam

samtools view accepted_hits.bam > accepted_hits.sam
samtools view -H accepted_hits.bam > header.txt
cat header.txt accepted_hits.sam > accepted_hits2.sam

But you can't use this sam file for counting the number of reads that are mapped to the reference or not as this sam is made from bam and bam file is supposed to contain only mapped reads.

I am still to figure out how to use tophat generated bam file for counting the number of reads that are mapped to the reference or not. I will let you know once i know it.

**Huijuan** · 04-28-2011, 10:31 PM

Originally posted by upendra_35 View Post

Yes i do the following to convert the bam to sam

samtools view accepted_hits.bam > accepted_hits.sam
samtools view -H accepted_hits.bam > header.txt
cat header.txt accepted_hits.sam > accepted_hits2.sam

But you can't use this sam file for counting the number of reads that are mapped to the reference or not as this sam is made from bam and bam file is supposed to contain only mapped reads.

I am still to figure out how to use tophat generated bam file for counting the number of reads that are mapped to the reference or not. I will let you know once i know it.

there‘s a method saying that you can keep the tmp files when you run Tophat by the parameter --keep-tmp then you will get both sam and bam files

**kmcarr** · 04-29-2011, 05:17 AM

Originally posted by upendra_35 View Post

Yes i do the following to convert the bam to sam

samtools view accepted_hits.bam > accepted_hits.sam
samtools view -H accepted_hits.bam > header.txt
cat header.txt accepted_hits.sam > accepted_hits2.sam

You can output the header with the alignments to your SAM file in one step, avoiding long cat job. The '-h' option for samtools view will output the header in addition to the alignments.

Code:

% samtools view -h accepted_hits.bam -o accepted_hits.sam

The -o option is used to give the output file name (or redirect stdout as shown in upendra's example).

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

analysis after mapping

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News