Seqanswers Leaderboard Ad

**Richard Finney** · 05-07-2014, 01:54 PM

What does "samtools flagstat" tell you?
Can you view your mapped reads in SAM format using "samtools view"?

Can you dump a random couple of dozen mapped alignments? Do these look right?

**jmwhitha** · 05-08-2014, 08:25 AM

Thanks for your quick response Richard! I was already home when I saw it though. Here's what I could get.

##"samtool flagstat" ##
samtool flagstat file.bam

11959974 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
8460790 + 0 mapped (70.74%:-nan%)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (-nan%:-nan%)
0 + 0 with itself and mate mapped
0 + 0 singletons (-nan%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

# To get only the mapped reads use the parameter 'F', which works like -v of grep.
samtool view -b -F 4 file_sorted.bam > file_sorted_mapped.bam

## Bam to Sam ###
samtool view -h -o out.file_sorted_mapped.sam file_sorted_mapped.bam

## samtools view ##

samtool view -c -S out.file_sorted_mapped.sam

[samopen] SAM header is present: 4184 sequences.
mapped count (8460790) #Value matched to above matched value in flagstat.

## Sam to Bam ##
samtool view -bS out.file_sorted_mapped.sam > out.file_sorted_mapped.bam
[samopen] SAM header is present: 4184 sequences.

# Find SNPs using SAMtools
samtool mpileup -d8000 -uf sequence_genome.fasta out.CLJUaligned_0112-02_sorted_mapped.bam | /root/Tools/samtools-0.1.19/bcftools/bcftools view -vcg - > s.vcf

[bam_header_read] EOF marker is absent. The input is probably truncated.
[mpileup] 1 samples in 1 input files
[bcfview] 100000 sites processed.
[afs] 0:6507.765 1:2715.281 2:90776.954
[bcfview] 200000 sites processed.
[afs] 0:8265.636 1:2648.100 2:89086.264
[bcfview] 300000 sites processed.
[afs] 0:10133.539 1:2403.082 2:87463.378
[afs] 0:289.935 1:75.121 2:2426.943

# Opened vcf file -- same result as before i.e. "N" position in referene file.

**swbarnes2** · 05-08-2014, 09:19 AM

The N's in mpileup tell me that there is likely a mismatch between what you said the reference was, and what you actually aligned to. So start by double-checking that the reference names in your .bam really match the ones in the reference. For instance, maybe there are weird characters, or spaces, or something that are being handled differently between the aligner and samtools.

Once that checks out, I'd remake the reference index with samtools fiadx. Make sure it completes with no errors.

For the one that is all header, you see that you have some filtering steps in your command lines. Why don't you start by taking those away. Maybe they are filtering away all your SNPs.

Also, I think with all your bam to sam and sam to bam, you might have lost your headers. You need -h to make sure those get converted too.

**jmwhitha** · 05-12-2014, 11:14 AM

Thank you for the suggestions swbarnes2. I agree that the N's make it seem like there is something wrong with the reference. My reference has a single header line and each subsequent line has 70 nucleotides. In the .fai file there is one line, which looks similar to the header of the fasta reference file. The last part of this line, which I discovered provides information about the "offset within the FASTA file of this sequence's first base", "the number of bases on each line", and "the number of bytes in each line, including the newline" (http://manpages.ubuntu.com/manpages/...5/faidx.5.html), has the numbers 80, 70, and 71, which seem to be correct since the header is 78 letters/ numbers long (plus two bytes for the header) and every subsequent line is 70 letters (plus one byte for the line).

I used "samtools view out.file_sorted_mapped.bam" to view the bam file. Just looking at the first few lines, I noticed that the lines (aka the header, quality, sequence lines) were wrapping. Not sure if this would cause a problem.

I also tried this command (samtools tview out.file_sorted_mapped.bam sequence_genome.fasta). I found that my second line was N's and my third line was my reference sequence. So the same sort of problem seen here (http://seqanswers.com/forums/showthread.php?t=5360). However, none of their solutions seemed correct.

I remade the index file with samtools faidx as you recommended and it came back with no errors. When I opened it, it looked the same as my orginial .fai file (described above).

I also tried using the "-h" (header option) in the samtools mpileup command as you recommended, and this got rid of the " EOF marker is absent. The input is probably truncated" comment. However, there were still no SNPs in the .vcf file.

When I take away the varfilter, the file becomes giant, and if I use -D10, there are no SNPs found.

So, I'm not sure how tview and the mpileup command are related, but I think that the N's in the tview are the same N's in the mpileup output.

So, how can we get mpileup to use the third line (the reference sequence) of the tview rather than the second line (N's)?

Thanks,
Jason

**jmwhitha** · 05-12-2014, 11:47 AM

I also just saw that the reference in tview stops at around 1350 bases. What does that mean?

**Lv Ray** · 10-14-2014, 06:03 PM

Hi, jmwhitha. Have you solved your problem? I got the same problem with so many N's on ref in my vcf.file. If you have solved it, Hope to hear from you.
Thank you.

**jmwhitha** · 10-15-2014, 04:16 AM

I would still like to know how, but I ended up using Tablet to confirm the mutations that were found when I did genome sequencing.

God bless,
Jason

Topics	Statistics	Last Post
TIGR Systems Offer a Compact Alternative to CRISPR for Gene Editing by seqadmin Started by seqadmin, 03-03-2025, 01:15 PM	0 responses 161 views 0 likes	Last Post by seqadmin 03-03-2025, 01:15 PM
Highlights from AGBT 2025 – Part II by seqadmin Started by seqadmin, 02-28-2025, 12:58 PM	0 responses 248 views 0 likes	Last Post by seqadmin 02-28-2025, 12:58 PM
Highlights from AGBT 2025 – Part I by seqadmin Started by seqadmin, 02-24-2025, 02:48 PM	0 responses 623 views 0 likes	Last Post by seqadmin 02-24-2025, 02:48 PM
Selecting the Right AI Model for Bioinformatics Research by seqadmin Started by seqadmin, 02-21-2025, 02:46 PM	0 responses 265 views 0 likes	Last Post by seqadmin 02-21-2025, 02:46 PM

Seqanswers Leaderboard Ad

Announcement

Calling SNPs from RNAseq bam files

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News