Seqanswers Leaderboard Ad

**mastal** · 07-12-2013, 01:34 AM

samtools converting sam>bam

The command should be

Code:

$ samtools view -b -S file.sam > file.bam

Then you will need to use

Code:

$ samtools sort file.bam file-sorted

followed by

Code:

$ samtools index  file-sorted.bam

in order to get an indexed file.

If you just type

Code:

$ samtools

or samtools followed by the name of one of the samtools commands, you will get a few lines of help giving the correct syntax for that command,

**Kienan** · 07-12-2013, 02:16 AM

Still not working

Thank you so much for your help. I tried this but it comes up with the error:

-bash: samtools: command not found

I think I might have done something wrong when I installed samtools...... (I really am a bit out of my depth here). I downloaded the samtools-0.1.19.tar file, expanded it and managed to compile it using the 'make' command (I had to download xcode for mac before this would work..).

My impression was I would be able to open terminal. Change directory to the directory with my .sam files in it (using cd command), and then use the code:

$ samtools view -b -S file.sam > file.bam

(with the word 'file' replaced with the actual name of my file...)

However, when I do this I just get the error

-bash: samtools: command not found

SO instead what I did was put my .sam file into the samtools-0.1.19 folder then went to terminal opened that folder using the cd xxxx/xxx/xxx/samtools-0.1.19

then just typed $ view -b -S file.sam > file.bam (again replacing the word 'file' with the actual name of my file...)

This is where I get the error:

Vim: Warning: Output is not to a terminal

It seems I can run other commands from within the samtools-0.1.19 folder, for example I can just type "view" and as suggested by you, I get lines of info about that command....

Honestly not sue what I've done wrong......

**mastal** · 07-12-2013, 02:53 AM

When you type $view, you are getting a different program called VIM, not samtools.

If you get
-bash: samtools: command not found

that means that samtools is not in your PATH (the directories where the computer looks for command files).
Add the path to the samtools files to the PATH variable, which should be in a file in your home directory called something like .profile or .bashrc or .bash_profile.

Or you can just specify the complete path to the samtools files when running samtools.

More info on running samtools can be found on the webpages:

samtools(1) manual page

http://samtools.sourceforge.net/samtools.shtml

Samtools

**maubp** · 07-12-2013, 06:46 AM

Originally posted by Kienan View Post

All I want to do for now is be able to view a file in IGV... So far I've used a perl script to convert the .sop to a .sam. I tried opening the .sam in IGV but it said it needs to be indexed and wouldn't open.

One alternative tool to IGV is Tablet, which can actually load and display SAM files directly. This is very handy for small projects like bacteria, but is too slow for large SAM files. For large datasets you should convert SAM to a sorted and indexed BAM file, which is what Tablet most assembly/mapping viewers expect or recommend.

Tablet : Information & Computational Sciences

http://bioinf.hutton.ac.uk/tablet/

**Heisman** · 07-12-2013, 07:19 AM

To add to what others are saying, if you know the path to where you put samtools on the computer, you can do this:

/home/user/pathToSamtools/samtools view -uS <input.sam> | samtools sort - <output>

That will read in the .sam file (-S tells it that it is a .sam file), generate an uncompressed .bam file (that's what -u does), and the pipe command ("|") then sends that output directly into samtools sort. Even though you only type "output", it will actually output "output.bam"

Then do:

/home/user/pathToSamtools/samtools index output.bam

And you'll get the index file as well.

**Kienan** · 07-14-2013, 02:36 AM

Another Problem

Hi Guys,

Thank you all so much for your help and putting up with my complete naivety, with all of this.

I finally got this to work (though have to type the path for samtools every time as I cannot for the life of me figure out where the executable path is on my mac...) and now I am getting a new error:

[samopen] no @SQ lines in the header.
[sam_read1] missing header? Abort!

From what I've been reading this suggests that there is something wrong with the .sam files I have generated, though not sure what to do about this as I generated them from aligned soap files (provided by BGI from a ChIP-seq experiment) using the per script soap2sam.pl.

Heisman, I also tried your suggestion but I get a different error when using your suggested command (samtools view -uS <input.sam> | samtools sort - <output>)

The error I get is:

-bash: syntax error near unexpected token `|'

**Heisman** · 07-14-2013, 06:14 AM

I haven't worked on Macs; perhaps piping doesn't work?

To convert the .sam file to a .bam file, there probably needs to be proper headers in the .sam file. Do you know the exact reference sequence the data was aligned too?

The header lines can look like this (tab-delimited), if the alignment was done to hg19 and only to chromosomes 1-22 and X/Y:

Code:

@HD     VN:1.4  GO:none SO:coordinate
@SQ     SN:chr1 LN:249250621
@SQ     SN:chr2 LN:243199373
@SQ     SN:chr3 LN:198022430
@SQ     SN:chr4 LN:191154276
@SQ     SN:chr5 LN:180915260
@SQ     SN:chr6 LN:171115067
@SQ     SN:chr7 LN:159138663
@SQ     SN:chr8 LN:146364022
@SQ     SN:chr9 LN:141213431
@SQ     SN:chr10        LN:135534747
@SQ     SN:chr11        LN:135006516
@SQ     SN:chr12        LN:133851895
@SQ     SN:chr13        LN:115169878
@SQ     SN:chr14        LN:107349540
@SQ     SN:chr15        LN:102531392
@SQ     SN:chr16        LN:90354753
@SQ     SN:chr17        LN:81195210
@SQ     SN:chr18        LN:78077248
@SQ     SN:chr19        LN:59128983
@SQ     SN:chr20        LN:63025520
@SQ     SN:chr21        LN:48129895
@SQ     SN:chr22        LN:51304566
@SQ     SN:chrX LN:155270560
@SQ     SN:chrY LN:59373566

You can generate this yourself using PicardTools if you have the reference sequence on your computer; it's explained here: http://gatkforums.broadinstitute.org...e-as-reference

Then you can take the part that looks like what I copy and pasted above and attach it to the head of your sam files (using "cat header.lines sam.file > full_sam.file").

If you don't know the reference sequence then you can take your sam file and do "cut -f3 sam.file | sort | uniq > temp.1" to see what chromosomes were used for aligning (or if you can't pipe do "cut -f3 sam.file > temp.1" followed by "sort temp.1 > temp.2" and then "uniq temp.2 > temp.3")

That still won't tell you if it's hg18 or hg19 that aligning was done to (or other versions if there are a bunch of "weird" chromosome names in the sam file).

So if you can figure out the exact reference sequence used for aligning (or get a copy of it), that would be best; if not we can do some trial and error to figure it out adequately.

**Kienan** · 07-15-2013, 06:32 AM

Adding Header to.sam files

Hi Heisman,

Thanks for your help. It was definitely aligned to hg19. I've managed to open the sam files I've generated using BamSeq (just something I found on the net) and it appears as though they have no header lines as you've suggested but I have no idea how to add them to the sam files I have........

**Heisman** · 07-15-2013, 07:05 AM

Like I said in the previous post, if you have a copy of the reference sequence on your computer and you have PicardTools installed, you can do what was explained in the link in the previous post to the gatkforums. Then you will want to concatenate the header lines to your sam file: "cat header_lines sam.file > reformatted.sam"

If you don't have a copy of the reference sequence, you can download it from the UCSC website (except it's down right now; otherwise I'd provide the link).

If you do "cut -f3 sam.file | sort | uniq > temp.1" like I said previously you can see all of the chromosomes that were used for alignment. You can do that and report back; and figure out if you have a copy of the reference sequence.

**Kienan** · 07-15-2013, 07:23 AM

slowly making progress

OK I've run the "cut -f3 sam.file | sort | uniq > temp.1" command (I'd done it before but kept typing samtools before hand as for some reason I had it in my head that it was a samtools command and couldn't figure out why it wouldn't work).

Can I just check when you write "sam.file" in this command do you mean I should replace this with "filename.sam"?

This is what I did and I've generated the type.1 file but now of course have no idea how to open it/look at it (you really are dealing with a complete newby here......., sorry).

**Heisman** · 07-15-2013, 07:27 AM

Ah, didn't realize you were that new. You should ASAP google for a linux tutorial and go through it. There are quite a few online and any should work.

"less type.1" will let you look through a file. Then hit "q" to quit.

"cat type.1" will output the file to the screen. Not necessary here though; you can use less and then copy and paste the output here if you'd like (just highlight everything and ctr+c ctr+v).

**Kienan** · 07-15-2013, 07:37 AM

Type.1 contents

Sorry, I'll do the tutorial ASAP, Sorry for taking up so much screen space....

chr1
chr10
chr11
chr11_gl000202_random
chr12
chr13
chr14
chr15
chr16
chr17
chr17_ctg5_hap1
chr17_gl000203_random
chr17_gl000204_random
chr17_gl000205_random
chr17_gl000206_random
chr18
chr18_gl000207_random
chr19
chr19_gl000208_random
chr19_gl000209_random
chr1_gl000191_random
chr1_gl000192_random
chr2
chr20
chr21
chr21_gl000210_random
chr22
chr3
chr4
chr4_ctg9_hap1
chr4_gl000193_random
chr4_gl000194_random
chr5
chr6
chr6_apd_hap1
chr6_cox_hap2
chr6_dbb_hap3
chr6_mann_hap4
chr6_mcf_hap5
chr6_qbl_hap6
chr6_ssto_hap7
chr7
chr7_gl000195_random
chr8
chr8_gl000196_random
chr8_gl000197_random
chr9
chr9_gl000198_random
chr9_gl000199_random
chr9_gl000200_random
chr9_gl000201_random
chrM
chrUn_gl000211
chrUn_gl000212
chrUn_gl000213
chrUn_gl000214
chrUn_gl000215
chrUn_gl000216
chrUn_gl000217
chrUn_gl000218
chrUn_gl000219
chrUn_gl000220
chrUn_gl000221
chrUn_gl000222
chrUn_gl000223
chrUn_gl000224
chrUn_gl000225
chrUn_gl000226
chrUn_gl000227
chrUn_gl000228
chrUn_gl000229
chrUn_gl000230
chrUn_gl000231
chrUn_gl000232
chrUn_gl000233
chrUn_gl000234
chrUn_gl000235
chrUn_gl000236
chrUn_gl000237
chrUn_gl000238
chrUn_gl000239
chrUn_gl000240
chrUn_gl000241
chrUn_gl000242
chrUn_gl000243
chrUn_gl000244
chrUn_gl000245
chrUn_gl000246
chrUn_gl000247
chrUn_gl000248
chrUn_gl000249
chrX
chrY

**Heisman** · 07-15-2013, 07:46 AM

The link is still down but you are going to want to download a copy of the reference sequence from here: http://hgdownload.soe.ucsc.edu/downloads.html

Try to get on there every hour or so until it works and see if you can figure it out. If I can get on I'll write out more detailed instructions. In the meantime you can start with a tutorial if you'd like.

**Kienan** · 07-15-2013, 08:02 AM

Which reference file to download?

Thanks a million Heisman, will do the tutorial. BTW the link to the UCSC downloads page is working from my end but not sure which of the files below I need to download (I'm guessing hg19.2bit but not sure)......

chromAgp.tar.gz - Description of how the assembly was generated from
fragments, unpacking to one file per chromosome.

chromFa.tar.gz - The assembly sequence in one file per chromosome.
Repeats from RepeatMasker and Tandem Repeats Finder (with period
of 12 or less) are shown in lower case; non-repeating sequence is
shown in upper case.

chromFaMasked.tar.gz - The assembly sequence in one file per chromosome.
Repeats are masked by capital Ns; non-repeating sequence is shown in
upper case.

chromOut.tar.gz - RepeatMasker .out files (one file per chromosome).
RepeatMasker was run with the -s (sensitive) setting.
Using: Jan 29 2009 (open-3-2-7) version of RepeatMasker and
RELEASE 20090120 of library RepeatMaskerLib.embl

chromTrf.tar.gz - Tandem Repeats Finder locations, filtered to keep repeats
with period less than or equal to 12, and translated into UCSC's BED 5+
format (one file per chromosome).

hg19.2bit - contains the complete hg19 Human Genome
in the 2bit format. A utility program, twoBitToFa (available
from our src tree), can be used to extract .fa file(s) from
this file. See also:

Index of /admin/exe/linux.x86_64

http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/

est.fa.gz - Human ESTs in GenBank. This sequence data is updated once a
week via automatic GenBank updates.

md5sum.txt - checksums of files in this directory

mrna.fa.gz - Human mRNA from GenBank. This sequence data is updated
once a week via automatic GenBank updates.

refMrna.fa.gz - RefSeq mRNA from the same species as the genome.
This sequence data is updated once a week via automatic GenBank
updates.

upstream1000.fa.gz - Sequences 1000 bases upstream of annotated
transcription starts of RefSeq genes with annotated 5' UTRs.
This file is updated weekly so it might be slightly out of sync with
the RefSeq data which is updated daily for most assemblies.

upstream2000.fa.gz - Same as upstream1000, but 2000 bases.

upstream5000.fa.gz - Same as upstream1000, but 5000 bases.

xenoMrna.fa.gz - GenBank mRNAs from species other than that of
the genome. This sequence data is updated once a week via automatic
GenBank updates.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 13 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

samtools converting sam>bam

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News