Seqanswers Leaderboard Ad

**nilshomer** · 06-07-2010, 07:15 PM

Originally posted by gen2prot View Post

Hello nilshomer,

I downloaded picard. I have the .jar files on MAC osx 10.6. Yet these jar files won't open. I have them saved on the Desktop. How do I run it?

Thanks
Abhijit

I am assuming you have familiarity with the Terminal and a Unix-based environment. If this is wrong, you need to become familiar with these environments (search this site for recommended books and tutorials). I cannot teach you how to use the Terminal and such basic questions.

Use the command for the respective jar:

Code:

java -jar SortSam.jar

**gen2prot** · 06-07-2010, 07:22 PM

Thank you.

**gen2prot** · 06-07-2010, 08:06 PM

Hello,

I gave the following command using the Picard tool Sortsam

java -jar SortSam.jar I=../test/testsam.sam O=../test/sortedtest.sam SO=queryname

My Input file looks like this:

@HD VN:1.0 SO:sorted
@PG ID:TopHat VN:1.0.13 CL:/share/apps/bin/tophat -o ./s1 --solexa1.3-quals -p 2 GeneIndex /home/asanyal/data/Flydata/Exp_100423/100423_HWI-EAS313_0001_61G2CAAXX.birchlerj/s_1_sequence.txt
HWI-EAS313_0001:1:80:8942:6680#0 0 FBgn0000003 1 3 42M * 0 0 CGGACTGGAAGGTTGGCAGCTTCTGTAATCACGCTTCTGTGA GGGFGGGGGGGFFEGGGGGGGFGGGGGGDDGFGGGGGGGGFE NM:i:2
HWI-EAS313_0001:1:108:8254:11808#0 0 FBgn0000003 9 3 42M * 0 0 AAGGTTGGCAGCTTCTGTAATCACGCTTCTGTGAGGTCTGAT C::?>ACCCCD?EDEB=EEEEEECEE?:E??@C@CEBED=4? NM:i:0

However, I get the following error message.

Exception in thread "main" net.sf.samtools.SAMFormatException: Error parsing text SAM file. Empty sequence dictionary.; Line 3
Line: HWI-EAS313_0001:1:80:8942:6680#0 0 FBgn0000003 1 3 42M * 0 0 CGGACTGGAAGGTTGGCAGCTTCTGTAATCACGCTTCTGTGA GGGFGGGGGGGFFEGGGGGGGFGGGGGGDDGFGGGGGGGGFE NM:i:2

Do I have to give the program the reference sequences? Or do I need to create a sequence dictionary using CreateSequenceDictionary

Thanks
Abhijit

**nilshomer** · 06-07-2010, 09:08 PM

Originally posted by gen2prot View Post

Hello,

I gave the following command using the Picard tool Sortsam

java -jar SortSam.jar I=../test/testsam.sam O=../test/sortedtest.sam SO=queryname

My Input file looks like this:

@HD VN:1.0 SO:sorted
@PG ID:TopHat VN:1.0.13 CL:/share/apps/bin/tophat -o ./s1 --solexa1.3-quals -p 2 GeneIndex /home/asanyal/data/Flydata/Exp_100423/100423_HWI-EAS313_0001_61G2CAAXX.birchlerj/s_1_sequence.txt
HWI-EAS313_0001:1:80:8942:6680#0 0 FBgn0000003 1 3 42M * 0 0 CGGACTGGAAGGTTGGCAGCTTCTGTAATCACGCTTCTGTGA GGGFGGGGGGGFFEGGGGGGGFGGGGGGDDGFGGGGGGGGFE NM:i:2
HWI-EAS313_0001:1:108:8254:11808#0 0 FBgn0000003 9 3 42M * 0 0 AAGGTTGGCAGCTTCTGTAATCACGCTTCTGTGAGGTCTGAT C::?>ACCCCD?EDEB=EEEEEECEE?:E??@C@CEBED=4? NM:i:0

However, I get the following error message.

Exception in thread "main" net.sf.samtools.SAMFormatException: Error parsing text SAM file. Empty sequence dictionary.; Line 3
Line: HWI-EAS313_0001:1:80:8942:6680#0 0 FBgn0000003 1 3 42M * 0 0 CGGACTGGAAGGTTGGCAGCTTCTGTAATCACGCTTCTGTGA GGGFGGGGGGGFFEGGGGGGGFGGGGGGDDGFGGGGGGGGFE NM:i:2

Do I have to give the program the reference sequences? Or do I need to create a sequence dictionary using CreateSequenceDictionary

Thanks
Abhijit

There are no "SQ" fields in your SAM file. You could try giving it the reference sequence if they are not present.

**gen2prot** · 06-08-2010, 08:26 AM

Hello nilshomer,

I specified the SQ field but I still get an error, probably because the reference sequences are drosophila gene sequences. Therefore the names of the reference sequences are different unless 2 or more reads match to the same gene. I cannot convert the names to a single reference sequence name, since I will loose information. I am stuck. Maybe I need to do the traditional perl sort (very time consuming for a 6GB file). Any better way of doing this?

Thanks
Abhijit

**gen2prot** · 06-24-2010, 11:25 AM

Hello,

I was wondering if there was a score associated with each read in the SAM file, that would give an indication on the strength of the match between the read and the subject sequence. The CIGAR string helps to some extent, but since "M" denotes match or mismatch, I was wondering if there was a way to differentiate between the two. Sort of like an E-value or a blast score.

Abhijit

**JohnK** · 06-24-2010, 12:47 PM

Hi guys,

Had a quick question regarding the SAM- CIGAR column. I understand the M attribute designates both matches and mismatches. Is there a way to get at the literal number of mismatches without resorting to comparing the tags or sequence to the reference using the SAM format? Sorry if this question is a re-post. I tried searching, but couldn't find anything.

**nilshomer** · 06-24-2010, 01:17 PM

Originally posted by gen2prot View Post

Hello,

I was wondering if there was a score associated with each read in the SAM file, that would give an indication on the strength of the match between the read and the subject sequence. The CIGAR string helps to some extent, but since "M" denotes match or mismatch, I was wondering if there was a way to differentiate between the two. Sort of like an E-value or a blast score.

Abhijit

Probably a good idea to create a new thread (this one is getting long!).
See the mapping quality field.

Originally posted by JohnK View Post

Hi guys,

Had a quick question regarding the SAM- CIGAR column. I understand the M attribute designates both matches and mismatches. Is there a way to get at the literal number of mismatches without resorting to comparing the tags or sequence to the reference using the SAM format? Sorry if this question is a re-post. I tried searching, but couldn't find anything.

Probably a good idea to create a new thread (this one is getting long!).
Try the NM optional tag if it is available (aligner specific).

**win804** · 06-25-2010, 05:25 AM

Hi All,
Is "sorted" BAM file smaller in size compare to unsorted BAM file?
If that's the case, why is that so?

I sort a lot of BAM files using the samtools, with this command:
samtools sort chr1-aligned.bam chr1-aligned.sorted

file size of chr1-aligned.bam ==> 353,618,735 bytes
but the file size of chr1-aligned.sorted.bam ==> 295,208,534 bytes

I have checked for all my unsorted and sorted bam files. All of the sorted bam files are smaller in size compare to the unsorted ones.

Thanks.

**lh3** · 06-25-2010, 06:03 AM

Sorted files are compressed better.

**maubp** · 06-25-2010, 07:33 AM

Originally posted by win804 View Post

Hi All,
Is "sorted" BAM file smaller in size compare to unsorted BAM file?
If that's the case, why is that so?...

You already asked this on a separate thread

SEQanswers

http://seqanswers.com/forums/showthread.php?t=5684

**win804** · 06-25-2010, 08:40 PM

Thanks Li Heng. I just want to confirm that nothing is wrong with the sorted bam file.

Thanks a lot.

**win804** · 06-25-2010, 08:43 PM

Originally posted by maubp View Post

You already asked this on a separate thread
http://seqanswers.com/forums/showthread.php?t=5684

Yes, I wanted to delete the previous thread before, however, I have no idea of how to do it. Any idea?

Thanks.

**glacierbird** · 06-29-2010, 01:13 AM

Originally posted by lh3 View Post

@corthay

You can convert with "blast2sam.pl -s" to save the sequence in SAM. Currently, samtools cannot parse SAM without sequence, although the specification allows this.

Hi Li,
I parse .sam with sequence, but samtools view still gave such error msg. Could you please take a look of the post:

samtools view error: CIGAR and sequence length are inconsistent (tophat/bowtie) - SEQanswers

http://seqanswers.com/forums/showthread.php?p=21003#post21003

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

Thanks.

**gsjlucky** · 06-30-2010, 08:29 PM

maq2sam-long

Originally posted by lh3 View Post

maq2sam-short is for the .map files generated by maq-0.6.x, while maq2sam-long for files generated by maq-0.7.x. Sorry for the confusion, and one of the aims of SAM is to avoid such confusions in future.

maq2sam <in.map> [<readGroup>], I want know how to use the option parameter 'readGroup',can it add library info from map to sam?

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News