Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by gen2prot View Post
    Hello nilshomer,

    I downloaded picard. I have the .jar files on MAC osx 10.6. Yet these jar files won't open. I have them saved on the Desktop. How do I run it?

    Thanks
    Abhijit
    I am assuming you have familiarity with the Terminal and a Unix-based environment. If this is wrong, you need to become familiar with these environments (search this site for recommended books and tutorials). I cannot teach you how to use the Terminal and such basic questions.

    Use the command for the respective jar:
    Code:
    java -jar SortSam.jar

    Comment


    • Thank you.

      Comment


      • Hello,

        I gave the following command using the Picard tool Sortsam

        java -jar SortSam.jar I=../test/testsam.sam O=../test/sortedtest.sam SO=queryname

        My Input file looks like this:

        @HD VN:1.0 SO:sorted
        @PG ID:TopHat VN:1.0.13 CL:/share/apps/bin/tophat -o ./s1 --solexa1.3-quals -p 2 GeneIndex /home/asanyal/data/Flydata/Exp_100423/100423_HWI-EAS313_0001_61G2CAAXX.birchlerj/s_1_sequence.txt
        HWI-EAS313_0001:1:80:8942:6680#0 0 FBgn0000003 1 3 42M * 0 0 CGGACTGGAAGGTTGGCAGCTTCTGTAATCACGCTTCTGTGA GGGFGGGGGGGFFEGGGGGGGFGGGGGGDDGFGGGGGGGGFE NM:i:2
        HWI-EAS313_0001:1:108:8254:11808#0 0 FBgn0000003 9 3 42M * 0 0 AAGGTTGGCAGCTTCTGTAATCACGCTTCTGTGAGGTCTGAT C::?>ACCCCD?EDEB=EEEEEECEE?:E??@C@CEBED=4? NM:i:0

        However, I get the following error message.

        Exception in thread "main" net.sf.samtools.SAMFormatException: Error parsing text SAM file. Empty sequence dictionary.; Line 3
        Line: HWI-EAS313_0001:1:80:8942:6680#0 0 FBgn0000003 1 3 42M * 0 0 CGGACTGGAAGGTTGGCAGCTTCTGTAATCACGCTTCTGTGA GGGFGGGGGGGFFEGGGGGGGFGGGGGGDDGFGGGGGGGGFE NM:i:2

        Do I have to give the program the reference sequences? Or do I need to create a sequence dictionary using CreateSequenceDictionary

        Thanks
        Abhijit

        Comment


        • Originally posted by gen2prot View Post
          Hello,

          I gave the following command using the Picard tool Sortsam

          java -jar SortSam.jar I=../test/testsam.sam O=../test/sortedtest.sam SO=queryname

          My Input file looks like this:

          @HD VN:1.0 SO:sorted
          @PG ID:TopHat VN:1.0.13 CL:/share/apps/bin/tophat -o ./s1 --solexa1.3-quals -p 2 GeneIndex /home/asanyal/data/Flydata/Exp_100423/100423_HWI-EAS313_0001_61G2CAAXX.birchlerj/s_1_sequence.txt
          HWI-EAS313_0001:1:80:8942:6680#0 0 FBgn0000003 1 3 42M * 0 0 CGGACTGGAAGGTTGGCAGCTTCTGTAATCACGCTTCTGTGA GGGFGGGGGGGFFEGGGGGGGFGGGGGGDDGFGGGGGGGGFE NM:i:2
          HWI-EAS313_0001:1:108:8254:11808#0 0 FBgn0000003 9 3 42M * 0 0 AAGGTTGGCAGCTTCTGTAATCACGCTTCTGTGAGGTCTGAT C::?>ACCCCD?EDEB=EEEEEECEE?:E??@C@CEBED=4? NM:i:0

          However, I get the following error message.

          Exception in thread "main" net.sf.samtools.SAMFormatException: Error parsing text SAM file. Empty sequence dictionary.; Line 3
          Line: HWI-EAS313_0001:1:80:8942:6680#0 0 FBgn0000003 1 3 42M * 0 0 CGGACTGGAAGGTTGGCAGCTTCTGTAATCACGCTTCTGTGA GGGFGGGGGGGFFEGGGGGGGFGGGGGGDDGFGGGGGGGGFE NM:i:2

          Do I have to give the program the reference sequences? Or do I need to create a sequence dictionary using CreateSequenceDictionary

          Thanks
          Abhijit

          There are no "SQ" fields in your SAM file. You could try giving it the reference sequence if they are not present.

          Comment


          • Hello nilshomer,

            I specified the SQ field but I still get an error, probably because the reference sequences are drosophila gene sequences. Therefore the names of the reference sequences are different unless 2 or more reads match to the same gene. I cannot convert the names to a single reference sequence name, since I will loose information. I am stuck. Maybe I need to do the traditional perl sort (very time consuming for a 6GB file). Any better way of doing this?

            Thanks
            Abhijit

            Comment


            • Hello,

              I was wondering if there was a score associated with each read in the SAM file, that would give an indication on the strength of the match between the read and the subject sequence. The CIGAR string helps to some extent, but since "M" denotes match or mismatch, I was wondering if there was a way to differentiate between the two. Sort of like an E-value or a blast score.

              Abhijit

              Comment


              • Hi guys,

                Had a quick question regarding the SAM- CIGAR column. I understand the M attribute designates both matches and mismatches. Is there a way to get at the literal number of mismatches without resorting to comparing the tags or sequence to the reference using the SAM format? Sorry if this question is a re-post. I tried searching, but couldn't find anything.

                Comment


                • Originally posted by gen2prot View Post
                  Hello,

                  I was wondering if there was a score associated with each read in the SAM file, that would give an indication on the strength of the match between the read and the subject sequence. The CIGAR string helps to some extent, but since "M" denotes match or mismatch, I was wondering if there was a way to differentiate between the two. Sort of like an E-value or a blast score.

                  Abhijit
                  Probably a good idea to create a new thread (this one is getting long!).
                  See the mapping quality field.

                  Originally posted by JohnK View Post
                  Hi guys,

                  Had a quick question regarding the SAM- CIGAR column. I understand the M attribute designates both matches and mismatches. Is there a way to get at the literal number of mismatches without resorting to comparing the tags or sequence to the reference using the SAM format? Sorry if this question is a re-post. I tried searching, but couldn't find anything.
                  Probably a good idea to create a new thread (this one is getting long!).
                  Try the NM optional tag if it is available (aligner specific).

                  Comment


                  • Hi All,
                    Is "sorted" BAM file smaller in size compare to unsorted BAM file?
                    If that's the case, why is that so?

                    I sort a lot of BAM files using the samtools, with this command:
                    samtools sort chr1-aligned.bam chr1-aligned.sorted

                    file size of chr1-aligned.bam ==> 353,618,735 bytes
                    but the file size of chr1-aligned.sorted.bam ==> 295,208,534 bytes

                    I have checked for all my unsorted and sorted bam files. All of the sorted bam files are smaller in size compare to the unsorted ones.

                    Thanks.

                    Comment


                    • Sorted files are compressed better.

                      Comment


                      • Originally posted by win804 View Post
                        Hi All,
                        Is "sorted" BAM file smaller in size compare to unsorted BAM file?
                        If that's the case, why is that so?...
                        You already asked this on a separate thread

                        Comment


                        • Thanks Li Heng. I just want to confirm that nothing is wrong with the sorted bam file.

                          Thanks a lot.

                          Comment


                          • Originally posted by maubp View Post
                            You already asked this on a separate thread
                            http://seqanswers.com/forums/showthread.php?t=5684
                            Yes, I wanted to delete the previous thread before, however, I have no idea of how to do it. Any idea?

                            Thanks.

                            Comment


                            • Originally posted by lh3 View Post
                              @corthay

                              You can convert with "blast2sam.pl -s" to save the sequence in SAM. Currently, samtools cannot parse SAM without sequence, although the specification allows this.

                              Hi Li,
                              I parse .sam with sequence, but samtools view still gave such error msg. Could you please take a look of the post:


                              Thanks.

                              Comment


                              • maq2sam-long

                                Originally posted by lh3 View Post
                                maq2sam-short is for the .map files generated by maq-0.6.x, while maq2sam-long for files generated by maq-0.7.x. Sorry for the confusion, and one of the aims of SAM is to avoid such confusions in future.
                                maq2sam <in.map> [<readGroup>], I want know how to use the option parameter 'readGroup',can it add library info from map to sam?

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Essential Discoveries and Tools in Epitranscriptomics
                                  by seqadmin




                                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                  04-22-2024, 07:01 AM
                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 11:49 AM
                                0 responses
                                15 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-24-2024, 08:47 AM
                                0 responses
                                16 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                62 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                60 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X