Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • After using FASTQC and Trim_Galore on my data, I used BWA with my first paired end...

    sequence.

    This is what the same file looked like. Is it supposed to look like this? I tried to google the sam format and I wasn't too sure what went wrong or if I did it right.

    Code:
    2010	147	Serratia	4602318	54	74M	=	4601933	-459	GGCATCGGCGACCGCACCCTCGACGTTGTGCGCCAGGCGGCGCGCGATCGCCAACTGACGTTGTGGCGGGCGAC	@@9)):D@=0+<)D:3))<)3<5C+BCCCDCDE5+*>555+55+CE+C7EC>A86++A<@9<5+5++++<55<<	NM:i:5	AS:i:49	XS:i:0
    M00532:8:000000000-A17VF:1:1101:15815:2019	99	Serratia	5016228	60	227M	=	5016443	403	GTGCTGGCCGCCGCCGGCGCGCGCGTGATCCTCAACGGCTTCGGCGATGTGGAAGCGGCGAAGACGCAGGTTGCCCGGCTGGGCGCCGCGCCGGGGTATCACGGCGCCGATCTCGGCGATGCGGCCCAGATAGCGGACATGATGCAGTATGCCGAACGTGAGTTCGGCGGCGTGGACATTCTGGTGAACAACGCCGGCATTCAGCACGTGGCGCCGCTGGATCAGTT	??<??<9?BB@BBB<BCCCCCCHHAC7EF;FDGHHHEHEHHFC:CC:DDD@;;@@DEE@@7??EE8:6;;CEE?EEE?;;?AAE62;;;2;;?;??4?EEEE?8;????;??EEAEE;?;882;???EEEEEEE;?;8'??CEAEEEA:A:?C88;?;8ACE?:C)'8.2;2''48A*//?C::/:??AECE;;'';828CEE****.*.)'5?2;;;4;C?EEA0A	NM:i:16	AS:i:147	XS:i:0
    M00532:8:000000000-A17VF:1:1101:15815:2019	147	Serratia	5016443	60	188M	=	5016228	-403	GCTGGATCAGTTCCCGGTGGAGAAATGGAACGCCATCCTCGCCATCAACCTGTCGGCGGTGTTCCACACTTGCCGATTGGCGCTGCCGGGCATGCGCGAGCGCCACTGGGGGCGCATCATCAACGTAGCGTCGGTGCACGGGCTGGTGGCGTCGAAAGACAAGTCGGCCTATGTGGCGGCCAAGCACG	CEE:?*:**.';8'8AC?0E?*::*??EDE>:)CA2;4'5:::*?:CC?2DD?D?DDEEEE?:):C?:82?;EECE8E8D>?'DD?EC:?))''8EEE<EBECEEEBEEEEEEHHHHFHHHHHHHDDHEECHHDFE@HHHFHFFHHEHEADFDHHGGHF?HCCCEFC>AFFFDD@@7DBDBB??????	NM:i:6	AS:i:158	XS:i:19
    M00532:8:000000000-A17VF:1:1101:13390:2020	99	Serratia	4718589	60	252M	=	4719007	534	AAATTGCTGCAGGGGCGTTGGATGCAGGGCGAGGTGCAAACCTGCGACGGCCAAAGCATGAAACCGGGACTGGATGCCGCCTCCATCGTCTGGATCGAGAAGCGTGCCCGCAGCAGCAGCCGGCCGGTGAGCGTCGCCTGGCTGGAAGCGCCGGAAGGCAGCGAACTGCTGCTGGTGGCGAACGACGATTTCTGCAGCTGGTGACCGACAGAAGACCCACTATAAACAAGACCCCGCGCTGCGGAGCCTCTT	?????BBBD<B9B@DBFCCFFFFHFHF->EECAC+CCFHHHHH,?>CCHHBCHHHFHHFDFFHDFCEB>@EF@DDB??)@BEEFF;CAAA?AAEEFEEFFECEDEEEEFEDD8DFCCCE*?'8;DDD4'?*:;;EDDE88?AECEEEFF>DDDDD8AEEFE?DD>8C:C:**:*::*??>?D?;?;>;8AEFFFF?A***0**110;;'.'1??*?A*..00:**?*1?0*00..5'4;''.''....:*1:	NM:i:24	AS:i:132	XS:i:0
    M00532:8:000000000-A17VF:1:1101:13390:2020	147	Serratia	4719007	60	116M	=	4718589	-534	GCCGCTGGTGACGACGTCATCGCCCTGCAGGCGCGTACCGGCGTGGAAGACCTTACCGTCGGCGCTTTCCTGCTGCGGTAGCCCCTGGATAACCTCACCGTTGCGATAGTCGCCCG	<;8<(=;<96''666;6(2<<?29;E9<<;83@@@:8:=DEDEED@9ED;+D@CD5CCCC=CDEEEEEEDEC>CCCE@ECCCCEFGEEEEC8,CC+>@CC@@@@@@7@=<7==,9=	NM:i:7	AS:i:83	XS:i:0
    M00532:8:000000000-A17VF:1:1101:17000:2031	77	*	0	0	*	*	0	0	AGGGCAACCACAACCCTCTGATGCAAATGCTCTAATGATCGTCCCTCATCCGATTTAAGAGCATTGATTAAGAGAGGTAGCTCAAGAGACTTGTTAAGAGGACCACCTTCGGGATCTTC	????????DDDDEEDDGGGGGGIIHIIHIIIIHHIIGHHIIHIHHIIIIHIHHHHHHHIHIHIIIFGHHIIIIIHHHHHDHHIHDGFFEEGHIIIIIIHHDFFHHHHGHHHHHHHGGGG	AS:i:0	XS:i:0
    M00532:8:000000000-A17VF:1:1101:17000:2031	141	*	0	0	*	*	0	0	AACTCCGAAATCAGAGAATCGTTCAAAGGTCATGTTGCCGACCGACTTAAACGAATATCGACCCATGCGGGCTATGCAGTGTATCAGGATGTCTGGAGAGAGGTGCTGAGACATTGGGGTAACCCAGCACCTCAAGTTGATACAGAGTCACACCTAATAGATCTGTTTGAAATCGCTATCAATCGTGCTCGTTCACAAAAAAGGTTAT	?A??AB?ADDDBDDBBFGGGGGHHHIIHIEFFHHHIIIHHHHHHHHHHIHEFHHHIIFHIHHHHHIHHEHEHBFHHHHHD,4CFGEFGGGDFGFGEDDD@E@D4A>C->ACGGGGGGGG@CEEGCC8CEGGGCE??C*0:?CEGECCCEGCEGCEGG?CGECEEG:CEGGCC?*8C9CCCE:**.0CCC:C289??CGGCC288::::	AS:i:0	XS:i:0
    M00532:8:000000000-A17VF:1:1101:14261:2037	83	Serratia	3418147	60	148M106S	=	3418067	-228	TGATTTCTCACCAATCAATACCTCTGGGATCACCTACTCTAGAGAGATGGCGTGCAGGAAACTACCGCCGAATCGCAAGAGTTCTGCTCCAAATGCAACCAGCTGTAAATTTCCCGCGATCTGCTGTAATAACTCAATGAAACTTAAACCTTCCCGCAGCGACGAAAATAAATATAACAACGACGAGCCAGTGACCCAGACGTGAAATCTTCACTCATCGCGCGCTGACCTCGACCAGGCAGCTCATGGCGTTC	:1*:1*0*)***:*A:CA::C8??:A1::***00110*::EAA?AEEE?2)A0**EFEC:182..'DE?FFFD>EAFEEECFEFEA??*CEFFEFEFFFFEC:?AEFFEA8?D>DEFFFFEEECEEEFFEEEEA?CEC:CCAEECCC=BE@@EBE@:EEFEFADDFHHHHHHHHFFFBEDED=HHHHHHHHHFFHHFHHHHHGFHHHFHHHFGHHHHHEHEHHHFHHHHHHHFFFFFFDBBBDBBD?@??????	NM:i:3	AS:i:133	XS:i:0
    M00532:8:000000000-A17VF:1:1101:14261:2037	163	Serratia	3418067	60	149M	=	3418147	228	ACTATATGGCAGGCAAAAAAAAACCTACGCATCCGCGTAGGTTGGTGCAATTGAAAATGGCTTCAACATACAGAGTATGCTGATTTCTCACCAATCAATACCTCTGGGATCACCTACTCTAGAGAGATGGCGTGCAGGAAACTACCGCC	?????BBBDDDDDDDEGGGGGGHHHIHFEHHEHIEHE>FHDCGFCH;GHFHHGFIIIIIIBFFFHFDDC,@FDFDCFHHHHHHHHHFHFGGGGGFFFGEGGG.4D>D=EGGGGGGGGCEGGCCCE8;A->C8C2CE?CGCCGGGG:28A	NM:i:1	AS:i:144	XS:i:0
    M00532:8:000000000-A17VF:1:1101:14313:2053	99	Serratia	293636	60	15S208M	=	294091	552	TTTGCGTGCAGCTGATGAGGTTGCATTTTATTACAACTGTGTCTGCCGCTTTCGGAATCATGTTAATGATTACTTAAGAAATTCGGCTCACATTGAGGGCTTAACCCAAGGAGGCCTCAATGTTAAATGCGACCCGGCTGCAACTGATGAATCACTTCGCTTACCTGCAGCAATTTATGGCTTCACCGCGCACCGTCGGTACGCTGGCACCTTCTTCCCCGTG	?????@=?D-5<BDBBFCFFC;CFFF;>EAEFFFGHHHFBD?EGFGGC+>EFH@7EDFHHFGHFHHHHHDEHFHHHGGHHHHHGFDEHHGCEEHGHHHEEHFHHHHHHHFDD;D:FF,B?4DDFEEEEBDE@@<@BEBEBEEEE:EEC:A**::AAAEEAEEEEEE??CEECE:*::?AAC:8*0?*).8;8;''8?2>8?))58A2):C)?0::EE0:C8?A	NM:i:4	AS:i:188	XS:i:0
    M00532:8:000000000-A17VF:1:1101:14313:2053	147	Serratia	294091	60	97M	=	293636	-552	CTGCCTCTGCTGTCGATCCCGGTCAGGATCAGCGTGCGCATTCTTCAGCAGGCCCGGCAGCGGCTGCTGGCGCGCAACGGCACGCTGGTGCTGTTCC	9?::*7EE@:::*:8*@@@8+<+;+@:DC9;CCDDEEEEEEDEECA=EEDDC>C5ECC7CC@E;DA+@CCCC79EC+CA+E@<-@@@@@>>9===<=	NM:i:3	AS:i:85	XS:i:0
    M00532:8:000000000-A17VF:1:1101:11897:2065	99	Serratia	4185170	60	243M11S	=	4185331	352	TTTCCCTGAAAAGATAACGTATTGAGGATTCACCATGAGCATCAAAAATATTTTACCCGGCAAGATCGGTTTGGGCGGCGCGCCGCTCGGCAATATGTACCGCGCCATTCCAGAAGAAGAAGCGCGGGCTACCGTAACCAGCGCCTGGGACTTGGGCATCCGCTACTTCGACACCGCGCCGCTTTACGGTTCCGGCCTGTCGGAAATTCGCATGGGCGAAGCGCTTTCTCAGTACCCACGCGATGAGTTCGTAC	AAAAABBBDDDDDDDDGFEFGFFHHHFFHIFCFGHHFFHIIIHIIHHIHHHIIIIHHIHHHHHHFHIFFCFHEEHHHHHHEGGGEGGGGGGGGGGGGEGCCC'8>>EGGGGGCCGGGGGGGGEGGGGGGGCEC>EGGEGEEGGGDEEGGGGEEG?EGGGGGGGEDDGG?EGECAG8>A>DDGGGGGGEECECEC<>AG8:*8.48<*?EEEGECCEEA<24<G<)8??EGCC:::?C?C8C8<8CEG1:*08:C	NM:i:28	AS:i:103	XS:i:0
    M00532:8:000000000-A17VF:1:1101:11897:2065	147	Serratia	4185331	60	191M	=	4185170	-352	GCTACTTCGACACCGCGCCGCTTTACGGGTCCGGCCTGTCGGAAATTCGCATGGGCGAAGCGCTTTCTCAGTACCCACGCGATGAGTTCGTACTGAGCACTAAAGTGGGCCGCATCATGCTGGACGAAATGGAAGATCCCGCCGCCCGCGATCTGGGTGAGAAAGGCGGCCTGTTCGAACACGGTTTGAAA	::C?ECCC:88.428>D<C?C8)>8<85'>D<C80'88??CCEEADGGEEEEGGE>GGGGEECGGEEECECE>DGG>DEGGGGGGGGEEEC:GGGC?CEGGGGGGGGGG>DCGEGGGGGEGEE?ECC;GEGGGGGGGGGHDHHHHDHHHHHHHIIIIHIHGIHHFEEIIGGGGGGDDDDDDDDBBB?????	NM:i:22	AS:i:85	XS:i:0
    M00532:8:000000000-A17VF:1:1101:18758:2082	121	Serratia	2219275	60	247M	=	2219275	0	GTAGCTCCATGCCAACCCCAAGGCCAGAAAGCCGTTGTACAGCCCCTGATTGGCGGCCAACNCCCGCGTGGCTGCGGCGAACTCCGCCGTTGTGCCGAATGCGCGCCTGCCGAGTGGCGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNATGNAGCGCTGCGATGAGCAGCAGCAANATATCNGCAAAGAACTTCATGGTGGTNCCAGTCGTTTNTTGAAGATGTCATCAATTATAAACT	:CCCEC?9?C:'2..88::CCC:0*:E:.4'..*C:CCECAGDC:**GE><4'<?:?.000#5DDE<EGE<A>D>CDC:?8GGGG>GGGGGGDGGGGGGGGGGGGGGGGGEGEC??444#####################################F?6#HHHHHHHIIIIIIIHIIIHFCA5#EFEA5#IHHIIHHHIHHIHIHFFFA7#HHEHHHFFA7#HFHGGGGGGDDDDDDDDBBBA????	NM:i:56	AS:i:100	XS:i:0
    Last edited by prs321; 06-14-2013, 11:12 AM.

  • #2
    It is hard to see since you are not using the Quote/Code tags (you will find those under the "Go advanced" tab when you are editing a post, highlight the text that you want to quote/code and then use the "#" button in the two row of icons).

    It does look right (have you clipped off some lines from the top): Here is simple explanation of SAM http://genome.sph.umich.edu/wiki/SAM
    Actual format specification: http://samtools.sourceforge.net/SAM1.pdf
    SAM flag meaning: http://picard.sourceforge.net/explain-flags.html
    Last edited by GenoMax; 06-14-2013, 11:14 AM.

    Comment


    • #3
      It has been fixed. What is the next step? I have 20 other sequences. Do I map them to the same sam file or will I end up having 21 separate sam files that I end up mapping together?

      Comment


      • #4
        Originally posted by prs321 View Post
        It has been fixed. What is the next step? I have 20 other sequences. Do I map them to the same sam file or will I end up having 21 separate sam files that I end up mapping together?
        What are you trying do exactly?

        Comment


        • #5
          Trying to map the 21 paired end sequences of Serratia over the reference. Then I think I'm supposed to do some sort of analysis.

          Comment


          • #6
            Reference Genome *

            Comment


            • #7
              Originally posted by prs321 View Post
              Trying to map the 21 paired end sequences of Serratia over the reference.
              If these are separate samples then you need to do the alignments separately. By creating these sam files you are mapping the reads to the reference sequence.

              Originally posted by prs321 View Post
              Then I think I'm supposed to do some sort of analysis.


              Are you looking to identify SNP/SV or going to do some sort of phylogenetic analysis?

              Comment


              • #8
                After using FASTQC and Trim_Galore on my data, I used BWA with my first paired end..

                What do you mean by 21 sequences?

                Do you have 21 fastq files from one sample, or do you have files that are from 21 different samples?

                How many reads are in each file?

                Comment


                • #9
                  Sorry, let me clarify.

                  I have 21 pairs of fastq files. The first 20 pairs (40 fastq files) are the paired end files for the specimens examined. The last pair is the ancestral one.

                  I think the base pair reads vary from 20-255ish.

                  Comment


                  • #10
                    And I'm not really sure what I'm supposed to do after aligning and mapping, it's up to the person who is higher up from me.

                    I was supposed to first clean the files, which I did using Trim_Galore.

                    Now I'm supposed to learn how to align and map.

                    The final process is some sort of analysis, i'm not too sure. Probably looking at SNPs and mutations and such and then seeing if the results were consistent with the paper.

                    Comment


                    • #11
                      bump 10char

                      Comment


                      • #12
                        Its impossible to help you when you can't even answer basic questions of what you are trying to do. You should talk to your boss/professor and get a clear idea of what the goal is. Then make use of google, google scholar, and pubmed and read papers on the subject. Go to the websites of the actual tools you use and learn the manuals. After you have done your homework.....then ask specific questions that you still have. As a researcher, one of the most important skills (perhaps the most important) is being able to find information on your own.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Current Approaches to Protein Sequencing
                          by seqadmin


                          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                          04-04-2024, 04:25 PM
                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 04-11-2024, 12:08 PM
                        0 responses
                        25 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 10:19 PM
                        0 responses
                        29 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 09:21 AM
                        0 responses
                        25 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-04-2024, 09:00 AM
                        0 responses
                        52 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X