Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Error details: Read FCC03A6ABXX:3:2107:11142:198335#TAGCTTAT is missing read group

    I am new to bam file and GATK tools. I want to convert bam into vcf by running

    Code:
    java -jar /media/zwang10/Elements/UK10K/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -R /media/zwang10/Elements/UK10K/human_g1k_v37.fasta -T HaplotypeCaller -I _EGAR00001038931_36843.pe.raw.sorted.bam --genotyping_mode DISCOVERY -stand_emit_conf 10 -stand_call_conf 30 -o raw_variants.vcf

    But I got

    Code:
    INFO  19:11:19,792 HelpFormatter - -------------------------------------------------------------------------------- 
    INFO  19:11:19,798 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.5-0-g36282e4, Compiled 2015/11/25 04:03:56 
    INFO  19:11:19,798 HelpFormatter - Copyright (c) 2010 The Broad Institute 
    INFO  19:11:19,799 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk 
    INFO  19:11:19,807 HelpFormatter - Program Args: -R /media/zwang10/Elements/UK10K/human_g1k_v37.fasta -T HaplotypeCaller -I _EGAR00001038931_36843.pe.raw.sorted.bam --genotyping_mode DISCOVERY -stand_emit_conf 10 -stand_call_conf 30 -o raw_variants.vcf 
    INFO  19:11:19,820 HelpFormatter - Executing as zwang10@zwang10-K55N on Linux 3.13.0-74-generic amd64; OpenJDK 64-Bit Server VM 1.7.0_91-b02. 
    INFO  19:11:19,821 HelpFormatter - Date/Time: 2016/01/03 19:11:19 
    INFO  19:11:19,822 HelpFormatter - -------------------------------------------------------------------------------- 
    INFO  19:11:19,823 HelpFormatter - -------------------------------------------------------------------------------- 
    INFO  19:11:20,220 GenomeAnalysisEngine - Strictness is SILENT 
    INFO  19:11:20,537 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 500 
    INFO  19:11:20,554 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
    INFO  19:11:20,783 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.23 
    INFO  19:11:20,887 HCMappingQualityFilter - Filtering out reads with MAPQ < 20 
    INFO  19:11:21,120 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files 
    INFO  19:11:22,436 GenomeAnalysisEngine - Done preparing for traversal 
    INFO  19:11:22,437 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] 
    INFO  19:11:22,439 ProgressMeter -                 |      processed |    time |         per 1M |           |   total | remaining 
    INFO  19:11:22,440 ProgressMeter -        Location | active regions | elapsed | active regions | completed | runtime |   runtime 
    INFO  19:11:22,441 HaplotypeCaller - Disabling physical phasing, which is supported only for reference-model confidence output 
    INFO  19:11:22,562 StrandBiasTest - SAM/BAM data was found. Attempting to use read data to calculate strand bias annotations values. 
    WARN  19:11:22,563 InbreedingCoeff - Annotation will not be calculated. InbreedingCoeff requires at least 10 unrelated samples. 
    INFO  19:11:22,565 StrandBiasTest - SAM/BAM data was found. Attempting to use read data to calculate strand bias annotations values. 
    INFO  19:11:22,930 HaplotypeCaller - Using global mismapping rate of 45 => -4.5 in log10 likelihood units 
    INFO  19:11:27,999 GATKRunReport - Uploaded run statistics report to AWS S3 
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A USER ERROR has occurred (version 3.5-0-g36282e4): 
    ##### ERROR
    ##### ERROR This means that one or more arguments or inputs in your command are incorrect.
    ##### ERROR The error message below tells you what is the problem.
    ##### ERROR
    ##### ERROR If the problem is an invalid argument, please check the online documentation guide
    ##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
    ##### ERROR
    ##### ERROR Visit our website and forum for extensive documentation and answers to 
    ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ##### ERROR
    ##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
    ##### ERROR
    ##### ERROR MESSAGE: SAM/BAM/CRAM file htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter@51762faf is malformed. Please see http://gatkforums.broadinstitute.org/discussion/1317/collected-faqs-about-input-files-for-sequence-read-data-bam-cramfor more information. Error details: Read FCC03A6ABXX:3:2107:11142:198335#TAGCTTAT is missing the read group (RG) tag, which is required by the GATK. Please see http://gatkforums.broadinstitute.org/discussion/59/companion-utilities-replacereadgroups to fix this problem
    ##### ERROR ------------------------------------------------------------------------------------------
    zwang10@zwang10-K55N:/media/zwang10/Elements/UK10K$ java -jar /media/zwang10/Elements/UK10K/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -R /media/zwang10/Elements/UK10K/human_g1k_v37.fasta -T HaplotypeCaller -I _EGAR00001038931_36843.pe.raw.sorted.bam --genotyping_mode DISCOVERY -stand_emit_conf 10 -stand_call_conf 30 -o raw_variants.vcf^C
    zwang10@zwang10-K55N:/media/zwang10/Elements/UK10K$ java -jar /media/zwang10/Elements/UK10K/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -R /media/zwang10/Elements/UK10K/human_g1k_v37.fasta -T HaplotypeCaller -I _EGAR00001038931_36843.pe.raw.sorted.bam --genotyping_mode DISCOVERY -stand_emit_conf 10 -stand_call_conf 30 -o raw_variants.vcf > error
    INFO  19:14:00,776 HelpFormatter - -------------------------------------------------------------------------------- 
    INFO  19:14:00,783 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.5-0-g36282e4, Compiled 2015/11/25 04:03:56 
    INFO  19:14:00,784 HelpFormatter - Copyright (c) 2010 The Broad Institute 
    INFO  19:14:00,785 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk 
    INFO  19:14:00,793 HelpFormatter - Program Args: -R /media/zwang10/Elements/UK10K/human_g1k_v37.fasta -T HaplotypeCaller -I _EGAR00001038931_36843.pe.raw.sorted.bam --genotyping_mode DISCOVERY -stand_emit_conf 10 -stand_call_conf 30 -o raw_variants.vcf 
    INFO  19:14:00,806 HelpFormatter - Executing as zwang10@zwang10-K55N on Linux 3.13.0-74-generic amd64; OpenJDK 64-Bit Server VM 1.7.0_91-b02. 
    INFO  19:14:00,807 HelpFormatter - Date/Time: 2016/01/03 19:14:00 
    INFO  19:14:00,808 HelpFormatter - -------------------------------------------------------------------------------- 
    INFO  19:14:00,808 HelpFormatter - -------------------------------------------------------------------------------- 
    INFO  19:14:01,199 GenomeAnalysisEngine - Strictness is SILENT 
    INFO  19:14:01,500 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 500 
    INFO  19:14:01,517 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
    INFO  19:14:01,668 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.15 
    INFO  19:14:01,739 HCMappingQualityFilter - Filtering out reads with MAPQ < 20 
    INFO  19:14:01,982 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files 
    INFO  19:14:03,265 GenomeAnalysisEngine - Done preparing for traversal 
    INFO  19:14:03,266 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] 
    INFO  19:14:03,268 ProgressMeter -                 |      processed |    time |         per 1M |           |   total | remaining 
    INFO  19:14:03,269 ProgressMeter -        Location | active regions | elapsed | active regions | completed | runtime |   runtime 
    INFO  19:14:03,270 HaplotypeCaller - Disabling physical phasing, which is supported only for reference-model confidence output 
    INFO  19:14:03,390 StrandBiasTest - SAM/BAM data was found. Attempting to use read data to calculate strand bias annotations values. 
    WARN  19:14:03,391 InbreedingCoeff - Annotation will not be calculated. InbreedingCoeff requires at least 10 unrelated samples. 
    INFO  19:14:03,393 StrandBiasTest - SAM/BAM data was found. Attempting to use read data to calculate strand bias annotations values. 
    INFO  19:14:03,675 HaplotypeCaller - Using global mismapping rate of 45 => -4.5 in log10 likelihood units 
    INFO  19:14:08,680 GATKRunReport - Uploaded run statistics report to AWS S3 
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A USER ERROR has occurred (version 3.5-0-g36282e4): 
    ##### ERROR
    ##### ERROR This means that one or more arguments or inputs in your command are incorrect.
    ##### ERROR The error message below tells you what is the problem.
    ##### ERROR
    ##### ERROR If the problem is an invalid argument, please check the online documentation guide
    ##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
    ##### ERROR
    ##### ERROR Visit our website and forum for extensive documentation and answers to 
    ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ##### ERROR
    ##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
    ##### ERROR
    ##### ERROR MESSAGE: SAM/BAM/CRAM file htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter@5c0bb1d5 is malformed. Please see http://gatkforums.broadinstitute.org/discussion/1317/collected-faqs-about-input-files-for-sequence-read-data-bam-cramfor more information. Error details: Read FCC03A6ABXX:3:2107:11142:198335#TAGCTTAT is missing the read group (RG) tag, which is required by the GATK. Please see http://gatkforums.broadinstitute.org/discussion/59/companion-utilities-replacereadgroups to fix this problem
    ##### ERROR ------------------------------------------------------------------------------------------
    Is there a way to add the missing RG tag?

  • #2
    If your BAM file does not have read groups then they would need to be added using picard's AddOrReplaceREadGroups tools. See this recent thread: http://seqanswers.com/forums/showthread.php?t=64005

    Read groups are required by GATK.

    Comment


    • #3
      Originally posted by GenoMax View Post
      If your BAM file does not have read groups then they would need to be added using picard's AddOrReplaceREadGroups tools. See this recent thread: http://seqanswers.com/forums/showthread.php?t=64005

      Read groups are required by GATK.
      I run
      Code:
      samtools view -H _EGAR00001038931_36843.pe.raw.sorted.bam
      And I got
      Code:
      @HD	VN:1.0	SO:coordinate
      @SQ	SN:1	LN:249250621	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:1b22b98cdeb4a9304cb5d48026a85128	SP:Human
      @SQ	SN:2	LN:243199373	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:a0d9851da00400dec1098a9255ac712e	SP:Human
      @SQ	SN:3	LN:198022430	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:fdfd811849cc2fadebc929bb925902e5	SP:Human
      @SQ	SN:4	LN:191154276	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:23dccd106897542ad87d2765d28a19a1	SP:Human
      @SQ	SN:5	LN:180915260	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:0740173db9ffd264d728f32784845cd7	SP:Human
      @SQ	SN:6	LN:171115067	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:1d3a93a248d92a729ee764823acbbc6b	SP:Human
      @SQ	SN:7	LN:159138663	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:618366e953d6aaad97dbe4777c29375e	SP:Human
      @SQ	SN:8	LN:146364022	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:96f514a9929e410c6651697bded59aec	SP:Human
      @SQ	SN:9	LN:141213431	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:3e273117f15e0a400f01055d9f393768	SP:Human
      @SQ	SN:10	LN:135534747	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:988c28e000e84c26d552359af1ea2e1d	SP:Human
      @SQ	SN:11	LN:135006516	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:98c59049a2df285c76ffb1c6db8f8b96	SP:Human
      @SQ	SN:12	LN:133851895	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:51851ac0e1a115847ad36449b0015864	SP:Human
      @SQ	SN:13	LN:115169878	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:283f8d7892baa81b510a015719ca7b0b	SP:Human
      @SQ	SN:14	LN:107349540	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:98f3cae32b2a2e9524bc19813927542e	SP:Human
      @SQ	SN:15	LN:102531392	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:e5645a794a8238215b2cd77acb95a078	SP:Human
      @SQ	SN:16	LN:90354753	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:fc9b1a7b42b97a864f56b348b06095e6	SP:Human
      @SQ	SN:17	LN:81195210	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:351f64d4f4f9ddd45b35336ad97aa6de	SP:Human
      @SQ	SN:18	LN:78077248	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:b15d4b2d29dde9d3e4f93d1d0f2cbc9c	SP:Human
      @SQ	SN:19	LN:59128983	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:1aacd71f30db8e561810913e0b72636d	SP:Human
      @SQ	SN:20	LN:63025520	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:0dec9660ec1efaaf33281c0d5ea2560f	SP:Human
      @SQ	SN:21	LN:48129895	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:2979a6085bfe28e3ad6f552f361ed74d	SP:Human
      @SQ	SN:22	LN:51304566	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:a718acaa6135fdca8357d5bfe94211dd	SP:Human
      @SQ	SN:X	LN:155270560	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:7e0e2e580297b7764e31dbc80c2540dd	SP:Human
      @SQ	SN:Y	LN:59373566	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:1fa3474750af0948bdf97d5a0ee52e51	SP:Human
      @SQ	SN:MT	LN:16569	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:c68f52674c9fb33aef52dcf399755519	SP:Human
      @SQ	SN:GL000207.1	LN:4262	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:f3814841f1939d3ca19072d9e89f3fd7	SP:Human
      @SQ	SN:GL000226.1	LN:15008	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:1c1b2cd1fccbc0a99b6a447fa24d1504	SP:Human
      @SQ	SN:GL000229.1	LN:19913	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:d0f40ec87de311d8e715b52e4c7062e1	SP:Human
      @SQ	SN:GL000231.1	LN:27386	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:ba8882ce3a1efa2080e5d29b956568a4	SP:Human
      @SQ	SN:GL000210.1	LN:27682	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:851106a74238044126131ce2a8e5847c	SP:Human
      @SQ	SN:GL000239.1	LN:33824	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:99795f15702caec4fa1c4e15f8a29c07	SP:Human
      @SQ	SN:GL000235.1	LN:34474	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:118a25ca210cfbcdfb6c2ebb249f9680	SP:Human
      @SQ	SN:GL000201.1	LN:36148	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:dfb7e7ec60ffdcb85cb359ea28454ee9	SP:Human
      @SQ	SN:GL000247.1	LN:36422	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:7de00226bb7df1c57276ca6baabafd15	SP:Human
      @SQ	SN:GL000245.1	LN:36651	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:89bc61960f37d94abf0df2d481ada0ec	SP:Human
      @SQ	SN:GL000197.1	LN:37175	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:6f5efdd36643a9b8c8ccad6f2f1edc7b	SP:Human
      @SQ	SN:GL000203.1	LN:37498	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:96358c325fe0e70bee73436e8bb14dbd	SP:Human
      @SQ	SN:GL000246.1	LN:38154	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:e4afcd31912af9d9c2546acf1cb23af2	SP:Human
      @SQ	SN:GL000249.1	LN:38502	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:1d78abec37c15fe29a275eb08d5af236	SP:Human
      @SQ	SN:GL000196.1	LN:38914	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:d92206d1bb4c3b4019c43c0875c06dc0	SP:Human
      @SQ	SN:GL000248.1	LN:39786	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:5a8e43bec9be36c7b49c84d585107776	SP:Human
      @SQ	SN:GL000244.1	LN:39929	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:0996b4475f353ca98bacb756ac479140	SP:Human
      @SQ	SN:GL000238.1	LN:39939	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:131b1efc3270cc838686b54e7c34b17b	SP:Human
      @SQ	SN:GL000202.1	LN:40103	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:06cbf126247d89664a4faebad130fe9c	SP:Human
      @SQ	SN:GL000234.1	LN:40531	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:93f998536b61a56fd0ff47322a911d4b	SP:Human
      @SQ	SN:GL000232.1	LN:40652	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:3e06b6741061ad93a8587531307057d8	SP:Human
      @SQ	SN:GL000206.1	LN:41001	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:43f69e423533e948bfae5ce1d45bd3f1	SP:Human
      @SQ	SN:GL000240.1	LN:41933	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:445a86173da9f237d7bcf41c6cb8cc62	SP:Human
      @SQ	SN:GL000236.1	LN:41934	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:fdcd739913efa1fdc64b6c0cd7016779	SP:Human
      @SQ	SN:GL000241.1	LN:42152	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:ef4258cdc5a45c206cea8fc3e1d858cf	SP:Human
      @SQ	SN:GL000243.1	LN:43341	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:cc34279a7e353136741c9fce79bc4396	SP:Human
      @SQ	SN:GL000242.1	LN:43523	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:2f8694fc47576bc81b5fe9e7de0ba49e	SP:Human
      @SQ	SN:GL000230.1	LN:43691	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:b4eb71ee878d3706246b7c1dbef69299	SP:Human
      @SQ	SN:GL000237.1	LN:45867	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:e0c82e7751df73f4f6d0ed30cdc853c0	SP:Human
      @SQ	SN:GL000233.1	LN:45941	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:7fed60298a8d62ff808b74b6ce820001	SP:Human
      @SQ	SN:GL000204.1	LN:81310	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:efc49c871536fa8d79cb0a06fa739722	SP:Human
      @SQ	SN:GL000198.1	LN:90085	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:868e7784040da90d900d2d1b667a1383	SP:Human
      @SQ	SN:GL000208.1	LN:92689	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:aa81be49bf3fe63a79bdc6a6f279abf6	SP:Human
      @SQ	SN:GL000191.1	LN:106433	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:d75b436f50a8214ee9c2a51d30b2c2cc	SP:Human
      @SQ	SN:GL000227.1	LN:128374	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:a4aead23f8053f2655e468bcc6ecdceb	SP:Human
      @SQ	SN:GL000228.1	LN:129120	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:c5a17c97e2c1a0b6a9cc5a6b064b714f	SP:Human
      @SQ	SN:GL000214.1	LN:137718	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:46c2032c37f2ed899eb41c0473319a69	SP:Human
      @SQ	SN:GL000221.1	LN:155397	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:3238fb74ea87ae857f9c7508d315babb	SP:Human
      @SQ	SN:GL000209.1	LN:159169	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:f40598e2a5a6b26e84a3775e0d1e2c81	SP:Human
      @SQ	SN:GL000218.1	LN:161147	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:1d708b54644c26c7e01c2dad5426d38c	SP:Human
      @SQ	SN:GL000220.1	LN:161802	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:fc35de963c57bf7648429e6454f1c9db	SP:Human
      @SQ	SN:GL000213.1	LN:164239	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:9d424fdcc98866650b58f004080a992a	SP:Human
      @SQ	SN:GL000211.1	LN:166566	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:7daaa45c66b288847b9b32b964e623d3	SP:Human
      @SQ	SN:GL000199.1	LN:169874	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:569af3b73522fab4b40995ae4944e78e	SP:Human
      @SQ	SN:GL000217.1	LN:172149	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:6d243e18dea1945fb7f2517615b8f52e	SP:Human
      @SQ	SN:GL000216.1	LN:172294	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:642a232d91c486ac339263820aef7fe0	SP:Human
      @SQ	SN:GL000215.1	LN:172545	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:5eb3b418480ae67a997957c909375a73	SP:Human
      @SQ	SN:GL000205.1	LN:174588	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:d22441398d99caf673e9afb9a1908ec5	SP:Human
      @SQ	SN:GL000219.1	LN:179198	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:f977edd13bac459cb2ed4a5457dba1b3	SP:Human
      @SQ	SN:GL000224.1	LN:179693	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:d5b2fc04f6b41b212a4198a07f450e20	SP:Human
      @SQ	SN:GL000223.1	LN:180455	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:399dfa03bf32022ab52a846f7ca35b30	SP:Human
      @SQ	SN:GL000195.1	LN:182896	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:5d9ec007868d517e73543b005ba48535	SP:Human
      @SQ	SN:GL000212.1	LN:186858	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:563531689f3dbd691331fd6c5730a88b	SP:Human
      @SQ	SN:GL000222.1	LN:186861	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:6fe9abac455169f50470f5a6b01d0f59	SP:Human
      @SQ	SN:GL000200.1	LN:187035	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:75e4c8d17cd4addf3917d1703cacaf25	SP:Human
      @SQ	SN:GL000193.1	LN:189789	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:dbb6e8ece0b5de29da56601613007c2a	SP:Human
      @SQ	SN:GL000194.1	LN:191469	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:6ac8f815bf8e845bb3031b73f812c012	SP:Human
      @SQ	SN:GL000225.1	LN:211173	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:63945c3e6962f28ffd469719a747e73c	SP:Human
      @SQ	SN:GL000192.1	LN:547496	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:325ba9e808f669dfeee210fdd7b470ac	SP:Human
      @RG	ID:1	PL:ILLUMINA	PU:110628_I283_FCC03A6ABXX_L3_HUMggcRLCDIAAPEI-10	LB:HUMggcRLCDIAAPEI-10	PI:478	DS:Study UK10K_COHORT_TWINSUK: The UK10K project proposes a series of complementary genetic approaches to find new low-frequency/rare variants contributing to disease phenotypes.  These will be based on obtaining the genome-wide sequence of 4000 samples from the TwinsUK and ALSPAC cohorts (at 6x sequence coverage), and the exome sequence (protein-coding regions and related conserved sequence) of 6000 samples selected for extreme phenotypes.  Our studies will focus primarily on cardiovascular-related quantitative traits, obesity and related metabolic traits, neurodevelopmental disorders and a limited number of extreme clinical phenotypes that will provide proof-of-concept for future familial trait sequencing.  We will directly analyse quantitative traits in the cohorts and the selected traits in the extreme samples, and also use imputation down to 0.1% allele frequency to extend the analyses to further sample sets with genome wide genotype data. In each case we will investigate indels and larger structural variants as well as SNPs, and use statistical methods that combine rare variants in a locus or pathway as well as single-variant approaches.	DT:2010-12-15T00:00:00+0000	SM:UK10K_15792	CN:SC
      @PG	ID:filter_adapter	PN:filter_adapter.pl	VN:1.0	DS:filter adapters	CL:perl /software/filter_adapter.pl  /HKC10040_HUMggcR/HUMggcRLCDIAAPEI-10/110628_I283_FCC03A6ABXX_L3_HUMggcRLCDIAAPEI-10/read_1.adapter.list /HKC10040_HUMggcR/HUMggcRLCDIAAPEI-10/110628_I283_FCC03A6ABXX_L3_HUMggcRLCDIAAPEI-10/read_2.adapter.list /HKC10040_HUMggcR/HUMggcRLCDIAAPEI-10/110628_I283_FCC03A6ABXX_L3_HUMggcRLCDIAAPEI-10/110628_I283_FCC03A6ABXX_L3_HUMggcRLCDIAAPEI-10_1.fq.gz /HKC10040_HUMggcR/HUMggcRLCDIAAPEI-10/110628_I283_FCC03A6ABXX_L3_HUMggcRLCDIAAPEI-10/110628_I283_FCC03A6ABXX_L3_HUMggcRLCDIAAPEI-10_2.fq.gz read_1.fq.gz read_2.fq.gz
      @PG	ID:bwa_aln	PN:bwa	PP:filter_adapter	VN:0.5.9rc1 (r1561)	DS:bwa alignment from fastq files	CL:/software/bwa aln -q 15 -t 4 /data/human_g1k_v37.fasta read_1.fq.gz > 1.sai;/software/bwa aln -q 15 -t 4 /data/human_g1k_v37.fasta read_2.fq.gz > 2.sai
      @PG	ID:bwa_sam	PN:bwa	PP:bwa_aln	VN:0.5.9rc1 (r1561)	DS:bwa converting alignemtns to sam	CL:/software/bwa sampe /data/human_g1k_v37.fasta 1.sai 2.sai read_1.fq.gz read_2.fq.gz > 110628_I283_FCC03A6ABXX_L3_HUMggcRLCDIAAPEI-10.sam
      @PG	ID:sam_to_bam	PN:samtools	PP:bwa_sam	VN:0.1.8 (r613) DS:convert sam to bam	CL:/software/samtools view -b -u -S 110628_I283_FCC03A6ABXX_L3_HUMggcRLCDIAAPEI-10.sam > 110628_I283_FCC03A6ABXX_L3_HUMggcRLCDIAAPEI-10.bam
      @PG	ID:bam_sort	PN:samtools	PP:sam_to_bam	VN:0.1.8 (r613)	DS:sort bam file	CL:/software/samtools sort 110628_I283_FCC03A6ABXX_L3_HUMggcRLCDIAAPEI-10.bam 110628_I283_FCC03A6ABXX_L3_HUMggcRLCDIAAPEI-10.sort
      @PG	ID:remove_duplication	PN:samtools	PP:bam_sort	VN:0.1.8 (r613)	DS:remove duplication	CL:/software/samtools rmdup 110628_I283_FCC03A6ABXX_L3_HUMggcRLCDIAAPEI-10.sort.bam 110628_I283_FCC03A6ABXX_L3_HUMggcRLCDIAAPEI-10.sort.rmdup.bam
      @PG	ID:add_header	PN:samtools	PP:remove_duplication	VN:0.1.8 (r613)	DS:add PG, RG and SQ tag in bam header	CL:/software/samtools reheader bam.header 110628_I283_FCC03A6ABXX_L3_HUMggcRLCDIAAPEI-10.sort.rmdup.bam > 110628_I283_FCC03A6ABXX_L3_HUMggcRLCDIAAPEI-10.sort.rmdup.reheader.bam
      zwang10@office:/media/zwang10/Elements/UK10K$ samtools view -H _EGAR00001038931_36843.pe.raw.sorted.bam > head
      zwang10@office:/media/zwang10/Elements/UK10K$ samtools view -H _EGAR00001038931_36843.pe.raw.sorted.bam
      @HD	VN:1.0	SO:coordinate
      @SQ	SN:1	LN:249250621	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:1b22b98cdeb4a9304cb5d48026a85128	SP:Human
      @SQ	SN:2	LN:243199373	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:a0d9851da00400dec1098a9255ac712e	SP:Human
      @SQ	SN:3	LN:198022430	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:fdfd811849cc2fadebc929bb925902e5	SP:Human
      @SQ	SN:4	LN:191154276	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:23dccd106897542ad87d2765d28a19a1	SP:Human
      @SQ	SN:5	LN:180915260	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:0740173db9ffd264d728f32784845cd7	SP:Human
      @SQ	SN:6	LN:171115067	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:1d3a93a248d92a729ee764823acbbc6b	SP:Human
      @SQ	SN:7	LN:159138663	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:618366e953d6aaad97dbe4777c29375e	SP:Human
      @SQ	SN:8	LN:146364022	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:96f514a9929e410c6651697bded59aec	SP:Human
      @SQ	SN:9	LN:141213431	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:3e273117f15e0a400f01055d9f393768	SP:Human
      @SQ	SN:10	LN:135534747	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:988c28e000e84c26d552359af1ea2e1d	SP:Human
      @SQ	SN:11	LN:135006516	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:98c59049a2df285c76ffb1c6db8f8b96	SP:Human
      @SQ	SN:12	LN:133851895	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:51851ac0e1a115847ad36449b0015864	SP:Human
      @SQ	SN:13	LN:115169878	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:283f8d7892baa81b510a015719ca7b0b	SP:Human
      @SQ	SN:14	LN:107349540	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:98f3cae32b2a2e9524bc19813927542e	SP:Human
      @SQ	SN:15	LN:102531392	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:e5645a794a8238215b2cd77acb95a078	SP:Human
      @SQ	SN:16	LN:90354753	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:fc9b1a7b42b97a864f56b348b06095e6	SP:Human
      @SQ	SN:17	LN:81195210	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:351f64d4f4f9ddd45b35336ad97aa6de	SP:Human
      @SQ	SN:18	LN:78077248	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:b15d4b2d29dde9d3e4f93d1d0f2cbc9c	SP:Human
      @SQ	SN:19	LN:59128983	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:1aacd71f30db8e561810913e0b72636d	SP:Human
      @SQ	SN:20	LN:63025520	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:0dec9660ec1efaaf33281c0d5ea2560f	SP:Human
      @SQ	SN:21	LN:48129895	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:2979a6085bfe28e3ad6f552f361ed74d	SP:Human
      @SQ	SN:22	LN:51304566	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:a718acaa6135fdca8357d5bfe94211dd	SP:Human
      @SQ	SN:X	LN:155270560	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:7e0e2e580297b7764e31dbc80c2540dd	SP:Human
      @SQ	SN:Y	LN:59373566	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:1fa3474750af0948bdf97d5a0ee52e51	SP:Human
      @SQ	SN:MT	LN:16569	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:c68f52674c9fb33aef52dcf399755519	SP:Human
      @SQ	SN:GL000207.1	LN:4262	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:f3814841f1939d3ca19072d9e89f3fd7	SP:Human
      @SQ	SN:GL000226.1	LN:15008	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:1c1b2cd1fccbc0a99b6a447fa24d1504	SP:Human
      @SQ	SN:GL000229.1	LN:19913	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:d0f40ec87de311d8e715b52e4c7062e1	SP:Human
      @SQ	SN:GL000231.1	LN:27386	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:ba8882ce3a1efa2080e5d29b956568a4	SP:Human
      @SQ	SN:GL000210.1	LN:27682	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:851106a74238044126131ce2a8e5847c	SP:Human
      @SQ	SN:GL000239.1	LN:33824	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:99795f15702caec4fa1c4e15f8a29c07	SP:Human
      @SQ	SN:GL000235.1	LN:34474	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:118a25ca210cfbcdfb6c2ebb249f9680	SP:Human
      @SQ	SN:GL000201.1	LN:36148	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:dfb7e7ec60ffdcb85cb359ea28454ee9	SP:Human
      @SQ	SN:GL000247.1	LN:36422	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:7de00226bb7df1c57276ca6baabafd15	SP:Human
      @SQ	SN:GL000245.1	LN:36651	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:89bc61960f37d94abf0df2d481ada0ec	SP:Human
      @SQ	SN:GL000197.1	LN:37175	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:6f5efdd36643a9b8c8ccad6f2f1edc7b	SP:Human
      @SQ	SN:GL000203.1	LN:37498	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:96358c325fe0e70bee73436e8bb14dbd	SP:Human
      @SQ	SN:GL000246.1	LN:38154	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:e4afcd31912af9d9c2546acf1cb23af2	SP:Human
      @SQ	SN:GL000249.1	LN:38502	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:1d78abec37c15fe29a275eb08d5af236	SP:Human
      @SQ	SN:GL000196.1	LN:38914	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:d92206d1bb4c3b4019c43c0875c06dc0	SP:Human
      @SQ	SN:GL000248.1	LN:39786	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:5a8e43bec9be36c7b49c84d585107776	SP:Human
      @SQ	SN:GL000244.1	LN:39929	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:0996b4475f353ca98bacb756ac479140	SP:Human
      @SQ	SN:GL000238.1	LN:39939	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:131b1efc3270cc838686b54e7c34b17b	SP:Human
      @SQ	SN:GL000202.1	LN:40103	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:06cbf126247d89664a4faebad130fe9c	SP:Human
      @SQ	SN:GL000234.1	LN:40531	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:93f998536b61a56fd0ff47322a911d4b	SP:Human
      @SQ	SN:GL000232.1	LN:40652	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:3e06b6741061ad93a8587531307057d8	SP:Human
      @SQ	SN:GL000206.1	LN:41001	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:43f69e423533e948bfae5ce1d45bd3f1	SP:Human
      @SQ	SN:GL000240.1	LN:41933	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:445a86173da9f237d7bcf41c6cb8cc62	SP:Human
      @SQ	SN:GL000236.1	LN:41934	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:fdcd739913efa1fdc64b6c0cd7016779	SP:Human
      @SQ	SN:GL000241.1	LN:42152	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:ef4258cdc5a45c206cea8fc3e1d858cf	SP:Human
      @SQ	SN:GL000243.1	LN:43341	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:cc34279a7e353136741c9fce79bc4396	SP:Human
      @SQ	SN:GL000242.1	LN:43523	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:2f8694fc47576bc81b5fe9e7de0ba49e	SP:Human
      @SQ	SN:GL000230.1	LN:43691	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:b4eb71ee878d3706246b7c1dbef69299	SP:Human
      @SQ	SN:GL000237.1	LN:45867	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:e0c82e7751df73f4f6d0ed30cdc853c0	SP:Human
      @SQ	SN:GL000233.1	LN:45941	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:7fed60298a8d62ff808b74b6ce820001	SP:Human
      @SQ	SN:GL000204.1	LN:81310	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:efc49c871536fa8d79cb0a06fa739722	SP:Human
      @SQ	SN:GL000198.1	LN:90085	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:868e7784040da90d900d2d1b667a1383	SP:Human
      @SQ	SN:GL000208.1	LN:92689	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:aa81be49bf3fe63a79bdc6a6f279abf6	SP:Human
      @SQ	SN:GL000191.1	LN:106433	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:d75b436f50a8214ee9c2a51d30b2c2cc	SP:Human
      @SQ	SN:GL000227.1	LN:128374	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:a4aead23f8053f2655e468bcc6ecdceb	SP:Human
      @SQ	SN:GL000228.1	LN:129120	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:c5a17c97e2c1a0b6a9cc5a6b064b714f	SP:Human
      @SQ	SN:GL000214.1	LN:137718	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:46c2032c37f2ed899eb41c0473319a69	SP:Human
      @SQ	SN:GL000221.1	LN:155397	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:3238fb74ea87ae857f9c7508d315babb	SP:Human
      @SQ	SN:GL000209.1	LN:159169	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:f40598e2a5a6b26e84a3775e0d1e2c81	SP:Human
      @SQ	SN:GL000218.1	LN:161147	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:1d708b54644c26c7e01c2dad5426d38c	SP:Human
      @SQ	SN:GL000220.1	LN:161802	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:fc35de963c57bf7648429e6454f1c9db	SP:Human
      @SQ	SN:GL000213.1	LN:164239	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:9d424fdcc98866650b58f004080a992a	SP:Human
      @SQ	SN:GL000211.1	LN:166566	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:7daaa45c66b288847b9b32b964e623d3	SP:Human
      @SQ	SN:GL000199.1	LN:169874	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:569af3b73522fab4b40995ae4944e78e	SP:Human
      @SQ	SN:GL000217.1	LN:172149	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:6d243e18dea1945fb7f2517615b8f52e	SP:Human
      @SQ	SN:GL000216.1	LN:172294	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:642a232d91c486ac339263820aef7fe0	SP:Human
      @SQ	SN:GL000215.1	LN:172545	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:5eb3b418480ae67a997957c909375a73	SP:Human
      @SQ	SN:GL000205.1	LN:174588	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:d22441398d99caf673e9afb9a1908ec5	SP:Human
      @SQ	SN:GL000219.1	LN:179198	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:f977edd13bac459cb2ed4a5457dba1b3	SP:Human
      @SQ	SN:GL000224.1	LN:179693	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:d5b2fc04f6b41b212a4198a07f450e20	SP:Human
      @SQ	SN:GL000223.1	LN:180455	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:399dfa03bf32022ab52a846f7ca35b30	SP:Human
      @SQ	SN:GL000195.1	LN:182896	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:5d9ec007868d517e73543b005ba48535	SP:Human
      @SQ	SN:GL000212.1	LN:186858	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:563531689f3dbd691331fd6c5730a88b	SP:Human
      @SQ	SN:GL000222.1	LN:186861	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:6fe9abac455169f50470f5a6b01d0f59	SP:Human
      @SQ	SN:GL000200.1	LN:187035	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:75e4c8d17cd4addf3917d1703cacaf25	SP:Human
      @SQ	SN:GL000193.1	LN:189789	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:dbb6e8ece0b5de29da56601613007c2a	SP:Human
      @SQ	SN:GL000194.1	LN:191469	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:6ac8f815bf8e845bb3031b73f812c012	SP:Human
      @SQ	SN:GL000225.1	LN:211173	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:63945c3e6962f28ffd469719a747e73c	SP:Human
      @SQ	SN:GL000192.1	LN:547496	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz	AS:NCBI37	M5:325ba9e808f669dfeee210fdd7b470ac	SP:Human
      @RG	ID:1	PL:ILLUMINA	PU:110628_I283_FCC03A6ABXX_L3_HUMggcRLCDIAAPEI-10	LB:HUMggcRLCDIAAPEI-10	PI:478	DS:Study UK10K_COHORT_TWINSUK: The UK10K project proposes a series of complementary genetic approaches to find new low-frequency/rare variants contributing to disease phenotypes.  These will be based on obtaining the genome-wide sequence of 4000 samples from the TwinsUK and ALSPAC cohorts (at 6x sequence coverage), and the exome sequence (protein-coding regions and related conserved sequence) of 6000 samples selected for extreme phenotypes.  Our studies will focus primarily on cardiovascular-related quantitative traits, obesity and related metabolic traits, neurodevelopmental disorders and a limited number of extreme clinical phenotypes that will provide proof-of-concept for future familial trait sequencing.  We will directly analyse quantitative traits in the cohorts and the selected traits in the extreme samples, and also use imputation down to 0.1% allele frequency to extend the analyses to further sample sets with genome wide genotype data. In each case we will investigate indels and larger structural variants as well as SNPs, and use statistical methods that combine rare variants in a locus or pathway as well as single-variant approaches.	DT:2010-12-15T00:00:00+0000	SM:UK10K_15792	CN:SC
      @PG	ID:filter_adapter	PN:filter_adapter.pl	VN:1.0	DS:filter adapters	CL:perl /software/filter_adapter.pl  /HKC10040_HUMggcR/HUMggcRLCDIAAPEI-10/110628_I283_FCC03A6ABXX_L3_HUMggcRLCDIAAPEI-10/read_1.adapter.list /HKC10040_HUMggcR/HUMggcRLCDIAAPEI-10/110628_I283_FCC03A6ABXX_L3_HUMggcRLCDIAAPEI-10/read_2.adapter.list /HKC10040_HUMggcR/HUMggcRLCDIAAPEI-10/110628_I283_FCC03A6ABXX_L3_HUMggcRLCDIAAPEI-10/110628_I283_FCC03A6ABXX_L3_HUMggcRLCDIAAPEI-10_1.fq.gz /HKC10040_HUMggcR/HUMggcRLCDIAAPEI-10/110628_I283_FCC03A6ABXX_L3_HUMggcRLCDIAAPEI-10/110628_I283_FCC03A6ABXX_L3_HUMggcRLCDIAAPEI-10_2.fq.gz read_1.fq.gz read_2.fq.gz
      @PG	ID:bwa_aln	PN:bwa	PP:filter_adapter	VN:0.5.9rc1 (r1561)	DS:bwa alignment from fastq files	CL:/software/bwa aln -q 15 -t 4 /data/human_g1k_v37.fasta read_1.fq.gz > 1.sai;/software/bwa aln -q 15 -t 4 /data/human_g1k_v37.fasta read_2.fq.gz > 2.sai
      @PG	ID:bwa_sam	PN:bwa	PP:bwa_aln	VN:0.5.9rc1 (r1561)	DS:bwa converting alignemtns to sam	CL:/software/bwa sampe /data/human_g1k_v37.fasta 1.sai 2.sai read_1.fq.gz read_2.fq.gz > 110628_I283_FCC03A6ABXX_L3_HUMggcRLCDIAAPEI-10.sam
      @PG	ID:sam_to_bam	PN:samtools	PP:bwa_sam	VN:0.1.8 (r613) DS:convert sam to bam	CL:/software/samtools view -b -u -S 110628_I283_FCC03A6ABXX_L3_HUMggcRLCDIAAPEI-10.sam > 110628_I283_FCC03A6ABXX_L3_HUMggcRLCDIAAPEI-10.bam
      @PG	ID:bam_sort	PN:samtools	PP:sam_to_bam	VN:0.1.8 (r613)	DS:sort bam file	CL:/software/samtools sort 110628_I283_FCC03A6ABXX_L3_HUMggcRLCDIAAPEI-10.bam 110628_I283_FCC03A6ABXX_L3_HUMggcRLCDIAAPEI-10.sort
      @PG	ID:remove_duplication	PN:samtools	PP:bam_sort	VN:0.1.8 (r613)	DS:remove duplication	CL:/software/samtools rmdup 110628_I283_FCC03A6ABXX_L3_HUMggcRLCDIAAPEI-10.sort.bam 110628_I283_FCC03A6ABXX_L3_HUMggcRLCDIAAPEI-10.sort.rmdup.bam
      @PG	ID:add_header	PN:samtools	PP:remove_duplication	VN:0.1.8 (r613)	DS:add PG, RG and SQ tag in bam header	CL:/software/samtools reheader bam.header 110628_I283_FCC03A6ABXX_L3_HUMggcRLCDIAAPEI-10.sort.rmdup.bam > 110628_I283_FCC03A6ABXX_L3_HUMggcRLCDIAAPEI-10.sort.rmdup.reheader.bam

      There is a line which contains @RG

      Comment


      • #4
        There is a line which contains @RG
        Which means very little.

        GATK is complaining about a read missing the read group. Not about the file in general missing @RG.

        Look at the read in question and quote us that. I suspect you will see that it (and the rest of your reads) does not have an RG tag. As far as I know the file's @RG does not negate the need for each read to have an RG tag.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        25 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        29 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        25 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        52 views
        0 likes
        Last Post seqadmin  
        Working...
        X