Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • sex/gender tags for BAM files?

    My analysis of next-generation sequencing data is gender / sex specific (should I consider the Y chromosome, or two X chromosomes? )

    I am unaware of any flag in .bam files (or samtools) for the gender of an individual (Male of Female). Is there a tag for gender that I don't know about?

    I am also unaware of gender consideraton in aligners such as BWA, etc. Do these aligners in fact consider gender when aligning?

    Finally, I couldn't find any gender / sex information on databases such as 1000 Genomes. Is it hidden away somewhere?

  • #2
    I don't think aligner algorithms are gender specific and bam files do not have gender flags. The gender/relationship data is passed into NGS analysis using pedigree files.

    Comment


    • #3
      Hi, I come back to this question for another reason:
      It's true that if you have reads from a female, you will have twice aligned reads on ChrX in comparison to a male... so for RNA-seq or ChIP-seq, this would introduce a biais in finding peaks or expressed transcripts, won't it?

      Second, to merge .bam files from male and female replicats with samtools, this generate an error of the type : different target sequence name: 'chrY' != 'chrM'
      because ChrY is absent from female .bam header....

      Any idea of how I could merge these files ? should only add chrY line in the header of the female.bam? is it sufficient? and How can I do? because .bam files are compressed binary files...

      Thanks for your help!
      Last edited by spacup; 12-02-2013, 03:00 AM.

      Comment


      • #4
        For downstream analysis, you would need to account for gender in your model fit (so "Counts ~gender + SomeFactor ..."), which is done by associating samples with factors. If you align to gender-specific genomes and need to subsequently merge files, then simply reheader the female samples and use a genome sorted such that chrY is last (that way you can simply swap in a new header without needing to modify any of the reads).
        Last edited by dpryan; 12-04-2013, 02:21 AM.

        Comment


        • #5
          Thanks dpryan for your fast answer, the question is how to reheader the female sample since the .bam file are compressed binary files.... ?

          EDIT: ok, i found this :
          samtools reheader <in.header.sam> <in.bam>

          I didn't know this command, I will try it!
          Last edited by spacup; 12-02-2013, 03:23 AM.

          Comment


          • #6
            i see the need to account for gender in downstream analysis, but for what kind of data would it make a difference for the alignment (to include/not include chrY)?

            Comment


            • #7
              One could argue that you might get slightly more accurate alignments when dealing with female samples if you exclude chrY from the genome. Honestly, though, I suspect the benefit is very minor and likely outweighed by the increased headaches caused.

              Comment


              • #8
                Hi, I see that my previous message has not been published...
                The idea is not to align data as when you have a .bam file, data are already mapped. I need to combine .bam file to perform peak calling.

                I dowloaded ENCODE data for FAIRE-analysis to try a peak caller and they combine their replicates for their analysis. As I wanted to make similar analysis as ENCODE with other peak caller to compare, I wanted to combine their data too, but one is male and other is female.

                By the way, the command samtools reheader <in.header.sam> <in.bam> worked, thanks!
                But my results are quite different from ENCODE ones...

                Comment


                • #9
                  Just be very careful when you reheader a BAM file. If you change the order of the chromosomes then everything will be screwed up. If the order is such that chrY is last and everything else is the same, then things should work OK.

                  Comment


                  • #10
                    Originally posted by spacup View Post
                    I dowloaded ENCODE data for FAIRE-analysis to try a peak caller and they combine their replicates for their analysis. As I wanted to make similar analysis as ENCODE with other peak caller to compare, I wanted to combine their data too, but one is male and other is female.
                    i still dont understand. why not just merge the files as they are? why the reheader?

                    if the biological replicates are of different gender, maybe it would be best to exclude all reads mapping to chrX and Y. the inactive copy of chrX will be quite different from the active ones.

                    Comment


                    • #11
                      Originally posted by dpryan View Post
                      One could argue that you might get slightly more accurate alignments when dealing with female samples if you exclude chrY from the genome. Honestly, though, I suspect the benefit is very minor and likely outweighed by the increased headaches caused.
                      you would probably also create bias on other chromosomes, e.g. reads then mapping uniquely to chrX ..

                      Comment


                      • #12
                        If it's a female sample then that's not bias, it's increased accuracy. Regarding why merging doesn't work, if the female samples were aligned to a genome lacking chrY, then samtools will refuse to merge since the headers are different.

                        Comment


                        • #13
                          Originally posted by dpryan View Post
                          If it's a female sample then that's not bias, it's increased accuracy. Regarding why merging doesn't work, if the female samples were aligned to a genome lacking chrY, then samtools will refuse to merge since the headers are different.
                          if there are only female samples in the whole study then i agree. otherwise the increased accuracy, which you couldnt achieve in male samples, might lead to false positive results?

                          so ENCODE performs/provides alignments of female samples without chrY?

                          Comment


                          • #14
                            In this case, the difference in mapping would be accounted for in the downstream statistics. If you wanted to directly compare the genders, though, then this would be an issue. I actually haven't a clue whether the original samples were aligned to a genome lacking chrY or not. From what spacup wrote, that's just my presumption.

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Strategies for Sequencing Challenging Samples
                              by seqadmin


                              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                              03-22-2024, 06:39 AM
                            • seqadmin
                              Techniques and Challenges in Conservation Genomics
                              by seqadmin



                              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                              Avian Conservation
                              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                              03-08-2024, 10:41 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, Yesterday, 06:37 PM
                            0 responses
                            12 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, Yesterday, 06:07 PM
                            0 responses
                            10 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-22-2024, 10:03 AM
                            0 responses
                            52 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-21-2024, 07:32 AM
                            0 responses
                            68 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X