Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • picard add read groups

    Hi all,

    I've been trying to get around this problem for the last couple of days and |I haven't been able to do anything myself and haven't seen any solutions in any forums. Here's my problem:
    My data in a single lane run of illumina Truseq with 24 indexed samples. All the steps have been run using a bash script so all files have been processed in exactly the same way with exactly the same parameters.
    I have converted sam to bam, sorted, indexed and removed duplicates.
    Next I index the files then I perform the realignment around indels
    Then I fix the PE using picard
    Then for some reason it asks me to add header info using picard.
    I've done this for all files but when I've gone to the next step of quality count recalibration there are 5 files that fail.
    When I count the reads (using bamtools) before and after adding the header info I go from 4,032,483 to 3,578,753 reads in the bad files but 4,625,944 to 4,625,834 reads in the other 19 files that have worked.
    GATK keeps giving me an end of file EOF error and it looks like these 5 files are truncated but why just 5 out of 19 files processed in exactly the same way?
    I kow this is a bit of a long winded question but has anyone else had a similar problem?

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Today, 08:47 AM
0 responses
12 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
60 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
59 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 09:21 AM
0 responses
54 views
0 likes
Last Post seqadmin  
Working...
X