Greetings,
I have some targeted DNA seq data with UMIs, where the UMI barcode is part of the read header as such:
NB500916:322:HVV53AFXX:1:11101:8946:1524 1:N:0:TGAAGAGA+AATGCTCCGT
CAGGGTGGAAAAGGGGTCCTGGGCTTCAGCTGAAGGGCAAACTGCCCAGTGTAGGAGTCCGTCCAGGACAGGCAG
Where TGAAGAGA is the index and AATGCTCCGT is the UMI id.
To run through various UMI pipelines I need to have the UMI is as part of the read ID (to use with UMI-tools) or as the actual sequence (to use with fgbio).
So the two outputs I need are (highlighting the changes):
1)
NB500916:322:HVV53AFXX:1:11101:8946:1524:AATGCTCCGT 1:N:0:TGAAGAGA+AATGCTCCGT
CAGGGTGGAAAAGGGGTCCTGGGCTTCAGCTGAAGGGCAAACTGCCCAGTGTAGGAGTCCGTCCAGGACAGGCAG
2)
NB500916:322:HVV53AFXX:1:11101:8946:1524 1:N:0:TGAAGAGA+AATGCTCCGT
AATGCTCCGT
Any help would be greatly appreciated.
I have some targeted DNA seq data with UMIs, where the UMI barcode is part of the read header as such:
NB500916:322:HVV53AFXX:1:11101:8946:1524 1:N:0:TGAAGAGA+AATGCTCCGT
CAGGGTGGAAAAGGGGTCCTGGGCTTCAGCTGAAGGGCAAACTGCCCAGTGTAGGAGTCCGTCCAGGACAGGCAG
Where TGAAGAGA is the index and AATGCTCCGT is the UMI id.
To run through various UMI pipelines I need to have the UMI is as part of the read ID (to use with UMI-tools) or as the actual sequence (to use with fgbio).
So the two outputs I need are (highlighting the changes):
1)
NB500916:322:HVV53AFXX:1:11101:8946:1524:AATGCTCCGT 1:N:0:TGAAGAGA+AATGCTCCGT
CAGGGTGGAAAAGGGGTCCTGGGCTTCAGCTGAAGGGCAAACTGCCCAGTGTAGGAGTCCGTCCAGGACAGGCAG
2)
NB500916:322:HVV53AFXX:1:11101:8946:1524 1:N:0:TGAAGAGA+AATGCTCCGT
AATGCTCCGT
Any help would be greatly appreciated.
Comment