VDJML is a file format for storing the results of VDJ analysis of immune receptor (IR) sequence reads.
The VDJ analysis involves aligning each input sequence to V, D, and J germline gene segments, and, based on the alignments, identifying various regions of interest and other properties of the input sequence. Currently, every VDJ analysis software application produces output in a different format.
The purpose of VDJML is to provide a common format for different VDJ analysis applications and to facilitate downstream processing of the results in an application-agnostic manner.
The VDJ analysis is performed using either "raw" input sequence reads, (e.g., as obtained from a sequencing instrument), "consensus" read sequences, or sequences obtained from other sources. In this document, the input sequences used for the analysis will be referred to as read sequences or reads.
VDJML stores information about the software packages and about the germline sequence databases that were used for producing the analysis results. For each read sequence, VDJML stores the information about the aligned gene segments. For combinations of the aligned segments, VDJML stores regions of interest, e.g., FR1, CDR3.
VDJML was designed to have a relatively narrow scope. In general, the data with commonly accepted file formats should be stored in separate files and not in VDJML. Examples of information that should not be stored in VDJML:
All elements and attributes defined by this version of the VDJML schema
have a namespace http://vdjserver.org/vdjml/xsd/1/
(prefix vdj:
).
The top element of the schema is vdj:vdjml
.
The schema uses primitive datatypes defined by the
W3C XML Schema.
The primitive datatypes are defined in the namespace
http://www.w3.org/2001/XMLSchema
, prefix xs:
,
e.g., xs:decimal
.
A VDJML file consists of two parts.
General information about the analysis appears under the element
vdj:meta
and the result for
each sequence read appears in a sequence of
vdj:read
elements under the
vdj:read_results
element.
The schema also allows some user-defined elements and attributes,
which may appear under
vdj:meta
and
vdj:read
elements.
User-defined elements and attributes, should have namespaces other than
vdj
.
1.0
vdj
.
libVDJML
1.42.0
xs:dateTime
format, e.g.,
2014-07-24T14:47:24
vdj
.
90%
.
true
if the stop codon is present
true
if a codon for a conserved amino acid is mutated
true
if the read sequence is a reverse-complement
to germline gene segments
true
if indel mutation resulted in a frame shift
true
if V(D)J recombination occurred out of frame
5AC-G35
.
V
, D
, or J
)
segment_match_id
-s)
that serve as a basis for the annotations listed in this element.
FR1
aligner_id
90%
.
true
if the stop codon is present
true
if a codon for a conserved amino acid is mutated
true
if the read sequence is a reverse-complement
to germline gene segments
true
if indel mutation resulted in a frame shift
true
if V(D)J recombination occurred out of frame