FLEA (the Full-Length Envelope Analyzer) is a tool for analyzing and exploring HIV envelope sequence data. This page explains some of the features of the FLEA front end visualizations. Please follow along on this example page

The Sequences tab.

This tab lets you explore your sequence data, visualizing summarized versions of the multiple sequence alignment, as well as allowing you to explore residue and motif frequencies.

The options bar, and the genome annotation bar.

Sequences 1

  1. The alignment rage (or ranges) to be displayed. A single range is two coordinates separated by a "-", and you can chain multiple ranges using ";"
  2. A regular expression used to highlight elements of the alignment. Here we highlight PNGS sites by default.
  3. The threshold below which variants are supressed, so make the alignment simpler to view. If you want to only see the majority variants, increase this, and if you want to see everything, set it to 0.
  4. A checkbox that switches between variant frequency curves over time, and box plots.
  5. The number of motifs to be displayed in the variant frequency plots. The top N frequency variants will be displayed, and the rest will be collapsed into the "other" category.
  6. The region on the genome annotation that is being displayed is highlighted in blue. One can click on the named regions to select those, and click and drag on the bar below to select custom regions by mouse. Pressing "shift" while selecting a new region will add that region to the selection, instead of replacing it.
  7. The reference coordinates (here in HXB2 numbering).

The coordinates, the reference sequence, and the Most Recent Common Ancestor (MRCA).

Sequences 2

  1. Reference coordinates (here HXB2) for each column. Clicking on coordinates for a column adds that column to the trajectory plot.
  2. Add or remove positively selected sites to the trajectory plot.
  3. Reference sequence
  4. Most recent common ancestor (MRCA), computed as the frequency-weighted consensus of the earliest time point.

Amino acid trajectory.

Sequences 3

  1. Frequency curve of motif from the selected columns. Here, "DDT" is the most frequent in the earliest time points, but "NDT" takes over between time points V12 and V22.
  2. Mousing over a time point displays a popup menu with the exact frequencies.
  3. Time points on the x-axis, labelled with visit codes.
  4. Legend for colors for each motif.

Amino acid trajectory, bar plots.

Sequences 4

The same plot, shown as a bar plot.

Multiple sequence alignment.

Sequences 5

  1. Copynumber for the shown subsequence, aggregated from all the consensus sequences that match the shown subsequence. Clicking the copynumber displays the corresponding consensus sequence ids.
  2. Fraction of the population.
  3. Sequences are grouped by time point.
  4. Highlighted motifs matching the regular expression. By default, shows PNGSs.

Tree controls.

Trees 1

  1. Sorting controls. Tree may be sorted by depth.
  2. Scale controls for resizing the tree.
  3. Show/hide leaf node sizes.
  4. Overlap nodes, if they are enabled.
  5. Enable/disable radial layout.
  6. Enable interpolated colors, which color time points more closely.
  7. Node label controls. "motif" shows the motif selected in the "sequences" tab.
  8. Scale indicator for volutionary distance along the tree.

Radial tree view.

Trees 2

  1. Node area corresponds to copy number. Nodes colored by time point.

Motifs mapped to the phylogenetic tree.

Trees 3

  1. Nodes show motifs selected in the "Sequences" tab, colored by motif identity. Ancestor nodes are colored according to the inferred motif.

Vertical tree layout.

Trees 4

Trees may be laid out vertically, rather than radially.

Interactive protein model.

Prot 1

  1. Interactive protein model, which can be rotated, panned, and zoomed. Residues are colored according to protein metrics.
  2. Time point for which to show metrics. "Combined" displays the maximum value for each time point.
  3. Controls for the protein model. Different metrics may be displayed.

Gene metrics plot.

Prot 2

  1. The same metrics that are mapped to the protein model, here plotted in one dimension.

Protein model with selected positions.

Prot 3

  1. Selected columns in the "sequences" tab are mapped to residues, which are shown here as spheres. Env is trimer, so each column corresponds to three residues.

Multidimensional scaling, colored by time point.

MDS 1

  1. Color controls. Colors may be interpolated by time point, mapping similar colors to nearby time points. Nodes may be colored by motif selected in the "sequences" tab.
  2. Legend showing color corresponding to time points.

Multidimensional scaling, colored by motif.

MDS 2

  1. Colored by motif selected in "sequences" tab.

Evolutionary and phenytypic metrics plots

Evo 1

  1. Time series plot for various evolutionary metrics. Multiple regions or multiple metrics may be selected.
  2. Phenotype metrics for the selected region. Multiple metrics may be selected.