Using UMAP to Process High Colour Flow Cytometry (part 1)

Introduction to UMAP

I find the UMAP algorithm very useful in separating the major phenotypic populations of leukocytes, but I have recently discovered – through trial and error, that the UMAP data can be skewed by small outlying populations.

To use UMAP to its best potential, I have found that merging datasets and running the data through the UMAP algorithm has been key. This is especially true when I am able to separate the files and look at comparisons using the same plot. However, this in itself can be a challenge.

In my most recent tests I have sent varying numbers of files (out of the 49 in this dataset from the Flow Repository), each up to 50MB in size, and received some very good results with UMAP.

Examples of UMAP previews in CytoSwarm (black and white) and in VenturiOne analysis software (coloured) are shown below:

Single Files






Merged Files

6, 10 files merged.

7, 10 different files merged.

8, 10 randomly picked files merged.

9, 10 different randomly picked files merged.

10, first 10 files merged.

11, 10 randomly picked files merged.

Gated and Merged.
12, All files gated on CD4 and merged.

13, All files gated on CD8 and merged.

14, All files gated on Tregs and merged.

As you can see, certain plots have been skewed by some outlying data. So what do I do with the effected plots? There are a few things to consider.

What is this outlying data? Is it credible data? Does it need to be run again, or can it simply be gated out?

I will discuss all my considerations in my next blog.

In the meantime please feel free to discuss your own considerations with me at I would love to hear your own experiences with UMAP!