Introduction to UMAP
I find the UMAP algorithm very useful in separating the major phenotypic populations of leukocytes, but I have recently discovered – through trial and error, that the UMAP data can be skewed by small outlying populations.
To use UMAP to its best potential, I have found that merging datasets and running the data through the UMAP algorithm has been key. This is especially true when I am able to separate the files and look at comparisons using the same plot. However, this in itself can be a challenge.
In my most recent tests I have sent varying numbers of files (out of the 49 in this dataset from the Flow Repository), each up to 50MB in size, and received some very good results with UMAP.
Examples of UMAP previews in CytoSwarm (black and white) and in VenturiOne analysis software (coloured) are shown below:
6, 10 files merged
7, 10 different files merged
8, 10 randomly picked files merged
9, 10 different randomly picked files merged
10, first 10 files merged
11, 10 randomly picked files merged
Gated and Merged
12, All files gated on CD4 and merged
13, All files gated on CD8 and merged
14, All files gated on Tregs and merged
As you can see, certain plots have been skewed by some outlying data. So what do I do with the effected plots? There are a few things to consider.
What is this outlying data? Is it credible data? Does it need to be run again, or can it simply be gated out?
I will discuss all my considerations in my next blog.
In the meantime
Please feel free to discuss your own considerations with me at email@example.com I would love to hear your own experiences with UMAP!