Using UMAP to process high colour flow cytometry data (part 3)

Over the last few UMAP blogs, I have looked at a case of UMAP cluster outliers to see if the data is actually some specific population of its own or whether it is simple debris or something else.

I back gated the outlying clusters on the singlet and FSC/SSC plots and found these events are quite randomly spread over the data which leads me to think it may be down to some non-specific staining, but there are a few more ideas I want to consider first.

Are these outliers sample specific?

Do these events occur in all samples or in specific ones within the 10 that were merged?

For this I look at the file scattered histogram.

UMAP outliers

The red histogram plot on the left shows the event counts across the ten different data files which have been merged, this plot is gated on the singlet gate A (below). The histogram plot on the above right shows the events from the outlying cluster gate B (shown below) and as we can see the majority of these events reside within three of the data files within the ten merged.

umap histogram plot

Further using the cluster gate B and the file scatter parameter, I look at a common antibody (CD3) to determine if the three different outlying clusters appear together or separately (below).

umap fcs file antibody cd3 outlying cluster

From the above plot we can see that the different outlying clusters do show separately at different intensities of staining for the CD3 antibody. The plot below shows the same but overlaying the rest of the data (grey), on this plot the red and orange clusters are showing higher levels of stain intensity than the rest of the data which for me backs up my non-specific antibody binding theory.

cd3 scattered umap

What do you think about UMAP?

There is another possibility I want to check, the UMAP algorithm in use. We use a UMAP Algorithm from the GitHub site in the CytoSwarm software, and looking on the site, it seems that a few other people have experienced some outlying data using a UMAP algorithm (some example links below):

https://github.com/rapidsai/cuml/issues/1121

https://github.com/lmcinnes/umap/issues/3

This is not the specific algorithm that CytoSwarm uses (jlmelville/uwot), but there are a few cases reported over varying UMAP algorithms, so this could theoretically be a reason for the outlying events above.

Next steps

So with these theories in mind, I have a few ways I could resolve this… 

Within VenturiOne I can use the zoom function to zoom in to the relevant data in the UMAP plot and put a gate around it effectively gating out the outlying cluster data (see below). 

umap1 merged fcs file gated data

I could save the above merged FCS file excluding the outlying events, then re-run the multidimensional analysis on the gated data. 

I could use the information from the file scatter plot to identify the files which contain the outlying cluster data, and re-run the multidimensional analysis excluding these files. 

What would you choose?

Contact Us

To speak to one of our team email customersupport@appliedcytometry.com or telephone +44 (0)1909 547210. Alternatively, please complete the form on the Contact page and one of our team will contact you as soon as possible.

Get In Touch