Using UMAP to Process High Colour Flow Cytometry (part 2)

In the first blog post, I found that several of my data runs showed skewed results due to some outlying data.

Using the merged file example number 10

I need to consider why I have some outlying data on the UMAP plots, and what the next steps should be. I am going to think through the procedure from the start.

Sample issues and mechanical issues from set up and acquisition are out of my hands since this data is from the Flow Repository, so I move onto issues with my analysis set up.

There are many things to think about when setting up multidimensional data analysis.

The size of the flow cytometry data files

The larger the files, the longer the data takes to process. For some scientists, time used is vital grant money and cannot be wasted.

I have used the Applied Cytometry CytoSwarm (algorithm processing engine) where everything is processed ultra-fast in AWS in the cloud. There were no issues here as it would been reported to me in the CytoSwarm software had there been any issues.

How many data files you want to merge

If you want to create a merged file to split apart and compare, you must consider the size and number of files. The more samples merged, the better and more accurate the outcome. This can also be time-consuming and, as mentioned above, time used is vital grant money spent.

In the case above I merged 10 data files.

Which parameters to use for your analysis

Which parameters are going to be able to pull apart the populations I’m interested in? What am I looking for within this data?

I have found that there is always trial and error with parameter selection, and been surprised by how useful certain parameters can be in helping separate cluster populations.

In my analysis I consistently used 6 parameters that would be used for phenotypic gating to keep things simple.

Compensation is important

Before sending your data for multidimensional analysis, make sure you have saved or applied the compensation in the files.

When analysing flow cytometry data, if the compensation is skewed then the populations can look completely different. It could make all the difference in results, which is bad for fields where accuracy is essential such as diagnostics.

When you may only have one shot at this, you need to make sure your data is the best it can be.

Putting the hyperparameters into VenturiOne flow cytometry analysis software

Having considered the steps taken, I now need to look more closely at the output data.

I load the outcome file containing the hyperparameters of UMAP and FlowSOM into VenturiOne flow cytometry analysis software. Here I can look at traditional gating next to the multidimensional data to try and figure out what these outlying clusters may be.

Hierarchy gating of 10 merged datafiles

In the image above, gate B is placed around all the outlying clusters in the UMAP plot to identify them, showing a total of 633 out of the 1.5 million events in the merged file.

Gates C and D are used identify each cluster and show any different features.

Here, I backgated B on traditional dot plots (below) and onto the flowSOM plot (above). On the flowSOM plot, all the outlying clusters appear on one branch at the top of the plot (except for the single event found in the rightmost cluster, which is marked by the thicker black box around it).

Here I wanted to see where these cluster events appeared within the single gate and the FSC/SSC plot. I used colour precedence to make the colours stand out, and then precedence density (below) to see if there was any correlation with main cell populations.

Overall, it looks like these events spread quite evenly over the plots, leading me to believe that this may be down to some non-specific staining.

What do you think?

There are some other parameters and ideas I have that I want to check but I will discuss these in Part 3.