Using UMAP to process high colour flow cytometry data

Introduction to UMAP

I find the UMAP algorithm very useful in separating the major phenotypic populations of leukocytes, but I have recently discovered – through trial and error, that the UMAP data can be skewed by small outlying populations.

To use UMAP to its best potential, I have found that merging datasets and running the data through the UMAP algorithm has been key. This is especially true when I am able to separate the files and look at comparisons using the same plot. However, this in itself can be a challenge.

In my most recent tests I have sent varying numbers of files (out of the 49 in this dataset from the Flow Repository), each up to 50MB in size, and received some very good results with UMAP.

Examples of UMAP previews in CytoSwarm (black and white) and in VenturiOne analysis software (coloured) are shown below:

Single files 

2

3

4

5

Merged files 

6, 10 files merged

7, 10 different files merged

8, 10 randomly picked files merged

 9, 10 different randomly picked files merged

10, first 10 files merged

11, 10 randomly picked files merged

Gated and Merged 

12, All files gated on CD4 and merged

13, All files gated on CD8 and merged

14, All files gated on Tregs and merged

As you can see, certain plots have been skewed by some outlying data. So what do I do with the effected plots? There are a few things to consider.

What is this outlying data? Is it credible data? Does it need to be run again, or can it simply be gated out?

I will discuss all my considerations in my next blog.

In the meantime

Please feel free to discuss your own considerations with me at jo.crofts@appliedcytometry.com I would love to hear your own experiences with UMAP!

Flow Cytometry Data is Growing…

Rare event analysis has been an important requirement of flow cytometry users for some time. The latest flow cytometry instruments allow a higher number of parameters to be collected on large numbers of cells. Because of this, the native software struggles to analyse the larger amount of flow cytometry data.

university of pittsburgh cancer centre venturione cytof flow cytometry data

Dr. Vera Donnenberg is Director of Basic Research for the Heart, Lung and Oesophageal Surgery Institute at the University of Pittsburgh. She says: “With our current system, just opening a file takes so long that I can get up, go get a cup of coffee – which is not close by! – come back, and the computer screen is still being refreshed.”

At their laboratories complex projects might require researchers to look at tens of markers on hundreds of patients. They can have tens of thousands or even millions of cells on each sample. As a result, the files that are created are very large.

So, What is the Problem With More Data?

These files are growing to hundreds of megabytes in size. “Although the software as it exists now can technically do everything that we want it to do, it is so difficult to use because of the speed issue,” adds Dr. Albert Donnenberg. Dr. Albert is the Director of the Flow Cytometry Facility at the University of Pittsburgh Cancer Institute. He says, “If we didn’t have the bottleneck of speed, we could perhaps be analysing these files in minutes. Instead, it literally takes us hours.”

university of pittsburgh cancer institute flow cytometry data

Dr. Albert Donnenberg says, “We’ve been working heavily in the area of cancer stem cells, dealing with about 6 million to 10 million cells with about 11 different parameters. We worked on the data every morning for 10 days using our fastest workstation, and only analysed 30 files.”

A typical example of the problems facing the Donnenberg’s is one of their cancer stem cell investigations. This required the analysis 6 samples each of which contained 11 parameters with 250,000 to 5 million events being collected. This required around 50 hours of analysis time. This caused an analysis backlog due to them being able to spend less time in the lab, forcing them to look at the data in a limited way. They also experienced “countless crashes and much marital strife”.

The Solution: VenturiOne Analysis Software

Dr’s Albert and Vera Donnenberg have advised Applied Cytometry in the development of VenturiOne® software. Since they have first seen an early version of the software they have been impressed.

“Applied Cytometry ran a demo of its VenturiOne® software for us, using a file that we provided, and it ran more than 10 times faster. It was amazing”.

Dr Albert and Vera Donnenberg

VenturiOne® software includes new techniques (U.S. patent pending) to increase the speed and efficiency of the software. This allows use of the latest multi-core processors. Available for 32bit or 64bit Windows with XP or Vista, VenturiOne® software is scalable and so will make use of all the available processors within your PC.

The design of workflow within VenturiOne® means that you do not need a high specification PC. Users will benefit from VenturiOne® software on any Windows XP or Windows Vista capable PC.

VenturiOne® has an innovative preview display of all parameters that allows extensive exploration of your generated data files. Point and click gating and powerful colour eventing allow users to explore possibilities on even the largest data files.

The Organisation

Basic Research for the Heart, Lung and Oesophageal Surgery Institute at the University of Pittsburgh Flow Cytometry Facility at the University of Pittsburgh Cancer Institute.