IGB researchers leverage team science to develop InSTAnT Toolkit
In a new study published in Nature Communications, a team of researchers at the Carl R. Woese Institute for Genomic Biology report a new, robust computational toolset to extract biological relationships from large transcriptomics datasets. These efforts will help scientists better investigate cellular processes.
Living organisms are governed by their genome—an instruction manual written in the language of DNA that dictates how an organism grows, survives, and reproduces. By regulating the abundance of different RNA transcripts, cells control their protein expression level, thereby shaping their functions and responses to the environment. Transcriptomics is the study of gene expression through cataloging the presence and abundance levels of active RNA transcripts generated from the genome under different conditions. Through the lens of RNA, transcriptomics technologies allow scientists to study the complex interactions that enable life and cause disease, as well as assess the biological effects of therapeutics.
Imaging-based spatial transcriptomics technologies are powerful methods to map the locations of hundreds to thousands of RNA transcripts within tissues to investigate the spatial organization of molecules and cells across tissues.
“It's a completely new way of probing molecular interactions,” said Hee-Sun Han (IGOH/GNDP), a professor of chemistry at the University of Illinois Urbana-Champaign. “Although we are getting a spatial map of molecules, like a really detailed Google Map, the community is not really clear on how to utilize the dataset. In this study, we utilize the co-localization pattern of molecules to identify potentially interacting molecular pairs and infer their functions. This is the beginning of our long journey to think about what information is embedded in this space.”
After connecting through the IGB’s Gene Networks in Neural and Development Plasticity Research Theme, Han teamed up with Dave Zhao (GNDP), a professor of statistics at Illinois, and Saurabh Sinha, a former Illinois professor and current professor of biomedical engineering at the Georgia Institute of Technology. Using Han’s knowledge of the technology and Sinha and Zhao’s expertise in genomics analysis and statistics, they developed InSTAnT, an intracellular spatial transcriptomics toolkit.
“I would say that the fundamental model of a cell can be thought of in terms of how its internal components are situated relative to each other, and not just how much of each there is. The toolkit allows you to break down these complicated biological systems into their smallest, most irreducible parts. Our new idea was to not so much look at where these smallest components are located, but instead how they are located with respect to each other,” Zhao said.
InSTAnT uses robust statistical tests and algorithms to identify proximal pairs—RNA transcripts located in close proximity. By finding these proximal pairs, InSTAnT offers new insights into sets of molecules that tend to work together, which can help scientists begin to understand their functions.
“As an analogy, let's say we don't really know anything about how society functions. Who people interact with forms the basis of the function of the society. So, if we want to learn who is interacting with each other, what the functional units are, and what these units do, we can look at their relative location,” Han said.
Following this analogy, if one took photos of proximal pairs of people at different times of day, they could learn about their relationships and roles in society when analyzing the data. If people are located together during the day in Urbana, it could suggest that they are coworkers at Illinois. But if people tend to colocalize at night, it could indicate that they are roommates or family members.
This is the basis of InSTAnT’s technology. By looking at the simplest components of a complex system, one can begin to understand how complicated phenomena emerge. The InSTAnT toolkit also tests if a proximal pairs’ sub-cellular colocalization shows any tissue-level spatial patterns, further adding to the benefits of using this technology. But while the underlying principles of identifying proximal pairs and searching for their colocalization patterns may sound simple, in practice InSTAnT is a highly complex statistical tool that was challenging to develop.
“We drew inspiration from spatial statistics that have been developed for other contexts, such as ecology. While describing molecules inside a cell can be likened to describing trees in a region, there are unique differences such as the peculiarity and heterogeneity of cells that did not allow us to just use those statistics for the modeling. So, we had to draw on completely different models, even non-spatial statistics, and combined these with spatial ideas to do the final modeling,” said Sinha, the lead researcher on the project. Sinha credits Anurendra Kumar, a graduate student in his research group and first author of the paper, for driving this challenging project and bringing InSTAnT to life.
Another key component of the InSTAnT toolkit is its accuracy and statistical rigor, especially when compared to existing tools. InSTAnT provides reproducible findings while maintaining a low rate of false positives when identifying proximal pairs. They accomplished this high robustness by leveraging team science.
“One of the nice things about this team is that we were able to do both experimental and analytical work by collaborating. The Han group generated extra data in the lab so we could validate and test this tool. The ability to look at these robustness metrics, not just analytically or with simulated data, but also with real in vitro experimental data, is a big strength,” Zhao said.
The trio of professors acknowledged that the collective, collaborative efforts of their students was important to successfully pull off this multi-lab project. But as this chapter closes with the publication of the InSTAnT toolkit, this team’s work is not done.
Han said, “I got to know Saurabh and Dave through this collaboration, and we're developing new projects to tackle more grand challenge problems. This requires a multi-pronged approach that uses all of our different expertise. So to me, this is just the beginning of our exciting work together.”
The publication, “Intracellular spatial transcriptomic analysis toolkit (InSTAnT)” can be found at https://doi.org/10.1038/s41467-024-49457-w and was funded by the National Institutes of Health, Johnson & Johnson, and the Cancer Center at Illinois.