We are creating a robust deep learning infrastructure that will allow us to analyze nearly any type of data. More importantly, this infrastructure is expected to have the capability to automatically train on multiple data types at once and obtain unified predictions from that data. Specifically, we expect our deep learning capabilities to be used for the following data types:
Structured data: This includes any form of data that results in a tabular database. Examples of such data include laboratory tests, claims data, and payment details to healthcare providers. Our deep learning models for structured data, which rely on fully connected neural networks, allow us to automatically train on any such structured databases and incorporate those insights into predictive models.
Images: While traditional image processing requires cumbersome development for each specific use case, deep learning-based models can directly operate on any image data without having to go through a preprocessing feature extraction phase. We are developing a series of deep learning models based on convolutional neural networks that could be extended to process medical images, including radiology imaging such as MRIs, CT scans, x-rays, ultrasounds, and PET scans, as well as other types of medical images, such as those used in dermatology, pathology, and ophthalmology.
Text: Traditionally, analyzing text data requires lengthy natural language processing, or “NLP.” Recent breakthroughs in deep learning, especially the advent of transformer networks (e.g., BERT), allow for the end-to-end training of language models, which results in significantly higher accuracy than previous NLP methods. Our infrastructure includes support for these deep learning-based language models, allowing us to automatically process textual data as well. A prime example of textual medical data is the plain text description of doctors’ notes, which include information about symptoms, diagnosis, and treatment. Even though some of this data, such as disease codes, appears in structured databases as well, the textual data contains a large amount of information that does not appear elsewhere.
Multi-type data: Deep learning can also be applied to multiple data sources and data types. During the first stage, the relevant deep learning models are applied to each data type (e.g., CNN for images, Transformers for text, and so on), and then, during the second stage, the processed information from those various data sources is fed into a secondary deep learning model that provides unified predictions. This is one of the major advantages of deep learning over traditional AI, as it allows the incorporation of multiple big data sources into a single unified prediction model. This approach far exceeds the accuracy rate achieved by traditional methods.
Data fusion: With deep learning, we have the ability to study different data types together, such as data in the form of images (e.g., CT scans) with data in tabular form (e.g., figures from healthcare claims). Putting these different data types together is referred to as data fusion.