‘Phe2vec’: An Automated AI Framework Based on Neural Networks For EHR-Based Disease Phenotyping

Clinical research is a vital part of the modern-day, and AI has been able to make it much quicker. Researchers from Mount Sinai’s Icahn School for Medicine recently released an interesting paper on how they used deep learning algorithm called ‘Phe2vec‘ in order speed up these clinical trials by teaching them what drugs or diagnostics work best against certain diseases that people may have had from their electronic health records (EHR).

As per the research team, Phe2vec aspires to be a part of the future generation of healthcare systems that use machine learning to successfully support doctors in their work. These systems, which may scale to many diseases, patients, and health data, can provide a more holistic approach to examining disease complexity and improving clinical practice and medical research.

Phenotypes are the physical traits of any organism that can be observed in genetics. For clinical research, patient electronic health records are an important source of phenotypic data. When phenotypic data is combined with genetic information, it is possible to diagnose inherited diseases and disorders more accurately.

Researchers now use manual phenotyping algorithms that require extensive knowledge of the target phenotypic or condition. Additionally, these results must be confirmed, which is a time-consuming process.

Instead of hard-coding, the researchers employed artificial intelligence machine learning to improve the existing method from the training data. Phe2vec is a scalable artificial neural network for EHR-based phenotyping that uses an unsupervised learning methodology.


Phe2vec produces vector-based representations, i.e., embeddings, of medical ideas to determine illness phenotypes utilizing semantic proximity in the embedding space to a seed concept, according to the scientist (e.g., an ICD code). The distance from the phenotype in the embedding space is then used to discover populations connected to a specific disease at the patient level.

The researchers de-identified approximately 1.9 million patient’s EHRs from the Mount Sinai Health System database and compiled various data points for each patient, including vital signs, lab tests, prescriptions, procedure codes, diagnosis, and clinical notes.

Phe2vec was compared to the previous gold standard, PheKB, for ten distinct illnesses, including abdominal aortic aneurysm, atrial fibrillation, ADHD, autism (ASD), Crohn’s disease, dementia, herpes zoster, multiple sclerosis, sickle cell disease, and type 2 diabetes mellitus (T2D).

Phe2vec outperformed PheKB methods, according to the researchers, with a higher total positive predictive value (PPV).

The researchers developed a system that uses unsupervised learning to spot cohorts for any target diseases on par or better than existing methods by using AI deep learning to understand disease characteristics from patient electronic health records.

The researchers concluded that Phe2vec aims to contribute to the next generation of clinical systems that use machine learning to effectively support clinicians in their activities. They added that These systems, capable of scaling to many patients, diseases, and health data, pledge to offer a more integrated way to examine the complexity of diseases and improve clinical practice and medical research.

Paper: https://www.cell.com/patterns/fulltext/S2666-3899(21)00185-9

Source: https://www.psychologytoday.com/ca/blog/the-future-brain/202109/using-artificial-intelligence-identify-diseases-health-records

[Announcing Gretel Navigator] Create, edit, and augment tabular data with the first compound AI system trusted by EY, Databricks, Google, and Microsoft