“There are approximately 7,000 recognized rare diseases
Estimates indicate that rare diseases affect over 350 million
people worldwide ~ 1 in 10 people in the United States.” 

– FDA Rare Diseases: 2019 Guidance for Industry


Advancements in personalized medicine via innovative data and analytic solutions continue to accelerate opportunities to support rare disease identification, treatment pathways and novel drug development. In the realm of rare and orphan diseases there are many unmet research needs related to therapeutic treatments. Patients often endure lengthy diagnostic odysseys or lack access to therapies under development due to the inherent complexity of these diseases and the frequent similarity of symptoms in rare diseases and common disorders.

The 1983 Orphan Drug Act accelerated drug development for rare diseases and the Affordable Care Act opened options for funding and reimbursing treatments for rare diseases by removing caps on spending. Increasing knowledge of genetics and advancements in capabilities to harness new technologies such as CRISPR/gene therapy has led to a surge in biotechnology investment into novel treatments. Patients with rare diseases will benefit from the personalized nature of these advanced therapies.

In July 2020 the importance of challenges to accelerate translation to meet unmet medical needs motivated the largest rare disease advocacy group, The Haystack Project, to sponsor legislation that encourages regulatory groups to expand treatment access and reimbursement for rare diseases through the 2020 HEART (Helping Experts Accelerate Rare Treatments) Act. Public awareness and support for the development of solutions to address rare diseases continues to rise with a strong demand to deliver tools and treatments to the patients who will benefit from them.

Real-world data and targeted analytics can significantly improve patient identification, medical decision making and clinical research for rare diseases. The nature of rare diseases requires volume, breadth and depth of data to support these improvements. However deep data is challenging to obtain via traditional sources such as medical claims. Diverse partnerships with technology vendors, health systems, advocacy groups, and life sciences organizations in conjunction with robust application of analytics are needed to access and derive value from data sources to improve clinical care, enhance patient experience, and design successful research studies.

At Graticule we are building partnerships and forming a global research network to increase real-world data expertise and ensure sufficient data availability, enrichment frameworks, and insights to accelerate solution development to support rare disease research.

Advanced real-world data solutions and capabilities enable healthcare stakeholders to solve rare disease challenges


Recognition of Disease

Many rare diseases have historically had no identifiable or reimbursable treatments, and as a result the structured data in the patient record rarely contains the rare disease diagnosis. General codes for therapeutic reimbursement are often utilized where classification lexicons with specificity are not available or not yet readily understood at the point of care. For example, the diagnosis of Secondary Progressive Multiple Sclerosis despite being an available code since 2016 has not been historically documented in medical records because no therapies were available. Instead patients who may have been diagnosed would receive diagnoses because they could be matched with treatments for relapsing remitting or primary progressive MS.

Even when structured records do store a record of the rare disease the specific rare disease subtypes have been difficult to represent in older classification tools such as ICD9. The coding lexicons often still lack specificity in updates to ICD10 and ICD11 sufficient to differentiate the many subtypes of rare disorders as scientific literature increases to define differences such as variations in genetic causes (autosomal, wild type, recessive).

At Graticule we believe there is a balance between coding specificity to inform clinical protocols, algorithm development and real-world care delivery. For example, genetic tests may imply a single base pair mutation that results in an ultra-rare disorder.  For broad analytic usage the value of super-precise coding specificity may not outweigh the administrative burden and overhead placed upon the clinical team to enter those differences as diagnoses. Those classifications will instead be stored in systems such as lab tests, genetic data structures, and radiology study reports.


Clinical teams may not have the training or awareness of rare diseases to inform definitive diagnostic decisions such as a genetic panel or antibody test to identify patients with rare diseases. Patients then often endure a diagnostic odyssey given that clinical protocols emphasize ruling out common disorders. For example, a patient may present with cardiac symptoms that are typically associated with congestive heart failure (CHF) and teams may subsequently adhere to standard of care for CHF. However, symptoms of a rare disease such as hereditary ATTR often overlap with CHF. A genetic panel and full case review can reduce the diagnostic odyssey by informing providers about applicable treatment regimens and effective therapeutics.

Diagnostic odysseys are challenging both for patients and clinical teams and quite often result in incorrect tests and ineffective therapeutic solutions. The standard of care treatment regimen may inadvertently create new challenges including unforeseen side effects, impacts from delayed use of effective treatments, and loss of trust between patients and clinical teams.

Patient Identification

Identifying patients with rare diseases is one of the most challenging issues that organizations encounter in developing and delivering therapeutics. Diagnosis of rare diseases is infrequent, symptoms are often misunderstood and there is a high probability that patients’ present with specific complaints that may result in mis-diagnosis.

Patient Identification Benefits: Drug Development and Market Access

Knowing how to find patients with a rare disease whether undiagnosed or diagnosed with a range of potential conditions presents significant challenges in therapeutic development and commercial programs. Understanding the precise and nuanced nature of a rare disease, for example determining a specific diagnosis PH1 vs. kidney stones or Dravet Syndrome vs. Epilepsy can accelerate earlier detection and clinical benefits for patients.

Identification and application of existing treatments

Treatments to block biological pathways through tools such as RNAi or to replace missing biological components via gene therapy can prevent disease progression into advanced stages such as end stage renal disease (ESRD) or congestive heart failure (CHF).

Earlier diagnosis for rare diseases and identification of treatment candidacy can prevent long term tissue damage. Identifying patients with rare diseases can lead to improvements in outcomes either via existing effective therapies and/or improving our understanding of the natural history of the disease through evaluation of disease progression to know when it is best to intervene appropriate therapies.

Detection and Rare Disease

While detection is often difficult, we can better understand rare diseases leveraging advanced real-world data (e.g. family history, unique symptoms, free text notes, radiology, and genetic tests stored in PDF results). These additional data elements can assist research teams to generate predictive models to identify, stratify and treat patients with rare diseases at an early stage prior to significant disease progression.

Diagram 1: Detection Algorithms

Diagram 1: Detection Algorithms

Diagram: The model for establishing a rare disease patient identification system follows a common pattern of steps.

Historical documentation of diagnosed patients can be used to design a system to detect the needles of rare disease in the haystack of common illness using data science. The first step is to build a data set of deep medical records for a rare disease cohort and a matched data set of control patients who have been ruled out for the rare disease. The matched data set can be as hard to find as the diagnosed patients as it needs to provide similar cases but with the rare disease diagnosis ruled out.

Starting with a set of known rare disease cases enables review of available data to find and understand the patterns collected for unknown cases. Using machine learning data scientists working together with clinical experts review records and statistical findings to find logical common elements to differentiate cases. The known elements can be extracted from clinical notes into structured fields using NLP or data curation.

Using machine learning we can review a variety of details about the medical record looking for the key differences that indicate a high probability of an undiagnosed rare case. For example, AS (Ankylosing Spondylitis, caused by an autoimmune disorder) creates joint pain and orthopedical specialists are quite often needed to identify this in patients

Many of those patients receive orthopedic imaging studies for areas such as their knees or lower backs. Ankylosing Spondylitis is treatable by IL17 antagonists recently made available by multiple life sciences companies.

In a hypothetical case if we could view the imaging history of AS patients we can expect to find differences in radiology study images and reports from control patients. Once a model is made that can differentiate potential cases we can apply that model to historical radiology records to validate the predictive accuracy of the new algorithms.

We can then stratify patients with no known diagnosis are being tracked for orthopedic pain and may benefit from a definitive test for AS. Algorithm validation can help us understand if the addition of a diagnostic test based on high probability of occurrence is predictive relative to either a control (random) or standard of care to identify patients.

Upon validation and verification these algorithms can be deployed through communication channels (e.g. publications, research eminence) and provided as a new tool into multiple health information technology systems as a recommendation engine for diagnostic interpretation of images. Organizations can also utilize the algorithms to scan historical images to identify potential cases and initiate care provider outreach to patients who may benefit from the definitive testing.


Leverage emerging clinical trial protocols to improve patient identification and recruitment

To date, regulatory agencies have approved fewer than ~1000 medications to treat rare diseases in the past 35 years. Life sciences organizations, and artificial intelligence companies encounter difficulties in recruiting patients with rare diseases either for clinical research or biomarker validation due to low counts, high attrition rates and increased recruitment costs per patient. Evidence indicates that approximately 50-70% of sites fail to recruit one patient for studies focused on rare diseases.  Recruitment challenges result in increased timelines, higher costs, and also lead to lower value clinical trial data.

The previously mentioned patient identification solutions for clinical care should begin in research and development as a mechanism to solve for study recruitment and enrollment efficiency. Creating algorithms that accelerate the rate of patient identification to double or triple true case volume per month can help solve critical patient recruitment and enrollment bottlenecks. A patient identification informatics infrastructure during the clinical development phase also increases the market value of the therapy once approved. Development of tools to identify a larger population of candidate cases increases the addressable market and value of the product which is a significant change in value for a program to treat populations in the range of 5,000 vs. 10,000 patients.

Starting with tools for patient recruitment also has the benefit that building patient identification algorithms is an iterative process. So as the products progress closer to an approval the tools for identifying the rare disease patients can progress in lock-step with increased experience available during the critical launch window for the product. Investing in patient identification tools may emerge as a key differentiator amongst companies in close competition with products providing similar biological mechanisms, e.g. two IL17 antagonists, but different strategies for finding and stratifying the patients who qualify for their products.

For many rare diseases sufficient patient volumes to execute the needed randomized controlled studies do not exist. For diseases with high mortality rates the RCT approach is an ethical dilemma for a complex therapy such as gene therapy where the placebo is already a well-known outcome ending in a mortality event within a few years of disease identification.

Synthetic controls arms leveraging real-world data are also an emerging approach for research teams to identify patients that can meet control arm criteria to reduce the volume of patients needed to power the research study. The increasing availability of digital clinical data and the FDA’s recent commitment to support the use of real-world evidence in regulatory decisions makes synthetic control arm trials an even more appealing solution to reduce delays, lower costs and improve overall trial efficiency.

Improving clinical trial design

Clinical trial design for rare diseases is challenging and often ineffective due to a limited understanding of the disease combined with lower patient accrual into studies. Clinical trials need more sensitive outcome measures data as primary or secondary end points (e.g. caregiver or patient reported outcomes on walking ability for neurological disorders) to understand disease progression in these small cohorts. Advanced real-world datasets (e.g. imaging data, lifestyle based data, IoT data) can complement traditional rare disease sources from medical claims to enable improved trial design.

These datasets assist stakeholders to focus on the specific measurable and meaningful end points that are useful in determining that therapeutics had a significant impact on the disease. Many rare diseases operate as a spectrum of disorders with common origins either from a failure of a specific gene or pathway.

By collecting data on the broadest definitions of the rare disease trials can focus on the subpopulations that will most likely respond to therapy, such as specific genetic variants, while also running sufficient testing to enable prescribing to the broadest definition of the disease. The early treatment of Cystic Fibrosis proved valuable at rescuing the receptor function in a small fraction of patients. One example where a team has done significant work in this area is Rhythm pharma who worked closely with Genomenon to curate a database of mutations for rare obesity. By mapping out the pathway and the case volumes of these related disorders, many treatable with Rhythm’s Phase III product Setmelanotide, the company was able to map out their clinical development strategy. With increasing data sets becoming available such as the UK Biobank, All Of Us, registries, and health system networks executing focused cohort sequencing these planning activities can be investigated at even greater depth than the work done with literature alone.


Accelerate the pace of drug development and personalized therapies

Drug development for rare diseases is a continued challenge due to a number of factors including; low patient volume, limited understanding of the heterogeneity (genotype/phenotype), outcomes variability and progression of each disease. The natural history of a disease is defined as the course the disease takes in the absence of clinical or therapeutic interventions. Natural history based studies can assist research teams in identifying the factors that contribute to disease progression and outcomes over time, as well the efficacy of standard of care treatments.

Longitudinal advanced real-world datasets (e.g. treatment modalities, environmental or occupational factors) enable stakeholders and machine learning models to identify unique markers that comprehensively inform disease onset and progression over time. Extraction of key clinical features and disease patterns from these datasets enables a better understanding of disease and the development of personalized therapies.

Informed decision making is challenging for patients with rare diseases either due to the lack of diagnostic information or fragmentation of disease management. Off-label therapy usage is frequent amongst patients with rare diseases to manage symptoms. Advanced data sets such as patient reported outcomes enable stakeholders to pursue indication expansion and broader market access. Rare diseases in particular are a set of conditions where many therapies have been used to attempt to treat the disorder because there was limited knowledge of the diagnosis and multiple efforts to support patients. Real World Data for rare diseases provides a rich set of data regarding the different approaches taken for misdiagnosed patients but can also provide information on therapies that can support rare disease cases that should be expanded if the evidence supports it. By researching the natural history of treatment and response within these patients new insights can be gained as well including biological clues regarding treatment or approaches for designing efficient clinical studies based on the experience of the patients.



Rare diseases have a significant impact on patient quality of life. For example people with narcolepsy have excessive, uncontrollable daytime sleepiness. But it can be difficult to quantify the burden of disease for these rare conditions in order to provide the basis for investment and reimbursement. For a drug company with a novel narcolepsy therapy they need to understand how the disease impacts patient lives. Does it reduce productivity or lower educational attainment? Does it prevent active living such as travel? Does it lead to short term or long term disability? Are patients in a significant amount of pain? Do they risk injury as a secondary challenge due to falls? For each rare disease the questions relating to value of the therapy need to be answered in order to support reimbursement. Many of these questions are difficult to answer in claims data. The combination of real-world datasets including clinical, claims, and consumer data provide insights on parameters like healthcare utilization, work productivity & economic burden of the disease. For example a combination of claims data and disabilities data from a benefits database can provide information on cases vs. controls of patients who have a narcolepsy diagnosis to determine the average rate of employment medical disability claims. Working with a patient advocacy group to gather patient reported lifestyle challenges can help make the case for the source of reduction in quality of life based on patient experience. Looking into consumer data from purchasing can show disability regarding how people with the disorder operate. Often these types of data sets are challenging because they exist in silos but with data linking tools now available from companies such as Health Verity and Datavant a claims data set can be linked to an advanced data set such as benefits, consumer, digital therapeutics, or patient reported data. As products become available the impact of therapy on these extended areas beyond direct medical cost avoidance can be built from these data sets. This can lead to new understanding and quantifiable justification in reimbursement discussions.


At Graticule we are seeing significant increased interest in leveraging real world data to support research into rare diseases and new technologies to support those needs. One of the biggest immediate challenges and opportunities is the patient identification problem. Life sciences companies can use an iterative process starting during the clinical development phase to establish patient identification algorithms. These can grow from early recruitment tools into utilities for diagnosing patients to help achieve a competitive advantage. Data can also be used to support study planning and alternative clinical trial designs. Because many rare diseases aren’t well understood they also require research to generate evidence regarding the value of treatment. This information is also often not available in claims data and can benefit from reviews of clinical notes, patient reported information, and linkage to sources of data outside of the healthcare experience.


We have likely only scratched the surface on the unmet information needs for rare diseases and we are collaborating with partners in multiple domains to make this process easier and to find routes to make scalable options. We look forwards to learning together with groups who are interested in pioneering rare disease real world data solutions.