Why Revenue Cycle Data Is a Key Differentiator

What you need to know: Aggregating and normalizing data is time-consuming and costly for health systems — it requires preprocessing, mapping and translating millions of data points, which is resource intensive. The effort is worth it, however; in revenue cycle management, as in other areas, the efficacy of analytic models lies in data diversity — more variables and broader data sources enhance model completeness. By mapping to outcomes, RCM models can yield swift, precise insights precisely when decisions matter most.

Data is the lifeblood of artificial intelligence (AI), and high-quality, unbiased data is essential for building fair and ethical AI systems. But with the average person likely to generate more than one million gigabytes of health-related data in their lifetime, equivalent to 300 million books, this kind of massive amount of information can be difficult for health systems to analyze on their own.

In healthcare, 80% of data is unstructured, meaning that it traditionally needed to be manipulated or processed in order to be readable by machines. In today’s industry, AI and other technologies exist that can ingest this data and make meaning of it, but these advancements come at a cost. Hospitals, which in recent years have been working with narrow operating margins and focusing their energies on standing up EHRs to meet federal specifications, are left with a ton of rich, beneficial data that many can’t fully tap into without support.

What is the value of data for analysis?

In machine learning, models ingest a large volume of data from thousands of data points to detect patterns so that computers can make decisions without a person intervening.

Data is critical for many other functions, including deep learning, the subset of machine learning that spawned generative AI like ChatGPT. Deep learning is essentially machine learning at warp speed; relying on more data input, millions of data points to assess, and the use of neural networks, which provide layers of processing to support more complex representations of data.

Diversity is critical here; models that draw on diverse expertise, geographies and perspectives help to ensure a diversity of contexts and patterns within datasets. This will help to support robust LLMs that can prevent overfitting a model to a limited set of instances, while also incorporating a diversity of contexts and patterns through a broader range of data and insights.

The more variables that inform models, the more valuable their insights become. Historical data can and should be codified within artificial intelligence, drawing upon multi-system or facility perspectives. This helps to avoid an N=1 bias in model development during the training or evaluation of a model.

Hosting and analyzing this data can take so much processing power and translation that many hospitals usually can’t do it alone. But the energy to access this data is worth the effort, as valuable insights can be gleaned across the revenue cycle.

Where are the sources of revenue cycle data in RCM + how are they used?

To train and run these models, there must first be data for them to ingest. Structured data is explicit and can be readily processed by machines. It is easier to ingest but limited in nature. These data sources include:

  • Demographic information (e.g., age, gender)
  • Vital signs (e.g., height, weight, blood pressure, blood glucose)
  • Diagnostic, procedures and/or billing codes, medications and laboratory test results generally available in host system
  • Financial data elements such as billing, payments, denials and adjustments

Unstructured data sources require extensive tools to extract relevant information. Examples of unstructured revenue cycle data include:

  • Clinical data, including medical images, scanned labs and ECGs
  • Treatment data, including clinical notes and discharge summaries

The benefits of this data in RCM are massive. These datasets can be used to guide decisioning across various stages throughout the revenue cycle and improve reimbursement accuracy.

Front office:

  • Streamlined appointment scheduling, taking into consideration which providers are credentialed for which payers
  • Automated and optimized scheduling to boost utilization
  • Automated insurance verification and prior authorization retrieval prior to service
  • Virtual assistants managing patient communications regarding financial inquiries, payment plans and insurance-related questions to enhance the patient experience


  • Thorough clinical documentation based on natural language processing during patient encounters
  • Automated coding and charge entry based on clinical data
  • Automated coding and charge audit for all accounts to ensure accurate reimbursement for services provided


  • Denial prediction and prevention based on historical data for proactive interventions to prevent revenue loss
  • Automated billing and payment posting to reduce errors and accelerate reimbursements
  • Contract management and analytics to support underpayment recovery, improve reimbursement and support payer contract negotiations
  • Payer portfolio management to analyze and predict payer-specific trends, behaviors and patterns associated with reimbursement

What are key challenges in compiling revenue cycle data?

Gathering and harmonizing data from many different systems is a difficult ask across industries. Healthcare revenue cycle data, in particular, has unique barriers that make the creation of a centralized, holistic dataset a challenge.

Data heterogeneity and inconsistency

  • Challenge: Healthcare data comes from various systems, each with its own format, terminology and structure. Aggregating these disparate data sources can be like assembling a puzzle with mismatched pieces.
  • Solution: Implement robust data normalization techniques. Standardize terminology, map data elements to common ontologies and ensure consistent representation across systems. For true interoperability, the industry must align on standard data definitions. This means not just consistent terminology, data formats or exchange standards but a single set of these that can inform an industry-wide patient health dataset.

Lack of meta-information and quality for unstructured data

  • Challenge: Unstructured data (such as clinical notes, radiology reports, and free-text entries) lacks essential meta-information (context, source details) and also suffers from inaccuracies, missing values and biases. Extracting meaningful insights from unstructured data becomes challenging without proper context. The vast volume of clinical data is unstructured, requiring processing before any value can be extracted. This means that fundamentally valuable revenue cycle data is often inaccessible to the health systems that might benefit from it.
  • Solution: Generative AI can make sense of unstructured data to create content that can then be analyzed for patterns and other valuable insights. Develop tools for capturing relevant metadata during data entry. Leverage natural language processing (NLP) to extract context and enrich unstructured data. At the same time, implement data validation checks, address missing data and assess bias systematically. Regularly audit data quality and correct discrepancies. Each of these steps helps ensure that the datasets being used are not only accessible but accurate.

Lack of systems interoperability

  • Challenge: Standards such as HL7 and FHIR exist but lack of adherence is an issue. Without consistently enforced standards in place for technological systems — like electronic health records or pharmacy and lab databases — to easily share healthcare data electronically, the effort that it takes for a health system to be interoperable can create a barrier to information exchange.
  • Solution: Providers can overcome challenges of interoperability by integrating large language models (LLMs) with host/legacy systems and other systems of record, not only at a facility level, but across all locations and across all instances of EHRs. This will allow for system-wide datasets that follow industry standards and vendor-specific requirements for data management.

Regulatory compliance and privacy

  • Challenge: The Health Insurance Portability and Accountability Act (HIPAA) regulates the use of sensitive patient health information and prohibits its broad disclosure to those who do not need to access it. Any dataset containing protected health information (PHI) therefore must be built to align to these federal restrictions. Aggregating data while adhering to HIPAA regulations and ensuring patient privacy is critical.
  • Solution: Understand HIPAA’s permitted uses and disclosures. Obtain patient consent when necessary. Regularly review privacy policies and practices. Regarding data usage, ensure proper de-identification of patient data to protect privacy, and adhere to the minimum necessary standard, sharing only relevant information. Implement robust access controls, encryption and audit trails to secure patient data.

The bottom line

Robust AI models are informed by thousands of variables and consume billions of transactions. The more variables provided to inform models, and the greater the volume and variety of data available for consumption, the more complete and valuable models become — providing better and faster insights at the point of decision.

For many health systems, however, this type of processing power can be difficult to support in house; in these situations, a revenue cycle data partner can be an invaluable asset, not only to assist in processing the data but also to provide insight into normalized, holistic analyses from other organizations.

Revenue cycle data is only as valuable as the access an organization has to it. Data analysis requires persistent efforts from a diverse team of problem solvers as well as a robust, diverse dataset — ensure your organization has both available to it.