opensource.google.com

Menu

Posts from July 2024

Google CQL: From Clinical Measurements to Action

Wednesday, July 31, 2024


Today, many institutions are building custom solutions for understanding their medical data, as well as tools for acting on that data. A major pain point with the current approach is that these tools can be error prone, lack built in medical context and medical data structure representations. Enter Clinical Quality Language (CQL), a portable, computable, and open HL7 language specification for expressing computable clinical logic over healthcare data. We believe that CQL has the power to radically improve the future of data driven workflows in healthcare. Over the past year at Google Health, our team has been hard at work building foundational tools for healthcare data analytics. Today we’re announcing the release of an experimental open source toolkit for Clinical Quality Language execution.

The Google CQL engine is an experimental open source toolkit that includes a CQL execution engine built from scratch in Go. We built this engine with a focus on horizontal scalability in mind, ease of use, and high test coverage. We wanted to make it easy to experiment with our engine, so we’ve included an easy to use CLI, REPL, and a two-click setup web playground! The toolkit is still a work in progress and we very much welcome input, contributions, and ideas from the community.


Why CQL

CQL represents a major shift away from the precedent of distributing clinical logic as free text guidelines which each institution implements in custom and often error prone ways. Now, CQL allows clinical logic to be written once, distributed, and run anywhere in a single framework. Major standards bodies like Medicare, NCQA, and the World Health Organization (WHO) have already started to adopt and distribute clinical measures in CQL! (Check out these antenatal care measures from the WHO as an example). We believe that CQL lowers the burden to writing, sharing, and computing complex clinical content.

CQL supports multiple common healthcare data models (such as FHIR and QDM) and is designed with common clinical concepts, tasks, and nested data structures in mind. For example, consider this comparison:

A side by side comparison of FHIR SQL (BigQuery) to CQL.
(click to enlarge) A side by side comparison of FHIR SQL (BigQuery) to CQL.
This logic extracts CHD encounters with statins prescribed during the visit.

The FHIR SQL requires more boilerplate, unnesting, and custom value set handling. It’s very clear here that the CQL is more readable, concise, and easier to understand than the SQL implementation for this example.

If you’d like to see a more in depth CQL example with an explanation, see Appendix A.

As the healthcare industry has matured so have the representations of Clinical Quality Measures. Previously, clinical quality mandates were provided as free-text guidelines. That left it up to each medical institution to implement themselves. This was of course error prone, and repetitive across the industry. There is a shift today where institutions like the WHO, CMS, and NCQA are writing clinical measures increasingly in CQL.

Transition to standards based Clinical Quality Measures diagram
Transition to standards based Clinical Quality Measures diagram

Examples like the WHO Antenatal Care Guidelines project exemplify the shift to openly distributed and executable measures. We believe that computable and shareable measures like these WHO SMART Guidelines are the future for expressing and sharing medical knowledge.


Our CQL Toolkit

We would love others excited about this work to check out our experimental CQL tools at https://github.com/google/cql. We continue to be very interested in welcoming external contributors, so we strongly encourage you to check out the repository to give it a try and consider helping with any open issues. If you’re not sure where to ask, reach out to us! We’d also like to hear from others about what they’re working on and how the Google CQL engine may fit into their toolchain, feel free to reach out at evango@google.com or open an issue on the repository.

If you want to learn more about CQL see https://github.com/cqframework/clinical_quality_language and https://cql.hl7.org/index.html.


Appendix A: Simplified Diabetes CQL Example

library ExampleCQLLibrary version '1.2.3'
using FHIR version '4.0.1'

valueset Diabetes: 'diabetes-valuseset-url' version '1.0'
valueset GlucoseLevels: 'glucose-levels-valueset-url' version '1.0'

context Patient

define PatientMeetsAgeRequirement: AgeInYearsAt(Now()) < 20

define HasDiabetes:
       exists ([Condition: Diabetes] chd where chd.onset before Now())

define LatestGlucoseReading:
       Last([Observation: GlucoseLevels] bp sort by effective desc)

define LatestGlucoseAbove200: LatestGlucoseReading.value > 200

define Denominator: PatientMeetsAgeRequirement and HasDiabetes

define Numerator: Denominator and LatestGlucoseAbove200

In this example for a given patient record, the code selects for individuals under 20 where their most recent glucose reading was above 200. Although this is a simple example, it’s made simple because CQL provides a solid foundation for which to define and act on medical information and concepts.

By Evan Gordon and Suyash Kumar – Software Engineers 
Health AI Team: Ryan Brush, Kai Bailey, Ed Nanale, Chris Grenz

DAGify: Accelerate Your Journey from Control-M to Apache Airflow

Friday, July 26, 2024


In the dynamic world of data engineering and workflow orchestration, organizations are increasingly migrating from legacy enterprise schedulers like Control-M to the open-source powerhouse, Apache Airflow. However, this transition often involves a complex and time-consuming process of converting existing job definitions. DAGify emerges as a beacon of efficiency in this scenario, offering an open-source solution to automate the conversion of Control-M XML files into Airflow's native DAG format.

DAGify isn't just a simple conversion tool; it's a migration accelerator, designed to significantly reduce the manual effort and potential errors associated with transitioning to Airflow. While it might not provide a perfect 1:1 migration in every case, its primary goal is to expedite the process, allowing developers to focus on optimizing their workflows in the new environment.


Introduction

Control-M has served as a reliable workhorse for many organizations, but its proprietary nature and limitations can become roadblocks in today's cloud-centric and agile data landscape. Apache Airflow, with its flexibility, scalability, and thriving community, presents a compelling alternative. However, the migration journey can be daunting, especially when dealing with intricate Control-M job definitions.

DAGify steps in to bridge this gap, offering an intuitive and extensible solution. By automating the conversion process, it empowers organizations to embrace Airflow's capabilities without the burden of manual translation. This translates to faster migrations, reduced errors, and a smoother transition overall.


Technical Details

Under the hood, DAGify employs a template-driven approach, making it adaptable to various Control-M configurations and Airflow requirements. It parses Control-M XML files, extracting crucial information about jobs, dependencies, and schedules. This data is then intelligently mapped to Airflow's operators, tasks, and dependencies, preserving the essence of the original workflow. While still under active development, DAGify already supports key Control-M features like job and dependency mapping. The project roadmap includes further enhancements, such as handling custom calendars and expanding support for other enterprise schedulers.


Template-driven conversion

DAGify employs a flexible template system that empowers you to define the mapping between Control-M jobs and Airflow operators. These user-defined YAML templates specify how Control-M attributes translate into Airflow operator parameters. For instance, the control-m-command-to-airflow-ssh template maps Control-M's "Command" task type to Airflow's SSHOperator, outlining how attributes like JOBNAME and CMDLINE are incorporated into the generated DAG.

The template's structure field utilizes Jinja2 templating to dynamically construct the Airflow operator code, seamlessly integrating Control-M job attributes.

Example:

A Control-M task like:

<JOB 
  APPLICATION="my_application" 
  SUB_APPLICATION="my_sub_application" 
  JOBNAME="job_1" 
  DESCRIPTION="job_1_reports"  
  TASKTYPE="Command" 
  CMDLINE="./hello_world.sh" 
  PARENT_FOLDER="my_folder">
  <OUTCOND NAME="job_1_completed" ODATE="ODAT" SIGN="+" />
</JOB>

is converted to an Airflow operator using the control-m-command-to-airflow-ssh-gce template:

job_1 = SSHOperator(
    task_id="x_job_1",
    command="./hello_world.sh",
    dag=dag,
)

The repository includes several pre-defined templates for common Control-M task types. The config.yaml file at the project's root allows you to customize which templates are applied during the conversion process.


Leveraging Google Cloud Composer

For organizations seeking a fully managed Airflow experience, Google Cloud Composer provides a compelling solution. It eliminates the complexities of managing Airflow infrastructure, allowing you to focus on building and orchestrating your data pipelines. DAGify seamlessly integrates with Google Cloud Composer, making it even easier to migrate your Control-M workflows to a cloud-native environment.


Try it yourself

Eager to experience the power of DAGify? It's readily available as an open-source project on GitHub: https://github.com/GoogleCloudPlatform/dagify. The repository provides detailed instructions on setting up and running DAGify locally or within a Docker container.

Key steps to get started:
  1. Clone the repository: git clone https://github.com/GoogleCloudPlatform/dagify.git
  2. Install dependencies: make clean (This sets up a virtual environment and installs required packages)
  3. Run DAGify: python3 DAGify.py --source-path=[YOUR-SOURCE-XML-FILE]

Remember, DAGify is an ongoing project, and community contributions are welcome! If you encounter any issues or have feature requests, feel free to open an issue on GitHub.


Conclusion

DAGify represents a significant leap forward in simplifying enterprise scheduler migrations to Apache Airflow. By automating the conversion process and seamlessly integrating with Google Cloud Composer, it empowers organizations to embrace the benefits of Airflow more rapidly and efficiently. Whether you're a seasoned Airflow developer or just starting your migration journey, DAGify is a valuable tool to explore.

Remember:

  • Thorough testing is crucial: Always test your converted DAGs in a staging environment before deploying them to production.
  • Leverage Airflow's ecosystem: Explore the vast array of Airflow plugins and integrations to further enhance your workflows.
  • Stay engaged with the community: Keep an eye on DAGify's development and contribute to its growth if you can!

Happy migrating!

By Konrad Schieban and Tim Hiatt – Google Cloud


Acknowledgments

Thank you to the following team members who made this solution possible: Shreya Prabhu, Harish S, Slava Guzanov and Joanna Rajaseharan from Google Cloud.

Google Blocks is now Open Source

Tuesday, July 16, 2024

In 2017, we shared Google Blocks with the world as a simple, easy and fun way to create 3D objects and scenes, using the new wave of VR headsets of the day.

We were thrilled to see the surprising, inventive and beautiful assets you all put together with Google Blocks, and continue to be impressed by the enthusiasm of the community.



We now wish to share the code behind Google Blocks, allowing for novel and rich experiences to emerge from the creativity and passion of open source contributors such as the Icosa Foundation, who have already been doing wonderful work with Tilt Brush, which we open-sourced in 2021.


"We're thrilled to see Blocks join Tilt Brush in being released to the community, allowing another fantastic tool to grow and evolve. We can't wait to take the app to the next level as we have done with Open Brush." 
– Mike Nisbet, Icosa Foundation

What’s Included

The open source archive of the Blocks code can be found at: https://github.com/googlevr/blocks

Please note that Google Blocks is not an actively developed product, and no pull requests will be accepted. You can use, distribute, and modify the Blocks code in accordance with the Apache 2.0 License under which it is released.

The currently published version of Google Blocks will remain available in digital stores for users with supported VR headsets. If you're interested in creating your own Blocks experience, please review the build guide and visit our github repo to access the source code.

Thank you all for coming on this journey with us so far, we can’t wait to see where you take Blocks from here.

By Ian MacGillivray – Software Engineer, on behalf of the Google Blocks team.

Bounds Checking Flexible Array Members

Tuesday, July 9, 2024

Buffer overflows are the cause of many security issues, and are a persistent thorn in programmers' sides. C is particularly susceptible to them. The advent of sanitizers mitigates some security issues by automatically inserting bounds checking, but they're not able to do so in all situations—in particular for flexible array members, because their size is known only at runtime.

The size of a flexible array member is typically opaque to the compiler. The alloc_size attribute on malloc() may be used for bounds checking flexible array members within the same function as the allocation. But the attribute's information isn't carried with the allocated object, making it impossible to perform bounds checking elsewhere.

To mitigate this drawback, Clang and GCC are introducing1 the counted_by attribute for flexible array members.


Specifying a flexible array member's element count

The number of elements allocated for a flexible array member is frequently stored in another field within the same structure. When applied to the flexible array member, the counted_by attribute is used by the sanitizer—enabled by -fsanitize=array-bounds—by explicitly referencing the field that stores the number of elements. The attribute creates an implicit relationship between the flexible array member and the count field enabling the array bounds sanitizer to verify flexible array operations.

There are some rules to follow when using this feature. For this structure:

struct foo {
	/* ... */
	size_t count; /* Number of elements in array */
	int array[] __attribute__((counted_by(count)));
};
  • The count field must be within the same non-anonymous, enclosing struct as the flexible array member.
  • The count field must be set before any array access.
  • The array field must have at least count number of elements available at all times.
  • The count field may change, but must never be larger than the number of elements originally allocated.

An example allocation of the above structure:

struct foo *foo_alloc(size_t count) {
  struct foo *ptr = NULL;
  size_t size = MAX(sizeof(struct foo),
                    offsetof(struct foo, array[0]) +
                        count * sizeof(p->array[0]));

  ptr = calloc(1, size);
  ptr->count = count;
  return ptr;
}

Uses for fortification

Fortification (enabled by the _FORTIFY_SOURCE macro) is an ongoing project to make the Linux kernel more secure. Its main focus is preventing buffer overflows on memory and string operations.

Fortification uses the __builtin_object_size() and __builtin_dynamic_object_size() builtins to try to determine if input passed into a function is valid (i.e. "safe"). A call to __builtin_dynamic_object_size() generally isn't able to take the size of a flexible array member into account. But with the counted_by attribute, we're able to calculate the size and improve safety.


Uses in the Linux kernel

The counted_by attribute is already in use in the Linux kernel, and will be instrumental in catching issues like integer overflows, which led to a heap buffer overflow. We want to expand its use to more flexible array members, and enforce its use in the future.


Conclusion

The counted_by attribute helps address a long-standing fortification road block where the memory bounds of a flexible array member couldn't be determined by the compiler, thus making Linux, and other hardened applications, less exploitable.

1In Clang v18.0.0 and GCC v15.0.0.

By Bill Wendling, Staff Software Engineer

.