Part 2 — Engineering the RxNorm Pipeline

Alex Frketic
1 day ago
3 min read

One of the most rewarding aspects of this project was learning how to work directly with APIs at an enterprise level. Prior to beginning this integration effort, API development was a skillset I understood primarily at a conceptual level. Building the RxNorm pipeline transformed that understanding into practical engineering experience. More importantly, it demonstrated how artificial intelligence can help accelerate learning rather than replace human expertise. Throughout this project, AI acted as a collaborative assistant that helped explain concepts, troubleshoot engineering issues, and accelerate experimentation while I continued making the architectural and strategic decisions myself.

The methodology behind the pipeline began with a simple question: how can multiple healthcare terminology systems be unified into one scalable analytical framework? To answer that question, I first needed to identify which APIs and datasets were required. The primary systems involved included RxNorm, RxClass, and RxTerms. Each API provided a different layer of clinical intelligence. RxNorm focused on normalized medication concepts, RxClass provided therapeutic classifications and relationships, and RxTerms supplied additional medication terminology information useful for downstream analysis.

The first major step in the methodology involved understanding API communication itself. APIs operate as bridges between systems, allowing applications to request and exchange information programmatically. Using Python, I learned how to send HTTP requests to endpoints, capture responses, and convert those responses into structured datasets. This process relied heavily on the requests and pandas libraries, which became essential tools throughout the project.

One of the earliest lessons involved recognizing that APIs rarely return “clean” data. Responses often arrive in nested JSON structures containing multiple layers of dictionaries, lists, and metadata. Because of this, much of the engineering work focused on parsing and flattening JSON structures into relational formats that could be used for analytics and SQL database integration. While this sounds straightforward in theory, the reality became significantly more complex once working with thousands of medication concepts and highly variable response structures.

The methodology gradually evolved into a multi-step orchestration framework. Rather than building one large monolithic script, I designed the process as modular stages. One stage collected raw API data, another normalized schema structures, another generated master RxCUI relationships, and another handled therapeutic classification mapping. This modular design improved scalability, debugging, maintainability, and future extensibility.

A major breakthrough in the project occurred when I shifted the architecture toward building a “master RxCUI universe.” Instead of treating every API independently, the framework first identified all unique RxCUIs discovered across multiple API endpoints. Those RxCUIs then became the central anchor for downstream processing and data merges. This decision dramatically improved consistency because all subsequent API calls could operate from one standardized medication concept framework.

Throughout this process, artificial intelligence played a meaningful role in accelerating learning. AI helped explain unfamiliar programming concepts, assisted with troubleshooting edge-case errors, and provided guidance on engineering patterns I had not previously implemented. However, one of the most important lessons was realizing that AI functions best as an assistant rather than an autonomous replacement. The successful parts of this project still depended on human reasoning, testing, architecture decisions, debugging, and business understanding.

This distinction matters significantly in healthcare technology. AI can help developers learn faster, identify patterns, and improve efficiency, but healthcare systems still require human oversight, contextual understanding, governance, and validation. Throughout the RxNorm integration process, AI helped accelerate my ability to solve technical problems, yet every major implementation decision still required human judgment to ensure scalability, accuracy, and analytical relevance.

Another important methodological lesson involved iterative learning. The pipeline did not emerge perfectly on the first attempt. There were multiple redesigns involving retry logic, schema normalization strategies, API throttling controls, timeout handling, caching approaches, and SQL export structures. Each engineering obstacle ultimately improved the final framework because it forced deeper understanding of how APIs behave under enterprise-scale workloads.

From a broader perspective, this project reinforced the importance of continuous technical learning within healthcare analytics. Modern healthcare systems increasingly rely on APIs, interoperability frameworks, cloud architecture, and scalable data pipelines. Learning how to engineer those systems opens the door to more advanced analytics, automation, and artificial intelligence initiatives in the future.

Most importantly, the project demonstrated how learning new technical skills becomes significantly more powerful when connected to a meaningful mission. The objective was never simply “learning APIs.” The goal was to build infrastructure capable of helping healthcare organizations better understand medication data at scale. That larger purpose transformed the project from a programming exercise into a real-world engineering journey focused on improving healthcare intelligence through better data architecture.

Part 2 — Engineering the RxNorm Pipeline

Recent Posts

Comments