About

Research Analyst with 7+ years of experience in statistical modeling, epidemiological study design, and next-generation sequencing (NGS) data analysis. Proficient in SAS, Python, R, SQL, Tableau, and machine learning techniques, with expertise in data integration, preprocessing, cleaning, and statistical modeling. Skilled in developing GIS-based disease mapping, multivariate analysis, automated data pipelines, optimizing database performance, and implementing data visualization solutions to drive data-driven decision-making. Certified in SAS programming, with a track record of delivering high-impact analytical insights and collaborating with cross-functional teams to solve complex healthcare challenges. Excellent problem-solving, time management, and communication skills, with a passion for leveraging data to improve healthcare outcomes and policy decisions.

  • City: Tallahassee, FL
  • Email: shivaraj.gk2708@gmail.com

Interests

Bio-Health Informatics

Data Analysis and Statistics

Database Management

Visualization

Machine Learning

Validation

Web Development

Development Processes and Tools

Education

MS in Bioinformatics

August 2021 - May 2023
Relevant Coursework
  • Biomedical Analytics
  • Database Management
  • Visualization
  • Machine Learning
  • Deep Learning Neural Networks

B.E. in Biomedical Engineering

Sep 2013 - June 2017
Relevant Coursework
  • Introduction to Biomedical Informatics
  • Healthcare Data Management
  • Electronic Health Records
  • Biomedical Image Processing
  • Human biology and physiology courses

Certifications

Machine Learning

SAS Certified Specialist

Experience

State of Florida - Department of Health

July 2023 - Present

Research Consultant

Conduct epidemiologic and machine learning-driven analyses to improve public health insights, focusing on maternal and child health (MCH), Medicaid data analysis, and uterine fibroid research. Lead data science initiatives, develop statistical models, and create interactive dashboards to facilitate decision-making. Manage grant-related activities, mentor interns, and collaborate with cross-functional teams to integrate data-driven solutions into public health strategies.

  • Lead statewide epidemiology & biostatistics across MCH, chronic disease (diabetes, hypertension), Medicaid, and reproductive health; use PRAMS/BRFSS (complex weights) and Medicaid claims to set surveillance priorities and guide policy.
  • Built a BERT/transformers (PyTorch) NLP pipeline to extract uterine fibroid symptoms from clinical text, closing ICD-10 gaps with multi-label models and targeted data augmentation.
  • Applied survival, mixed-effects, and generalized regression in R/SAS to estimate associations between symptom burden and LOS, readmissions, and procedures; documented diagnostics, calibration, and sensitivity analyses.
  • Designed population-based surveillance studies (sampling, power, SAPs) to detect disparities across demographic subgroups.
  • Engineered automated ETL integrating REDCap, EPIC, ENS, and Medicaid claims with deterministic & probabilistic linkage, end-to-end QA, and HIPAA controls; leveraged AWS (S3/EC2/Athena) for scalable processing where appropriate.
  • Built REDCap instruments (branching logic, e-consent, API) and Qualtrics surveys; harmonized clinical variables for semantic consistency across sources.
  • Conducted ArcGIS Pro geospatial analyses to identify hotspots/vulnerable populations; produced risk maps to target interventions.
  • Developed R Shiny dashboards (with GIS layers, suppression rules, and reproducible protocols) to communicate incidence, mortality, and disparities to leadership.
  • Implemented Bayesian hierarchical models and time-series forecasts for infectious disease early-warning and outbreak preparedness.
  • Led a quality-improvement review of pregnancy–syphilis cases (Miami-Dade), producing epi curves and screening recommendations.
  • Contributed methods, sample-size justifications, and evaluation metrics to CDC and Title V Block Grant deliverables.
  • Presented at national conferences (NLP, small-area estimation), delivered internal seminars on machine learning & reproducible research.
  • Produced statistical briefs, fact sheets, and visualizations translating complex analyses into actionable insights.
  • Mentor junior analysts in study design, statistical programming, visualization, code review, and SOP-driven workflows.

YanLab, IUPUI

September 2021 - May 2023

Research Assistant

Conduct advanced data analysis on large-scale longitudinal demographic and clinical datasets, applying machine learning, statistical modeling, and data engineering techniques to extract meaningful insights for disease epidemiology and healthcare research. Develop automated data pipelines, optimize code efficiency, and create custom data visualizations to support data-driven decision-making.

  • Ran longitudinal analyses on UK Biobank and ADNI cohorts (Python/R): preprocessing, feature engineering, and modeling of disease progression/treatment effects.
  • Built risk-stratification models (random forest, logistic regression) with cross-validation; compared discrimination/calibration and documented limitations.
  • Performed bioinformatics/NGS workflows in Bioconductor (clustering, differential expression), integrating genomic signals with clinical phenotypes for translational insights.
  • Automated ETL (Bash/Python) for demographics, ICD-10, and imaging metadata; standardized schemas across sites and cut processing time by ~40%.
  • Mined EHR data in SAS 9.4 to characterize utilization and outcomes in chronic neurologic conditions.
  • Built R Shiny/ggplot2 visuals for biomarker and outcome trajectories to accelerate hypothesis testing and collaborator feedback.
  • Refactored SAS/Python/R code into reusable, parameterized functions; reduced complexity ~20% and improved reproducibility with GitHub READMEs and versioning.

Accenture Pvt. Ltd.

October 2017 - August 2021

Data Analyst

Utilize statistical modeling, data engineering, and automation techniques to optimize health insurance analytics, data quality monitoring, and business intelligence solutions. Develop and maintain automated data pipelines, enhance database performance, and create interactive dashboards to drive data-driven decision-making in the healthcare domain.

  • Analyzed large health-insurance claims with SAS/SQL to quantify utilization, cost, and program KPIs; defined cohorts and standardized metric logic.
  • Built automated SAS macros and SQL workflows (type handling, date logic, deduplication), improving accuracy and run efficiency by ~25–40%.
  • Developed Tableau dashboards (drilldowns) for spend patterns, risk monitoring, and care-management prioritization.
  • Tuned MySQL (indexes, query refactors) for high-volume datasets to reduce latency and maintain SLA compliance.
  • Authored 100+ Java/Python utilities for ETL automation and pipeline health checks, adding alerting for upstream data-quality issues.
  • Designed test strategies mapped to business rules; integrated Jenkins automation with results pushed to Jira, reducing manual QA time ~30%.
  • Partnered with actuarial/analytics teams on descriptive and inferential studies for risk stratification and preventive-care planning; documented assumptions and data limits.

Chakra IT Solutions

September 2018 - February 2019

SAS Programmer Analyst

Provide SAS programming and statistical analysis expertise for Phase I-IV clinical trials in Oncology and CNS, ensuring data integrity, regulatory compliance, and efficient reporting. Develop standardized datasets, programming macros, and clinical trial outputs aligned with CDISC standards and study requirements.

  • Programmed Phase I–IV Oncology/CNS trials; produced CDISC-compliant SDTM/ADaM datasets (DM, LB, DS, VS, EX, ADSL, ADAE).
  • Generated submission-ready TLFs per SAPs and mock shells with traceable derivations.
  • Ensured regulatory readiness using aCRF, mapping specs, IG, Controlled Terminology; executed Pinnacle21 validation with tracked issue resolution.
  • Built/validated SAS macros (%LET, CALL SYMPUT, DATA NULL); leveraged PROC SQL/REPORT/MEANS/FREQ/TRANSPOSE and ODS for delivery.
  • Maintained programming specs & codebooks to support auditability and reproducible reruns across milestones.

Projects

  • All
  • Data Analysis
  • Visualization
  • Machine Learning
  • Web Development
  • Bioinformatics
Healthcare Claims Analysis in USA
Cholera Outbreak Analysis - d3.js
Medicaid in COVID-19 Claims
Sales Analysis
ACA Impact in USA Healthcare
DL Image Classification Model
Statistical Analysis

Skills

Languages and Databases

vectorlogo.zone vectorlogo.zone vectorlogo.zone vectorlogo.zone vectorlogo.zone vectorlogo.zone upload.wikimedia.org vectorlogo.zone

Frameworks

vectorlogo.zone vectorlogo.zone upload.wikimedia.org vectorlogo.zone upload.wikimedia.org

Visualization

upload.wikimedia.org upload.wikimedia.org upload.wikimedia.org

Tools and Technologies

vectorlogo.zone vectorlogo.zone vectorlogo.zone vectorlogo.zone

Contact

Email

shivaraj.gk2708@gmail.com

Say Hi Anytime :)