Senior Data Engineer · AI Researcher · Lima, Peru

Building intelligent
data systems &
published research

I design end-to-end data pipelines, generative AI systems, and fraud detection solutions processing millions of daily transactions across GCP and Azure. Published researcher with formal evaluation frameworks for RAG and graph-based anomaly detection.

5+
Years Experience
4
Publications
85%
Pipeline Latency Reduced
2.5M+
Merchants Served
Publications & Research
Peer-reviewed work
Research focused on production AI systems, information retrieval evaluation, and fraud detection with graph neural networks.
DataGov-Factory: Empirical Cost-Quality Analysis of Managed vs. DIY LLM Agent Architectures for Financial Data Governance
Karl J. Mollan Neyra2026Zenodo

Compares three LLM agent deployment patterns — Managed Agents, DIY Factory (Python + GCP), and Direct API — across 45 controlled measurements on financial data governance tasks. Reveals that prompt caching inverts the expected cost hierarchy: DIY accumulates 21.68x context tokens while managed caching reduces costs by 90%. Includes security analysis for regulated fintech (PCI-DSS, SBS Peru, Ley 29733) with a practical decision framework for data sovereignty.

Managed AgentsData GovernancePrompt CachingBuild vs BuyFintech
GRAFID: When Do Graph Neural Networks Outperform Gradient Boosting for Fraud Detection? A Feature Richness Analysis
Karl J. Mollan Neyra2026Zenodo

Introduces the GRAFID framework with three novel metrics (Feature Richness Index, Graph Signal Gain, Cost-Effectiveness Ratio) to determine when GNNs outperform XGBoost for fraud detection. Evaluated on IEEE-CIS (590K transactions) and Credit Card EU (284K) datasets with 20+ model configurations, statistical validation (Wilcoxon, McNemar, Cohen's d), and multi-seed reproducibility.

GNNFraud DetectionXGBoostGRAFIDFeature Analysis
Data Quality Over Algorithmic Complexity: Empirical Evidence from a Production Hybrid Search System
Karl J. Mollan Neyra2026Zenodo

Demonstrates that data quality improvements (ground truth correction, embedding deduplication) yielded a 27% improvement in Precision@5 with zero code changes and zero cost, outperforming neural reranking approaches. Based on 28 formal evaluations with 52 reference questions.

Information RetrievalData QualityHybrid SearchHyDEEvaluation
Conceptual Mechatronics Design and Prototyping of Autonomous Inverted Pendulum-System Applied on Two-Wheeled Mobile Robot
Karl J. Mollan Neyra2023IEEE ICAECT

Design and prototype of an autonomous two-wheeled mobile robot using inverted pendulum control systems, with stress simulations and material validation in Autodesk Inventor.

MechatronicsControl SystemsRoboticsIEEE
Professional Experience
Where I've made impact
From payment processing at scale to healthcare AI, building systems that move real business metrics.
Data Engineering Specialist · Izipay (Intercorp)
Peru's largest payment acquirer — 2.5M+ merchants
Nov 2025 — Present · Data & Analytics · Lima, Peru
  • Design and implement stored procedures in Synapse and SQL Server for churn analytics with mobile-period logic, weighted averages, and volume segmentation across 2.5M+ merchant accounts.
  • Build end-to-end Azure Data Factory pipelines with data existence validation, incremental copy, and quality controls ensuring daily SLA compliance for settlement processing.
  • Execute cross-platform validations between BigQuery and Azure Synapse, identifying and resolving discrepancies in critical data migrations across cloud environments.
  • Develop and deploy generative AI APIs and conversational chatbots on GCP Cloud Run (LangChain, Azure OpenAI, Firestore, vector stores) — resolving 24/24 reported production cases.
  • Architect a Hybrid RAG system (vector search + FTS + HyDE + GraphRAG) with formal evaluation framework (52 ground truth questions, P@5: 0.704, MRR: 0.853).
GCPAzure SynapseBigQueryADFLangChainCloud RunPostgreSQLKafka CDC
Analyst I — Planning & Commercial Intelligence · Izipay (Intercorp)
Apr 2025 — Nov 2025 · Lima, Peru
  • Built retention KPI pipelines processing multi-million-row datasets in 20 min (vs. 2h prior) — 85% latency reduction eliminating legacy models.
  • Refactored ecosystem of 8 legacy SQL/Python scripts, restoring 100% data reliability and cutting validation time to 5 minutes.
  • Automated segmentation workflows from hours to 15 min with clustered indexes and dynamic pivoting, achieving >98% segmentation precision.
  • Mapped >38 sensitive data flows per Peru's Data Protection Law (Ley 29733), minimizing PII exposure across settlement pipelines.
BigQuerySQL ServerPythonSynapsePower BI
Innovation & AI Coordinator · LOLIMSA
Healthcare software — 14 countries, 5,000+ clients
Jan 2025 — Mar 2025 · Lima, Peru
  • Designed 2 internal WebSocket APIs for legacy-AI interoperability, connecting distributed SQL Server systems without public IPs; centralized monitoring for 5+ hospital clients.
  • Built LOLIMSA Detect — end-to-end ML pipeline for anemia detection from nail images (85% precision, image → ML → PDF/email).
  • Implemented advanced prompt engineering for pharmaceutical data extraction achieving 70% success rate, overcoming rate limiting across drug comparison systems.
PythonWebSocketsSQL ServerComputer VisionPrompt Engineering
R&D Coordinator · LOLIMSA
Jan 2023 — Feb 2025 · 2 years · Lima, Peru
  • Led cross-functional team (analysts, designers, developers) delivering innovation projects with data science and generative AI.
  • Designed Power BI dashboards improving executive decision speed by 30%; implemented predictive analytics models reducing response times by 20%.
  • Led desktop-to-web migration increasing operational efficiency by 25% through automation of key clinical workflows.
Power BIPythonSQL ServerAgile
IT Coordinator · LOLIMSA
Jul 2022 — Dec 2022 · Lima, Peru
  • Managed 200+ monthly support tickets, reducing downtime by 30% and recurrence to 7% with proactive KPI monitoring.
  • Built PMBOK/BPM hybrid framework: 73% faster incident resolution, $15K annual savings, 40% automation of routine tickets.
Junior Engineer / Intern · LOLIMSA
Mar 2021 — Jul 2022 · Lima, Peru
  • Developed S-KODA smart tennis ball prototype in Autodesk Inventor, reducing fabrication costs by 23% with real-time metrics (force, speed, spin via Bluetooth).
  • Coordinated migration of 5 hospital servers (Huawei → OVH) with MS Project, achieving 40% less downtime via dynamic backups and sequential validation.
Featured Project
Distributed Semantic Inference Engine for BI Democratization
Mar 2025 — Oct 2025Private Project

Distributed architecture for natural language query (NLQ) processing translating business questions into SQL. Asynchronous WebSocket orchestration with adaptive timeouts (5s→40s), semantic engine using open-source LLMs for NLQ-SQL with ontological schema mappings (50+ business terms), hierarchical fallbacks, and conversational interface on WhatsApp. Democratized BI access for non-technical enterprise users.

NLQ-SQLWebSocketsDistributed SystemsGenerative AIWhatsApp
AI Lab — Live Demo
Try it yourself
A live semantic similarity engine powered by open-source models. Type two sentences and compare their meaning vectors.

Semantic Similarity Explorer

Powered by HuggingFace Inference API (all-MiniLM-L6-v2, 384 dimensions).

Ask About Karl

Ask anything about my professional background, technical skills, research, or projects. Powered by Gemini.

Professional and technical questions only. 5 queries per day.

Education
Academic background
MSc Data Science
Universidad Nacional de Ingeniería (UNI)
2024 — 2026
MSc Project Management
Escuela de Postgrado UTP
2021 — 2023
BSc Mechatronics Engineering
Universidad San Ignacio de Loyola (USIL)
2020 — 2023
BSc Environmental Engineering
Universidad San Ignacio de Loyola (USIL)
2015 — 2019
Diploma in Agile Project Management
CENTRUM PUCP Graduate Business School
2022
Diploma in Biomedical Equipment Mgmt
TECH SENATI
2021 — 2022
Technical Skills

Languages & Data

  • Python
  • SQL (Advanced)
  • R
  • Power BI
  • ETL/ELT Pipelines
  • Data Modeling

AI & ML

  • RAG Systems
  • LLMs / GenAI
  • Prompt Engineering
  • Machine Learning
  • Computer Vision
  • NL2SQL
  • LangChain

Cloud & Infra

  • GCP (Cloud Run, BQ)
  • Azure (Synapse, ADF)
  • PostgreSQL
  • Docker
  • Kafka CDC
  • REST / WebSockets

Domain & Mgmt

  • Payment Systems
  • Fraud Detection
  • Settlement (T+1/T+2)
  • PMBOK / Agile
  • MS Project
  • Team Leadership
Certifications
Autodesk Inventor Advanced (UNI) · AutoCAD 2D & 3D (UNI) · Professional Graphic Engineering (UNI) · English Competency (USIL)
Leadership & Volunteering
Founder & President — M.E.S.Mechatronics Engineers Society, USIL · 2020–2021 · 4 divisions
Teaching Assistant — Digital CircuitsUSIL · 2021
Environmental Coach — EcolegioUSIL · 2021
Volunteer — Crea+ PeruSocial Services · 2020
Scientific Research DirectorEnvironmental Engineering Club, USIL · 2016–2017
Physics TutorGeneral Physics, USIL · 2016
Get in Touch

Let's build something together

Open to collaborations, research partnerships, and new opportunities.