MEDICAL DATA ANONYMIZER
DOI:
https://doi.org/10.25215/8194288770.38Abstract
The Medical Data Anonymizer is an AI-powered system designed to automatically identify and conceal sensitive personal information contained in medical documents such as blood test reports and diagnostic records. The tool uses pytesseract-based Optical Character Recognition (OCR) to extract text from scanned or image-based files, ensuring that even handwritten or low-quality documents can be processed. After extraction, the system applies spaCy’s Named Entity Recognition (NER) models to detect confidential details, including patient names, dates, medical record IDs, and other identifiable attributes. Once the sensitive entities are recognized, the system anonymizes them through masking or cloaking techniques while preserving the document’s clinical meaning. It also generates structured audit logs to maintain traceability, aiding compliance with data privacy regulations and enabling authorized review when needed. By automating an error-prone manual process, the Medical Data Anonymizer reduces workload, minimizes human mistakes, and strengthens adherence to privacy laws. Future improvements will focus on enhancing accuracy, supporting multilingual data processing, and integrating the tool with electronic health record systems to expand its use in real-world clinical workflows.Published
2026-03-11
Issue
Section
Articles
