cv
General Information
Full Name | Bowen Gu |
Date of Birth | 12th July 1998 |
Languages | Chinese, English |
Education
-
Aug. 2022 - present
Master of Science in Health Data Science
Harvard University, Boston, MA, United States
- GPA 4.00 / 4.00
- Study Computer Science at Massachusetts Institute of Technology during enrollment at Harvard University
- MIT Computer Science and Artificial Intelligence Laboratory (MIT CSAIL) affiliate
-
Aug. 2018 - May. 2022
Bachelor of Science in Computer Science & Physics – Astrophysics Option
University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
- GPA 3.99 / 4.00
Experience
-
Jun. 2023 - present
Data Science Intern
Mayo Clinic, Department of Otolaryngology - ENT- Head and Neck Surgery
-
Automating Data Abstraction for the Generation of Clinical Registries
- Engineered a natural language processing (NLP) pipeline using large language models (LLMs) to extract information from unstructured clinical notes to automate the clinical registries generation process.
- Formulated the LLM prompt, built an autograder to evaluate the pipeline performance, and gave a presentation on the phased results on the Harvard DSI South Africa Training Program monthly seminar in collaboration with two data scientists at Mayo Clinic.
-
Automatic Decision System for Rhinology and Otology Appointments
- Built an automatic decision system to automate the approval of the Mayo ENT Rhinology and Otology appointments based on historical patient symptom descriptions on Qualtrics and historical physician approval data on Triage.
- Designed a system that was projected to automate 63% and 25% of the Rhinology and Otology appointments, collaborated with two data scientists at Mayo Clinic, and presented the outcomes to the chief physicians of the Rhinology and Otology division.
-
Automating Data Abstraction for the Generation of Clinical Registries
-
Jul. 2023 - present
Data Science Research Assistant
Dana-Farber Cancer Institute, Department of Data Science
-
Annotation of ECOG PS from Unstructured Oncology Notes and Survivability Analysis
- Identified performance status (PS) labels in unstructured clinical notes using a text-based search, trained a CNN model and a transformer-based model to predict the ECOG PS, and evaluated the correlation between ECOG PS and survival outcomes.
- Developed a model with 95.5% accuracy, found strong correlation between ECOG PS and survival outcomes, worked with two medical oncologists and one data scientist at Dana-Farber, and drafted a research paper.
-
Annotation of ECOG PS from Unstructured Oncology Notes and Survivability Analysis
-
Aug. 2023 - present
Data Science Research Trainee
Brigham and Women's Hospital, Department of Medicine
-
Extract Patient Entities from Free-Text EHR Data Using NLP Models
- Constructed a natural language processing (NLP) pipeline by implementing transformer-based and large language models to extract social determinants of health (SDOH) variables from unstructured text.
- Worked under the guidance of one epidemiologist and one data scientist at Brigham, and presented during the weekly group meetings.
-
Extract Patient Entities from Free-Text EHR Data Using NLP Models
-
Aug. 2020 - May. 2022
Research Assistant (Computer Science)
University of North Carolina at Chapel Hill
-
Towards a Comprehensive AI Teaching Assistant Based on Course Forums
- Used neural networks, word embedding, IBM Watson natural language understanding API, and Keras deep learning API to build the kernel of the AITA that helps classify forum posts and detect forum post questions that are duplicate or incomplete.
- Achieved an average accuracy of 95% on the model, collaborated on team of 3, led 95% of work, composed an honor thesis, held a thesis defense, and published the work on the Carolina Digital Repository.
-
Automating Testing of Visual Observed Concurrency
- Developed a new testing-based framework using Java to provide both a grading management and automation system for evaluating the concurrency requirements of assignments implemented in Java.
- Collaborated on team of 6, and published a paper on The 3rd Workshop on Education for High Performance Computing.
-
Broad Awareness of Unseen Work on a Concurrency-based Assignment
- Used different technologies to record events related to work on a Java assignment that exercised threads, synchronization, and coordination and provided preliminary answers to questions about the unseen work behind the concurrency aspects of the assignment.
- Collaborated on team of 6 and published a paper on The Workshop on Education for High Performance Computing.
-
Towards a Comprehensive AI Teaching Assistant Based on Course Forums
-
Aug. 2020 - May. 2022
Research Assistant (Physics)
University of North Carolina at Chapel Hill
-
The Role of Activity & Youth on the M_KS-M_* Relation
- Measured rotation periods and S indices of 143 binaries, fitted the position of the binaries using Monte Carlo Markov Chain (MCMC), and colored the HR diagram using rotation periods, H-alpha lines, and S indices using Python.
- Collaborated on team of 7, participated in writing the paper about this work which will be published on American Astronomical Society Journal and presented at the UNC research symposium by the end of the semester.
-
Structure in The APOGEE/Cannon Stars
- Used Python unsupervised learning techniques to find groupings of the Cannon dataset in chemical abundance space and determine if the same groupings also occur in physical space.
- Collaborated on team of 3, composed a thesis, and gave a presentation at the end of the semester.
-
Computer Simulation of Nemesis's Effect on Earth
- Used various numerical techniques in Python to determine if Nemesis, a hypothetical brown dwarf companion to the Sun, was responsible for the 26 million-year periodicity in the extinction of Earth's living species due to perturbing comets
- Collaborated on team of 3, composed a thesis, and gave a presentation at the end of the semester.
-
The Role of Activity & Youth on the M_KS-M_* Relation
Honors and Awards
-
2022
- University of North Carolina at Chapel Hill Graduation with Highest Distinction
- University of North Carolina at Chapel Hill Graduation with Highest Honors in Computer Science
-
2020
- Member of Phi Beta Kappa Honor Society
- University of North Carolina at Chapel Hill Daniel C. Johnson Outstanding Junior Award in Physics and Astronomy
Academic Interests
-
Data Science
- Statistical Inference
- Bayesian Analysis
-
Computer Science
- Artificial Intelligence
- Deep Learning
- Natural Language Processing
Skills
-
Programming Languages
- C
- C++
- Java
- Lisp
- MySQL
- Prolog
- Python
- R
- SML
- TypeScript
-
Technologies
- BERT
- Docker
- Git
- GPT-3
- Jupyter notebook
- Keras
- Latex
- Pandas data frame
- RStudio
- Scikit-Learn
- Spark
- TensorFlow
-
Other
- Algorithms
- Build CRAN packages (include ones using RCPP)
- Computer organization
- Computer security
- Databases
- Data structures
- Distributed system
- Hypothesis testing
- Object-oriented programming
- Probability
- Regression
- Statistical inference