cv

General Information

Full Name Bowen Gu
Date of Birth 12th July 1998
Languages Chinese, English

Education

  • Aug. 2022 - present
    Master of Science in Health Data Science
    Harvard University, Boston, MA, United States
    • GPA 4.00 / 4.00
    • Study Computer Science at Massachusetts Institute of Technology during enrollment at Harvard University
    • MIT Computer Science and Artificial Intelligence Laboratory (MIT CSAIL) affiliate
  • Aug. 2018 - May. 2022
    Bachelor of Science in Computer Science & Physics – Astrophysics Option
    University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
    • GPA 3.99 / 4.00

Experience

  • Jun. 2023 - present
    Data Science Intern
    Mayo Clinic, Department of Otolaryngology - ENT- Head and Neck Surgery
    • Automating Data Abstraction for the Generation of Clinical Registries
      • Engineered a natural language processing (NLP) pipeline using large language models (LLMs) to extract information from unstructured clinical notes to automate the clinical registries generation process.
      • Formulated the LLM prompt, built an autograder to evaluate the pipeline performance, and gave a presentation on the phased results on the Harvard DSI South Africa Training Program monthly seminar in collaboration with two data scientists at Mayo Clinic.
    • Automatic Decision System for Rhinology and Otology Appointments
      • Built an automatic decision system to automate the approval of the Mayo ENT Rhinology and Otology appointments based on historical patient symptom descriptions on Qualtrics and historical physician approval data on Triage.
      • Designed a system that was projected to automate 63% and 25% of the Rhinology and Otology appointments, collaborated with two data scientists at Mayo Clinic, and presented the outcomes to the chief physicians of the Rhinology and Otology division.
  • Jul. 2023 - present
    Data Science Research Assistant
    Dana-Farber Cancer Institute, Department of Data Science
    • Annotation of ECOG PS from Unstructured Oncology Notes and Survivability Analysis
      • Identified performance status (PS) labels in unstructured clinical notes using a text-based search, trained a CNN model and a transformer-based model to predict the ECOG PS, and evaluated the correlation between ECOG PS and survival outcomes.
      • Developed a model with 95.5% accuracy, found strong correlation between ECOG PS and survival outcomes, worked with two medical oncologists and one data scientist at Dana-Farber, and drafted a research paper.
  • Aug. 2023 - present
    Data Science Research Trainee
    Brigham and Women's Hospital, Department of Medicine
    • Extract Patient Entities from Free-Text EHR Data Using NLP Models
      • Constructed a natural language processing (NLP) pipeline by implementing transformer-based and large language models to extract social determinants of health (SDOH) variables from unstructured text.
      • Worked under the guidance of one epidemiologist and one data scientist at Brigham, and presented during the weekly group meetings.
  • Aug. 2020 - May. 2022
    Research Assistant (Computer Science)
    University of North Carolina at Chapel Hill
    • Towards a Comprehensive AI Teaching Assistant Based on Course Forums
      • Used neural networks, word embedding, IBM Watson natural language understanding API, and Keras deep learning API to build the kernel of the AITA that helps classify forum posts and detect forum post questions that are duplicate or incomplete.
      • Achieved an average accuracy of 95% on the model, collaborated on team of 3, led 95% of work, composed an honor thesis, held a thesis defense, and published the work on the Carolina Digital Repository.
    • Automating Testing of Visual Observed Concurrency
      • Developed a new testing-based framework using Java to provide both a grading management and automation system for evaluating the concurrency requirements of assignments implemented in Java.
      • Collaborated on team of 6, and published a paper on The 3rd Workshop on Education for High Performance Computing.
    • Broad Awareness of Unseen Work on a Concurrency-based Assignment
      • Used different technologies to record events related to work on a Java assignment that exercised threads, synchronization, and coordination and provided preliminary answers to questions about the unseen work behind the concurrency aspects of the assignment.
      • Collaborated on team of 6 and published a paper on The Workshop on Education for High Performance Computing.
  • Aug. 2020 - May. 2022
    Research Assistant (Physics)
    University of North Carolina at Chapel Hill
    • The Role of Activity & Youth on the M_KS-M_* Relation
      • Measured rotation periods and S indices of 143 binaries, fitted the position of the binaries using Monte Carlo Markov Chain (MCMC), and colored the HR diagram using rotation periods, H-alpha lines, and S indices using Python.
      • Collaborated on team of 7, participated in writing the paper about this work which will be published on American Astronomical Society Journal and presented at the UNC research symposium by the end of the semester.
    • Structure in The APOGEE/Cannon Stars
      • Used Python unsupervised learning techniques to find groupings of the Cannon dataset in chemical abundance space and determine if the same groupings also occur in physical space.
      • Collaborated on team of 3, composed a thesis, and gave a presentation at the end of the semester.
    • Computer Simulation of Nemesis's Effect on Earth
      • Used various numerical techniques in Python to determine if Nemesis, a hypothetical brown dwarf companion to the Sun, was responsible for the 26 million-year periodicity in the extinction of Earth's living species due to perturbing comets
      • Collaborated on team of 3, composed a thesis, and gave a presentation at the end of the semester.

Honors and Awards

  • 2022
    • University of North Carolina at Chapel Hill Graduation with Highest Distinction
    • University of North Carolina at Chapel Hill Graduation with Highest Honors in Computer Science
  • 2020
    • Member of Phi Beta Kappa Honor Society
    • University of North Carolina at Chapel Hill Daniel C. Johnson Outstanding Junior Award in Physics and Astronomy

Academic Interests

  • Data Science
    • Statistical Inference
    • Bayesian Analysis
  • Computer Science
    • Artificial Intelligence
    • Deep Learning
    • Natural Language Processing

Skills

  • Programming Languages
    • C
    • C++
    • Java
    • Lisp
    • MySQL
    • Prolog
    • Python
    • R
    • SML
    • TypeScript
  • Technologies
    • BERT
    • Docker
    • Git
    • GPT-3
    • Jupyter notebook
    • Keras
    • Latex
    • Pandas data frame
    • RStudio
    • Scikit-Learn
    • Spark
    • TensorFlow
  • Other
    • Algorithms
    • Build CRAN packages (include ones using RCPP)
    • Computer organization
    • Computer security
    • Databases
    • Data structures
    • Distributed system
    • Hypothesis testing
    • Object-oriented programming
    • Probability
    • Regression
    • Statistical inference