Dotun Opasina

  • About
  • AI Projects
  • DotunData
  • Practical Datascience
  • Trainings
  • impact
PatientNoShow.png

Using Machine Learning to know Patients that are No Shows

March 01, 2020 by Oladotun Opasina

Here is a brief introduction into the project.

Please check out the blog post: https://www.dotunopasina.com/datascience/noshowappointments

Introduction

In this project, we will be utilizing machine learning algorithms to perform feature selection on patient appointments data. The goal is to understand what characteristics of a particular patient that makes them miss their appointment.

Dataset

The dataset for this project was gotten from Kaggle consisting of 14 columns and 110527 rows of data.

The data consists of the following columns:

  1. Patient Id

    • Identification of a patient

  2. Appointment ID

    • Identification of each appointment

  3. Gender

    • Male or Female. Female is the greater proportion, woman takes way more care of they health in comparison to a man.

  4. AppointmentDate

    • The day of the actual appointment, when they have to visit the doctor.

  5. Scheduled Date

    • The day someone called or registered the appointment, this is before appointment of course.

  6. Age

    • How old is the patient.

  7. Neighborhood

    • Where the appointment takes place.

  8. Scholarship

    • True of False . Observation, this is a broad topic, consider reading this article https://en.wikipedia.org/wiki/Bolsa_Fam%C3%ADlia

  9. Hypertension

    • True or False

  10. Diabetes

    • True or False

  11. Alcoholism

    • True or False

  12. Handicap

    • True or False

  13. SMS_received

    • 1 or more messages sent to the patient.

  14. No-show

    • True or False.

Machine Learning Process

The steps taken to accomplish our results include the following:

  1. Data preprocessing.

  2. Create awaiting time field (Days between Scheduled and appointed times)

  3. Exploratory data analysis.

  4. Pass the data through the machine learning algorithm

  5. Select top 10 features that affect appointment times and least 10 features that affect appointment times.

The code of the project can be found on my github.

Exploratory Data Analysis

The below pie chart shows the number of Yes (shows up to appointment) as 85,299 and No (misses appointment) as 21,677. This implies we have an imbalanced data set and we need to keep that in mind as we move along.

Number of Yes and No to appointments

Number of Yes and No to appointments

Machine Learning Model

The machine learning model used here was a logistic regression with lasso regularization. Regularization is a way of penalizing the model’s cost function to ensure that the model does not overfit. In this case, the features that are not important are made to zero while we can select the important features.

Results and Insights

The model selected the most important features that affect patients missing their appointment as seen in the figure below.

Feature selections of Appointment No Shows

Feature selections of Appointment No Shows

From the image above we can break the groups of data into more likely to miss appointment and less likely to miss appointment.

More Likely to Miss Appointment

  • Patients who had a large difference between their scheduled and appointment date missed their appointment the most

  • Interestingly patients who received an SMS message still missed their appointment

  • Patients in the Itarare and Santos dumont neighborhood were more likely to miss their appointment

  • Patients between the ages of 13 and 14 were more likely to miss their appointments

Less Likely to Miss Appointment

  • Patients who were age 64 and 69

  • Patients who lived in Santa martha, Jardim da Penha and Jardim Camburi

  • Patients who had Hypertension were less likely to miss their appointments

March 01, 2020 /Oladotun Opasina
  • Newer
  • Older

Powered by Squarespace