Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
July 29, 2021 11:28 pm GMT

Disease Prediction Based On Medical Diagnosis

In this article, we will discuss one of DOCTOR-Y's Machine Learning Models. This model predicts the current patients' medical conditions based on the previous diagnoses from the patient's medical history.

We used a dataset containing the diseases and their diagnosis and classified it using 3 different machine learning classifiers.

If you don't know what is DOCTOR-Y check this post (link will be available soon).

Idea

Physicians will spend a lot of time reviewing the patient's previous e-prescriptions provided on DOCTOR-Y to know their past medical conditions and previous diseases.

That's why DOCTOR-Y provides a summarized chart representing the percentages for suffering from a group of diseases based on previous diagnoses. The model is provided with a dataset to train and classify these diseases. The model takes the diagnoses as input from previous prescriptions, and the output will be the predicted disease based on these diagnoses.

The snippet below shows how the model works.

python NLP.py "The patient has high blood pressure"->['Hypertension']

Dataset

In this model, most of the data were collected from Disease Symptom Prediction Dataset from Kaggle.
Our dataset is used for the disease diagnosis model based on previous diagnoses, and it is divided into two columns the disease name, and diagnoses for that disease. We have 773 rows with 41 unique diseases leaving us with approximately 19 entries for each disease.

The dataset is balanced. However, we faced a problem regarding building it from scratch. This data may lead to misclassification for diseases based on different diagnoses, which will affect the models accuracy.

The majority of the data is collected by hand from multiple healthcare sites; we looked carefully for definitions and diagnoses for the required diseases and ensured that no entries were duplicated.

PrognosisPrognosisPrognosisPrognosis
Fungal infectionMigrainehepatitis AHeart attack
AllergyCervical spondylosisHepatitis BVaricose veins
GERDParalysis(brain hemorrhage)Hepatitis CHypothyroidism
Chronic cholestasisJaundiceHepatitis DHyperthyroidism
Drug ReactionMalariaHepatitis EHypoglycemia
Peptic ulcer diseaeChicken poxAlcoholic hepatitisOsteoarthristis
AIDSDengueTuberculosisArthritis
DiabetesTyphoidCommon Cold(vertigo) Paroymsal Positional Vertigo
GastroenteritisPsoriasisPneumoniaAcne
Bronchial AsthmaImpetigoDimorphic hemmorhoids(piles)Urinary tract infection
Hypertension

Implementation

Data Preparation

We prepared the data to be cleaner to obtain better results, and we implemented the following preprocessors:

  • Stop Words Removal is used to remove stop words like (the, them, etc.).
  • Lowercasing is used to convert all words in subject and body to lowercase.
  • Punctuation Removal is used to remove all the punctuations like ('[/(){}[]|@,;]') and replace them with spaces."

Model Definition

  • The Model is trained on the discussed dataset.
  • The Model input: the diagnosis.
  • The Model output: the possible diseases the patient may suffer from.

Model Training

We used three classification algorithms to process this data which are :

  • SVM (Support Vector Machine)
  • NLP LSTM (Long Short Term Memory for Natural Language Processing) Our model will have one input layer, one embedding layer, one LSTM layer with 100 neurons and one output layer with 41 neurons since we have 41 labels in the output batch size of 64 and 80 epochs. NLP model strucutre
  • Multinomial Nave Bayes

Evaluation & Results

  • The Dataset in NLP(LSTM) model was split by 90/10, and in SVM and Nave Bayes was 80/20.
  • The accuracy of each classification technique used for predicting diseases based on diagnoses:
AlgorithmAccuracy
SVM81%
NLP (LSTM)69.20%
NAVE BAYES74%

The accuracy of the NLP model in training nearly reached 90% accuracy in training and 69.2% accuracy in the validation phase.

Accuracy of NLP Model
loss curve of NLP model

Discussion

The least performing model was the LSTM model, while the best performing model was the SVM and Naiive model

NLP Model

  • Unfortunately, papers did not provide guidelines on configuring the network of this model. So we had to use trial and error to choose the hyperparameters.

  • The results of the LSTM model are worse than both the SVM and Nave models by achieving 69% accuracy; because the LSTM model reads the data sequentially and it has a memory that helps to keep words and use them in the prediction process, so it is more reliable than both.

Integration With DOCTOR-Y

We used the Diseases Diagnoses Prediction Model's results and combined them with the Diseases Symptoms Prediction Model's results to calculate the percentage of suffering from a group of diseases based on previous diagnoses + the associated symptoms.

The final diseases and their percentages are sent to the system server, which sends them to the client-side to be represented on a chart as shown in the figure below.

Chart


Original Link: https://dev.to/ahmedsamy/disease-prediction-based-on-medical-diagnosis-547o

Share this article:    Share on Facebook
View Full Article

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To