With lung cancer being the leading cause of death in smokers, it is becoming more of a problem than it ever was. Lung cancer has a high mortality rate because of how difficult it is to detect until the later stages.


Through the usage of artificial intelligence, we can now help predict the risk of a person developing lung cancer and have them consult a doctor about it. This is only a single example of the countless ways in which machine learning can improve a person's life. 


Steps of Modeling

Step 1: Data engineering

There are many online resources available to data modelers such as Kaggle, an online data set library that provides valid industry-standard datasets as well as supporting modelers in the medical industry and other sectors. A data set was download to practice modeling.


The columns represent as follows:

  1. Row # A serial number
  2. Age [1 – 99] Age of the patient
  3. Smokes [0 - 35] Cigarettes per week
  4. AreaQ [0- 10]
  5. Alcohol [0 -10] Number of drinks per week
  6. Result [0 or 1] 1 if the individual may be suffering from lung disease, 0 otherwise


After uploading this dataset in mlOS, we proceeded with processing the dataset so it can be fed into a model building algorithm. This includes steps such as data wrangling and feature pre-processing.


Fig 2: Reviewing the data

Once done, we moved on to machine learning.


Step 2 – Machine Learning

After many tweaks in the training and testing data percentages, I found 30:70 spit to be best for the machine building algorithm. After this, just press the “Auto Pilot” button.


Fig 3: Reviewing the models


Out of the dozen algorithms used, the Random Forest Classifier was found to be the best algorithm to solve this problem as it gave an accurate reading and an almost perfect ROC curve.


Fig 4: Reviewing the ROC curve


With this we could move on to publish the model for peer review.


Step 3 – Model Governance

At this step, you can have a friend review the model. This decreases the chances of making a mistake before deploying the model into production. Once the peer has reviewed the model, it can be approved so that it can be deployed as an app.


Fig 5: Reviewing and approving the model


Step 4 - App Creation and Dashboard

The approved model was deployed in a few clicks. It is now a live application that can predict the risk of lung cancer in a person by giving it some inputs - and getting a predicted output. You can even download the app and edit how the display looks but the display was not changed as it was simple enough to use.


Fig 6: A screenshot of the app


We were able to make a model that can predict the risk of lung cancer. Machine learning can do a lot for humanity by predicting the risk of illnesses and other events, potentially saving lives.

About the Authors



Adithya Raj is an aspiring data scientist who interned at Braintoy to learn more about machine learning and its practical use. He wants to apply artificial intelligence in the medical industry, especially now with the global pandemic.

“As a data scientist, working on such innovative problems is very thrilling as it gives me the opportunity to help others as well as develop my own skills. Working on this problem helped me better understand the importance of machine learning and how it is a powerful tool that can be used to save countless lives in many ways. It also helped me understand the importance of data collection and how good data can help improve models that can do even better”.




Billy Cao is an aspiring writer who interned with Braintoy and partnered up to write this blog. He believes that media should be honest and aspires to one day find a way to separate true news from fake, specifically, because with so many people that have access to the internet, it allows for fake news to spread.