The Operating Ratio in the trucking industry is on decreasing trends due to increasing competition in the industry. Trucking and transportation companies need to optimize their internal operating expenses such as the unnecessary movement of goods from one place to another, delivery of products at the wrong location, empty miles, maintenance schedules, and avoiding equipment breakdown.


The most challenging among these examples are empty miles. 

Empty miles, also known as non-revenue miles or deadhead miles are the miles when the truck unit is not earning any revenue for being on the road. This can happen in the scenarios such as, when there is a load from origin to destination and there is no load backward, the driver must drive empty as there is not even a nearby load for the trucker to pick up. Everyone from shipper to end-user including the environment pays for these empty miles. The biggest reasons behind the empty miles are bad planning and inefficient scheduling & combinations. If we already have an idea of empty miles, we can eliminate or at least reduce the empty miles. These empty miles can be predicted using supervised machine learning algorithms so that better trip planning can be executed to reduce or eliminate them.

Business Understanding

Every trucking company has gathered data for all their shipments over the years. This raw dataset will contain information about the shipper, the consignee, the driver, and the dispatcher. It will also have certain monetary information such as outsourcing and insourcing cost, pickup & drop off cost, fuel cost, driver pay, and total charges. By using this dataset, we can predict cost, revenue, OR (Operational Ratio), delays, deadhead miles, type of trailer being requested (demand forecasting), etc. It is important to have industry-specific knowledge for input variable selection to develop a model. Shipper City, Consignee City, Customer Group, and Month of the years are input variables for predicting target variable, empty miles.


In the simplest case of predicting whether a particular trip will have empty miles or not, a label can be created using the empty miles records. It is 1 if there are no empty miles otherwise 0.

Machine Learning

An end-to-end application is created to predict the likelihood of a loaded trip back using supervised machine learning in mlOS. This problem was solved as a classification problem by assign variables 0 and 1 for the target variables, empty miles, and no empty miles, respectively, during the trip back. Following are the steps to create a running application:

Step 1 - Data Engineering

The first step is to upload the raw data after login to the mlOS. The following window will appear.

At the top menu, there is the name of the project that is being loaded. Double click on the data engine to upload the raw dataset. Once the data is loaded the next step is to make it compatible to ingest the machine learning model. For this, the following data engineering methods are used.

Data Wrangling: Data wrangling is also known as Data Munging. To make the data more meaningful data wrangling is done. The process of selecting the raw data and transforming it into another format to make it appropriate for data analysis is called data wrangling. As in the figure below, data wrangling is performed to split the ‘StartDate’ into Day, Month, and Year.

  1. Select the ‘StartDate’ feature. 
  2. As you can see the date format is mm/dd/yy. 
  3. Click on ‘Split Column’ from the data wrangling operation.
  4. For ‘Parameters’, using is ‘/’. It will split the column into 3 other columns by ‘/’. 
  5. For that click on ‘Add to Steps’
  6. It will start appearing under the ‘New wrangling step’.

To see the effect, hit on ‘Apply’. After applying the ‘split column’ data wrangling operation three separate columns are created for Month, Day, and Year (StartDate0, StartDate1, StartDate2). StartDate0 contains 1 to 12 numeric here 1 for January and 12 for December. Rename the column ‘StartDate0’ to ‘StartMonth’ for better understanding using ‘Rename Column’ data wrangling. Data Wrangling operation ‘Compute New Column’ is used to create a new column start_month by using the if-else statement (January ->1, February ->2, and so on).

Data Analysis: Analytics plays a pivotal role in the world of data science. It is a process that is used to examine the raw data to draw meaningful information. It gives insights into the information for better decision making. Here we can see a pattern based on which the corrective actions can be taken. The main goal of the Data Analysis is to get information regarding past decisions and trends. This is how we can best optimize the use of trucks and trailers in this case. The data analysis shows empty miles are lowest for the winter months and highest in July as shown in the figure below. There could be multiple reasons for these cyclic changes in the empty miles. For example, if the clients of this trucking company are in the consumer good industry, then the increased demand in the consumer goods during winter Black Friday, Christmas, Boxing Day, and New year leads to lower empty miles. There could be a drop in demand in summer months of July and August as people may be vacationing. This could be a possible reason for the deviation in empty miles during certain months. Similarly, trailer type, shipper and consignee locations, and client type will also have an impact on the empty miles.

Data Preprocessing: As raw data may be inconsistent and contains errors, data preprocessing is the process of converting raw data into a specific format for further processing. It is a technique of data mining. e.g. the raw data may contain categorical features (text), before applying Machine Learning algorithms it is necessary to convert it into a numeric feature because Machine Learning algorithms only work for numerical features. 

  1. To perform this action in mlOS go to ‘Data Engine’.
  2.  Under ‘My Data’, there is a ‘Raw Data’.
  3.  Raw Data -> Manage Data
  4. Under ‘Manage Data’ all the dataset in the current project is listed as shown in Figure 4. 
  5. From there click on the ‘Define Data’.

It will open the window as shown in the figure below. From here click on ‘Training and Target Features’, Under ‘My data’ select the desired dataset, in this case, it is ‘LiklihoodofTripBackBringLoadedVsNot’ and then select the input features and target variable by clicking on checkboxes and radio button respectively.

By clicking on the ‘View Raw Data’ you can view the data. Next, click on the ‘Feature Preprocessing’ from the top menu, it will open the window as shown in the figure below here the categorical features will automatically get converted into numeric features if there is any in the dataset.

After feature preprocessing move to ‘Review and Save’, give a name to the dataset click the checkbox for the email if you wish to receive the email once the data is computed, then click on ‘Define Data’ as shown in the figure below. It will be saved as a ‘Processed_Data’ which in turn will be used for cross-validation.

Cross-Validation: Click on ‘Next’ to generate a cross-validation dataset, the following window will appear as shown in the figure below.

In this step, the dataset will be converted into a train and test set. The training set will be used to train the ML model and test to validate the model. Click on ‘LikelyhoodOfTripBackBeingLoadedVsNot’ under ‘My Data’, select ‘Processed_Data’ and click on ‘Generate Dataset’. By default, 80% of the total data will be used as a training dataset, and the remaining 20% will be as a validation dataset. You can change the division percentage of data by moving the scroll bar under ‘Train and test set split’. Now the next step is to create a model, for that click on ‘Next’. 

Step 2 - Model Building

In Model Building the cross-validation dataset is used which is created in Step-1. Double clicking the ML Engine opens the following window. As the target variable is binary (0,1) in nature, it is a classification problem.

So, click on Classification and then click on ‘Add Model Container’, select the desired dataset then select the ML algorithm, and hit ‘Auto Pilot’. The ‘AutoPilot’ feature of mlOS allows for creating various ML models by using different algorithms and arrange them in descending order of accuracy. Accuracy and ROC are two performance evaluation criteria on the basis of which model version is ‘v.16-v.a58’ is selected. Random Forest gives the best performance with 90.59% accuracy.

Step 3 - Model Governance

Once the model is published it will appear under ‘My Models’ in Model Governance. From here the model can be accepted or rejected. The accepted models will go under ‘Accepted Models’. Click 'Next' for model deployment.

Step 4 - Model Deployment

Once the model is approved it can be deployed. When the model is deployed it is ready for predictions.

Step 5 - Dashboard

Double-click on the 'Dashboard' icon to interact with your model. If the App is running it will show the dashboard as below. From here the user can interact with the model. All the input variables can be seen that were used during the model creation. 

Hit predict to see the result.