A few months ago, we announced the release of Demisto’s ML-powered phishing email classifier on Amazon SageMaker. This classifier leverages a model trained on thousands of emails to help organizations detect malicious phishing emails with a high degree of accuracy.
In this blog, we’ll highlight how the phishing email classifier can be used within Demisto itself. Before we start, let’s set some context.
As we shaped the security orchestration space with Demisto over the past three years, we saw SOC teams dealing with a wide variety of security incidents. But as much as attacking techniques and entry vectors have evolved, one incident has continued to maintain its frequency and level of danger to security teams – the phishing incident.
Based on the user deployments we’ve observed, here’s some background on phishing incidents. The attack usually originates when an employee in the organization receives a suspicious email and – not being sure of its veracity – forwards the email to the SOC team’s phishing mailbox (something like email@example.com).
The SOC team then reviews the email, enriches indicators within it (IPs, attachments, URLs, and so on) to see if any proof of malice exists, studies the email’s content, and eventually decides whether it’s a false positive or a genuine phishing attempt.
How Machine Learning Helps
As we’ve always been strong believers in machine learning being an enhancer within security platforms, we started looking at ML methods to improve the overall phishing response process that SOC teams should follow. We studied phishing emails as an ML classification problem and leveraged the amount of tagged data that SOC teams had gathered over time (high volumes of emails that security teams passed a verdict on) to build a state-of-the-art ML model to predict with high certainty whether or not future emails will be genuine phishing attempts.
Our phishing email classifier is a good illustration of supervised learning. In supervised learning, the ML algorithm is trained on a set of data (with identified inputs and output categories), learns from that training set, and accurately classifies new data pieces to their respective categories. You can learn more about supervised learning and statistical classification by visiting the linked pages.
Implementing a Phishing Email Classifier in Demisto
There are three ways we support the implementation of a phishing email classifier today:
- Customers can send us their raw phishing data and Demisto’s ML team will build a dedicated model that trains on that data and can be deployed on the customer’s environment.
- Customers can train a model on their own environment using a Demisto playbook named ‘DBot Create Phishing Classifier Job’ (available in versions 4.1 and above).
- Customers can use an out-of-the-box generic ML model (which the development team can supply upon request).
Let’s walk through the second option: training an ML model in your Demisto environment using playbooks and existing phishing data.
Here are the set of steps we will follow:
1. Define Training and Evaluation Datasets
We need to create two queries that will be used as input parameters for the playbook. One query is for the dataset you want the model to train on, and the other query is for the dataset you want the model to evaluate.
It’s recommended that the training dataset be ‘older’ (farther back in time) than the evaluation dataset, but this might depend on your specific requirements. The queries should ideally contain details about the incident type (phishing, in our case), email tags (fields that capture whether or not the email is malicious), and time period.
For example, this query…
`type:Phishing AND phishingclassification:(“Malware” “Spam” “Malicious” “other”) AND created:<”7 days ago”`
…will return phishing incidents in your database older than the last week that have been classified as ‘Malware’, ‘Spam’, ‘Malicious’, or ‘Other’. This should be a good candidate for a training dataset!
- We have specified values under the ‘phishingclassification’ field in order to avoid uncategorized phishing emails. It’s recommended to feed only categorized data to the model for training.
- ‘phishingclassification’ is a custom label and might be called something else in your Demisto deployment. It is any field that captures the categorization of emails as malicious or non-malicious.
Continuing our example, the following query should work for the evaluation dataset…
`type:Phishing AND phishingclassification:(“Malware” “Spam” “Malicious” “other”) AND created:>=”7 days ago”`
This is a good time to bring up the logical mapping of phishing labels. Ideally, you’d like a dataset that just has two categories (‘Malicious’ and ‘Other’) in order to train the model more effectively. To achieve this, we can map the other labels (in this case, ‘Malware’ and ‘Spam’) to the two labels we want (‘Malicious’ and ‘Other’). The mapping will look like this:
`Spam:Other, Malware: Malicious, Other:Other, Malicious:Malicious`
Once this mapping is completed, we recommend that you check the sizes of the two datasets. The training dataset should have at least 300 incidents in each category (ideally more than 1000) and the evaluation dataset should have at least 50 incidents in each category (ideally more than 100).
2. Setup Playbook with Input Parameters
To create a playbook that will train ML models on your phishing data, first clone the ‘DBot Create Phishing Classifier Job’ playbook. You will need to ensure that the input parameters of the playbook are properly mapped to the relevant data fields. Let’s go through a brief overview here:
This field should contain your training dataset. In this example, we already defined the training dataset in the previous section.
This field should contain your evaluation dataset. In this example, we already defined the evaluation dataset in the previous section.
This field corresponds to the incident field that contains the name of the email. In this example, our field is ‘name’, but your field name can be anything.
This field corresponds to the incident field that contains the text body of the email. In this example, our field is ‘details’, but your field name can be anything.
This field corresponds to the incident field that tags the email as malicious or otherwise. In our example, you might remember from the previous section that our field is ‘phishingclassification’.
This field contains the mapping values of the category labels in the ‘emailTagKey’ field. In our example, this field will contain `Spam:Other, Malware: Malicious, Other:Other, Malicious:Malicious`, the logic of which we discussed in the previous section.
This fields contains the maximum number of incidents you want to fetch with the playbook. We’d recommend limiting this due to performance and accuracy reasons, but the number should be greater than the total number of incidents in the training and evaluation queries we discussed above. In this example, let’s enter 3000 in this field.
This field should contain the model used by the playbook. In this example, we will keep the default ‘phishing_model’ value here.
3. Run Playbook and Verify Model Accuracy
After all that hard work, it’s time to run the playbook! You can run the playbook by creating a dummy incident or by running it in the playground.
Once the playbook has finished its execution, click on the ‘Prepare phishing data’ task in the ‘DBot Create Phishing Classifier’ sub-playbook and download the output file.
Each line in the output file will be an email (subject and body) and its associated tags. Scan through this data to verify that the results are reasonable. You can also view the summary by scrolling down the Task Details bar.
Next, click on ‘Train model’ task and verify that the ‘Done training’ message appears in the Task Details. These details will also contain the number of emails that were used for training and associated prediction and recall values. Note that the precision threshold here is 0.7 – the model will be rejected if the value is below 0.7.
Finally, let’s click on the ‘Model evaluation’ task and check the results for different thresholds of probability. Looking at this data, you can decide whether you want to keep a threshold level to maintain accuracy.
4. Set Playbook as ‘Job’ to Learn with Time
Now that you’re happy with your model, you can perform two actions:
- Use the model within your phishing response playbooks to predict whether or not new incoming emails are malicious.
- Schedule the playbook as a ‘Job’ so that the model always stays updated and trains with time.
Let’s use the ‘DBotPredictPhishingLabel’ command in the War Room to test out our model! If we enter the query below…
`!DBotPredictPhishingLabel modelListName="phishing_model" hashData="no" emailSubject="This is some subject" emailBody="This is example body"`
…we get the following output.
As expected, the model does not classify this email as malicious. Please note that the model you enter in the query (‘phishing_model’) should be the same one you entered in the training playbook.
You can schedule the phishing classifier playbook as a ‘Job’ just like any other playbook. In the screenshot below, for example, the playbook will run every two weeks on Sunday at 3am.
This blog hopefully lays bare the possible benefits of running this phishing classifier playbook in your Demisto environment. We’ve received encouraging feedback from customers and end users so far. Here are a few examples of the impact our phishing classifier has had:
- A large MSSP customer has saved roughly 53 analyst hours per month with this playbook. This saved time is now being spent in more proactive activities and decision-making.
- A Fortune 500 pharmaceutical company has detected 90% of new phishing campaign emails using our phishing classifier.
If you’re a Demisto user, you can access this support document to find out more about how to build and train your own phishing classifier.