This blog was written by Meir Wahnon, Director of Engineering at Demisto
We’re delighted to announce that Demisto’s ML-powered phishing email classifier is live on the Amazon SageMaker marketplace. This classifier leverages a model trained on thousands of emails to help organizations detect malicious phishing emails with a high degree of accuracy. In this blog, we’ll highlight the journey this model took and why we’re releasing it on SageMaker.
How We Did It
As we shaped the security orchestration space with Demisto over the past three years, we saw SOC teams dealing with a wide variety of security incidents. But as much as attacking techniques and entry vectors have evolved, one incident has continued to maintain its frequency and level of danger to security teams – the phishing incident.
Based on the user deployments we’ve observed, here’s some background on phishing incidents. The attack usually originates when an employee in the organization receives a suspicious email and – not being sure of its veracity – forwards the email to the SOC team’s phishing mailbox (something like email@example.com).
The SOC team then reviews the email, enriches indicators within it (IPs, attachments, URLs, and so on) to see if any proof of malice exists, studies the email’s content, and eventually decides whether it’s a false positive or a genuine phishing attempt.
Introducing Machine Learning
While Demisto provides a robust phishing response playbook that coordinates and automates actions across security products to lower the amount of manual work for security teams, we felt that further improvements could be made within SOAR platforms to combat phishing incidents.
As we’ve always been strong believers in machine learning being an enhancer within security platforms, we started looking at ML methods to improve the overall phishing response process that SOC teams should follow. We studied phishing emails as an ML classification problem and leveraged the amount of tagged data that SOC teams had gathered over time (high volumes of emails that security teams passed a verdict on) to build a state-of-the-art ML model to predict with high certainty whether or not future emails will be genuine phishing attempts.
We used the latest work done in the field of text analysis and played with various features until we reached results that satisfied us. Ultimately, we were able to reach 85%-90% accuracy using our customers’ data, to everyone’s satisfaction.
These models were incorporated back into our customers’ phishing playbooks in Demisto, helping their analysts make better decisions and more refined prioritizations on which incidents to focus on, improving overall SOC productivity.
Building Models Across Customers
Next, we wanted to see if we could build a cross-customer model, using datasets from multiple customers to predict a totally different customer’s dataset. This model would provide value for customers even if it was not trained on their specific data. While we knew in advance to expect lower accuracy rates in these cases, we experimented with the idea nonetheless.
We eventually reached an 80% accuracy level with this model (although accuracy differed for each customer) and decided that the model had value, especially for organizations that are too short on finances and personnel to manually review phishing emails.
We’ve shared this model with Demisto customers, but that was just the start. By hosting the phishing email classifier on Amazon SageMaker, we’d like to give more power to SOC teams that aren’t using Demisto as well as to phishing solution vendors that are looking to improve their product. With this model, they can leverage the knowledge accumulated over 100k phishing emails tagged by top-tier SOC teams around the world and use it to classify phishing emails with greater accuracy than ever before.
Why We Did It
Demisto, like any other SOAR platform, is only as strong as its product integration network. But this isn’t just about Demisto. In security today, environments are so disparate that it’s tough for security teams to get visibility. Moreover, there’s a product for everything, and even though they provide unique security intelligence, it’s a challenge for SOCs to make sense of all this data and correlate information across products to arrive at the correct decision to make.
With these challenges in mind, we’re strong believers in inter-product connectivity and creating a “best of breed” product ecosystem that gives security teams the right information where they’re most comfortable receiving it, not where we think they should be receiving it.
We hope security teams and security products can benefit from our phishing email classifier. We’d love it if you tried it out and gave us feedback to improve the model even further.
You can access the phishing email classifier on Amazon SageMaker here.
Note: We do charge a very low and nominal fee for using the ML model on SageMaker ($1 per hour per machine time).
To explore Demisto's platform in greater detail, you can download the Free Community Edition by visiting the link below.