Oftentimes when performing automated analysis of malware we seek to answer one basic question: what kind of malware is it?

Automated analysis has become very popular with malware analysts since the advent of sandboxes, which allow an analyst to place a sample inside of an environment where the malware executes and the sandbox collects relevant information regarding the malicious program and its behavior.

There are many tools available to an analyst that allows him or her to identify a particular sample.

One of those tools is a website known as VirusTotal, which many of you reading this are likely already familiar with. VirusTotal, owned by Google, is a website containing a vast amount of samples scanned by multiple security vendors; the vendor name given to the sample helps identify the malware in question.

Yara, a tool for malware identification, is another great tool for rapid malware identification that we’ve blogged about before.

By using yara (and perhaps also a rule generator), an analyst can use his or her knowledge of malware to write rules that target specific malware families, or other interesting things. The tool also supports modules, one of which can be used with the Cuckoo sandbox to further enhance rapid malware identification.

However, there may be some cases where you may know two or more malware samples are related, but you don’t know exactly what they are.

For example, when observing artifacts like strings, mutants, imphashes, and other things, you can see similarities between multiple samples, but there isn’t enough information available to make a positive ID of the malware.

On top of that, if you’re using a sandbox for bulk processing of samples, taking the time to look at sandbox reports and make comparisons between them can be a daunting task if done manually.

For this reason and others, having an automated tool that categorizes malware based on its behavior can be helpful. Thankfully, there is such a tool, and it’s called “malheur”.

Malheur is “a tool for the automatic analysis of malware behavior”. By using machine learning, malheur collects behavioral analysis data inside sandbox reports and categorizes malware into similar groups called “clusters”.

malheur_clusterImage from malheur website depicting how the software uses data to make decisions.

Unfortunately, while malheur is a valuable tool for a malware analyst, it wasn’t directly compatible with Cuckoo sandbox until recent work done by Optiv (formerly Accuvant/Fishnet). Using the plain title “cuckoo-modified,” researchers at Optiv have made a modified version of Cuckoo that supports malheur, along with adding a lot of other functionality to the sandbox (some of which you can read here).

To enable malheur with Cuckoo, you first have to download and install it. Once this is done, enabling it is as easy as opening up the reporting.conf file inside of Cuckoo and turning it on.malheur_conf

After Cuckoo finishes an analysis, there will be a new folder under the storage directory called “malheur”. Inside malheur.txt is information regarding the clusters.

malheur_clustersAs can be seen from above, task reports 1091-1093 are all in similar clusters. When looking at the reports manually, it was found they were all the same type of ransomware.

Malheur isn’t really a new tool, and it offers a lot more than just clustering. To really explore it’s capabilities, try to experiment with the tool and read the documentation.

Questions? Comments? Post them below.

@joshcannell