By: Karl Weinmeister and Jarred Capellman
Previously, we explained the basics of performance tuning for feature extraction, and the benefits it provides. Now we’ll explain how to obtain this technology for yourself.
Evaluating a model’s accuracy requires a sufficiently large dataset to determine if a change has made a statistically significant impact. On the other hand, evaluating the feature extraction time generally does not require as large a sample size to determine the impact of a change.
It still is critical to ensure that the test set is representative of real-world data. For the DeepArmor Windows executable model, for example, that would mean that the test set should have an appropriate mix of 32-bit vs 64-bit, EXE vs DLL, different file sizes, and more.
Once this sample set is created, a batch of data should have feature extraction performed on it, and then be timed to determine the mean time per sample. Any changes can then be compared to this baseline to see how they are impacting performance.
Even better, some temporary features can be added to profile the performance of individual features. A timer can be started prior to extracting a feature and the difference can be calculated afterward. For larger projects with more features, it’s best to group multiple related features and time those together.
For instance, the table below illustrates timing of individual features. One insight shown is that Feature1 takes significantly more time to derive than Feature0.
When calculating the mean time, it’s worth remembering that any samples with a NaN value should be excluded. Also, more sophisticated profiling can be done to determine if there are performance issues only in certain situations—that is, if the feature extraction time is correlated with another feature (e.g. file size in DeepArmor’s case). It can be helpful to calculate other summary statistics such as min, max, and standard deviation to quickly determine if a deeper analysis is needed.
What can I do with this performance information?
Now that you have a more detailed understanding of your feature extraction performance, you can look at any trouble spots and determine what to do about them.
In practice, we have found that a close inspection of the code related to slow features often yields benefits. Here’s one example: DeepArmor looks for the presence of certain strings in files to derive features. Many of these strings don’t require regular expression capability, so creating another path to use a streamlined search for these strings dramatically sped up extraction of these features.
In some cases, it may not be possible to improve performance significantly. Then, the data scientist can look at removing the slow features to see if they only minimally affect accuracy. Or, another compromise might be to limit the scope of the features; the newly derived features may not be ideal, but they could well be close enough. For example, if the feature extraction tool is looking for the presence of strings from a top 100 list, that list could be reduced to the top 50—doubling performance and most likely having a much less proportional impact on accuracy.
After you have tuned your models for production, it’s also worth thinking about maintaining high performance as the solution evolves over time. One answer to that is using a continuous integration system to run tests on new code. An integration test can be created to check that the feature extraction process finishes within a certain amount of time. This test should have some buffer for minor variance between runs, and there should be consistency in the build environment.
Conclusion
Feature extraction can be the long pole in the research and production workflows, so it’s a key area to focus your efforts. It can enable your data science team to increase their throughput and accuracy while reducing your costs. You’ve also seen how to get started on timing and analyzing the process. Best of luck in applying this knowledge to your own projects!