Traffic classification is an important topic in the networking research field. The possibility to know which kind of traffic is flowing within the network can be useful for many different purposes, starting from simple statistics collection up to QoS provisioning or dynamic resource allocation.
Currently there are several of open issues in traffic classification; we currently focus on scalability problems, in particular with respect of techniques based on Deep Packet Inspection (DPI), which looks for specific signatures present in application-layer data and that can unequivocally identify the application protocol transported.
Results and ongoing projects
Line rate software DPI traffic classification
One of the issue about traffic classification is the impossibility to perform DPI traffic classification at line rate.
Some hardware solution enables line rate traffic classification, but usually these are not very flexible. In other cases traffic classification is performed only on a subset (e.g. sampling) of the traffic.
NetGroup is looking for techniques that enable software DPI traffic classifiers to be run at line rate. Currently we are evaluating some optimizations that permits to achieve up to 50-fold speedup respect to a classic software DPI classifier.
Obtaining the Ground Truth
One very big issue when working with traffic classifiers is the problem to collect a set of data (the ground truth) for evaluating the accuracy of the results of the traffic classifier under testing.
Usually research groups use as ground truth a set of traffic traces pre-classified by other classifiers (usually DPI). This does not provide a solid base for evaluating a new classifier; even worse, some research group use this information as the ground truth for the training phase of their statistical classifiers exposing them to the risk of a wrong training since the data used as training set do not have any guaranties of correctness.In collaboration with the Telecommunication Network Group (NTW) of the University of Brescia, we have developed a framework that enables the collection of data sets with a solid ground truth information. The framework consist in a daemon that reports which application is associated to each network socket of a network host and a set of post-processing tools for creating the ground truth dataset.
Benchmarking traffic classifiers
The study of new classification techniques must be supported by an evaluation of how the new techniques perform respect to already existent solutions.For this reason, in collaboration with the Telecommunication Network Group (NTW) of the University of Brescia, we developed a methodology for comparing the computational cost of different traffic classifiers. As a first result we discovered that current software DPI classifiers have a computational requirement very close to statistical classifiers based on SVM algorithm. This result demystifies the claims that DPI classification is too much expensive respect to the newer statistical approaches.
We are currently working on this topic for comparing other statistical traffic classifiers with classic and optimized DPI classifiers.
Computational costs are not the only evaluation parameter that describe the goodness of a network classifier. In collaboration with Telecommunication Network Group (NTW) of the University of Brescia and the Networking group of Telecommunication Networks Group (TNG) of DELEN of Politecnico di Torino we have developed a set of tools that enables the comparison of the outcome of different classifiers, returning the classification precision relative to another classifier or even to ground truth information if available.
Service Based traffic classification
DPI techniques fails when the traffic is encrypted and have several problems with tunnels. Moreover, optimization on software DPI classifiers does not completely solve the computational cost and memory requirements, especially on very large network context.NetGroup has developed a new approach for classifying traffic, based on the discovery of network services, called "Service Based Classification" (SBC). The idea is very simple: once a classifier has identified that a host is running a certain network application, SBC reuses this information for classifying next sessions related to the same service.
For sure SBC can be coupled with any existing classification method since it only requires some "entity" providing the classification outcome for a new discovered service (even by hand crafted and/or pre-compiled lists).
This methods enables up to 20-fold memory saving needed to store the flow status information and we are currently running tests in order to evaluate the gain on classification cost achievable with this techniques.
The tools and software mentioned in the previous section are available.
- the l7-netpdlclassifier, that is a DPI classifier developed by Netgroup, is available here
- the GT framework is available here
- the set of tools used for traffic classification benchmarking is available here
Papers and drafts
All the published work on traffic classification can be found in our publication section.