You are here: Home Research Tools for L2-L7 Traffic Classification @ NetGroup

Tools for L2-L7 Traffic Classification @ NetGroup

by Fulvio Risso last modified Oct 08, 2009 10:10 AM

This page contains a payload-based traffic classifier (and a set of companion tools) that is able to parse and classify traffic from layer 2 (data-link layer) to layer 7 (application layer) based on protocol descriptions contained in the NetPDLDatabase. Some useful tools are also been developed in order to process data produced by the classifier or prepare traffic traces before starting the analysis.

A zip file containing all tools and libraries compiled for Win32 architecture can be downloaded here.

A tarball file containing all tools and libraries compiled for Linux 32 architecture can be downloaded here.

Available tools include a detailed command-line help, which is shown by invoking the tool without parameters.

Available tools

This program deletes packets that belong to TCP session that have been established before the begining of the capture.
In addition, a session (TCP or UDP) is considered "terminated" after 5 minutes of inactivities; so, also packets that belongs to sessions that did not produce traffic for 5 minutes will be discarded by capturecleaner.
This program is the actual classifier based on NetPDL. It can read traffic dumps in the standard tcpdump format. This file produces a packet-based output (i.e. the classification result for each packet) and it prints a set of statistics for each protocol, and more through an appropriate set of configuration options. For a more detailed explaination of l7-netpdlclassifier output you can refer here. In case statistics per-session are required, the output of the l7-netpdlclassifier can be used as input for the session-rebuilder.
This program take as input the CSV decode dump file produced by l7-netpdlclassifier with -verbose option (which contains classification results per packet) and it produces an output_file cointaining the classification results grouped by session. For more details about the statistics produced and the output format, you can refer here. The output of this tool can be further used to fed the diffinder tool, which is able to compare the classification result produced by tool A with the ones produced by tool B.
This tool compares classification results from an arbitrary number of classifiers. It produces a file containing information about sessions that were classified differently by classifiers and on standard output it produces statistics grouped by protocol. Warning: comparison will be performed ONLY on sessions that are present in all the files that we are comparing. If, for some reasons, a classifier did not report a session, that session will not be counted in statistics. For more details about the input and output formats, you can refer here.
This tool reads a file containing Cisco SCA BB RDRs (Raw Data Records) and extracts information about classified sessions, printing it to the specified output file. This tool can also compare these results with the ones obtained by another classifier. This is because pcubesessions is not able yet to produce an ouput compatible with diffinder. It is possible to specify to use the flow_start RDRs (option -flowstart) or the transaction usage one (option -tranusage). Warning: the number of RDRs produced is usually very low compared to the number of sessions present in the input trace. Therefore, results are not very affordable. For more details about the statistics produced and the output format, you can refer here.

More details about some of these tools


A possible output for l7-netpdlclassifier

>l7-netpdlclassifier.exe -netpdlfile ntpdl.xml -capturefile test_clean.acp -decodedumpfilecsv test_dump.csv -enable7sessionstat

Loading NetPDL protocol database...
NetPDL Protocol database loaded.

-Starting the analysis of the capture dump 'test_clean.acp'.

Measured 0.0 seconds.

Decoding ended

Packets in the capture file: 469
Bytes in the capture file: 92227
Packets processed by the filter: 469
Bytes processed by the filter: 92227
Packets generated by an unknown application: 92 (19.62%)
HTTP sessions: 0
FTP sessions: 0
SMTP sessions: 0
SSH sessions: 0

Application-level statistics

br Protocol #Sessions Share #Bytes Share Avg. Proc. Time (us)
spop3 1 01.06% 156 00.17% 36.00
hsrp 1 01.06% 62 00.07% 28.00
icmp 4 04.26% 735 00.80% 354.00
edonkudp 3 03.19% 364 00.39% 142.67
http 7 07.45% 5577 06.05% 3475.14
syslog 1 01.06% 321 00.35% 29.00
defaultproto 35 37.23% 8849 09.59% 650.09
skype 42 44.68% 5583 06.05% 259.19

Example of an output file (in plain text)

This format is intended to be as compact as possible. Each row contains these fields:
Timestamp PacketNO HighestLevelProto (TransportProto)
1198144372.645134 1 tcp (tcp)
1198144372.645257 2 tcp (tcp)
1198144372.645426 3 tcp (tcp)
1198144372.645554 4 edonkudp (udp)
1198144372.645879 5 skype (udp)
1198144372.645957 6 edonkudp (udp)
1198144372.646209 7 defaultproto (udp)
1198144372.647446 8 defaultproto (udp)

Example of an output file (in plain text) with the 'verbose' option

This format is intended for showing as more information as possible. It must be used if you want to process the classification output with captdiff tool. Each row contains these fields:
Timestamp FileNO-PacketNO HighestLevelProto SrcIP:SrcPort<=>DstIP:DstPort PacketLengthByte (TransportProto) PayloadLengthByte TCPFlags
1198144372.645134 1-1 tcp<=> 62 (tcp) 0 S
1198144372.645257 1-2 tcp<=> 60 (tcp) 0 SA
1198144372.645426 1-3 tcp<=> 62 (tcp) 0 S
1198144372.645554 1-4 edonkudp<=> 161 (udp) 119
1198144372.645879 1-5 skype<=> 64 (udp) 22
1198144372.645957 1-6 edonkudp<=> 161 (udp) 119

Example of an outpuf file (in CSV format) with the 'verbose' option

Output files reports the highest-layer protocol detected for a session. (i.e.TCP sessions that received a RST during the 3-way handshake will have tcp as L7). defaultproto means "unknown protocol".



Standard output

Protocol	Sessions	Packets	Packets_per_session_mean	Packets_per_session_variance	Payloaded_Packets	Frame_bytes	Payload_bytes	Pattern_match	Pattern_match_mean	Pattern_match_variance	Not_well_known
bittorrent (udp)	8	13	1.625	0.984375	13	2240	1694	8	1	0	8
defaultproto (udp)	12	62	5.16667	24.1389	62	23971	21349	62	5.16667	24.1389	12
dns (udp)	84	171	2.03571	1.53444	171	23847	16665	84	1	0	0
edonkudp (udp)	5	8	1.6	0.24	8	922	574	6	1.2	0.16	5
http (tcp)	29	240	8.27586	9.85494	80	77995	63639	29	1	0	0
netbios (udp)	1	1	1	0	1	92	50	1	1	0	0
pop3 (tcp)	1	8	8	0	4	613	153	1	1	0	0
rtp (udp)	11	158	14.3636	256.413	158	58417	51781	21	1.90909	1.53719	11
skype (udp)	46	93	2.02174	4.06474	93	17588	13675	59	1.28261	0.941872	46
smtp (tcp)	2	15	7.5	12.25	6	1339	355	3	1.5	0.25	0
ssl (tcp)	6	111	18.5	198.583	64	68042	61664	6	1	0	0
syslog (udp)	2	2	1	0	2	608	524	2	1	0	0
tcp (tcp)	64	95	1.48438	0.468506	0	5966	0	0	0	0	64

Information printed by session-rebuilder

  • Protocol: the name of the protocol
  • Sessions: the number of sessions that transported the specified protocol
  • Packets: the number of packets that transported the specified protocol
  • Packet_per_session_mean: self explained
  • Packet_per_session_variance: self explained
  • Payloaded_packets: the number of packet that transported application data for that protocols (this number exclude i.e. TCP 3-way handshake or simple TCP ACK packets which is instead counted by Packets field)
  • Frame_bytes: the number of bytes produced by the specified protocol. This number includes all bytes "put on wire", so it includes also, say, ehternet-ip-tcp header bytes
  • Payload_bytes: the number of data bytes produced by the specified protocol. This number exclude all headers of layer less than application
  • Pattern_match: the cumulative number of payloaded packets that were analyzed with pattern match algorithm before the classifier was able to state which protocol is transported by a session of the specified protocol
  • Pattern_match_mean: simply Pattern_match/Sessions
  • Pattern_match_variance: self explained
  • Not_well_known: the number of sessions that did not used the well-known port for the specified protocol (i.e. http on port 5000). If a protocol does not have a well-known port, all sessions produced by this protocol will be counted by this field


File containing the protocol mapping between different classifiers

The file must contains as many columns as the number of different classifiers that we want to compare. Columns must be separated by TAB characters. The first line must indicates the name of the classifier.

If a classifier has more specific sub-types for a protocol (i.e. in the example below the tstat classifier is able to distinguish between http1.0 and http1.1), you must specify a line for each sub-type and you must map it to a protocol already specified for other classifiers.

If a classifier does not support a given protocol, you must provide a tag like "UNSUPPORTED" or "-"

Example of a protomap file

Netbee	Tstat	UniBs
edonkey	edonk	emule
skype	skype	UNSUPPORTED
http	http1.0	http
http	http1.1	http
defalutproto	unknown	unkown

File contaning the classification result of a single classifier

 The dump_file must be in CommaSeparatedValue format and must contains at lest the following information for each session:

  • Timestamp of the first packet of the session
  • The session id 5-tuple (Ip A, Port A, Ip B, Port B, Transport protocol) The L7 protocol associated to the session (the string that indicates the L7 protocol must match one of the string specified in the protomap_file; otherwise the session will be discarded from the comparison)
  • The size in bytes transported byt the session; it can be from the L2 to L7 or just L7... but all the classifier should provide the same information. In case of different sizes declared for the same session, the biggest size will be taken. If your classifier cannot provide this information, please add a '0' value

The first line of this file must keep the list of fields contained in the session dump. In order to have diffinder working, you must specify at least the tokens shown in the example. The order does not matter, but it should be coherent with the following data. Tokens are case insensitive. If your classifier produces more fileds, you must specify a token also for these fields; they will be ignored by diffinder.

Example of a dump file


Example of standard output

Session not reported by all classification dumps:       0
Common comparable sessions:     999     (bytes) 999000
Sessions with equal classification:     996     99.6997%        (bytes) 996000  99.6997%
Sessions with different classification: 3       0.3003% (bytes) 3000    0.3003%

Equally classified
Protocols       L4      Common_sessions Percentage      Common_bytes    Percentage
bittorrent|bittorrent   (tcp)   2       0.2002% 2000    0.2002%
bittorrent|bittorrent   (udp)   27      2.7027% 27000   2.7027%
cldap|cldap     (udp)   1       0.1001% 1000    0.1001%
dce_rpc_tcp|dce_rpc_tcp (tcp)   5       0.500501%       5000    0.500501%
defaultproto|defaultproto       (tcp)   16      1.6016% 16000   1.6016%
defaultproto|defaultproto       (udp)   23      2.3023% 23000   2.3023%
dns|dns (udp)   234     23.4234%        234000  23.4234%
edonkudp|edonkudp       (udp)   14      1.4014% 14000   1.4014%
edonk|edonk     (tcp)   1       0.1001% 1000    0.1001%
gnutella|gnutella       (udp)   4       0.4004% 4000    0.4004%
hsrp|hsrp       (udp)   1       0.1001% 1000    0.1001%
http|http       (tcp)   229     22.9229%        229000  22.9229%
kerberos|kerberos       (udp)   12      1.2012% 12000   1.2012%
ldap|ldap       (tcp)   1       0.1001% 1000    0.1001%
netbiosdgm|netbiosdgm   (udp)   1       0.1001% 1000    0.1001%
netbios|netbios (udp)   4       0.4004% 4000    0.4004%
ntp|ntp (udp)   3       0.3003% 3000    0.3003%
pop3|pop3       (tcp)   3       0.3003% 3000    0.3003%
rtp|rtp (udp)   20      2.002%  20000   2.002%
samba|samba     (tcp)   2       0.2002% 2000    0.2002%
skype|skype     (udp)   175     17.5175%        175000  17.5175%
smtp|smtp       (tcp)   67      6.70671%        67000   6.70671%
snmp|snmp       (udp)   3       0.3003% 3000    0.3003%
ssl|ssl (tcp)   23      2.3023% 23000   2.3023%
syslog|syslog   (udp)   2       0.2002% 2000    0.2002%
tcp|tcp (tcp)   123     12.3123%        123000  12.3123%

Differently classified
Protocols       L4      Common_sessions Percentage
dns|skype       (udp)   1       0.1001% 1000    0.1001%
skype|edonk     (udp)   1       0.1001% 1000    0.1001%
skype|http      (udp)   1       0.1001% 1000    0.1001%


Example of a SCA-BB RDR file

Cisco SCA BB RDR file format (csv):

Example of an ouptut file produced by pcubesessions

0,,1415,,80,(tcp),[http] ,<edonk>
2,,1316,,3467,(tcp),[P2P] ,<defaultproto>
137,,6881,,17823,(udp),[bittorrent] ,<defaultproto>

139,,1422,,6885,(tcp),[bittorrent] ,<defaultproto>

Example of a file containing statistics generated by pcubesessions

Total number of sessions found in both input files,2780 (576043225 bytes)
Number of sessions found only by PCUBE,0
Number of sessions found only by NetBee,1212 (154148164 bytes)

Number of sessions found by both tools,1568,, (421895061 bytes),56.4029%

Number of sessions with equal results,1267,, (224676924 bytes),80.8036%
Number of sessions classified by both (same classif. result),, (84561531 bytes),1121,71.4923%
Number of sessions unclassified by both (both defaultproto),, (140115393 bytes),146,9.31122%

Total Number of differences,301,, (49262403 bytes),19.1964%
Number of real differences,,36, (11721609 bytes),2.35969%
Number of sessions not classified by PCUBE and classified by NetBEE,,68 (37540794 bytes),4.40051%
Number of sessions classified by PCUBE and not classified by NetBEE,,197,12.6276%
Document Actions