Tools for L2-L7 Traffic Classification @ NetGroup
This page contains a payload-based traffic classifier (and a set of companion tools) that is able to parse and classify traffic from layer 2 (data-link layer) to layer 7 (application layer) based on protocol descriptions contained in the NetPDLDatabase. Some useful tools are also been developed in order to process data produced by the classifier or prepare traffic traces before starting the analysis.
A zip file containing all tools and libraries compiled for Win32 architecture can be downloaded here.
A tarball file containing all tools and libraries compiled for Linux 32 architecture can be downloaded here.
Available tools include a detailed command-line help, which is shown by invoking the tool without parameters.
Available tools
- capturecleaner
- This program deletes packets that belong to TCP session that have been established before the begining of the capture.
In addition, a session (TCP or UDP) is considered "terminated" after 5 minutes of inactivities; so, also packets that belongs to sessions that did not produce traffic for 5 minutes will be discarded by capturecleaner. - l7-netpdlclassifier
- This program is the actual classifier based on NetPDL. It can read traffic dumps in the standard tcpdump format. This file produces a packet-based output (i.e. the classification result for each packet) and it prints a set of statistics for each protocol, and more through an appropriate set of configuration options. For a more detailed explaination of l7-netpdlclassifier output you can refer here. In case statistics per-session are required, the output of the l7-netpdlclassifier can be used as input for the session-rebuilder.
- session-rebuilder
- This program take as input the CSV decode dump file produced by l7-netpdlclassifier with -verbose option (which contains classification results per packet) and it produces an output_file cointaining the classification results grouped by session. For more details about the statistics produced and the output format, you can refer here. The output of this tool can be further used to fed the diffinder tool, which is able to compare the classification result produced by tool A with the ones produced by tool B.
- diffinder
- This tool compares classification results from an arbitrary number of classifiers. It produces a file containing information about sessions that were classified differently by classifiers and on standard output it produces statistics grouped by protocol. Warning: comparison will be performed ONLY on sessions that are present in all the files that we are comparing. If, for some reasons, a classifier did not report a session, that session will not be counted in statistics. For more details about the input and output formats, you can refer here.
- pcubesessions
- This tool reads a file containing Cisco SCA BB RDRs (Raw Data Records) and extracts information about classified sessions, printing it to the specified output file. This tool can also compare these results with the ones obtained by another classifier. This is because pcubesessions is not able yet to produce an ouput compatible with diffinder. It is possible to specify to use the flow_start RDRs (option -flowstart) or the transaction usage one (option -tranusage). Warning: the number of RDRs produced is usually very low compared to the number of sessions present in the input trace. Therefore, results are not very affordable. For more details about the statistics produced and the output format, you can refer here.
More details about some of these tools
l7-netpdlclassifier
A possible output for l7-netpdlclassifier
>l7-netpdlclassifier.exe -netpdlfile ntpdl.xml -capturefile test_clean.acp -decodedumpfilecsv test_dump.csv -enable7sessionstat
Loading NetPDL protocol database...
NetPDL Protocol database loaded.
-Starting the analysis of the capture dump 'test_clean.acp'.
Measured 0.0 seconds.
==================================================================
Decoding ended
Packets in the capture file: 469
Bytes in the capture file: 92227
Packets processed by the filter: 469
Bytes processed by the filter: 92227
Packets generated by an unknown application: 92 (19.62%)
HTTP sessions: 0
FTP sessions: 0
SMTP sessions: 0
SSH sessions: 0
==================================================================
Application-level statistics
br Protocol #Sessions Share #Bytes Share Avg. Proc. Time (us)
============================================================================
spop3 1 01.06% 156 00.17% 36.00
hsrp 1 01.06% 62 00.07% 28.00
icmp 4 04.26% 735 00.80% 354.00
edonkudp 3 03.19% 364 00.39% 142.67
http 7 07.45% 5577 06.05% 3475.14
syslog 1 01.06% 321 00.35% 29.00
defaultproto 35 37.23% 8849 09.59% 650.09
skype 42 44.68% 5583 06.05% 259.19
Example of an output file (in plain text)
This format is intended to be as compact as possible. Each row contains these fields:Timestamp PacketNO HighestLevelProto (TransportProto)Example:
1198144372.645134 1 tcp (tcp)
1198144372.645257 2 tcp (tcp)
1198144372.645426 3 tcp (tcp)
1198144372.645554 4 edonkudp (udp)
1198144372.645879 5 skype (udp)
1198144372.645957 6 edonkudp (udp)
1198144372.646209 7 defaultproto (udp)
1198144372.647446 8 defaultproto (udp)
Example of an output file (in plain text) with the 'verbose' option
This format is intended for showing as more information as possible. It must be used if you want to process the classification output with captdiff tool. Each row contains these fields:Timestamp FileNO-PacketNO HighestLevelProto SrcIP:SrcPort<=>DstIP:DstPort PacketLengthByte (TransportProto) PayloadLengthByte TCPFlagsExample:
1198144372.645134 1-1 tcp 85.18.21.14:1472<=>13.19.3.45:80 62 (tcp) 0 S 1198144372.645257 1-2 tcp 13.19.3.45:80<=>85.18.21.14:1472 60 (tcp) 0 SA 1198144372.645426 1-3 tcp 13.19.25.8:3809<=>208.72.31.147:80 62 (tcp) 0 S 1198144372.645554 1-4 edonkudp 13.19.236.74:1756<=>90.22.87.156:9003 161 (udp) 119 1198144372.645879 1-5 skype 78.61.90.249:55257<=>13.19.58.78:21712 64 (udp) 22 1198144372.645957 1-6 edonkudp 13.19.236.74:1756<=>87.19.141.123:6414 161 (udp) 119
Example of an outpuf file (in CSV format) with the 'verbose' option
Output files reports the highest-layer protocol detected for a session. (i.e.TCP sessions that received a RST during the 3-way handshake will have tcp as L7). defaultproto means "unknown protocol".
Timestamp,IpA,PortA,IpB,PortB,L4,L7,Frame_bytes,Payload_bytes 1198144372,79.165.172.0,49873,130.192.3.84,25,tcp,tcp,248,0 1198144372,79.165.172.0,49874,130.192.3.84,25,tcp,tcp,64,0 1198144372,192.168.1.1,80,130.192.43.125,1082,tcp,tcp,30022,0 1198144372,130.192.1.1,80,160.97.4.201,47817,tcp,http,990,900 1198144372,130.192.19.2,59175,200.168.142.16,31390,udp,skype,2345,2000 1198144372,195.210.87.2,80,130.192.85.15,2285,tcp,tcp,9998,9900 1198144372,193.206.116.2,25,130.192.86.149,43451,tcp,smtp,8877,8700
Session-rebuilder
Standard output
Protocol Sessions Packets Packets_per_session_mean Packets_per_session_variance Payloaded_Packets Frame_bytes Payload_bytes Pattern_match Pattern_match_mean Pattern_match_variance Not_well_known bittorrent (udp) 8 13 1.625 0.984375 13 2240 1694 8 1 0 8 defaultproto (udp) 12 62 5.16667 24.1389 62 23971 21349 62 5.16667 24.1389 12 dns (udp) 84 171 2.03571 1.53444 171 23847 16665 84 1 0 0 edonkudp (udp) 5 8 1.6 0.24 8 922 574 6 1.2 0.16 5 http (tcp) 29 240 8.27586 9.85494 80 77995 63639 29 1 0 0 netbios (udp) 1 1 1 0 1 92 50 1 1 0 0 pop3 (tcp) 1 8 8 0 4 613 153 1 1 0 0 rtp (udp) 11 158 14.3636 256.413 158 58417 51781 21 1.90909 1.53719 11 skype (udp) 46 93 2.02174 4.06474 93 17588 13675 59 1.28261 0.941872 46 smtp (tcp) 2 15 7.5 12.25 6 1339 355 3 1.5 0.25 0 ssl (tcp) 6 111 18.5 198.583 64 68042 61664 6 1 0 0 syslog (udp) 2 2 1 0 2 608 524 2 1 0 0 tcp (tcp) 64 95 1.48438 0.468506 0 5966 0 0 0 0 64
Information printed by session-rebuilder
- Protocol: the name of the protocol
- Sessions: the number of sessions that transported the specified protocol
- Packets: the number of packets that transported the specified protocol
- Packet_per_session_mean: self explained
- Packet_per_session_variance: self explained
- Payloaded_packets: the number of packet that transported application data for that protocols (this number exclude i.e. TCP 3-way handshake or simple TCP ACK packets which is instead counted by Packets field)
- Frame_bytes: the number of bytes produced by the specified protocol. This number includes all bytes "put on wire", so it includes also, say, ehternet-ip-tcp header bytes
- Payload_bytes: the number of data bytes produced by the specified protocol. This number exclude all headers of layer less than application
- Pattern_match: the cumulative number of payloaded packets that were analyzed with pattern match algorithm before the classifier was able to state which protocol is transported by a session of the specified protocol
- Pattern_match_mean: simply Pattern_match/Sessions
- Pattern_match_variance: self explained
- Not_well_known: the number of sessions that did not used the well-known port for the specified protocol (i.e. http on port 5000). If a protocol does not have a well-known port, all sessions produced by this protocol will be counted by this field
Diffinder
File containing the protocol mapping between different classifiers
The file must contains as many columns as the number of different classifiers that we want to compare. Columns must be separated by TAB characters. The first line must indicates the name of the classifier.
If a classifier has more specific sub-types for a protocol (i.e. in the example below the tstat classifier is able to distinguish between http1.0 and http1.1), you must specify a line for each sub-type and you must map it to a protocol already specified for other classifiers.
If a classifier does not support a given protocol, you must provide a tag like "UNSUPPORTED" or "-"
Example of a protomap file
Netbee Tstat UniBs edonkey edonk emule skype skype UNSUPPORTED http http1.0 http http http1.1 http defalutproto unknown unkown
File contaning the classification result of a single classifier
The dump_file must be in CommaSeparatedValue format and must contains at lest the following information for each session:
- Timestamp of the first packet of the session
- The session id 5-tuple (Ip A, Port A, Ip B, Port B, Transport protocol) The L7 protocol associated to the session (the string that indicates the L7 protocol must match one of the string specified in the protomap_file; otherwise the session will be discarded from the comparison)
- The size in bytes transported byt the session; it can be from the L2 to L7 or just L7... but all the classifier should provide the same information. In case of different sizes declared for the same session, the biggest size will be taken. If your classifier cannot provide this information, please add a '0' value
The first line of this file must keep the list of fields contained in the session dump. In order to have diffinder working, you must specify at least the tokens shown in the example. The order does not matter, but it should be coherent with the following data. Tokens are case insensitive. If your classifier produces more fileds, you must specify a token also for these fields; they will be ignored by diffinder.
Example of a dump file
timestamp,IPa,PORTa,IPb,PORTb,L4,L7,bytes 1198144372,195.54.2.1,53,130.192.3.21,2103,udp,dns,50 1198144372,130.192.58.78,21712,78.61.90.249,55257,udp,skype,1500 1198144372,130.192.3.21,2103,158.152.1.193,53,udp,dns,60
Example of standard output
Session not reported by all classification dumps: 0 Common comparable sessions: 999 (bytes) 999000 Sessions with equal classification: 996 99.6997% (bytes) 996000 99.6997% Sessions with different classification: 3 0.3003% (bytes) 3000 0.3003% Equally classified Protocols L4 Common_sessions Percentage Common_bytes Percentage netbee|netbee bittorrent|bittorrent (tcp) 2 0.2002% 2000 0.2002% bittorrent|bittorrent (udp) 27 2.7027% 27000 2.7027% cldap|cldap (udp) 1 0.1001% 1000 0.1001% dce_rpc_tcp|dce_rpc_tcp (tcp) 5 0.500501% 5000 0.500501% defaultproto|defaultproto (tcp) 16 1.6016% 16000 1.6016% defaultproto|defaultproto (udp) 23 2.3023% 23000 2.3023% dns|dns (udp) 234 23.4234% 234000 23.4234% edonkudp|edonkudp (udp) 14 1.4014% 14000 1.4014% edonk|edonk (tcp) 1 0.1001% 1000 0.1001% gnutella|gnutella (udp) 4 0.4004% 4000 0.4004% hsrp|hsrp (udp) 1 0.1001% 1000 0.1001% http|http (tcp) 229 22.9229% 229000 22.9229% kerberos|kerberos (udp) 12 1.2012% 12000 1.2012% ldap|ldap (tcp) 1 0.1001% 1000 0.1001% netbiosdgm|netbiosdgm (udp) 1 0.1001% 1000 0.1001% netbios|netbios (udp) 4 0.4004% 4000 0.4004% ntp|ntp (udp) 3 0.3003% 3000 0.3003% pop3|pop3 (tcp) 3 0.3003% 3000 0.3003% rtp|rtp (udp) 20 2.002% 20000 2.002% samba|samba (tcp) 2 0.2002% 2000 0.2002% skype|skype (udp) 175 17.5175% 175000 17.5175% smtp|smtp (tcp) 67 6.70671% 67000 6.70671% snmp|snmp (udp) 3 0.3003% 3000 0.3003% ssl|ssl (tcp) 23 2.3023% 23000 2.3023% syslog|syslog (udp) 2 0.2002% 2000 0.2002% tcp|tcp (tcp) 123 12.3123% 123000 12.3123% Differently classified Protocols L4 Common_sessions Percentage netbee|netbee dns|skype (udp) 1 0.1001% 1000 0.1001% skype|edonk (udp) 1 0.1001% 1000 0.1001% skype|http (udp) 1 0.1001% 1000 0.1001%
Pcubesessions
Example of a SCA-BB RDR file
Cisco SCA BB RDR file format (csv):
#4042321942,N/A,4999,16,6,1353851901,80,1353867207,1415,1,1194260179,1194260179,0,1449,0,
#4042321942,N/A,4999,16,6,3515168235,80,1046087763,1199,1,1194260179,1194260179,0,1451,0,
#4042321942,N/A,4999,9,6,1053707246,3467,2887385347,1316,1,1194260179,1194260179,0,1454,0,
Example of an ouptut file produced by pcubesessions
0,80.178.95.199,1415,80.178.35.253,80,(tcp),[http] ,<edonk>
2,172.26.1.3,1316,62.206.75.238,3467,(tcp),[P2P] ,<defaultproto>
137,80.178.95.210,6881,82.44.218.96,17823,(udp),[bittorrent] ,<defaultproto>
139,10.1.11.4,1422,10.1.11.16,6885,(tcp),[bittorrent] ,<defaultproto>
Example of a file containing statistics generated by pcubesessions
Total number of sessions found in both input files,2780 (576043225 bytes)
Number of sessions found only by PCUBE,0
Number of sessions found only by NetBee,1212 (154148164 bytes)
Number of sessions found by both tools,1568,, (421895061 bytes),56.4029%
Number of sessions with equal results,1267,, (224676924 bytes),80.8036%
Number of sessions classified by both (same classif. result),, (84561531 bytes),1121,71.4923%
Number of sessions unclassified by both (both defaultproto),, (140115393 bytes),146,9.31122%
Total Number of differences,301,, (49262403 bytes),19.1964%
Number of real differences,,36, (11721609 bytes),2.35969%
Number of sessions not classified by PCUBE and classified by NetBEE,,68 (37540794 bytes),4.40051%
Number of sessions classified by PCUBE and not classified by NetBEE,,197,12.6276%
