The Content Name Collection

Dataset: cisco-url-names-2014-12

Intro

The cisco-url-names-2014-12 dataset consists of 14 files. They were created by applying Cisco's ndn-trace-script (extract_urls.sh) on the HTTP URL traces obtained from IRCache. The full dataset comprises 13'549'129 URL content names.

If you are interested in the corresponding ICN content names to this dataset visit cisco-icn-names-2014-12.

Examples

Features of cisco-url-names-2014-12:

Cisco ndn-trace-script

You can download all the Cisco scripts directly here: ndn-trace-script-master.zip
SHA-512 checksum: ndn-trace-script-master_sha512-sum.txt
These files originally were obtained from https://github.com/wonsocisco/ndn-trace-script (2014-11-19)

Please take note of the readme:

NDN Trace Script Copyright (c) 2012-2013 by Cisco Systems, Inc. All rights reserved. Written by Ashok Narayanan and Won So

This software suite provides Perl scripts that can be used to translate HTTP URL traces into NDN names.

extract_urls.sh: This script reads gzipped IRCache trace files in the current directory and convert them into plain text HTTP URLs by adding ".urls" at the end of each trace file name.

url2ccnf.pl: This script converts plain text files with HTTL URLs into CCNF (Common Componentized Name Format - see another document) format files simultaneously generating the histogram of named components in the input files.

build_fib.pl: Given a set of names from CCNF files, this script builds a FIB name trace that satisfies a specific component name distribution.

ccnfdump.pl: This utility script decode names in a CCNF file and displays in a plain text.

For more details, refer comments in script source files and the paper published based on the data generated from these scripts: Won So, Ashok Narayanan, and David Oran, Named data networking on a router: fast and DoS-resistant forwarding with hash tables, In Proceedings of the 2013 ACM/IEEE Ninth Symposium on Architectures for Networking and Communications Systems, Oct. 2013.

HTTP URL traces can be obtained from independent sources. E.g. IRCache trace: ftp://ircache.net/Traces/DITL-2007-01-09

Files

SHA-512 checksums for all files in this dataset: cisco-url-names-2014-12_sha512-sums.txt

cisco-url-names-2014-12_1.txt.xz
cisco-url-names-2014-12_2.txt.xz
cisco-url-names-2014-12_3.txt.xz
cisco-url-names-2014-12_4.txt.xz
cisco-url-names-2014-12_5.txt.xz
cisco-url-names-2014-12_6.txt.xz
cisco-url-names-2014-12_7.txt.xz
cisco-url-names-2014-12_8.txt.xz
cisco-url-names-2014-12_9.txt.xz
cisco-url-names-2014-12_10.txt.xz
cisco-url-names-2014-12_11.txt.xz
cisco-url-names-2014-12_12.txt.xz
cisco-url-names-2014-12_13.txt.xz
cisco-url-names-2014-12_14.txt.xz

Contact

General coordinator:
Urs Schnurrenberger (urs.schnurrenberger@unibas.ch)

Also involved:
Christian Tschudin, Manolis Sifalakis

University of Basel
Department of Mathematics
and Computer Science
Spiegelgasse 1
CH - 4051 Basel