Abstract. We address the problem of creating entire and complete
maps of software code clones (copy features in data) in a corpus of binary
artifacts of unknown provenance. We report on a practical methodology,
which employs enhanced suffix data structures and partial orderings of
clones to compute a compact representation of most interesting clones
features in data. The enumeration of clone features is useful for malware
triage and prioritization when human exploration, testing and verification
is the most costly factor. We further show that the enhanced arrays
may be used for discovery of provenance relations in data and we introduce
two distinct Jaccard similarity coefficients to measure code similarity
in binary artifacts. We illustrate the use of these tools on real malware
data including a retro-diction experiment for measuring and enumerating
evidence supporting common provenance in Stuxnet and Duqu. The
results indicate the practicality and efficacy of mapping completely the
clone features in data.
more here................http://arxiv.org/pdf/1407.2877v1.pdf
maps of software code clones (copy features in data) in a corpus of binary
artifacts of unknown provenance. We report on a practical methodology,
which employs enhanced suffix data structures and partial orderings of
clones to compute a compact representation of most interesting clones
features in data. The enumeration of clone features is useful for malware
triage and prioritization when human exploration, testing and verification
is the most costly factor. We further show that the enhanced arrays
may be used for discovery of provenance relations in data and we introduce
two distinct Jaccard similarity coefficients to measure code similarity
in binary artifacts. We illustrate the use of these tools on real malware
data including a retro-diction experiment for measuring and enumerating
evidence supporting common provenance in Stuxnet and Duqu. The
results indicate the practicality and efficacy of mapping completely the
clone features in data.
more here................http://arxiv.org/pdf/1407.2877v1.pdf