Comparing Data Sets: Are Two Data Files the Same?

Jon and I were looking at nondominated sorting, and the question came up, “how do you validate a sorting routine?”  You’ve got to compare the resulting data, but doing so is not straightforward.  You can’t just diff the text files that come out, because there’s no guarantee everything comes out in the same order.  Sorting and then diff-ing the files still doesn’t help, because output formatting may differ between sorting routines.  So I wrote a Python script that evaluates whether two data tables are the same, to within a specified tolerance:

https://github.com/matthewjwoodruff/datacomparison

Advertisements

2 thoughts on “Comparing Data Sets: Are Two Data Files the Same?

  1. Pingback: Water Programming Blog Guide (Part I) – Water Programming: A Collaborative Research Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s