1 year ago

#299639

test-img

Salvatore Ambulando

Sorting based on fuzzy criteria OR Create an acceptable order with only n comparisons

I'm looking for an algorithm to sort a large number of items using the fewest comparisons. My specific case makes it unclear which of the obvious approaches is appropriate: the comparison function is slow and non-deterministic so it can make errors, because it's a human brain.

In other words, I want to sort arbitrary items on my computer into a list from "best" to "worst" by comparing them two at a time. They could be images, strings, songs, anything. My program would display two things for me to compare. The program doesn't know anything about what is being compared, its job is just to decide which pairs to compare. So that gives the following criteria

  1. It's a comparison sort - The only time the user sees items is when comparing two of them.
  2. It's an out-of-place sort - I don't want to move the actual files, so items can have placeholder values or metadata files
  3. Comparisons are slow - at least compared to a computer. Data locality won't have an effect, but comparing obvious disparities will be quick, similar items will be slow.
  4. Comparison is subjective - comparison results could vary slightly at different times.
  5. Items don't have a total order - the desired outcome is an order that is "good enough" at runtime, which will vary depending on context.
  6. Items will rarely be almost sorted - in fact, the goal is to get random data to an almost-sorted state.
  7. Sets usually will contain runs - If every song on an album is a banger, it might be faster because of (2) to compare them to the next album rather than each other. Imagine a set {10.0, 10.2, 10.9, 5.0, 4.2, 6.9} where integer comparisons are fast but float comparisons are very slow.

There are many different ways to approach this problem. In addition to sorting algorithms, it's similar to creating tournament brackets, and voting systems. As that table illustrates, there are countless ways to define and solve the problem based on various criteria. For this question I'm only interested in treating it as a sorting problem where the user is comparing two items at a time and choosing a preference. So what approach makes sense for either of the two following versions of the question?

  1. How to choose pairs to get the best result in O(n) or fewer operations? (for example compare random pairs of items with n/2 operations, then use n/2 operations to spot check or fine-tune)
  2. How to create the best order with additional operations but no additional comparisons (e.g. similar items are sorted into buckets or losers are removed, anything that doesn't increase the number of comparisons)

The representation of comparison results can be anything that makes the solution convenient - it can be dictionary keys corresponding to the final order, a "score" based on number of comparisons, a database, etc.

Edit: The comments have helped clarify the question in that the goal is similar to something like bucket sort, samplesort or the partitioning phase of quicksort. So the question could be rephrased as how to choose good partitions based on comparisons, but I'm also interested in any other ways of using the comparison results that wouldn't be applicable in a standard in-place comparison sort like keeping a score for each item.

algorithm

sorting

approximation

external-sorting

partial-sort

0 Answers

Your Answer

Accepted video resources