Python: Most efficient way to compare two lists of integers

Refresh

December 2018

Views

9.5k time

3

I'm trying to compare two lists of integers, each the same size, in Python 2.6. The comparison I need is to compare the first item in List 1 with the first item in List 2, the second item in List 1 with the second item in List 2, and so on, and returns a result if ALL of the list items follow the same comparison criteria. It should behave as follows:

list1 = [1,1,1,1]
list2 = [2,1,2,3]
compare(list1,list2) 
# returns a "list 1 is <= list 2" response.

list1 = [4,1,4,3]
list2 = [2,1,2,3]
compare(list1,list2) 
# returns a "list 1 is >= list 2" response.

list1 = [3,2,3,2]
list2 = [1,4,1,4]
compare(list1,list2) 
# returns None— some items in list1 > list2, and some items in list2 > list1.

I figured I could write the code like the following block, but I don't know if it's the most efficient. My program is going to be calling this method a LOT so I want to streamline this as much as possible.

def compare(list1,list2):
    gt_found = 0
    lt_found = 0
    for x in range(len(list1)):
        if list1[x] > list2[x]:
            gt_found += 1
        elif list1[x] < list2[x]:        
            lt_found += 1
        if gt_found > 0 and lt_found > 0:
            return None   #(some items >, some items <)
    if gt_found > 0:
        return 1          #(list1 >= list2)
    if lt_found > 0:
        return -1         #(list1 <= list2)
    return 0              #(list1 == list2)

Is it already as good as it's going to get (big-O of n), or is there a faster way to go about it (or a way that uses system functions instead)?

CLARIFICATION: I expect the case that returns 'None' to happen the most often, so it is important.

3 answers

5

Вы знакомы с замечательной zipфункцией?

import itertools

def compare(xs, ys):
  all_less = True
  all_greater = True

  for x, y in itertools.izip(xs, ys):
    if not all_less and not all_greater:
      return None
    if x > y:
      all_less = False
    elif x < y:
      all_greater = False

  if all_less:
    return "list 1 is <= list 2"
  elif all_greater:
    return "list 1 is >= list 2"
  return None  # all_greater might be set False on final iteration

Zip принимает два списка ( xsи ysв этом случае, но назвать их все , что вы хотите) и создает итератор для последовательности кортежей.

izip([1,2,3,4], [4,3,2,1]) == [(1,4), (2,3), (3,2), (4,1)]

Таким образом, вы можете перебирать в обоих списках одновременно и сравнивать каждое значение в тандеме. Трудоемкость должно быть О (п), где п размер ваших списков.

Он вернется рано в тех случаях, когда ни> = или <= условие удовлетворяется.

Обновить

Как Джеймс Матта указывает, itertools.izipработает лучше , чем стандарт zipв Python 2. Это не верно в Python 3, где стандарт zipработает так , как izipделает в более старых версиях.

0

For anyone interested in the performance of the two methods, I named the iterative method 'tortoise' and the numpy method 'hare', and tested it with the code below.

At first, the 'tortoise' won [.009s [T] vs .033s [H]], but I checked it and found that asarray() was being called more often than it need to be. With that fix, the 'hare' won again, [.009s [T] vs .006s [H]].

The data is here: http://tny.cz/64d6e5dc
It consists of 28 lines of about 950 elements in length. Four of the lines collectively >= all the others.

It might be interesting to see how the performance works on larger data sets.

import itertools, operator, math
import cProfile
import numpy as np

data = #SEE PASTEBIN

def tortoise(xs, ys):
    all_less = True
    all_greater = True

    for x, y in zip(xs, ys):
        if not all_less and not all_greater:
          return None
        if x > y:
          all_less = False
        elif x < y:
          all_greater = False

    if all_greater and all_less:
        return 0
    if all_greater:
        return 1
    if all_less:
        return -1
    return None  # all_greater might be set False on final iteration


hare = lambda b,a : np.all(b >= a)

def find_uniques_tortoise():
    include_list = range(len(data))
    current_list_index = 0
    while current_list_index < len(data):
        if current_list_index not in include_list:
            current_list_index += 1
            continue
        for x in range(current_list_index+1,len(data)):
            if x not in include_list:
                continue
            result = tortoise(data[current_list_index], data[x])
            if result is None: #no comparison
                continue 
        elif result == 1 or result == 0: # this one beats the other one
            include_list.remove(x)
            continue
        elif result == -1: #the other one beats this one
            include_list.remove(current_list_index)
            break
    current_list_index +=1
return include_list

def find_uniques_hare():
    include_list = range(len(data))
    current_list_index = 0
    #do all asarray()s beforehand for max efficiency
    for x in range(len(data)): 
        data[x] = np.asarray(data[x])
    while current_list_index < len(data):
        if current_list_index not in include_list:
            current_list_index += 1
            continue
        for x in range(current_list_index+1,len(data)):
            if x not in include_list:
                continue
            if hare(data[current_list_index], data[x]): # this one beats the other one, or it's a tie
                include_list.remove(x)
            #   print x
                continue
            elif hare(data[x], data[current_list_index]): #the other one beats this one
                include_list.remove(current_list_index)
            #   print current_list_index
                break
            else: #no comparison
                continue
        current_list_index +=1
    return include_list             



cProfile.run('find_uniques_tortoise()')
cProfile.run('find_uniques_hare()')

print find_uniques_tortoise()
print   
print find_uniques_hare()       
5

Вы можете рассмотреть Numpy на основе векторизованного сравнения.

import numpy as np

a = [1,1,1,2]
b = [2,2,4,3]

all_larger = np.all(np.asarray(b) > np.asarray(a))  # true if b > a holds elementwise

print all_larger

        True

Очевидно, что вы можете спроектировать вещь, чтобы иметь свой ответ.

all_larger = lambda b,a : np.all(np.asarray(b) > np.asarray(a))

if all_larger(b,a):
       print "b > a"
elif all_larger(a,b):
       print "a > b"

else
       print "nothing!"

Каждый тип сравнения , такие , как <, >, <=, >=,может быть сделано.