# Python: Most efficient way to compare two lists of integers

 Refresh December 2018 Views 9.5k time
3

I'm trying to compare two lists of integers, each the same size, in Python 2.6. The comparison I need is to compare the first item in List 1 with the first item in List 2, the second item in List 1 with the second item in List 2, and so on, and returns a result if ALL of the list items follow the same comparison criteria. It should behave as follows:

``````list1 = [1,1,1,1]
list2 = [2,1,2,3]
compare(list1,list2)
# returns a "list 1 is <= list 2" response.

list1 = [4,1,4,3]
list2 = [2,1,2,3]
compare(list1,list2)
# returns a "list 1 is >= list 2" response.

list1 = [3,2,3,2]
list2 = [1,4,1,4]
compare(list1,list2)
# returns None— some items in list1 > list2, and some items in list2 > list1.
``````

I figured I could write the code like the following block, but I don't know if it's the most efficient. My program is going to be calling this method a LOT so I want to streamline this as much as possible.

``````def compare(list1,list2):
gt_found = 0
lt_found = 0
for x in range(len(list1)):
if list1[x] > list2[x]:
gt_found += 1
elif list1[x] < list2[x]:
lt_found += 1
if gt_found > 0 and lt_found > 0:
return None   #(some items >, some items <)
if gt_found > 0:
return 1          #(list1 >= list2)
if lt_found > 0:
return -1         #(list1 <= list2)
return 0              #(list1 == list2)
``````

Is it already as good as it's going to get (big-O of n), or is there a faster way to go about it (or a way that uses system functions instead)?

CLARIFICATION: I expect the case that returns 'None' to happen the most often, so it is important.

### 3 answers

5

Вы знакомы с замечательной `zip`функцией?

``````import itertools

def compare(xs, ys):
all_less = True
all_greater = True

for x, y in itertools.izip(xs, ys):
if not all_less and not all_greater:
return None
if x > y:
all_less = False
elif x < y:
all_greater = False

if all_less:
return "list 1 is <= list 2"
elif all_greater:
return "list 1 is >= list 2"
return None  # all_greater might be set False on final iteration
``````

Zip принимает два списка ( `xs`и `ys`в этом случае, но назвать их все , что вы хотите) и создает итератор для последовательности кортежей.

``````izip([1,2,3,4], [4,3,2,1]) == [(1,4), (2,3), (3,2), (4,1)]
``````

Таким образом, вы можете перебирать в обоих списках одновременно и сравнивать каждое значение в тандеме. Трудоемкость должно быть О (п), где п размер ваших списков.

Он вернется рано в тех случаях, когда ни> = или <= условие удовлетворяется.

### Обновить

Как Джеймс Матта указывает, `itertools.izip`работает лучше , чем стандарт `zip`в Python 2. Это не верно в Python 3, где стандарт `zip`работает так , как `izip`делает в более старых версиях.

0

For anyone interested in the performance of the two methods, I named the iterative method 'tortoise' and the numpy method 'hare', and tested it with the code below.

At first, the 'tortoise' won [.009s [T] vs .033s [H]], but I checked it and found that asarray() was being called more often than it need to be. With that fix, the 'hare' won again, [.009s [T] vs .006s [H]].

The data is here: http://tny.cz/64d6e5dc
It consists of 28 lines of about 950 elements in length. Four of the lines collectively >= all the others.

It might be interesting to see how the performance works on larger data sets.

``````import itertools, operator, math
import cProfile
import numpy as np

data = #SEE PASTEBIN

def tortoise(xs, ys):
all_less = True
all_greater = True

for x, y in zip(xs, ys):
if not all_less and not all_greater:
return None
if x > y:
all_less = False
elif x < y:
all_greater = False

if all_greater and all_less:
return 0
if all_greater:
return 1
if all_less:
return -1
return None  # all_greater might be set False on final iteration

hare = lambda b,a : np.all(b >= a)

def find_uniques_tortoise():
include_list = range(len(data))
current_list_index = 0
while current_list_index < len(data):
if current_list_index not in include_list:
current_list_index += 1
continue
for x in range(current_list_index+1,len(data)):
if x not in include_list:
continue
result = tortoise(data[current_list_index], data[x])
if result is None: #no comparison
continue
elif result == 1 or result == 0: # this one beats the other one
include_list.remove(x)
continue
elif result == -1: #the other one beats this one
include_list.remove(current_list_index)
break
current_list_index +=1
return include_list

def find_uniques_hare():
include_list = range(len(data))
current_list_index = 0
#do all asarray()s beforehand for max efficiency
for x in range(len(data)):
data[x] = np.asarray(data[x])
while current_list_index < len(data):
if current_list_index not in include_list:
current_list_index += 1
continue
for x in range(current_list_index+1,len(data)):
if x not in include_list:
continue
if hare(data[current_list_index], data[x]): # this one beats the other one, or it's a tie
include_list.remove(x)
#   print x
continue
elif hare(data[x], data[current_list_index]): #the other one beats this one
include_list.remove(current_list_index)
#   print current_list_index
break
else: #no comparison
continue
current_list_index +=1
return include_list

cProfile.run('find_uniques_tortoise()')
cProfile.run('find_uniques_hare()')

print find_uniques_tortoise()
print
print find_uniques_hare()
``````
5

Вы можете рассмотреть Numpy на основе векторизованного сравнения.

``````import numpy as np

a = [1,1,1,2]
b = [2,2,4,3]

all_larger = np.all(np.asarray(b) > np.asarray(a))  # true if b > a holds elementwise

print all_larger

True
``````

Очевидно, что вы можете спроектировать вещь, чтобы иметь свой ответ.

``````all_larger = lambda b,a : np.all(np.asarray(b) > np.asarray(a))

if all_larger(b,a):
print "b > a"
elif all_larger(a,b):
print "a > b"

else
print "nothing!"
``````

Каждый тип сравнения , такие , как `<, >, <=, >=,`может быть сделано.