Time complexity: UPDATE … WHERE vs UPDATE ALL

Refresh

April 2019

Views

421 time

1

I am having a database table DuplicatesRemoved with possibly large number of records. I execute certain operations to remove duplicates of users in my application, and every time I remove the duplicates, I keep a track of the duplicate UserID's by storing them in this table DuplicatesRemoved.

This table contains a bit field HistoryRecord. I need to update this field at the end of every "RemoveDuplicates" operation.

I do NOT have any indexes on DuplicatesRemoved.

I am wondering which of these would be better?

1.

UPDATE DuplicatesRemoved SET HistoryRecord=1 WHERE HistoryRecord<>1

OR

2.

UPDATE DuplicatesRemoved SET HistoryRecord=1

Will Query #1 take less time than Query #2?

I have referred this question but still am not sure about which one would be better for me.

2 answers

1

In the first option:

UPDATE DuplicatesRemoved SET HistoryRecord=1 WHERE HistoryRecord<>1

You have to find those records and update only those.

In the second option:

UPDATE DuplicatesRemoved SET HistoryRecord=1

You have to update the entire table.

So first option will be better assuming you find the records quickly, and also minimizes the number of locks acquired during the time of the update, and the total size of the transaction that the engine writes to the log file (i.e the records that we need to be able to rollback).

Showing the execution plan will help in this decision.

0

In databases, you measure the number of disk accesses to evaluate the complexity of a query, since the time to read something from the external memory is order of magnitute greater than the time to perform few operations in main memory.

The two queries, if no index is present, have the same number of disk accesses, since both require the complete scan of the relation.