Python list big o

Performance of Python Types

Now that you have a general understanding of big O notation, we’re going to spend some time discussing the big O performance for the most commonly-used operations supported by Python lists and dictionaries. The efficiencies of these data types are important because we’ll be using them to implement other abstract data structures for the remainder of this book.

This section is intended to give you some intuitive understanding of why the performances are what they are, but you won’t fully appreciate these reasons until later, when we explore how lists and dictionaries can be implemented.

Keep in mind that there is a difference between the Python language and a Python implementation. Our discussion below assumes the use of the CPython implementation.

Lists

The designers of the Python list data type had many choices to make during implementation. Each choice affected how quickly the list could perform operations. One decision they made was to optimize the list implementation for common operations.

Indexing & Assigning

Two common operations are indexing and assigning to an index position. In Python lists, values are assigned to and retrieved from specific, known memory locations. No matter how large the list is, index lookup and assignment take a constant amount of time and are thus O ( 1 ) O(1) O ( 1 ) .

Appending & Concatenating

Another common programming need is to grow a list. There are two ways to do this: you can use the append method or the concatenation operator ( + ).

The append method is “amortized” O ( 1 ) O(1) O ( 1 ) . In most cases, the memory required to append a new value has already been allocated, which is strictly O ( 1 ) O(1) O ( 1 ) . Once the C array underlying the list has been exhausted, it must be expanded in order to accomodate further append s. This periodic expansion process is linear relative to the size of the new array, which seems to contradict our claim that append ing is O ( 1 ) O(1) O ( 1 ) .

However, the expansion rate is cleverly chosen to be three times the previous size of the array; when we spread the expansion cost over each additional append afforded by this extra space, the cost per append is O ( 1 ) O(1) O ( 1 ) on an amortized basis.

On the other hand, concatenation is O ( k ) O(k) O ( k ) , where k k k is the size of the concatenated list, since k k k sequential assignment operations must occur.

Popping, Shifting & Deleting

Popping from a Python list is typically performed from the end but, by passing an index, you can pop from a specific position. When pop is called from the end, the operation is O ( 1 ) O(1) O ( 1 ) , while calling pop from anywhere else is O ( n ) O(n) O ( n ) . Why the difference?

Читайте также:  Last value of array python

When an item is taken from the front of a Python list, all other elements in the list are shifted one position closer to the beginning. This is an unavoidable cost to allow O ( 1 ) O(1) O ( 1 ) index lookup, which is the more common operation.

For the same reasons, inserting at an index is O ( n ) O(n) O ( n ) ; every subsequent element must be shifted one position closer to the end to accomodate the new element. Unsurprisingly, deletion behaves the same way.

Iteration

Iteration is O ( n ) O(n) O ( n ) because iterating over n n n elements requires n n n steps. This also explains why the in operator in Python is O ( n ) O(n) O ( n ) : to determine whether an element is in a list, we must iterate over every element.

Slicing

Slice operations require more thought. To access the slice [a:b] of a list, we must iterate over every element between indices a and b . So, slice access is O ( k ) O(k) O ( k ) , where k k k is the size of the slice. Deleting a slice is O ( n ) O(n) O ( n ) for the same reason that deleting a single element is O ( n ) O(n) O ( n ) : n n n subsequent elements must be shifted toward the list’s beginning.

Multiplying

To understand list multiplication, remember that concatenation is O ( k ) O(k) O ( k ) , where k k k is the length of the concatenated list. It follows that multiplying a list is O ( n k ) O(nk) O ( n k ) , since multiplying a k k k -sized list n n n times will require k ( n − 1 ) k(n — 1) k ( n − 1 ) appends.

Reversing

Reversing a list is O ( n ) O(n) O ( n ) since we must reposition each element.

Sorting

Finally (and least intuitively), sorting in Python is O ( n log n ) O(n\log) O ( n lo g n ) and beyond the scope of this book to demonstrate.

For reference, we’ve summarized the performance characteristics of Python’s list operations in the table below:

Operation Big O Efficiency
index [] O ( 1 ) O(1) O ( 1 )
index assignment O ( 1 ) O(1) O ( 1 )
append O ( 1 ) O(1) O ( 1 )
pop() O ( 1 ) O(1) O ( 1 )
pop(i) O ( n ) O(n) O ( n )
insert(i, item) O ( n ) O(n) O ( n )
del operator O ( n ) O(n) O ( n )
iteration O ( n ) O(n) O ( n )
contains (in) O ( n ) O(n) O ( n )
get slice [x:y] O ( k ) O(k) O ( k )
del slice O ( n ) O(n) O ( n )
reverse O ( n ) O(n) O ( n )
concatenate O ( k ) O(k) O ( k )
sort O ( n log n ) O(n\log) O ( n lo g n )
multiply O ( n k ) O(nk) O ( n k )

Dictionaries

The second major Python data type is the dictionary. As you might recall, a dictionary differs from a list in its ability to access items by key rather than position. For now, the most important characteristic to note is that “getting” and “setting” an item in a dictionary are both O ( 1 ) O(1) O ( 1 ) operations.

Читайте также:  display

We won’t try to provide an intuitive explanation for this now, but rest assured that we’ll discuss dictionary implementations later. For now, simply remember that dictionaries were created specifically to get and set values by key as fast as possible.

contains

Another important dictionary operation is checking whether a key is present in a dictionary. This “contains” operation is also O ( 1 ) O(1) O ( 1 ) because checking for a given key is implicit in getting an item from a dictionary, which is itself O ( 1 ) O(1) O ( 1 ) .

Iterating & Copying

Iterating over a dictionary is O ( n ) O(n) O ( n ) , as is copying the dictionary, since n n n key/value pairs must be copied.

We’ve summarized the efficencies of all dictionary operations in the table below:

Operation Big O Efficiency
copy O ( n ) O(n) O ( n )
get item O ( 1 ) O(1) O ( 1 )
set item O ( 1 ) O(1) O ( 1 )
delete item O ( 1 ) O(1) O ( 1 )
contains (in) O ( 1 ) O(1) O ( 1 )
iteration O ( n ) O(n) O ( n )

The “Average Case”

The efficiences provided in the above tables are performances in the average case. In rare cases, “contains”, “get item” and “set item” can degenerate into O ( n ) O(n) O ( n ) performance but, again, we’ll discuss that when we talk about different ways of implementing a dictionary.

Python is still an evolving language, which means that the above tables could be subject to change. The latest information on the performance of Python data types can be found on the Python website. As of this writing, the Python wiki has a nice time complexity page that can be found at the Time Complexity Wiki.

Practical Algorithms and Data Structures

Источник

User

This page documents the time-complexity (aka «Big O» or «Big Oh») of various operations in current CPython. Other Python implementations (or older or still-under development versions of CPython) may have slightly different performance characteristics. However, it is generally safe to assume that they are not slower by more than a factor of O(log n).

Generally, ‘n’ is the number of elements currently in the container. ‘k’ is either the value of a parameter or the number of elements in the parameter.

list

The Average Case assumes parameters generated uniformly at random.

Internally, a list is represented as an array; the largest costs come from growing beyond the current allocation size (because everything must move), or from inserting or deleting somewhere near the beginning (because everything after that must move). If you need to add/remove at both ends, consider using a collections.deque instead.

collections.deque

A deque (double-ended queue) is represented internally as a doubly linked list. (Well, a list of arrays rather than objects, for greater efficiency.) Both ends are accessible, but even looking at the middle is slow, and adding to or removing from the middle is slower still.

Читайте также:  List string си шарп

See dict — the implementation is intentionally very similar.

  • As seen in the source code the complexities for set difference s-t or s.difference(t) (set_difference()) and in-place set difference s.difference_update(t) (set_difference_update_internal()) are different! The first one is O(len(s)) (for every element in s add it to the new set, if not in t). The second one is O(len(t)) (for every element in t remove it from s). So care must be taken as to which is preferred, depending on which one is the longest set and whether a new set is needed.
  • To perform set operations like s-t, both s and t need to be sets. However you can do the method equivalents even if t is any iterable, for example s.difference(l), where l is a list.

dict

The Average Case times listed for dict objects assume that the hash function for the objects is sufficiently robust to make collisions uncommon. The Average Case assumes the keys used in parameters are selected uniformly at random from the set of all keys.

Note that there is a fast-path for dicts that (in practice) only deal with str keys; this doesn’t affect the algorithmic complexity, but it can significantly affect the constant factors: how quickly a typical program finishes.

Notes

[1] = These operations rely on the «Amortized» part of «Amortized Worst Case». Individual actions may take surprisingly long, depending on the history of the container. [2] = Popping the intermediate element at index k from a list of size n shifts all elements after k by one slot to the left using memmove. n — k elements have to be moved, so the operation is O(n — k). The best case is popping the second to last element, which necessitates one move, the worst case is popping the first element, which involves n — 1 moves. The average case for an average value of k is popping the element the middle of the list, which takes O(n/2) = O(n) operations. [3] = For these operations, the worst case n is the maximum size the container ever achieved, rather than just the current size. For example, if N objects are added to a dictionary, then N-1 are deleted, the dictionary will still be sized for N objects (at least) until another insertion is made.

TimeComplexity (last edited 2023-01-19 22:35:03 by AndrewBadr )

Источник

Оцените статью