November 01, 2023
Python Collections
A tangle, a bed, even a knot - there is no universally accepted collective nouns for pythons, but these are some commonly used terms. My favourite is 'pit'. None of these words conjure fun images in the mind. But this is not an article about creepy bunches of snakes basking in the sun or sharing food sources, in this article we will talk about Collections, special objects that store data, in Python.
Audience
I have written this article with beginners and new learners in mind.
Prerequisites
- a Python interpreter installed on your machine - if you are new to Python, check out my
- a text editor or IDE. I've recently enjoyed working with , and is always amazing
Introduction
Enterprise developers inevitably need to work with data - bunches of stuff. Python offers several ways for handling data in your program. These are:
- list - a collection of things in any order. Lists are mutable - you can change them after you have created them. e.g., a list of snake species
- tuple - a collection of things in a certain order. Tuples are immutable - you cannot change them. e.g., the names of weekdays - not going to change any time soon
- set - a collection of things in any order. Sets are mutable but cannot contain duplicates. e.g., a shuffled deck of cards. You can't have more than one Queen of Hearts!
- dictionary - a collecting of things in key-value pair format. e.g., "name": "Kaa". Dictionaries are mutable, but must but the keys must be unique.
In this tutorial we will introduce each of these.
Step-by-step
Lists
If you have worked with programming languages like Java, you might know this as an 'array'. Let's create a list.
- create a file called collections.py
- edit collections.py and add the code below.
- run collections.py - 'python collections.py'
my_list = ["London", "Paris", "Tokyo", "Berlin", "Madrid", "Mumbai", "Tokyo", "Kiev"]
print(my_list)
['London', 'Paris', 'Tokyo', 'Berlin', 'Madrid', 'Mumbai', 'Tokyo', 'Kiev']
Here, we have created list containing city names and then printed that list. You will note that the list contains Tokyo twice. Also, when printing the list, the cities are in not in alphabetic order. This is because lists can contain duplicates and are unordered.
Lists are also mutable, we can change them. Let's add a city. To do this, we use the append function.
- add Brasilia to the list
my_list = ["London", "Paris", "Tokyo", "Berlin", "Madrid", "Mumbai", "Tokyo", "Kiev"]
my_list.append("Brasilia")
print(my_list)
['London', 'Paris', 'Tokyo', 'Berlin', 'Madrid', 'Mumbai', 'Tokyo', 'Kiev', 'Brasilia']
The append function inserts a new entry to the end of a list.
Now let's get rid of that duplicate Tokyo. But before we do, a brief detour on indices. Lists have the concepts of indices - a shadow list of numbers, each representing a corresponding entry in the list. The indices start at 0. So for example:
['London', 'Paris', 'Tokyo', 'Berlin', 'Madrid', 'Mumbai', 'Tokyo', 'Kiev', 'Brasilia']
[ 0, 1, 2, 3, 4, 5, 6, 7, 8]
Notice that we have nine cities in our list, but the highest index number is 8. This is because we started at zero. Okay let's use this to remove the second Tokyo entry. Who can guess the index where we will find that entry? Correct, index 6. Python provides the pop() function to remove elements from a list.
- modify collections.py with this code:
my_list = ["London", "Paris", "Tokyo", "Berlin", "Madrid", "Mumbai", "Tokyo", "Kiev"]
my_list.append("Brasilia")
my_list.pop(6)
print(my_list)
['London', 'Paris', 'Tokyo', 'Berlin', 'Madrid', 'Mumbai', 'Kiev', 'Brasilia']
If we didn't provide an index, the 'pop' function would remove the last entry in the list. Last in, first out.
Okay, I like a bit of order. The sort() function orders the entries in our list in either ascending (default) or descending sequence.
- let's sort our list into alphabetical order and also in reverse.
my_list = ["London", "Paris", "Tokyo", "Berlin", "Madrid", "Mumbai", "Tokyo", "Kiev"]
my_list.append("Brasilia")
my_list.pop(6)
my_list.sort()
print(my_list)
my_list.sort(reverse=True)
print(my_list)
['Berlin', 'Brasilia', 'Kiev', 'London', 'Madrid', 'Mumbai', 'Paris', 'Tokyo']
['Tokyo', 'Paris', 'Mumbai', 'Madrid', 'London', 'Kiev', 'Brasilia', 'Berlin']
You can access an element in a list by its index. If I wanted the third element, I use index 2 (remember that the numbering starts at zero).
my_list = ["London", "Paris", "Tokyo", "Berlin", "Madrid", "Mumbai", "Tokyo", "Kiev"]
my_list.append("Brasilia")
my_list.pop(6)
my_list.sort()
my_list.sort(reverse=True)
print(my_list)
city = my_list[2]
print(city)
['Tokyo', 'Paris', 'Mumbai', 'Madrid', 'London', 'Kiev', 'Brasilia', 'Berlin']
Mumbai
Tuples
In the previous section, we were playing fast and loose with our data, adding and dropping cities in any order with impunity. The tuple is another of Python's in-built data types. Tuples run a tighter ship, they are immutable and keep their elements in the order in which they were created.
- create a tuple using parenthesis
my_tuple = ("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")
print(my_tuple)
('Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday')
- Just as with lists, you can access a tuple's elements by index.
my_tuple = ("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")
print(my_tuple)
day = my_tuple[4]
print(day)
('Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday')
Thursday
We cannot change a tuple, there are no methods to add, remove, or sort its elements. Tuples are useful for collections for which we do not expect the values or order to change during a program's execution.
Set
I have described a Set as a shuffled deck of playing cards. You can add and remove cards, but it (hopefully) can never contain more than one of the same face/suit combination. The analogy is not perfect however, as you can sort a physical deck of cards, but in Python, sets are unordered.
- create a set with curly braces
my_set={1,2,3,4,5,"hello","tup", 1,"hello"}
print(my_set)
{'tup', 1, 2, 3, 4, 5, 'hello'}
I deliberately repeated 1 and "hello" when I created the set, however, when I printed it, these duplicates were not present.
Set objects are not subscriptable, so when cannot refer to an element by index. The code below will throw an exception:
my_set={1,2,3,4,5,"hello","tup", 1,"hello"}
print(my_set)
print(my_set[2])
TypeError: 'set' object is not subscriptable
Dictionaries
Not Collins or Websters, a dictionary here means data structured by key and values (commonly referred to as key-value pairs). A dictionary generally holds a single data point with a defined structure rather than a long list of records. Here's an example of a dictionary:
{
"city_name": "Brasilia",
"population": 214000000,
"year_founded": 1960,
"website": "brasilia.df.gov.br"
}
Notice that the different elements in the dictionary can have different types. city_name contains a string while year_founded is an integer.
- create a dictionary as follows
my_dict = {
"city_name": "Brasilia",
"population": 214000000,
"year_founded": 1960,
"website": "brasilia.df.gov.br"
}
print(my_dict)
{'city_name': 'Brasilia', 'population': 214000000, 'year_founded': 1960, 'website': 'brasilia.df.gov.br'}
Dictionary items are ordered, changeable, and unique. So we can change the value for the population key, but we can only have one key with that name. We can also expect 'population' to appear in the same place.
We can access item values dictionary by quoting its key name.
- find the year in which Brasilia was founded
my_dict = {
"city_name": "Brasilia",
"population": 214000000,
"year_founded": 1960,
"website": "brasilia.df.gov.br"
}
1960
Changing a value is easy.
- re-write history and show Brasilia as having been founded in 1970.
my_dict = {
"city_name": "Brasilia",
"population": 214000000,
"year_founded": 1960,
"website": "brasilia.df.gov.br"
}
my_dict["year_founded"] = 1970
print(my_dict["year_founded"])
1970
Another way to update a dictionary is simple with the update() method.
- let's set the record straight
my_dict = {
"city_name": "Brasilia",
"population": 214000000,
"year_founded": 1960,
"website": "brasilia.df.gov.br"
}
my_dict["year_founded"] = 1970
my_dict.update({"year_founded": 1960})
print(my_dict["year_founded"])
1960
- add a new item to a dictionary as below:
my_dict = {
"city_name": "Brasilia",
"population": 3500924,
"year_founded": 1960,
"website": "brasilia.df.gov.br"
}
my_dict["official_language"] = "Portuguese"
print(my_dict)
{'city_name': 'Brasilia', 'population': 3500924, 'year_founded': 1960, 'website': 'brasilia.df.gov.br', 'official_language': 'Portuguese'}
Finally, to remove a key, we can use the pop() function - just as we did for lists.
- let's remove the website key/value pair.
my_dict = {
"city_name": "Brasilia",
"population": 3500924,
"year_founded": 1960,
"website": "brasilia.df.gov.br"
}
my_dict["official_language"] = "Portuguese"
my_dict.pop("website")
print(my_dict)
{'city_name': 'Brasilia', 'population': 3500924, 'year_founded': 1960, 'official_language': 'Portuguese'}
Conclusion
A collection of pythons might be a scary prospect, but Python Collections are easy to learn and powerful. In truth, each of Python's built-in collection types warrants its own article, but after
262 views