Dictionaries

A dictionary in Python is a collection of key-value pairs. Here's a breakdown of this concept:

  • Key: A unique identifier used to access a corresponding value.

  • Value:The data associated with a key.

  • Key-Value Pair:A combination of a key and its corresponding value.

Consider the key-value pair '4+': 4433. Here:

  • Key: '4+'

  • Value: 4433

Together, they form the key-value pair '4+': 4433.

Dictionary values can be of any data type: strings, integers, floats, Booleans, lists, and even dictionaries.

Create a dictionary

To create the dictionary above, complete the following:

  • Map each content rating to its corresponding number by following an index:value pattern. For instance, to map a rating of '4+' to the number 4,433, we type '4+': 4433 (notice the colon between '4+' and 4433). To map '9+' to 987, we type '9+': 987, and so on.

  • Type the entire sequence of index:value pairs and separated each with a comma: '4+': 4433, '9+': 987, '12+': 1155, '17+': 622.

  • Surround the sequence with curly braces: {'4+': 4433, '9+': 987, '12+': 1155, '17+': 622}

Retrieve a Value

Appending value to a dictionary

Limitations of Keys

While values can be almost anything, keys have some restrictions. They can be of many data types, including: integers, strings, floats, Booleans. However, lists and dictionaries cannot be used as keys. Attempting to do so results in a TypeError. This is because keys must be "hashable," meaning they cannot change over the lifetime of the dictionary. Lists and dictionaries are mutable, so they can't be used as keys. When we populate a dictionary, Python tries to convert each dictionary key to an integer (even if the key is a data type other than an integer). Python does the conversion using the hash() command:

Check if value exists in the Dictionary

An expression of the form a_value in a_dictionary always returns a Boolean value:

  • We get True if a_value exists in a_dictionary as a dictionary key.

  • We get False if a_value doesn't exist in a_dictionary as a dictionary key.

Updating Dictionary Values

Counting with Dictionaries

We can update dictionary values to count how many times each unique content rating occurs in our dataset. Let's start by considering the list ['4+', '4+', '4+', '9+', '9+', '12+', '17+'], which stores a few content ratings. To use code to count how many times each rating occurs in this short list, let's complete the following:

  • Create a dictionary where the keys are the unique content ratings and the values are all 0: {'4+': 0, '9+': 0, '12+': 0, '17+': 0}.

  • Loop through the list ['4+', '4+', '4+', '9+', '9+', '12+', '17+'], and for each iteration, do the following:

    • Check if the iteration variable exists as a key in the previously created dictionary.

    • If it exists, then increment the dictionary value at that key by 1.

If we do not know the key values, we can generate them on run time.

Change Frequencies into Propositions

To change frequencies into proportions or percentages, we can individually modify the values in the dictionary by doing the necessary math. In the example below, we divide each value in the dictionary by the total number of apps to change the frequencies into proportions.

Cleansing data (removing duplicates based on a single highest value)

We are saving highest number of reviews. Now, we will save data of apps with highest number of reviews in a separate data set.

  • We start by initializing two empty lists, android_clean and already_added.

  • We loop through the android data set, and for every iteration:

    • We isolate the name of the app and the number of reviews.

    • We add the current row (app) to the android_clean list, and the app name (name) to the already_added list if:

      • The number of reviews of the current app matches the number of reviews of that app as described in the reviews_max dictionary; and

      • The name of the app is not already in the already_added list. We need to add this supplementary condition to account for those cases where the highest number of reviews of a duplicate app is the same for more than one entry (for example, the Box app has three entries, and the number of reviews is the same). If we just check for reviews_max[name] == n_reviews, we'll still end up with duplicate entries for some apps.

Last updated