Useful Functions

If we create our own function with the built in function name, it will be run instead of built in function.

Filtering Non English Names

All these characters that are specific to English texts are encoded using the ASCII standard. Each ASCII character has a corresponding number between 0 and 127 associated with it, and we can take advantage of that to build a function that checks an app name and tells us whether it contains non-ASCII characters.

We built this function below, and we use the built-in ord() function to find out the corresponding encoding number of each character.

def is_english(string):
    
    for character in string:
        if ord(character) > 127:
            return False
    
    return True

The function seems to work fine, but some English app names use emojis or other symbols (™, — (em dash), – (en dash), etc.) that fall outside of the ASCII range. Because of this, we'll remove useful apps if we use the function in its current form. To minimize the impact of data loss, we'll only remove an app if its name has more than three non-ASCII characters:

def is_english(string):
    non_ascii = 0
    
    for character in string:
        if ord(character) > 127:
            non_ascii += 1
    
    if non_ascii > 3:
        return False
    else:
        return True

Sorted

This function takes in an iterable data type (like a list, dictionary, tuple, etc.), and returns a list of the elements of that iterable sorted in ascending or descending order (the reverse parameter controls whether the order is ascending or descending).

Replace Function

str.replace(), we substitute the str for the variable name of the string we want to modify. Let's look at an example in code:

fav_color = "red is my favorite color"
fav_color = fav_color.replace("red", "blue")
print(fav_color)

blue is my favorite color

Capitatize 1st word

The str.title() method returns a copy of the string with the first letter of each word transformed to uppercase (also known as title case).

Let's look at an example of this method in action with a simple string:

The Cool Thing About This String Is That It Has A Combination Of Uppercase And Lowercase Letters!

Split strings

three_peat = "1991-1993"
print(three_peat.split("-"))

Convert a value in each row to int

for row in moma:
    birth_date = row[3]
    if birth_date != "":
        birth_date = int(birth_date)
    row[3] = birth_date

Convert to string

decade = str(age)

String Format Function

We use the method with a string — which acts as a template — using the brace characters ({}) to signify where we want to insert any variables. We then pass those variables as arguments to the method. Let's look at a few examples:

output = "{}'s favorite number is {}".format("Kylie", 8)
print(output)

Kylie's favorite number is 8

str.format() converts the integer to a string. The variables are inserted into the {} in the order we pass them as arguments.

If we want to specify ordering and/or repeat numbers, we can use integers:

output = "{0}'s favorite number is {1}, {1} is {0}'s favorite number".format("Kylie", 8)
print(output)

Kylie's favorite number is 8, 8 is Kylie's favorite number

example

def artist_summary(artist):
    num_artworks = artist_freq[artist]
    template = "There are {num} artworks by {name} in the dataset"
    output = template.format(name=artist, num=num_artworks)
    print(output)

artist_summary("Henri Matisse")

Formatting Numbers Inside Strings

Another powerful use of the method is helping us apply formatting to numbers as they are inserted into the string. This can make our data more readable, especially in the case of long decimal numbers. Let's look at a quick example:

num = 32.554865
print("I own {pct}% of the company".format(pct=num))

I own 32.554865% of the company

For most cases, having six numbers after the decimal point — also called precision — is unnecessary. One approach might be that instead of a precision of 6, we only want to show a precision of 2:

I own 32.55% of the company

We specify number formatting, including things like precision, by adding one of various format specifications inside the braces ({}) of our string. There are many different parts to this format specification part of the documentation, but because the complexity makes it difficult to understand, we're going to just focus on the most common ones you'll need.

To indicate the precision of two, we specify :.2f after the name or position of our argument:

If you are not specifying a named/positional argument, you just leave that part out:

Another useful format specification is to add a comma as a thousands separator, which prevents large numbers from being hard to read, as in the example below:

print("The approximate population of {0} is {1}".format("India",1324000000))

The approximate population of India is 1324000000

To add a comma, you would use the syntax :, inside the brackets, after the number or name of the variable you're inserting:

We can also combine the thousands separator and the precision by specifying them in this order:

Example

pop_millions = [
    ["China", 1379.302771],
    ["India", 1281.935991],
    ["USA",  326.625791],
    ["Indonesia",  260.580739],
    ["Brazil",  207.353391],
]

template = "The population of {} is {:,.2f} million"

for country in pop_millions:
    name = country[0]
    pop = country[1]
    output = template.format(name, pop)
    print(output)

The population of China is 1,379.30 million
The population of India is 1,281.94 million
The population of USA is 326.63 million
The population of Indonesia is 260.58 million
The population of Brazil is 207.35 million

PreviousIntrouction to Functions NextDates and time

Last updated 1 year ago

hashtagFiltering Non English Names

hashtagSorted

hashtagReplace Function

hashtagCapitatize 1st word

hashtagSplit strings

hashtagConvert a value in each row to int

hashtagString Format Function

hashtagexample

hashtagFormatting Numbers Inside Strings

hashtagExample