If we create our own function with the built in function name, it will be run instead of built in function.
Filtering Non English Names
All these characters that are specific to English texts are encoded using the ASCII standard. Each ASCII character has a corresponding number between 0 and 127 associated with it, and we can take advantage of that to build a function that checks an app name and tells us whether it contains non-ASCII characters.
We built this function below, and we use the built-in ord() function to find out the corresponding encoding number of each character.
def is_english(string):
for character in string:
if ord(character) > 127:
return False
return True
The function seems to work fine, but some English app names use emojis or other symbols (™, — (em dash), – (en dash), etc.) that fall outside of the ASCII range. Because of this, we'll remove useful apps if we use the function in its current form.
The function seems to work fine, but some English app names use emojis or other symbols (™, — (em dash), – (en dash), etc.) that fall outside of the ASCII range. Because of this, we'll remove useful apps if we use the function in its current form. To minimize the impact of data loss, we'll only remove an app if its name has more than three non-ASCII characters:
def is_english(string):
non_ascii = 0
for character in string:
if ord(character) > 127:
non_ascii += 1
if non_ascii > 3:
return False
else:
return True
Sorted
This function takes in an iterable data type (like a list, dictionary, tuple, etc.), and returns a list of the elements of that iterable sorted in ascending or descending order (the reverse parameter controls whether the order is ascending or descending).
Replace Function
str.replace(), we substitute the str for the variable name of the string we want to modify. Let's look at an example in code:
Capitatize 1st word
Split strings
Convert a value in each row to int
Convert to string
String Format Function
We use the method with a string — which acts as a template — using the brace characters ({}) to signify where we want to insert any variables. We then pass those variables as arguments to the method. Let's look at a few examples:
str.format() converts the integer to a string. The variables are inserted into the {} in the order we pass them as arguments.
If we want to specify ordering and/or repeat numbers, we can use integers:
example
Formatting Numbers Inside Strings
Another powerful use of the method is helping us apply formatting to numbers as they are inserted into the string. This can make our data more readable, especially in the case of long decimal numbers. Let's look at a quick example:
For most cases, having six numbers after the decimal point — also called precision — is unnecessary. One approach might be that instead of a precision of 6, we only want to show a precision of 2:
We specify number formatting, including things like precision, by adding one of various format specifications inside the braces ({}) of our string. There are many different parts to this format specification part of the documentation, but because the complexity makes it difficult to understand, we're going to just focus on the most common ones you'll need.
To indicate the precision of two, we specify :.2f after the name or position of our argument:
If you are not specifying a named/positional argument, you just leave that part out:
Another useful format specification is to add a comma as a thousands separator, which prevents large numbers from being hard to read, as in the example below:
To add a comma, you would use the syntax :, inside the brackets, after the number or name of the variable you're inserting:
We can also combine the thousands separator and the precision by specifying them in this order:
fav_color = "red is my favorite color"
fav_color = fav_color.replace("red", "blue")
print(fav_color)
blue is my favorite color
The str.title() method returns a copy of the string with the first letter of each word transformed to uppercase (also known as title case).
Let's look at an example of this method in action with a simple string:
The Cool Thing About This String Is That It Has A Combination Of Uppercase And Lowercase Letters!
for row in moma:
birth_date = row[3]
if birth_date != "":
birth_date = int(birth_date)
row[3] = birth_date
decade = str(age)
output = "{}'s favorite number is {}".format("Kylie", 8)
print(output)
Kylie's favorite number is 8
output = "{0}'s favorite number is {1}, {1} is {0}'s favorite number".format("Kylie", 8)
print(output)
Kylie's favorite number is 8, 8 is Kylie's favorite number
def artist_summary(artist):
num_artworks = artist_freq[artist]
template = "There are {num} artworks by {name} in the dataset"
output = template.format(name=artist, num=num_artworks)
print(output)
artist_summary("Henri Matisse")
num = 32.554865
print("I own {pct}% of the company".format(pct=num))
I own 32.554865% of the company
I own 32.55% of the company
print("The approximate population of {0} is {1}".format("India",1324000000))
The approximate population of India is 1324000000
pop_millions = [
["China", 1379.302771],
["India", 1281.935991],
["USA", 326.625791],
["Indonesia", 260.580739],
["Brazil", 207.353391],
]
template = "The population of {} is {:,.2f} million"
for country in pop_millions:
name = country[0]
pop = country[1]
output = template.format(name, pop)
print(output)
The population of China is 1,379.30 million
The population of India is 1,281.94 million
The population of USA is 326.63 million
The population of Indonesia is 260.58 million
The population of Brazil is 207.35 million