And what if we need to split columns instead? Here's an efficient way to split one column into two columns using the first space character in a data entry:
# Getting first name from the 'name' column clients['f_name'] = clients['name'].str.split(' ', expand = True)
# Getting last name from the 'name' column clients['l_name'] = clients['name'].str.split(' ', expand = True)
Now we save the first part of the name as the f_name column and the second part of the name as a separate l_name column.
Often, you might need to join several columns with a specific separator. Here's an easy way to do this:
# Joining columns with first and last name clients['name'] = clients['first_name'] + ' ' + clients['last_name']
As you can see, we combined the first_name and last_name columns into the name column, where the first and last names are separated by a space.
There's a standard way to get a list of unique values for a particular column: clients['state'].unique(). However, if you have a huge dataset with millions of entries, you might prefer a much faster option:
# Checking unique values efficiently clients['state'].drop_duplicates(keep="first", inplace=False).sort_values()
This way, you drop all the duplicates and keep only the first occurrence of each value. We've also sorted the results to check that each state is indeed mentioned only once.
For illustration, I'm going to use a synthetic dataset with the contact information of 500 fictitious subjects from the US. Let's imagine that this is our client base. Here's what the dataset looks like:
As you can see, it includes information on each person's first name, last name, company name, address, city, county, state, zip code, phone numbers, email, and web address. Our first task is to check for missing data. You can use clients.info() to get an overview of the number of complete entries in each of the columns. However, if you want a clearer picture, here's how you can get the percentage of missing entries for each of the features in descending order:
# Getting percentange of missing data for each column (clients.isnull().sum()/clients.isnull().count()).sort_values(ascending=False)
As you may recall, isnull() returns an array of True and False values that indicate whether a given entry is present or missing, respectively. In addition, True is considered as 1 and False is considered as 0 when we pass this boolean object to mathematical operations. Thus, clients.isnull().sum() gives us the number of missing values in each of the columns (the number of True values), while clients.isnull().count() is the total number of values in each column. After we divide the first value by the second and sort our results in descending order, we get the percentage of missing data entries for each column, starting with the column that has the most missing values. In our example, we see that we miss the second phone number for 51.6% of our clients.
How would you print a numbered list of the world's richest people? Maybe you'd consider something like this:
# Inefficient way to get numbered list the_richest = ['Jeff Bezos', 'Bill Gates', 'Warren Buffett', 'Bernard Arnault & family', 'Mark Zuckerberg'] i = 0 for person in the_richest: print(i, person) i+=1
However, you can do the same with less code using the enumerate() function:
# Efficient way to get numbered list the_richest = ['Jeff Bezos', 'Bill Gates', 'Warren Buffett', 'Bernard Arnault & family', 'Mark Zuckerberg'] for i, person in enumerate(the_richest): print(i, person)
Enumerators can be very useful when you need to iterate through a list while keeping track of the list items' indices.
Now, how would you proceed if you needed to combine several lists with the same length and print out the result? Again, here is a more generic and "Pythonic" way to get the desired result by utilizing zip():
# Inefficient way to combine two lists the_richest = ['Jeff Bezos', 'Bill Gates', 'Warren Buffett', 'Bernard Arnault & family', 'Mark Zuckerberg'] fortune = ['$112 billion', '$90 billion', '$84 billion', '$72 billion', '$71 billion'] for i in range(len(the_richest)): person = the_richest[i] amount = fortune[i] print(person, amount)
# Efficient way to combine two lists the_richest = ['Jeff Bezos', 'Bill Gates', 'Warren Buffett', 'Bernard Arnault & family', 'Mark Zuckerberg'] fortune = ['$112 billion', '$90 billion', '$84 billion', '$72 billion', '$71 billion'] for person, amount in zip(the_richest,fortune): print(person, amount)
Possible applications of the zip() function include all the scenarios that require mapping of groups (e.g., employees and their wage and department info, students and their marks, etc). If you need to recap working with lists and dictionaries, you can do that here online.
This is one of the basic programs every newbie programmers will learn. The most common way of swapping two variables by using a third variable. In fact, there are a couple of other ways to swap two numbers.
a = 5 b = 6 c = a a = b b = c
a, b = 5, 6 a, b = b, a
import webbrowser url = r'https://www.codefires.com' webbrowser.open(url)
import random # for generating random numbers between # lower and upper limit random1 = random.randint(7, 777) print(random1) # for generating step wise # use randrange(start, stop, step) random2 = random.randrange(7, 777, 10) print(random2)
import os dirpath = os.getcwd() print(dirpath)
D:\PythonProjects\tips and tricks>
Note: To change the working directory use chdir() method of os library as os.chdir(r’your directory path’)