Python

Table of Contents

Pandas

Pandas Conditonal Column

A conditional column is a column in a dataframe that is generated based on other column/s. There are multiple way of doing this in Python but one of the quickest ways is to use the np.where() function. This function is basically the equivalent of an if statement in Excel.

np.where(condition,True action, False action)

1
2
3
4
5
6
7
8
9
#import the two libraries
import numpy as np
import pandas as pd

#generate a pandas dataframe of one column with a 100 random integers
df = pd.DataFrame(np.random.randint(low=0,high = 100,size = (100,1)),columns = ['col1'])

#Generate a conditional column that checks if the number is even or odd
df['odd/even'] = np.where(df['col1'] % 2 == 1, 'odd', 'even')

If we print the first 5 rows of our data frame we get :

   col1 odd/even
0    87      odd
1    85      odd
2    22     even
3    80     even
4     0     even

Applying functions to Pandas columns

We want to manipulate all the values in one column with a function with created. It’s actually quite simple as Pandas series have an .apply method. An example below

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
#import the two libraries
import numpy as np
import pandas as pd

#generate a pandas dataframe of one column with a 100 random integers
df = pd.DataFrame(np.random.randint(low=0,high = 100,size = (100,1)),columns = ['col1'])

#create a function that multiplies a number by two
def multiply_by_2(number):
    val = number * 2
    return val

#test it on single value
print(multiply_by_2(2))

4

#Create a second column that is based on multiplying the first column by 2
df['col2'] = df['col1'].apply(multiply_by_2)

If we print the first 5 rows of our data frame we get :

1
2
3
4
5
6
       col1  col2
    0    81   162
    1    26    52
    2     6    12
    3    80   160
    4    35    70

Converting strings to date or datetime

It would be nice if the whole word worked with ISO dates. Sadly that is not the case, if you’ve read in a dataframe and want to convert a column to date time. For example, you’ve read in the data and the dataframe looks like this:

1
2
3
4
5
6
         date         col2
    0    01/01/2021   162
    1    02/01/2021   123
    2    03/01/2021   87
    3    04/01/2021   170

We can simply convert this to date using pandas.to_datetime function. You simply pass the series and format you need to convert

1
df['date'] = pd.to_datetime(df['date'],format='%d/%m/%Y')

To determine the format, you can use this table

CodeMeaningCodeMeaning
%aAbbreviated weekday%AFull weekday
%bAbbreviated month%BFull month
%cLocale-specific date and time%dDecimal date
%HDecimal hours (24 hour)%IDecimal hours (12 hour)
%jDecimal day of the year%mDecimal month
%MDecimal minute%pLocale-specific AM/PM
%SDecimal second%UDecimal week of the year (starting on Sunday)
%wDecimal Weekday (0=Sunday)%WDecimal week of the year (starting on Monday)
%xLocale-specific Date%XLocale-specific Time
%y2-digit year%Y4-digit year
%zOffset from GMT%ZTime zone (character)