Create Pandas DataFrame with Random Data
Sometimes you need to create a Pandas DataFrame with random data for data analysis and exploration.
This article describes three methods of how to create Pandas DataFrame with random data
Method 1
The following example demonstrate how to create a Pandas DataFrame with customized values for each column.
# load packages
import pandas as pd
import numpy as np
# set random seed for reproducibility
np.random.seed(42)
# crate random pandas dataframe
df = pd.DataFrame({'col1': np.random.rand(3),
'col2': np.random.randint(1, 10, 3),
'col3': np.random.randn(3)})
df
col1 col2 col3
0 0.374540 0.950714 0.731994
1 5.000000 7.000000 3.000000
2 -0.094621 -0.928828 -0.885230
In the above example, we have created a Pandas DataFrame with three columns.
The np.random.rand(3)
creates three
random values between 0 and 1. The np.random.randint(1, 10, 3)
creates three
random integers values between 1 and 10. The np.random.randn(3)]
creates three
random values from a standard normal distribution.
This example creates a Pandas DataFrame with 3 rows and 3 columns, but you can adjust the size and structure of DataFrame as per your requirement.
You can also create large random panda DataFrames as described in this article.
Method 2
The following example demonstrates how to create a Pandas DataFrame with similar type of values for all columns.
# load packages
import pandas as pd
import numpy as np
# set random seed for reproducibility
np.random.seed(42)
# crate random pandas dataframe
df = pd.DataFrame(np.random.randint(0, 50, size=(5, 3)),
columns=['col1', 'col2', 'col3'])
df
col1 col2 col3
0 38 28 14
1 42 7 20
2 38 18 22
3 10 10 23
4 35 39 23
In the above example, we have created a Pandas DataFrame of integer values with three columns. The
np.random.randint(0, 50, size=(5, 3)
creates 5x3 dimensional array of integer values.
This example creates a Pandas DataFrame with 5 rows and 3 columns, but you can adjust the size and structure of DataFrame as per your requirement.
Method 3
You can also add the categorical variable while creating a random Panda Dataframe.
# load packages
import pandas as pd
import numpy as np
# set random seed for reproducibility
np.random.seed(42)
# crate random pandas dataframe
df = pd.DataFrame({'col1': np.random.rand(3),
'col2': np.random.randint(1, 10, 3),
'col3': np.random.choice(['a', 'b'], size=3),
})
df
col1 col2 col3
0 0.374540 5 a
1 0.950714 7 a
2 0.731994 3 a
In the above example, we have created a Pandas DataFrame of numerical values and categorical values.
The
np.random.rand(3)
creates three random values between 0 and 1. The np.random.randint(1, 10, 3)
creates three
random integers values between 1 and 10. The np.random.choice(['a', 'b'], size=3)
creates categorical value which
are randomly chosen from given list.