Contents

Create Pandas DataFrame with Random Data

Sometimes you need to create a Pandas DataFrame with random data for data analysis and exploration.

This article describes three methods of how to create Pandas DataFrame with random data

Method 1

The following example demonstrate how to create a Pandas DataFrame with customized values for each column.

# load packages
import pandas as pd
import numpy as np

# set random seed for reproducibility
np.random.seed(42)

# crate random pandas dataframe
df = pd.DataFrame({'col1': np.random.rand(3), 
                   'col2': np.random.randint(1, 10, 3),
                   'col3': np.random.randn(3)})   
df

       col1      col2      col3
0  0.374540  0.950714  0.731994
1  5.000000  7.000000  3.000000
2 -0.094621 -0.928828 -0.885230

In the above example, we have created a Pandas DataFrame with three columns.

The np.random.rand(3) creates three random values between 0 and 1. The np.random.randint(1, 10, 3) creates three random integers values between 1 and 10. The np.random.randn(3)] creates three random values from a standard normal distribution.

This example creates a Pandas DataFrame with 3 rows and 3 columns, but you can adjust the size and structure of DataFrame as per your requirement.

You can also create large random panda DataFrames as described in this article.

Method 2

The following example demonstrates how to create a Pandas DataFrame with similar type of values for all columns.

# load packages
import pandas as pd
import numpy as np

# set random seed for reproducibility
np.random.seed(42)

# crate random pandas dataframe
df = pd.DataFrame(np.random.randint(0, 50, size=(5, 3)), 
                  columns=['col1', 'col2', 'col3'])
df   

   col1  col2  col3
0    38    28    14
1    42     7    20
2    38    18    22
3    10    10    23
4    35    39    23

In the above example, we have created a Pandas DataFrame of integer values with three columns. The np.random.randint(0, 50, size=(5, 3) creates 5x3 dimensional array of integer values.

This example creates a Pandas DataFrame with 5 rows and 3 columns, but you can adjust the size and structure of DataFrame as per your requirement.

Method 3

You can also add the categorical variable while creating a random Panda Dataframe.

# load packages
import pandas as pd
import numpy as np

# set random seed for reproducibility
np.random.seed(42)

# crate random pandas dataframe
df = pd.DataFrame({'col1': np.random.rand(3), 
                   'col2': np.random.randint(1, 10, 3),
                   'col3': np.random.choice(['a', 'b'], size=3),
                   })   
df

       col1  col2 col3
0  0.374540     5    a
1  0.950714     7    a
2  0.731994     3    a

In the above example, we have created a Pandas DataFrame of numerical values and categorical values.

The np.random.rand(3) creates three random values between 0 and 1. The np.random.randint(1, 10, 3) creates three random integers values between 1 and 10. The np.random.choice(['a', 'b'], size=3) creates categorical value which are randomly chosen from given list.