Creating Synthetic Traffic Data Using Python

Traffic data is important for things like planning roads, building smart cities, and developing self-driving cars. However, obtaining real traffic data can be expensive, incomplete, or raise privacy concerns. That’s where synthetic traffic data helps. It’s fake data that behaves like real traffic. In this post, I’ll show you how to make your own synthetic traffic data using Python, step by step.

Tools You’ll Need

Python basics: pandas, numpy
Data generation: Faker (for vehicle IDs), random
Visualization: matplotlib, seaborn

You can install any missing libraries using pip:

pip install pandas numpy matplotlib seaborn faker

Step 1: Define Your Traffic Scenario

Let’s simulate a single road segment with vehicles passing through during peak and off-peak hours. We’ll consider:

Vehicle types: Car, Bus, Bike
Speed ranges: Cars (40–80 km/h), Buses (30–60 km/h), Bikes (20–50 km/h)
Time intervals: Every minute for one hour

Step 2: Generate Synthetic Data

Here’s a simple Python script to generate synthetic traffic data:

import pandas as pd
 import numpy as np
 from faker import Faker
 import random
 fake = Faker()
 #Define parameters
 vehicle_types = ['Car', 'Bus', 'Bike']
 num_records = 1000  # Number of vehicles
 time_range = pd.date_range(start='2025-06-02 06:00', periods=num_records, freq='T')
 #Generate data
 data = []
 for t in time_range:
     vehicle_type = random.choice(vehicle_types)
     speed = random.randint(40, 80) if vehicle_type == 'Car' else \
             random.randint(30, 60) if vehicle_type == 'Bus' else \
             random.randint(20, 50)
     vehicle_id = fake.unique.license_plate()
     data.append([t, vehicle_id, vehicle_type, speed])
 df = pd.DataFrame(data, columns=['Timestamp', 'VehicleID', 'VehicleType', 'Speed'])
 #Preview
 print(df.head())

This creates a dataframe of vehicles passing through a road segment, with timestamps, vehicle IDs, types, and speeds.

Step 3: Visualize Traffic Patterns

A quick way to explore your synthetic traffic data is through simple visualizations:

import matplotlib.pyplot as plt
import seaborn as sns

sns.histplot(df[‘Speed’], bins=20, kde=True)
plt.title(‘Vehicle Speed Distribution’)
plt.xlabel(‘Speed (km/h)’)
plt.ylabel(‘Frequency’)
plt.show()

You can also plot traffic volume by time:

df[‘Hour’] = df[‘Timestamp’].dt.hour
sns.countplot(x=’Hour’, data=df)
plt.title(‘Vehicle Count by Hour’)
plt.xlabel(‘Hour of the Day’)
plt.ylabel(‘Vehicle Count’)
plt.show()

These plots help you understand patterns like peak-hour congestion or vehicle type distribution.

Step 4: Tips for Realistic Synthetic Data

Match distributions to reality: Use traffic sensor data if available.
Add variability: Introduce random delays, accidents, or unusual events.
Scale smartly: Start small, then scale to simulate entire cities.
Validate: Compare with real-world patterns to ensure usefulness.

Conclusion

Synthetic traffic data is a powerful tool for experimentation, modeling, and planning. With Python, you can easily generate realistic datasets, visualize traffic patterns, and experiment with your own traffic simulations.

Creating Synthetic Traffic Data Using Python

Leave a Reply Cancel reply

Archives