Creating Synthetic Traffic Data Using Python

Traffic data is important for things like planning roads, building smart cities, and developing self-driving cars. However, obtaining real traffic data can be expensive, incomplete, or raise privacy concerns. That’s where synthetic traffic data helps. It’s fake data that behaves like real traffic. In this post, I’ll show you how to make your own synthetic traffic data using Python, step by step.

Tools You’ll Need

  • Python basics: pandas, numpy
  • Data generation: Faker (for vehicle IDs), random
  • Visualization: matplotlib, seaborn

You can install any missing libraries using pip:

pip install pandas numpy matplotlib seaborn faker

Step 1: Define Your Traffic Scenario

Let’s simulate a single road segment with vehicles passing through during peak and off-peak hours. We’ll consider:

  • Vehicle types: Car, Bus, Bike
  • Speed ranges: Cars (40–80 km/h), Buses (30–60 km/h), Bikes (20–50 km/h)
  • Time intervals: Every minute for one hour

Step 2: Generate Synthetic Data

Here’s a simple Python script to generate synthetic traffic data:

import pandas as pd
import numpy as np
from faker import Faker
import random
fake = Faker()
#Define parameters
vehicle_types = ['Car', 'Bus', 'Bike']
num_records = 1000 # Number of vehicles
time_range = pd.date_range(start='2025-06-02 06:00', periods=num_records, freq='T')
#Generate data
data = []
for t in time_range:
vehicle_type = random.choice(vehicle_types)
speed = random.randint(40, 80) if vehicle_type == 'Car' else \
random.randint(30, 60) if vehicle_type == 'Bus' else \
random.randint(20, 50)
vehicle_id = fake.unique.license_plate()
data.append([t, vehicle_id, vehicle_type, speed])
df = pd.DataFrame(data, columns=['Timestamp', 'VehicleID', 'VehicleType', 'Speed'])
#Preview
print(df.head())

This creates a dataframe of vehicles passing through a road segment, with timestamps, vehicle IDs, types, and speeds.

Step 3: Visualize Traffic Patterns

A quick way to explore your synthetic traffic data is through simple visualizations:

import matplotlib.pyplot as plt
import seaborn as sns

sns.histplot(df[‘Speed’], bins=20, kde=True)
plt.title(‘Vehicle Speed Distribution’)
plt.xlabel(‘Speed (km/h)’)
plt.ylabel(‘Frequency’)
plt.show()

You can also plot traffic volume by time:

df[‘Hour’] = df[‘Timestamp’].dt.hour
sns.countplot(x=’Hour’, data=df)
plt.title(‘Vehicle Count by Hour’)
plt.xlabel(‘Hour of the Day’)
plt.ylabel(‘Vehicle Count’)
plt.show()

These plots help you understand patterns like peak-hour congestion or vehicle type distribution.

Step 4: Tips for Realistic Synthetic Data

  1. Match distributions to reality: Use traffic sensor data if available.
  2. Add variability: Introduce random delays, accidents, or unusual events.
  3. Scale smartly: Start small, then scale to simulate entire cities.
  4. Validate: Compare with real-world patterns to ensure usefulness.

Conclusion

Synthetic traffic data is a powerful tool for experimentation, modeling, and planning. With Python, you can easily generate realistic datasets, visualize traffic patterns, and experiment with your own traffic simulations.

Leave a Reply

Your email address will not be published. Required fields are marked *