Skip to Content

Create a fake e-commerce database

October 20, 2025 by
Create a fake e-commerce database
Joris Geerdes


This notebook guides you in creating a complete and realistic dataset for an online clothing store. We will generate 10,000 simulated orders, each with numerous details (price, customer, logistics), and save everything in a CSV file.


Step 1: Installation and import of libraries

We start by installing Faker and importing all the necessary libraries.

  • pandas: To create and manipulate our data table.
  • Faker: To generate fake data (names, addresses, etc.).
  • random, datetime, hashlib: Python utilities for randomness, date management, and creating unique identifiers.

Python

# Installation de la bibliothèque Faker pour générer des données factices
!pip install Faker

# Import des bibliothèques nécessaires
import pandas as pd
import random
import hashlib
from datetime import datetime
from faker import Faker


Step 2: Product Configuration

This is where we define the universe of our store. We replace watches with clothing that has its own characteristics, prices, and images. We maintain the logic of seasonal periods to simulate sales or winter/summer collections.

Python

# Initialisation de Faker
fake = Faker()

# --- CONFIGURATION DES PRODUITS (VÊTEMENTS) ---
PRODUCTS = {
    'T-shirt Classique Blanc': {
        'sku': 'TS-WH-CL-01', 'price': 25.00, 'price_ex_vat': 20.00, 'price_chf': 22.50,
        'price_ex_vat_chf': 18.00, 'image': 'https://images.pexels.com/photos/428338/pexels-photo-428338.jpeg'
    },
    'Pull en Laine Hivernal': {
        'sku': 'SW-GR-WO-01', 'price': 80.00, 'price_ex_vat': 64.00, 'price_chf': 72.00,
        'price_ex_vat_chf': 57.60, 'image': 'https://images.pexels.com/photos/6770141/pexels-photo-6770141.jpeg'
    },
    'Jean Slim Noir - Premium': {
        'sku': 'JN-BK-SL-PR', 'price': 120.00, 'price_ex_vat': 96.00, 'price_chf': 108.00,
        'price_ex_vat_chf': 86.40, 'image': 'https://images.pexels.com/photos/1082529/pexels-photo-1082529.jpeg'
    },
    'Jean Slim Noir - Standard': {
        'sku': 'JN-BK-SL-ST', 'price': 90.00, 'price_ex_vat': 72.00, 'price_chf': 81.00,
        'price_ex_vat_chf': 64.80, 'image': 'https://images.pexels.com/photos/1598507/pexels-photo-1597507.jpeg'
    },
    'Robe d\'Été Fleurie': {
        'sku': 'DR-FL-SU-01', 'price': 75.00, 'price_ex_vat': 60.00, 'price_chf': 67.50,
        'price_ex_vat_chf': 54.00, 'image': 'https://images.pexels.com/photos/1755428/pexels-photo-1755428.jpeg'
    },
}
PRODUCT_NAMES = list(PRODUCTS.keys())

# --- CONFIGURATION GÉOGRAPHIQUE ---
CITIES = {
    'Asia': ['Tokyo', 'Seoul', 'Shanghai', 'Beijing', 'Singapore'],
    'Europe': ['Paris', 'London', 'Berlin', 'Rome', 'Madrid'],
    'North America': ['New York', 'Los Angeles', 'Toronto'],
    'Others': ['Sydney', 'Cairo', 'Moscow']
}
COUNTRIES = ['CN', 'JP', 'KR', 'FR', 'UK', 'DE', 'IT', 'ES', 'US', 'CA', 'AU', 'EG', 'RU']

# --- CONFIGURATION DES PÉRIODES SAISONNIÈRES (SOLDES, FÊTES) ---
SEASONAL_PERIODS = [
    (datetime(2023, 11, 15), datetime(2024, 1, 15)),
    (datetime(2024, 6, 20), datetime(2024, 8, 20)),
    (datetime(2024, 11, 15), datetime(2025, 1, 15)),
]


Step 3: Business logic to improve data quality


These functions add realism. calculate_price_order adjusts the final selling price based on the product and the period. For example, the Wool Sweater will be more expensive during the holiday seasons, simulating high winter demand.

Python

def generate_seasonal_date():
    """Génère une date, avec une probabilité accrue d'être dans une période saisonnière."""
    if random.random() < 0.5:
        start_date, end_date = random.choice(SEASONAL_PERIODS)
        return fake.date_time_between(start_date=start_date, end_date=end_date)
    return fake.date_time_between(start_date='-3y', end_date='now')

def calculate_price_order(name, product, creation_date):
    """Calcule le prix de vente final en fonction de règles métier."""
    if 'T-shirt' in name:
        time_factor = (datetime.now() - creation_date).days / 365
        sales_multiplier = 1 + random.uniform(0.2, 0.8) * time_factor
        return product['price'] * sales_multiplier * random.uniform(0.9, 1.1)

    elif 'Pull en Laine' in name:
        if any(start <= creation_date <= end for start, end in SEASONAL_PERIODS):
            return product['price'] * random.uniform(1.1, 1.3) # Plus cher en hiver
        else:
            return product['price'] * random.uniform(0.9, 1.1)

    elif 'Jean Slim' in name:
        return product['price'] * random.uniform(0.9, 1.1)

    else: # Pour la robe et autres futurs articles
        time_factor = (datetime.now() - creation_date).days / 365
        sales_reduction = max(1 - random.uniform(0.1, 0.5) * time_factor, 0.7)
        return product['price'] * sales_reduction * random.uniform(0.9, 1.0)


Step 4: Final assembly of the database

The generate_row function is the heart of the script. It builds a complete order by drawing from all our configurations and generating dozens of fields to make each line unique and detailed: customer information, payment, logistics, etc.

Python

def generate_row():
    """Génère un dictionnaire complet représentant une commande."""
    creation_date = generate_seasonal_date()
    name = random.choice(PRODUCT_NAMES)
    product = PRODUCTS[name]

    price_order = calculate_price_order(name, product, creation_date)
    
    # Logique pour la localisation
    region = 'Europe' if 'Jean' in name and random.random() < 0.5 else random.choice(list(CITIES.keys()))
    city = random.choice(CITIES[region])
    country_code = random.choice(COUNTRIES)

    row = {
        'id': fake.unique.random_int(min=1000000000, max=9999999999),
        'creation_date': creation_date,
        'update_date': fake.date_time_between(start_date=creation_date, end_date='now'),
        'status': random.choices(['complete', 'canceled', 'deleted'], [0.85, 0.1, 0.05])[0],
        'price_order': round(price_order, 2),
        'traffic': random.randint(50, 500),
        'city': city,
        'country_code': country_code,
        'total_quantity': random.randint(1, 5),
        'sku': product['sku'],
        'name': name.capitalize(),
        'price_ex_vat': product['price_ex_vat'],
        'store_name': fake.company(),
        'price': product['price'],
        'payment_cc_type': random.choice(['VISA', 'MASTERCARD', 'AMEX']),
        'quantity': random.randint(1, 5),
        'quantity_refunded': random.randint(0, 2),
        'quantity_invoiced': random.randint(1, 5),
        'quantity_returned': random.randint(0, 2),
        'black_friday': any(p[0].month == 11 and 20 < p[0].day < 30 for p in SEASONAL_PERIODS if p[0] <= creation_date <= p[1]),
        'is_customized': random.choices([True, False], [0.1, 0.9])[0],
        'discount': round(random.uniform(0, 25), 2),
        'vat_amount': round(price_order * 0.2, 2),
        'options': random.choice([None, 'Emballage cadeau', 'Retouches simples']),
        'subtotal_invoiced': round(price_order * 0.8, 2),
        'discount_amount': round(price_order * random.uniform(0, 0.15), 2),
        'tax_invoiced': round(price_order * 0.2, 2),
        'options_id': random.choice(['men', 'women', 'unisex']),
        'payment_base_shipping_amount': round(random.uniform(5, 15), 2),
        'payment_amount_paid': round(price_order, 2),
        'order_currency_code': random.choice(['USD', 'EUR', 'GBP']),
        'coupon_code': random.choice([None, 'SUMMER24', 'WINTER24', 'WELCOME10']),
        'payment_method': random.choice(['paypal_express', 'credit_card']),
        'shipping_description': fake.country(),
        'mail_ID': fake.sha256(),
        'shipping': random.choice(['GLS', 'DHL', 'UPS']),
        'payment': random.choice(['paypal', 'credit card']),
        'image': product['image'],
        'price_ex_vat_chf': product['price_ex_vat_chf'],
        'is_combo': random.choices([True, False], [0.2, 0.8])[0],
        'discount_chf': round(random.uniform(0, 20), 2),
        'vat_amount_chf': round(product['price_chf'] * 0.077, 2),
        'creation_date_local': fake.date_time_this_year(),
        'price_chf': product['price_chf']
    }
    return row

# Testons la fonction en générant une seule ligne pour voir le résultat
print(generate_row())


Step 5: Export CSV

This is the final step! We execute the generate_row function 10,000 times to create our dataset, we transform it into a pandas DataFrame, and we save it as a CSV file.

Python

# Nombre de lignes à générer
NUM_ROWS = 10000
print(f"Génération de {NUM_ROWS} lignes de données de commandes...")

# Créer la liste de toutes les lignes
rows = [generate_row() for _ in range(NUM_ROWS)]

# Convertir la liste en DataFrame Pandas
df = pd.DataFrame(rows)

print("DataFrame généré avec succès !")
print(df.head())

# Exporter le DataFrame en CSV
output_csv_path = 'fake_clothing_store_database.csv'
df.to_csv(output_csv_path, index=False)

print(f"\nLes données ont été sauvegardées dans '{output_csv_path}'.")
print("Vous pouvez trouver ce fichier dans le panneau de gauche de Colab (icône dossier).")


The dataset generated by this script is available for viewing and download via this Google Sheets link.

➡️ E-commerce Database (Clothing)

You can import this data into a Power BI Desktop or Looker Studio environment for free to practice data visualization.


Complete code:

Python

# ==============================================================================
# 1. INSTALLATION ET IMPORTS
# ==============================================================================
# !pip install Faker # Décommentez si vous n'exécutez pas dans Colab où Faker est souvent pré-installé.

import pandas as pd
import random
import hashlib
from datetime import datetime
from faker import Faker

# ==============================================================================
# 2. CONFIGURATION DE LA BOUTIQUE ET DES PRODUITS
# ==============================================================================
fake = Faker()

# --- CONFIGURATION DES PRODUITS (VÊTEMENTS) ---
PRODUCTS = {
    'T-shirt Classique Blanc': {
        'sku': 'TS-WH-CL-01', 'price': 25.00, 'price_ex_vat': 20.00, 'price_chf': 22.50,
        'price_ex_vat_chf': 18.00, 'image': 'https://images.pexels.com/photos/428338/pexels-photo-428338.jpeg'
    },
    'Pull en Laine Hivernal': {
        'sku': 'SW-GR-WO-01', 'price': 80.00, 'price_ex_vat': 64.00, 'price_chf': 72.00,
        'price_ex_vat_chf': 57.60, 'image': 'https://images.pexels.com/photos/6770141/pexels-photo-6770141.jpeg'
    },
    'Jean Slim Noir - Premium': {
        'sku': 'JN-BK-SL-PR', 'price': 120.00, 'price_ex_vat': 96.00, 'price_chf': 108.00,
        'price_ex_vat_chf': 86.40, 'image': 'https://images.pexels.com/photos/1082529/pexels-photo-1082529.jpeg'
    },
    'Jean Slim Noir - Standard': {
        'sku': 'JN-BK-SL-ST', 'price': 90.00, 'price_ex_vat': 72.00, 'price_chf': 81.00,
        'price_ex_vat_chf': 64.80, 'image': 'https://images.pexels.com/photos/1598507/pexels-photo-1597507.jpeg'
    },
    'Robe d\'Été Fleurie': {
        'sku': 'DR-FL-SU-01', 'price': 75.00, 'price_ex_vat': 60.00, 'price_chf': 67.50,
        'price_ex_vat_chf': 54.00, 'image': 'https://images.pexels.com/photos/1755428/pexels-photo-1755428.jpeg'
    },
}
PRODUCT_NAMES = list(PRODUCTS.keys())

# --- CONFIGURATION GÉOGRAPHIQUE ---
CITIES = {
    'Asia': ['Tokyo', 'Seoul', 'Shanghai', 'Beijing', 'Singapore'],
    'Europe': ['Paris', 'London', 'Berlin', 'Rome', 'Madrid'],
    'North America': ['New York', 'Los Angeles', 'Toronto'],
    'Others': ['Sydney', 'Cairo', 'Moscow']
}
COUNTRIES = ['CN', 'JP', 'KR', 'FR', 'UK', 'DE', 'IT', 'ES', 'US', 'CA', 'AU', 'EG', 'RU']

# --- CONFIGURATION DES PÉRIODES SAISONNIÈRES (SOLDES, FÊTES) ---
SEASONAL_PERIODS = [
    (datetime(2023, 11, 15), datetime(2024, 1, 15)),
    (datetime(2024, 6, 20), datetime(2024, 8, 20)),
    (datetime(2024, 11, 15), datetime(2025, 1, 15)),
]

# ==============================================================================
# 3. FONCTIONS DE LOGIQUE MÉTIER
# ==============================================================================
def generate_seasonal_date():
    """Génère une date, avec une probabilité accrue d'être dans une période saisonnière."""
    if random.random() < 0.5:
        start_date, end_date = random.choice(SEASONAL_PERIODS)
        return fake.date_time_between(start_date=start_date, end_date=end_date)
    return fake.date_time_between(start_date='-3y', end_date='now')

def calculate_price_order(name, product, creation_date):
    """Calcule le prix de vente final en fonction de règles métier."""
    if 'T-shirt' in name:
        time_factor = (datetime.now() - creation_date).days / 365
        sales_multiplier = 1 + random.uniform(0.2, 0.8) * time_factor
        return product['price'] * sales_multiplier * random.uniform(0.9, 1.1)

    elif 'Pull en Laine' in name:
        if any(start <= creation_date <= end for start, end in SEASONAL_PERIODS):
            return product['price'] * random.uniform(1.1, 1.3) # Plus cher en hiver
        else:
            return product['price'] * random.uniform(0.9, 1.1)

    elif 'Jean Slim' in name:
        return product['price'] * random.uniform(0.9, 1.1)

    else: # Pour la robe et autres futurs articles
        time_factor = (datetime.now() - creation_date).days / 365
        sales_reduction = max(1 - random.uniform(0.1, 0.5) * time_factor, 0.7)
        return product['price'] * sales_reduction * random.uniform(0.9, 1.0)

# ==============================================================================
# 4. FONCTION D'ASSEMBLAGE DE LIGNE
# ==============================================================================
def generate_row():
    """Génère un dictionnaire complet représentant une commande."""
    creation_date = generate_seasonal_date()
    name = random.choice(PRODUCT_NAMES)
    product = PRODUCTS[name]

    price_order = calculate_price_order(name, product, creation_date)
    
    region = 'Europe' if 'Jean' in name and random.random() < 0.5 else random.choice(list(CITIES.keys()))
    city = random.choice(CITIES[region])
    country_code = random.choice(COUNTRIES)

    row = {
        'id': fake.unique.random_int(min=1000000000, max=9999999999),
        'creation_date': creation_date,
        'update_date': fake.date_time_between(start_date=creation_date, end_date='now'),
        'status': random.choices(['complete', 'canceled', 'deleted'], [0.85, 0.1, 0.05])[0],
        'price_order': round(price_order, 2),
        'traffic': random.randint(50, 500),
        'city': city,
        'country_code': country_code,
        'total_quantity': random.randint(1, 5),
        'sku': product['sku'],
        'name': name.capitalize(),
        'price_ex_vat': product['price_ex_vat'],
        'store_name': fake.company(),
        'price': product['price'],
        'payment_cc_type': random.choice(['VISA', 'MASTERCARD', 'AMEX']),
        'quantity': random.randint(1, 5),
        'quantity_refunded': random.randint(0, 2),
        'quantity_invoiced': random.randint(1, 5),
        'quantity_returned': random.randint(0, 2),
        'black_friday': any(p[0].month == 11 and 20 < p[0].day < 30 for p in SEASONAL_PERIODS if p[0] <= creation_date <= p[1]),
        'is_customized': random.choices([True, False], [0.1, 0.9])[0],
        'discount': round(random.uniform(0, 25), 2),
        'vat_amount': round(price_order * 0.2, 2),
        'options': random.choice([None, 'Emballage cadeau', 'Retouches simples']),
        'subtotal_invoiced': round(price_order * 0.8, 2),
        'discount_amount': round(price_order * random.uniform(0, 0.15), 2),
        'tax_invoiced': round(price_order * 0.2, 2),
        'options_id': random.choice(['men', 'women', 'unisex']),
        'payment_base_shipping_amount': round(random.uniform(5, 15), 2),
        'payment_amount_paid': round(price_order, 2),
        'order_currency_code': random.choice(['USD', 'EUR', 'GBP']),
        'coupon_code': random.choice([None, 'SUMMER24', 'WINTER24', 'WELCOME10']),
        'payment_method': random.choice(['paypal_express', 'credit_card']),
        'shipping_description': fake.country(),
        'mail_ID': fake.sha256(),
        'shipping': random.choice(['GLS', 'DHL', 'UPS']),
        'payment': random.choice(['paypal', 'credit card']),
        'image': product['image'],
        'price_ex_vat_chf': product['price_ex_vat_chf'],
        'is_combo': random.choices([True, False], [0.2, 0.8])[0],
        'discount_chf': round(random.uniform(0, 20), 2),
        'vat_amount_chf': round(product['price_chf'] * 0.077, 2),
        'creation_date_local': fake.date_time_this_year(),
        'price_chf': product['price_chf']
    }
    return row

# ==============================================================================
# 5. SCRIPT PRINCIPAL : GÉNÉRATION ET EXPORT
# ==============================================================================
NUM_ROWS = 10000
print(f"Génération de {NUM_ROWS} lignes de données de commandes...")

# Créer la liste de toutes les lignes
rows = [generate_row() for _ in range(NUM_ROWS)]

# Convertir la liste en DataFrame Pandas
df = pd.DataFrame(rows)

print("\nDataFrame généré avec succès ! Voici les 5 premières lignes :")
print(df.head())

# Exporter le DataFrame en CSV
output_csv_path = 'fake_clothing_store_database.csv'
df.to_csv(output_csv_path, index=False)

print(f"\nLes données ont été sauvegardées dans le fichier '{output_csv_path}'.")

Create a fake e-commerce database
Joris Geerdes October 20, 2025
Share this post
Tags
Archive