Introduction

Platform overview and scientific foundations

About This Platform

This Energy Simulations Platform is a comprehensive tool for analyzing solar energy production, energy market dynamics, and battery storage optimization. It combines real-world data from NASA satellites with advanced machine learning algorithms to provide accurate predictions and simulations for energy planning and decision-making.

Solar PV Machine Learning Monte Carlo NASA POWER Energy Markets

Scientific Methodology

Our platform is built on established scientific principles and peer-reviewed methodologies:

  • Solar radiation modeling: Based on NASA POWER satellite data and pvlib-python algorithms
  • Machine learning: Random Forest and Neural Network models with cross-validation
  • Uncertainty quantification: Confidence intervals using statistical bootstrapping
  • Monte Carlo simulation: Stochastic modeling for battery optimization
Note

All calculations include uncertainty estimates and confidence intervals to help users understand the reliability of predictions.

Solar Energy Production

Physics of photovoltaic energy conversion

Solar Irradiance Fundamentals

Solar irradiance is the power per unit area received from the Sun in the form of electromagnetic radiation. The total solar irradiance (TSI) at Earth's distance is approximately 1,361 W/m² (the "solar constant"), but the amount reaching any specific location on Earth's surface varies based on atmospheric conditions, time of day, and geographic location.

Components of Solar Radiation

  • GHI (Global Horizontal Irradiance): Total solar radiation on a horizontal surface
  • DNI (Direct Normal Irradiance): Solar radiation from the sun's disk only
  • DHI (Diffuse Horizontal Irradiance): Scattered radiation from the sky
Irradiance Relationship

$$GHI = DNI \cdot \cos(\theta_z) + DHI$$

Where $\theta_z$ is the solar zenith angle

PV Energy Production

The energy produced by a photovoltaic system depends on the solar irradiance, panel characteristics, and system losses. We use the following model for daily energy production:

Daily Energy Production

$$E_{day} = H_{POA} \cdot A_{panel} \cdot \eta_{panel} \cdot PR$$

Symbol Description Typical Value
$E_{day}$ Daily energy production (kWh) Variable
$H_{POA}$ Plane-of-array irradiation (kWh/m²/day) 3-7 kWh/m²/day
$A_{panel}$ Panel area (m²) Variable
$\eta_{panel}$ Panel efficiency 18-22%
$PR$ Performance Ratio 0.75-0.85

Annual Production Estimation

Specific Yield (kWh/kWp/year)

$$Y_f = \frac{E_{annual}}{P_{installed}} = \sum_{d=1}^{365} \frac{H_{POA,d} \cdot PR}{G_{STC}}$$

Where $G_{STC} = 1$ kW/m² (Standard Test Conditions irradiance)

Typical Values

In Southern Europe (Spain), typical specific yields range from 1,400-1,700 kWh/kWp/year. In Northern Europe (Germany), values typically range from 900-1,100 kWh/kWp/year.

Temperature Effects

Photovoltaic cells experience reduced efficiency at higher temperatures. The power output decreases approximately 0.4-0.5% per degree Celsius above the standard test condition temperature (25°C).

Temperature-Corrected Power

$$P_{actual} = P_{STC} \cdot [1 + \gamma \cdot (T_{cell} - 25°C)]$$

Where $\gamma$ is the temperature coefficient (typically -0.004 to -0.005 /°C for crystalline silicon)

Cell Temperature Estimation

$$T_{cell} = T_{ambient} + \frac{NOCT - 20°C}{800 W/m^2} \cdot G_{POA}$$

NOCT = Nominal Operating Cell Temperature (typically 45°C)

Solar Energy References

Perez, R., Ineichen, P., Seals, R., et al. (1990)
Modeling daylight availability and irradiance components from direct and global irradiance
Solar Energy, 44(5), 271-289
King, D. L., Boyson, W. E., & Kratochvil, J. A. (2004)
Photovoltaic Array Performance Model
Sandia National Laboratories Technical Report SAND2004-3535
Holmgren, W. F., Hansen, C. W., & Mikofski, M. A. (2018)
pvlib python: a python package for modeling solar energy systems
Journal of Open Source Software, 3(29), 884

Machine Learning Models

Algorithms for energy prediction

Random Forest Regression

Random Forest is an ensemble learning method that constructs multiple decision trees during training and outputs the average prediction of individual trees. It is particularly effective for energy prediction due to its ability to capture non-linear relationships and handle feature interactions.

The algorithm works by creating a "forest" of decision trees, each trained on a random subset of the data (bootstrap sampling) and a random subset of features. This randomization helps reduce overfitting and improves generalization. For energy price and production forecasting, Random Forest excels at capturing complex patterns like time-of-day effects, seasonal variations, and weather dependencies.

Random Forest Prediction

$$\hat{y} = \frac{1}{B} \sum_{b=1}^{B} T_b(x)$$

Where $B$ is the number of trees and $T_b(x)$ is the prediction of tree $b$

Feature Importance

One key advantage of Random Forest is its ability to compute feature importance, helping users understand which variables most influence predictions. In energy applications, this often reveals that solar irradiance, hour of day, and temperature are the most important features for production prediction.

Key Hyperparameters

Parameter Description Typical Range
n_estimators Number of trees in the forest 100-500
max_depth Maximum depth of each tree 10-30 or None
min_samples_split Minimum samples to split a node 2-10
max_features Features considered at each split 'sqrt', 'log2', or fraction

Neural Networks

Deep neural networks can learn complex patterns in energy data, capturing temporal dependencies and non-linear relationships between weather conditions and energy production/prices.

Our platform uses Multi-Layer Perceptron (MLP) architectures with configurable hidden layers. These networks are trained using backpropagation with adaptive learning rate optimizers like Adam. The network learns to map input features (weather, time, historical data) to output predictions (energy production or price) through multiple layers of non-linear transformations.

Unlike simpler models, neural networks can automatically learn feature interactions and complex patterns without explicit feature engineering. However, they require more data and careful hyperparameter tuning to avoid overfitting.

Neural Network Forward Pass

$$h^{(l)} = \sigma(W^{(l)} h^{(l-1)} + b^{(l)})$$

Where $\sigma$ is the activation function (ReLU, tanh, etc.)

ReLU Activation

$$\text{ReLU}(x) = \max(0, x)$$

Mean Squared Error Loss

$$\mathcal{L} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$$

Model Evaluation Metrics

We use multiple metrics to evaluate model performance, each providing different insights into prediction quality. Using several metrics together gives a more complete picture of model behavior.

R² Score (Coefficient of Determination)

$$R^2 = 1 - \frac{\sum_{i}(y_i - \hat{y}_i)^2}{\sum_{i}(y_i - \bar{y})^2}$$

R² = 1 means perfect prediction; R² = 0 means model performs as well as predicting the mean. Values above 0.8 are generally considered good for energy prediction.

Mean Absolute Error (MAE)

$$MAE = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|$$

MAE gives the average error magnitude in the same units as the target variable. For energy prices (€/MWh), MAE represents the average prediction error in euros.

Root Mean Square Error (RMSE)

$$RMSE = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2}$$

RMSE penalizes large errors more heavily than MAE. It's useful when large errors are particularly undesirable, such as in financial planning.

Cross-Validation

To ensure our models generalize well to unseen data, we use k-fold cross-validation. The data is split into k subsets, and the model is trained k times, each time using a different subset for validation. For time-series data like energy prices, we use time-based splits to prevent data leakage from future observations.

Best Practice

Always evaluate your model on data it has never seen during training. Our platform automatically splits data into training (80%) and test (20%) sets.

Machine Learning References

Breiman, L. (2001)
Random Forests
Machine Learning, 45(1), 5-32
Goodfellow, I., Bengio, Y., & Courville, A. (2016)
Deep Learning
MIT Press
Voyant, C., et al. (2017)
Machine learning methods for solar radiation forecasting: A review
Renewable and Sustainable Energy Reviews, 105, 569-582

Battery Simulation

Energy storage modeling and optimization

Battery Energy Model

Our battery simulation uses a simplified yet accurate model for energy storage systems, accounting for charging/discharging efficiencies, state of charge limits, and power constraints.

The model simulates hour-by-hour energy flows: when solar production exceeds consumption, excess energy is stored in the battery (subject to charging efficiency losses). When consumption exceeds production, the battery discharges to cover the deficit (subject to discharging efficiency losses). This approach allows accurate estimation of self-consumption rates and grid dependency.

Real lithium-ion batteries have complex characteristics including capacity degradation over time, temperature sensitivity, and non-linear efficiency curves. Our simplified model captures the essential behavior while remaining computationally efficient for optimization and scenario analysis.

State of Charge Update

$$SOC_{t+1} = SOC_t + \eta_c \cdot P_{charge} \cdot \Delta t - \frac{P_{discharge} \cdot \Delta t}{\eta_d}$$

Operating Constraints

Battery operation is constrained by physical limits:

  • Capacity limits: SOC must stay between SOC_min and SOC_max
  • Power limits: Maximum charge/discharge rate (C-rate)
  • Energy balance: Cannot discharge more than available energy
Parameter Description Typical Value
$\eta_c$ Charging efficiency 90-95%
$\eta_d$ Discharging efficiency 90-95%
$SOC_{min}$ Minimum state of charge 10-20%
$SOC_{max}$ Maximum state of charge 90-100%

Monte Carlo Simulation

Monte Carlo methods are used to evaluate the expected performance of battery systems under uncertainty in solar production and energy prices. By running thousands of simulations with randomly sampled scenarios, we can estimate probability distributions of outcomes.

In energy applications, Monte Carlo is particularly valuable because solar production and prices are inherently uncertain. Rather than providing a single "best estimate," Monte Carlo gives us a range of possible outcomes with associated probabilities, enabling better risk management.

Our implementation varies key parameters like daily solar variability, price volatility, and consumption patterns. Each simulation represents a possible "future year" scenario, and aggregating results provides confidence intervals for financial returns and energy metrics.

Monte Carlo Expectation

$$\mathbb{E}[f(X)] \approx \frac{1}{N} \sum_{i=1}^{N} f(X_i)$$

Where $X_i$ are independent samples from the probability distribution of $X$

Why Monte Carlo Works

The Law of Large Numbers guarantees that as we increase the number of simulations, our estimate converges to the true expected value. The Central Limit Theorem tells us that the distribution of our estimate is approximately normal, allowing us to construct confidence intervals.

Standard Error of Monte Carlo Estimate

Monte Carlo Standard Error

$$SE = \frac{\sigma}{\sqrt{N}}$$

The error decreases with $\sqrt{N}$, so 10,000 simulations gives 100× less error than 1 simulation

Simulation Accuracy

With N=10,000 Monte Carlo simulations, the standard error is approximately 1% of the standard deviation, providing highly reliable estimates of expected values and confidence intervals.

Battery Storage References

Xu, B., Oudalov, A., Ulbig, A., et al. (2018)
Modeling of Lithium-Ion Battery Degradation for Cell Life Assessment
Applied Energy, 209, 165-176
Mongird, K., et al. (2019)
Energy Storage Technology and Cost Characterization Report
Pacific Northwest National Laboratory, PNNL-28866

Energy Markets

Electricity pricing and market dynamics

Day-Ahead Market

European electricity markets operate on a day-ahead basis, where prices are determined through auctions the day before delivery. Prices vary hourly based on supply and demand balance.

In the day-ahead market, generators submit offers to sell electricity at various prices, and consumers/retailers submit bids to buy. The market operator (like OMIE for Spain/Portugal) matches supply and demand using a marginal pricing algorithm, where all accepted generators receive the price set by the most expensive accepted offer.

This market design, known as "pay-as-clear" or marginal pricing, means that price is determined by the last (most expensive) unit needed to meet demand. When renewable generation is high, cheaper sources cover most demand, pushing prices down. When demand is high and renewables are low, expensive gas plants set the price.

Key Price Drivers

  • Renewable generation: High solar/wind production tends to lower prices, sometimes to zero or negative
  • Demand patterns: Peak demand hours (morning 8-10am, evening 7-9pm) typically have higher prices
  • Fuel costs: Natural gas and coal prices affect thermal generation costs and set floor prices
  • Interconnection: Cross-border flows can equalize prices across regions when capacity is available
  • CO₂ prices: Emissions trading costs add to fossil fuel generation costs
Market Volatility

Electricity prices can be highly volatile, sometimes even becoming negative when renewable supply exceeds demand. In 2023-2024, European prices have ranged from -€500 to +€4,000/MWh. Our models account for this volatility in predictions.

Price Prediction Models

Energy price prediction uses features such as historical prices, weather forecasts, demand forecasts, and renewable generation forecasts.

Price Prediction Feature Model

$$P_t = f(\text{Hour}_t, \text{DayType}_t, \text{Solar}_t, \text{Wind}_t, \text{Demand}_t, \text{Temp}_t, P_{t-1}, ...)$$

Common Features for Price Prediction

Feature Category Examples
Temporal Hour, day of week, month, holiday indicator
Weather Temperature, solar irradiance, wind speed
Supply Solar production, wind production, thermal capacity
Demand Load forecast, industrial activity index
Lagged Previous hour price, same hour yesterday/week

Energy Market References

Weron, R. (2014)
Electricity price forecasting: A review of the state-of-the-art with a look into the future
International Journal of Forecasting, 30(4), 1030-1081
ENTSO-E Transparency Platform
European Network of Transmission System Operators - Data Portal
Official EU electricity market data source
OMIE
Operador del Mercado Ibérico de Energía
Spanish-Portuguese electricity market operator

Self-Sufficiency Analysis

Energy independence calculations

Autarky Rate

The autarky rate (or self-sufficiency rate) measures the percentage of total energy consumption that is covered by local generation (solar + battery).

This is a key metric for understanding energy independence. A 70% autarky rate means that 70% of your energy needs are met by your own solar+battery system, while 30% must still come from the grid. Note that autarky is not the same as self-consumption:

  • Autarky: What fraction of your consumption comes from your own production
  • Self-consumption: What fraction of your production you consume yourself

A system can have high self-consumption but low autarky (small solar system relative to consumption) or high autarky but low self-consumption (large system with significant export to grid).

Autarky Rate

$$\text{Autarky} = \frac{E_{self-consumed}}{E_{total-demand}} \times 100\%$$

Self-Consumption Rate

$$\text{Self-Consumption} = \frac{E_{self-consumed}}{E_{solar-produced}} \times 100\%$$

Understanding the Difference

Both metrics are important: autarky tells you about grid independence, while self-consumption tells you how efficiently you use your solar production. Batteries improve both metrics by storing excess daytime production for evening use.

Battery Sizing

Optimal battery size depends on the mismatch between solar production and consumption patterns. The following heuristics are commonly used:

Target Autarky Battery Size (relative to daily consumption)
50-60% 0.5-1× daily consumption
70-80% 1-2× daily consumption
90%+ 3-5× daily consumption (diminishing returns)
Economic Consideration

Achieving 100% autarky is often economically impractical. The marginal cost of additional battery capacity increases exponentially as autarky approaches 100%.

Weather Impact Analysis

Climate effects on solar production

Cloud Impact Model

Cloud cover significantly reduces solar irradiance. We use the clearness index (Kt) to quantify the ratio of measured irradiance to extraterrestrial irradiance.

The clearness index is a dimensionless value between 0 and 1 that indicates how much solar radiation reaches the Earth's surface compared to what would arrive without an atmosphere. It accounts for absorption and scattering by clouds, aerosols, and gases.

This index is crucial for solar forecasting because it allows us to normalize solar data across different times of year and latitudes. A Kt of 0.7 on a winter day indicates similarly clear conditions as 0.7 on a summer day, even though absolute irradiance values are very different.

Clearness Index

$$K_t = \frac{GHI}{GHI_0}$$

Where $GHI_0$ is the extraterrestrial horizontal irradiance (calculated from solar geometry)

Interpreting Clearness Index Values

Clearness Index Sky Condition
0.7 - 0.8Clear sky
0.5 - 0.7Partly cloudy
0.3 - 0.5Overcast
< 0.3Heavy clouds/rain

Temperature Sensitivity

We analyze the correlation between ambient temperature and production efficiency to understand local climate effects on PV performance.

Pearson Correlation Coefficient

$$r = \frac{\sum_{i}(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i}(x_i - \bar{x})^2} \sqrt{\sum_{i}(y_i - \bar{y})^2}}$$

Data Sources

APIs and datasets used

NASA POWER

The NASA Prediction Of Worldwide Energy Resources (POWER) project provides solar and meteorological data derived from satellite observations and reanalysis models.

NASA POWER data is derived from multiple satellite instruments and the MERRA-2 (Modern-Era Retrospective Analysis for Research and Applications) reanalysis system. This provides consistent, gap-free data covering the entire globe, making it ideal for locations without ground-based weather stations.

The data has been validated against ground measurements and is widely used in the solar industry for resource assessment and system design. While ground measurements are more accurate for specific sites, NASA POWER data provides reliable estimates for initial feasibility studies and regional comparisons.

Available Parameters

  • ALLSKY_SFC_SW_DWN: All-sky surface shortwave downward irradiance (GHI) - primary solar resource metric
  • CLRSKY_SFC_SW_DWN: Clear-sky surface shortwave downward irradiance - theoretical maximum without clouds
  • T2M: Temperature at 2 meters - used for PV efficiency corrections
  • WS10M: Wind speed at 10 meters - affects panel cooling
  • RH2M: Relative humidity at 2 meters - secondary weather parameter

Data Characteristics

Characteristic Value
Temporal resolutionHourly
Spatial resolution0.5° × 0.5° (≈55 km at equator)
Historical coverage1981 - present
Geographic coverageGlobal (land and ocean)
Update frequencyMonthly (with ~2 month lag)
Data Resolution

NASA POWER provides hourly data at 0.5° × 0.5° spatial resolution, covering the entire globe from 1981 to present. For best results, we recommend using at least 10 years of data for resource assessment to capture interannual variability.

Data Source References

NASA POWER Project
Prediction Of Worldwide Energy Resources (POWER)
NASA Langley Research Center
Gelaro, R., et al. (2017)
The Modern-Era Retrospective Analysis for Research and Applications, Version 2 (MERRA-2)
Journal of Climate, 30(14), 5419-5454

Bibliography

Complete reference list

Solar Energy & Photovoltaics

Duffie, J. A., & Beckman, W. A. (2013)
Solar Engineering of Thermal Processes (4th Edition)
Wiley - The definitive reference for solar energy engineering
Luque, A., & Hegedus, S. (2011)
Handbook of Photovoltaic Science and Engineering
Wiley - Comprehensive PV technology reference
IEC 61724-1:2017
Photovoltaic system performance - Part 1: Monitoring
International Electrotechnical Commission

Machine Learning & Forecasting

Hastie, T., Tibshirani, R., & Friedman, J. (2009)
The Elements of Statistical Learning
Springer - Free online textbook
Pedregosa, F., et al. (2011)
Scikit-learn: Machine Learning in Python
Journal of Machine Learning Research, 12, 2825-2830

Energy Systems & Markets

IEA (2023)
World Energy Outlook 2023
International Energy Agency
IRENA (2023)
Renewable Power Generation Costs in 2022
International Renewable Energy Agency

Software & Tools

pvlib-python Development Team
pvlib-python Documentation
Open-source solar photovoltaic modeling library
NREL
PVWatts Calculator
National Renewable Energy Laboratory - Online PV estimation tool