Introduction
Platform overview and scientific foundations
About This Platform
This Energy Simulations Platform is a comprehensive tool for analyzing solar energy production, energy market dynamics, and battery storage optimization. It combines real-world data from NASA satellites with advanced machine learning algorithms to provide accurate predictions and simulations for energy planning and decision-making.
Scientific Methodology
Our platform is built on established scientific principles and peer-reviewed methodologies:
- Solar radiation modeling: Based on NASA POWER satellite data and pvlib-python algorithms
- Machine learning: Random Forest and Neural Network models with cross-validation
- Uncertainty quantification: Confidence intervals using statistical bootstrapping
- Monte Carlo simulation: Stochastic modeling for battery optimization
All calculations include uncertainty estimates and confidence intervals to help users understand the reliability of predictions.
Solar Energy Production
Physics of photovoltaic energy conversion
Solar Irradiance Fundamentals
Solar irradiance is the power per unit area received from the Sun in the form of electromagnetic radiation. The total solar irradiance (TSI) at Earth's distance is approximately 1,361 W/m² (the "solar constant"), but the amount reaching any specific location on Earth's surface varies based on atmospheric conditions, time of day, and geographic location.
Components of Solar Radiation
- GHI (Global Horizontal Irradiance): Total solar radiation on a horizontal surface
- DNI (Direct Normal Irradiance): Solar radiation from the sun's disk only
- DHI (Diffuse Horizontal Irradiance): Scattered radiation from the sky
$$GHI = DNI \cdot \cos(\theta_z) + DHI$$
Where $\theta_z$ is the solar zenith angle
PV Energy Production
The energy produced by a photovoltaic system depends on the solar irradiance, panel characteristics, and system losses. We use the following model for daily energy production:
$$E_{day} = H_{POA} \cdot A_{panel} \cdot \eta_{panel} \cdot PR$$
| Symbol | Description | Typical Value |
|---|---|---|
| $E_{day}$ | Daily energy production (kWh) | Variable |
| $H_{POA}$ | Plane-of-array irradiation (kWh/m²/day) | 3-7 kWh/m²/day |
| $A_{panel}$ | Panel area (m²) | Variable |
| $\eta_{panel}$ | Panel efficiency | 18-22% |
| $PR$ | Performance Ratio | 0.75-0.85 |
Annual Production Estimation
$$Y_f = \frac{E_{annual}}{P_{installed}} = \sum_{d=1}^{365} \frac{H_{POA,d} \cdot PR}{G_{STC}}$$
Where $G_{STC} = 1$ kW/m² (Standard Test Conditions irradiance)
In Southern Europe (Spain), typical specific yields range from 1,400-1,700 kWh/kWp/year. In Northern Europe (Germany), values typically range from 900-1,100 kWh/kWp/year.
Temperature Effects
Photovoltaic cells experience reduced efficiency at higher temperatures. The power output decreases approximately 0.4-0.5% per degree Celsius above the standard test condition temperature (25°C).
$$P_{actual} = P_{STC} \cdot [1 + \gamma \cdot (T_{cell} - 25°C)]$$
Where $\gamma$ is the temperature coefficient (typically -0.004 to -0.005 /°C for crystalline silicon)
$$T_{cell} = T_{ambient} + \frac{NOCT - 20°C}{800 W/m^2} \cdot G_{POA}$$
NOCT = Nominal Operating Cell Temperature (typically 45°C)
Solar Energy References
Machine Learning Models
Algorithms for energy prediction
Random Forest Regression
Random Forest is an ensemble learning method that constructs multiple decision trees during training and outputs the average prediction of individual trees. It is particularly effective for energy prediction due to its ability to capture non-linear relationships and handle feature interactions.
The algorithm works by creating a "forest" of decision trees, each trained on a random subset of the data (bootstrap sampling) and a random subset of features. This randomization helps reduce overfitting and improves generalization. For energy price and production forecasting, Random Forest excels at capturing complex patterns like time-of-day effects, seasonal variations, and weather dependencies.
$$\hat{y} = \frac{1}{B} \sum_{b=1}^{B} T_b(x)$$
Where $B$ is the number of trees and $T_b(x)$ is the prediction of tree $b$
Feature Importance
One key advantage of Random Forest is its ability to compute feature importance, helping users understand which variables most influence predictions. In energy applications, this often reveals that solar irradiance, hour of day, and temperature are the most important features for production prediction.
Key Hyperparameters
| Parameter | Description | Typical Range |
|---|---|---|
n_estimators |
Number of trees in the forest | 100-500 |
max_depth |
Maximum depth of each tree | 10-30 or None |
min_samples_split |
Minimum samples to split a node | 2-10 |
max_features |
Features considered at each split | 'sqrt', 'log2', or fraction |
Neural Networks
Deep neural networks can learn complex patterns in energy data, capturing temporal dependencies and non-linear relationships between weather conditions and energy production/prices.
Our platform uses Multi-Layer Perceptron (MLP) architectures with configurable hidden layers. These networks are trained using backpropagation with adaptive learning rate optimizers like Adam. The network learns to map input features (weather, time, historical data) to output predictions (energy production or price) through multiple layers of non-linear transformations.
Unlike simpler models, neural networks can automatically learn feature interactions and complex patterns without explicit feature engineering. However, they require more data and careful hyperparameter tuning to avoid overfitting.
$$h^{(l)} = \sigma(W^{(l)} h^{(l-1)} + b^{(l)})$$
Where $\sigma$ is the activation function (ReLU, tanh, etc.)
$$\text{ReLU}(x) = \max(0, x)$$
$$\mathcal{L} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$$
Model Evaluation Metrics
We use multiple metrics to evaluate model performance, each providing different insights into prediction quality. Using several metrics together gives a more complete picture of model behavior.
$$R^2 = 1 - \frac{\sum_{i}(y_i - \hat{y}_i)^2}{\sum_{i}(y_i - \bar{y})^2}$$
R² = 1 means perfect prediction; R² = 0 means model performs as well as predicting the mean. Values above 0.8 are generally considered good for energy prediction.
$$MAE = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|$$
MAE gives the average error magnitude in the same units as the target variable. For energy prices (€/MWh), MAE represents the average prediction error in euros.
$$RMSE = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2}$$
RMSE penalizes large errors more heavily than MAE. It's useful when large errors are particularly undesirable, such as in financial planning.
Cross-Validation
To ensure our models generalize well to unseen data, we use k-fold cross-validation. The data is split into k subsets, and the model is trained k times, each time using a different subset for validation. For time-series data like energy prices, we use time-based splits to prevent data leakage from future observations.
Always evaluate your model on data it has never seen during training. Our platform automatically splits data into training (80%) and test (20%) sets.
Machine Learning References
Battery Simulation
Energy storage modeling and optimization
Battery Energy Model
Our battery simulation uses a simplified yet accurate model for energy storage systems, accounting for charging/discharging efficiencies, state of charge limits, and power constraints.
The model simulates hour-by-hour energy flows: when solar production exceeds consumption, excess energy is stored in the battery (subject to charging efficiency losses). When consumption exceeds production, the battery discharges to cover the deficit (subject to discharging efficiency losses). This approach allows accurate estimation of self-consumption rates and grid dependency.
Real lithium-ion batteries have complex characteristics including capacity degradation over time, temperature sensitivity, and non-linear efficiency curves. Our simplified model captures the essential behavior while remaining computationally efficient for optimization and scenario analysis.
$$SOC_{t+1} = SOC_t + \eta_c \cdot P_{charge} \cdot \Delta t - \frac{P_{discharge} \cdot \Delta t}{\eta_d}$$
Operating Constraints
Battery operation is constrained by physical limits:
- Capacity limits: SOC must stay between SOC_min and SOC_max
- Power limits: Maximum charge/discharge rate (C-rate)
- Energy balance: Cannot discharge more than available energy
| Parameter | Description | Typical Value |
|---|---|---|
| $\eta_c$ | Charging efficiency | 90-95% |
| $\eta_d$ | Discharging efficiency | 90-95% |
| $SOC_{min}$ | Minimum state of charge | 10-20% |
| $SOC_{max}$ | Maximum state of charge | 90-100% |
Monte Carlo Simulation
Monte Carlo methods are used to evaluate the expected performance of battery systems under uncertainty in solar production and energy prices. By running thousands of simulations with randomly sampled scenarios, we can estimate probability distributions of outcomes.
In energy applications, Monte Carlo is particularly valuable because solar production and prices are inherently uncertain. Rather than providing a single "best estimate," Monte Carlo gives us a range of possible outcomes with associated probabilities, enabling better risk management.
Our implementation varies key parameters like daily solar variability, price volatility, and consumption patterns. Each simulation represents a possible "future year" scenario, and aggregating results provides confidence intervals for financial returns and energy metrics.
$$\mathbb{E}[f(X)] \approx \frac{1}{N} \sum_{i=1}^{N} f(X_i)$$
Where $X_i$ are independent samples from the probability distribution of $X$
Why Monte Carlo Works
The Law of Large Numbers guarantees that as we increase the number of simulations, our estimate converges to the true expected value. The Central Limit Theorem tells us that the distribution of our estimate is approximately normal, allowing us to construct confidence intervals.
Standard Error of Monte Carlo Estimate
$$SE = \frac{\sigma}{\sqrt{N}}$$
The error decreases with $\sqrt{N}$, so 10,000 simulations gives 100× less error than 1 simulation
With N=10,000 Monte Carlo simulations, the standard error is approximately 1% of the standard deviation, providing highly reliable estimates of expected values and confidence intervals.
Battery Storage References
Energy Markets
Electricity pricing and market dynamics
Day-Ahead Market
European electricity markets operate on a day-ahead basis, where prices are determined through auctions the day before delivery. Prices vary hourly based on supply and demand balance.
In the day-ahead market, generators submit offers to sell electricity at various prices, and consumers/retailers submit bids to buy. The market operator (like OMIE for Spain/Portugal) matches supply and demand using a marginal pricing algorithm, where all accepted generators receive the price set by the most expensive accepted offer.
This market design, known as "pay-as-clear" or marginal pricing, means that price is determined by the last (most expensive) unit needed to meet demand. When renewable generation is high, cheaper sources cover most demand, pushing prices down. When demand is high and renewables are low, expensive gas plants set the price.
Key Price Drivers
- Renewable generation: High solar/wind production tends to lower prices, sometimes to zero or negative
- Demand patterns: Peak demand hours (morning 8-10am, evening 7-9pm) typically have higher prices
- Fuel costs: Natural gas and coal prices affect thermal generation costs and set floor prices
- Interconnection: Cross-border flows can equalize prices across regions when capacity is available
- CO₂ prices: Emissions trading costs add to fossil fuel generation costs
Electricity prices can be highly volatile, sometimes even becoming negative when renewable supply exceeds demand. In 2023-2024, European prices have ranged from -€500 to +€4,000/MWh. Our models account for this volatility in predictions.
Price Prediction Models
Energy price prediction uses features such as historical prices, weather forecasts, demand forecasts, and renewable generation forecasts.
$$P_t = f(\text{Hour}_t, \text{DayType}_t, \text{Solar}_t, \text{Wind}_t, \text{Demand}_t, \text{Temp}_t, P_{t-1}, ...)$$
Common Features for Price Prediction
| Feature Category | Examples |
|---|---|
| Temporal | Hour, day of week, month, holiday indicator |
| Weather | Temperature, solar irradiance, wind speed |
| Supply | Solar production, wind production, thermal capacity |
| Demand | Load forecast, industrial activity index |
| Lagged | Previous hour price, same hour yesterday/week |
Energy Market References
Self-Sufficiency Analysis
Energy independence calculations
Autarky Rate
The autarky rate (or self-sufficiency rate) measures the percentage of total energy consumption that is covered by local generation (solar + battery).
This is a key metric for understanding energy independence. A 70% autarky rate means that 70% of your energy needs are met by your own solar+battery system, while 30% must still come from the grid. Note that autarky is not the same as self-consumption:
- Autarky: What fraction of your consumption comes from your own production
- Self-consumption: What fraction of your production you consume yourself
A system can have high self-consumption but low autarky (small solar system relative to consumption) or high autarky but low self-consumption (large system with significant export to grid).
$$\text{Autarky} = \frac{E_{self-consumed}}{E_{total-demand}} \times 100\%$$
$$\text{Self-Consumption} = \frac{E_{self-consumed}}{E_{solar-produced}} \times 100\%$$
Both metrics are important: autarky tells you about grid independence, while self-consumption tells you how efficiently you use your solar production. Batteries improve both metrics by storing excess daytime production for evening use.
Battery Sizing
Optimal battery size depends on the mismatch between solar production and consumption patterns. The following heuristics are commonly used:
| Target Autarky | Battery Size (relative to daily consumption) |
|---|---|
| 50-60% | 0.5-1× daily consumption |
| 70-80% | 1-2× daily consumption |
| 90%+ | 3-5× daily consumption (diminishing returns) |
Achieving 100% autarky is often economically impractical. The marginal cost of additional battery capacity increases exponentially as autarky approaches 100%.
Weather Impact Analysis
Climate effects on solar production
Cloud Impact Model
Cloud cover significantly reduces solar irradiance. We use the clearness index (Kt) to quantify the ratio of measured irradiance to extraterrestrial irradiance.
The clearness index is a dimensionless value between 0 and 1 that indicates how much solar radiation reaches the Earth's surface compared to what would arrive without an atmosphere. It accounts for absorption and scattering by clouds, aerosols, and gases.
This index is crucial for solar forecasting because it allows us to normalize solar data across different times of year and latitudes. A Kt of 0.7 on a winter day indicates similarly clear conditions as 0.7 on a summer day, even though absolute irradiance values are very different.
$$K_t = \frac{GHI}{GHI_0}$$
Where $GHI_0$ is the extraterrestrial horizontal irradiance (calculated from solar geometry)
Interpreting Clearness Index Values
| Clearness Index | Sky Condition |
|---|---|
| 0.7 - 0.8 | Clear sky |
| 0.5 - 0.7 | Partly cloudy |
| 0.3 - 0.5 | Overcast |
| < 0.3 | Heavy clouds/rain |
Temperature Sensitivity
We analyze the correlation between ambient temperature and production efficiency to understand local climate effects on PV performance.
$$r = \frac{\sum_{i}(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i}(x_i - \bar{x})^2} \sqrt{\sum_{i}(y_i - \bar{y})^2}}$$
Data Sources
APIs and datasets used
NASA POWER
The NASA Prediction Of Worldwide Energy Resources (POWER) project provides solar and meteorological data derived from satellite observations and reanalysis models.
NASA POWER data is derived from multiple satellite instruments and the MERRA-2 (Modern-Era Retrospective Analysis for Research and Applications) reanalysis system. This provides consistent, gap-free data covering the entire globe, making it ideal for locations without ground-based weather stations.
The data has been validated against ground measurements and is widely used in the solar industry for resource assessment and system design. While ground measurements are more accurate for specific sites, NASA POWER data provides reliable estimates for initial feasibility studies and regional comparisons.
Available Parameters
- ALLSKY_SFC_SW_DWN: All-sky surface shortwave downward irradiance (GHI) - primary solar resource metric
- CLRSKY_SFC_SW_DWN: Clear-sky surface shortwave downward irradiance - theoretical maximum without clouds
- T2M: Temperature at 2 meters - used for PV efficiency corrections
- WS10M: Wind speed at 10 meters - affects panel cooling
- RH2M: Relative humidity at 2 meters - secondary weather parameter
Data Characteristics
| Characteristic | Value |
|---|---|
| Temporal resolution | Hourly |
| Spatial resolution | 0.5° × 0.5° (≈55 km at equator) |
| Historical coverage | 1981 - present |
| Geographic coverage | Global (land and ocean) |
| Update frequency | Monthly (with ~2 month lag) |
NASA POWER provides hourly data at 0.5° × 0.5° spatial resolution, covering the entire globe from 1981 to present. For best results, we recommend using at least 10 years of data for resource assessment to capture interannual variability.
Data Source References
Bibliography
Complete reference list