EC2 Windows Unable to RDP: Winlogon Failure Due to Memory Exhaustion
EC2 Windows Unable to RDP: Winlogon Failure Due to Memory Exhaustion
When an EC2 Windows instance passes status checks, responds to ping, but RDP connections fail, and Stop & Start resolves the issue — don't only look at the network. Windows internal resource exhaustion can also cause critical processes like Winlogon to crash.
Symptoms
- Instance is in running state.
- System status checks and instance status checks pass.
- Security group allows port 3389.
- Instance responds to ping.
- RDP cannot connect.
- Stop & Start resolves the issue.
This behavior closely resembles a network issue, but the logs may point to insufficient system resources.
Key Logs
The event log may contain:
Not enough storage is available to process this command.And:
System.OutOfMemoryExceptionYou may also see Winlogon-related events, such as Winlogon crashing or failing to create a login session.
Here, "storage" does not necessarily refer to disk — in the context of Windows error codes, it can also mean insufficient memory or system resources.
Root Cause
When instance memory and page file resources are exhausted, the system cannot allocate resources for critical processes. Components required for RDP login — such as Winlogon, LSASS, and Remote Desktop Services — may fail to function properly.
Stop & Start clears the memory state, so the issue temporarily disappears, but if the instance size or application memory usage is not addressed, the problem will recur.
Troubleshooting Direction
1. Check Event Logs
Focus on the period around the failure in:
- Application.evtx
- System.evtx
- Setup.evtx
Look for:
OutOfMemoryExceptionNot enough storage is available- Winlogon errors
- Security software or monitoring agent anomalies
- Windows Update-related anomalies
2. Check Instance Size
Confirm whether the current instance memory meets peak workload demands. If it is consistently near the upper limit, upgrade the instance size or optimize the application.
3. Deploy OS Metrics Monitoring
CloudWatch does not collect Windows memory metrics by default. You need to install the CloudWatch Agent to collect:
- Memory utilization
- Pagefile utilization
- Disk utilization
- Key process metrics
And set up alarms, such as alerting when memory utilization exceeds 85%.
Recommended Actions
Short-Term Recovery
Stop & Start can release memory and temporarily restore login capability. But this is not a permanent fix.
Medium-Term Optimization
Investigate applications, monitoring agents, and security software that consume high memory to determine if there are memory leaks or overly heavy configurations.
Long-Term Solution
If peak workloads genuinely require more memory, upgrade to a larger instance size. Before upgrading, create an AMI and confirm instance type compatibility with ENA/NVMe.
Summary
RDP login failure is not necessarily a port 3389, security group, or NACL issue. As long as the instance is still pingable and status checks pass, you should also examine the Windows event logs.
If the logs show OOM, insufficient system resources, and Winlogon anomalies, the root cause is most likely memory exhaustion. Stop & Start only provides temporary recovery — long-term, you need to monitor memory and adjust instance size or application configuration.
