Recently we introduced storing user session data on Amazon's ElastiCache through Redis Cache. I was a little worried of the speed and latency issues of this solution as before the session data was being stored in the server's memory, but to my surprise the Redis cache is actually very quick, and I don't think we really took a performance hit.
We are looking to expand our web servers and we have a load balancer already (but it's mostly for security at the moment), so we wanted to store user session data somewhere else so if users got directed to another server in-between requests, they wouldn't noticed anything.
After about a month of releasing this solution, we got a few users reporting timeouts. It doesn't happen very often, and we've only gotten a few reports, but the user's were very frustrated as the work they were doing on those pages was completely lost.
This timeout is inline with the config setting timeout of 5000ms
<sessionState mode="Custom" customProvider="Custom_RedisSessionStateStore" timeout="60">
<providers>
<add name="Custom_RedisSessionStateStore"
type="Microsoft.Web.Redis.RedisSessionStateProvider"
settingsClassName="AWS.SessionStateRedisSettings"
settingsMethodName="ConnectionString"
operationTimeoutInMilliseconds="5000"
/>
</providers>
</sessionState>
The class “SessionStateRedisSettings” is setting the connection information for the session store as that information is being stored in AWS secrets manager and is being pulled on the start up of the web application.
namespace AWS
{
public static class SessionStateRedisSettings
{
public static string RedisConnectionString = string.Empty;
public static void Initialize()
{
RedisConnectionString = string.Format("{0}:{1},password={2},ssl=True", SecretsCache.SecretsDictonary["RedisHost"], SecretsCache.SecretsDictonary["RedisPort"], SecretsCache.SecretsDictonary["RedisPass"]);
}
public static string ConnectionString()
{
return RedisConnectionString;
}
}
}
Using the following link from the error message to try to find out the root cause: https://stackexchange.github.io/StackExchange.Redis/Timeouts
Are you getting network or CPU bound?
I don’t think it’s the network or CPU, here are some metrics during the timeout
Are there commands taking a long time to process on the redis-server?
Using “log insights” to query SlowLog in AWS, we can look to verify:
First query to get all the logs, we have over 1k at the moment:
fields @timestamp, @message
Second query is to get all of them that are like EVALSHA, which is almost the entire data set.
fields @timestamp, @message
| filter Command like /EVALSHA/
Third query to get all the logs that are NOT like EVALSHA
fields @timestamp, @message
| filter Command not like /EVALSHA/
Fourth query to see which ones took longer than 5 seconds.
Duration (us) is measured in microseconds, there is nothin that is longer than 5 seconds.
fields @timestamp, @message
| filter `Duration (us)` > 5000000
Fifth query to prove that query on does work, and the longest duration of a command is about .3 seconds
fields @timestamp, @message
| filter `Duration (us)` > 50000
Was there a big request preceding several small requests to the Redis that timed out?
I thought this was the case initially, but seeing how the “qs” value in the error is 0, it doesn’t seem to be the case.
Are you seeing a high number of busyio or busyworker threads in the timeout exception?
It does not seem that way.
The IOCP (Runtime Global Thread Pool IO Threads) has 0 busy threads.
The WORKER (Runtime Global Thread Pool Worker Threads) has 13/32767 busy threads
Overall the metrics on the redis server, the slow logs, and the values returned in the error message all look good, so I’m not entirely sure where to continue to look to find the source of the problem.
I did end up optimizing our session usage on the specific page that was getting the redis timeouts. (That page was riddled with session variables, so I thought that would help)







