0

I am re-designing a project I built a year ago when I was just starting to learn how to code. I used MEAN stack, back then and want to convert it to a PERN stack now. My AWS knowledge has also grown a bit and I'd like to expand on these new skills.

The application receives real-time data from an api which I clean up to write to a database as well as broadcast that data to connected clients.

To better conceptualize this question I will refer to the following items:

api-m1 : this receives the incoming data and passes it to my schema I then send it to my socket-server.

socket-server: handles the WSS connection to the application's front-end clients. It also will write this data to a postgres database which it gets from Scraper and api-m1. I would like to turn this into clusters eventually as I am using nodejs and will incorporate Redis. Then I will run it behind an ALB using sticky-sessions etc.. for multiple EC2 instances.

RDS: postgres table which socket-server writes incoming scraper and api-m1 data to. RDS is used to fetch the most recent data stored along with user profile config data. NOTE: RDS main data table will have max 120-150 UID records with 6-7 columns

To help better visualize this see img below.

enter image description here

From a database perspective, what would be the quickest way to write my data to RDS. Assuming we have during peak times 20-40 records/s from the api-m1 + another 20-40 records/s from the scraper? After each day I tear down the database using a lambda function and start again (as the data is only temporary and does not need to be saved for any prolonged period of time).

1.Should I INSERT each record using a SERIAL id, then from the frontend fetch the most recent rows based off of the uid?

2.a Should I UPDATE each UID so i'd have a fixed N rows of data which I just search and update? (I can see this bottlenecking with my Postgres client.

2.b Still use UPDATE but do BATCHED updates (what issues will I run into if I make multiple clusters i.e will I run into concurrency problems where table record XYZ will have an older value overwrite a more recent value because i'm using BATCH UPDATE with Node Clusters?

My concern is UPDATES are slower than INSERTS and I don't want to make it as fast as possible. This section of the application isn't CPU heavy, and the rt-data isn't that intensive.

4
  • "NOTE: RDS main data table will have max 120-150 UID records with 6-7 columns" - then don't use Postgres at all, but just Redis..? Commented Feb 5, 2023 at 15:52
  • So use an in-memory cache of the data? Commented Feb 5, 2023 at 15:56
  • Well, based on your description you're gaining nothing from SQL semantics (relations and tuples and all), so there's no good reason to use an SQL datastore; using Redis you don't need to think about whether INSERT or UPDATE is faster. Commented Feb 5, 2023 at 15:58
  • true, no need for a RDS at all on this. Thanks @AKX, will explore this path. It does make sense and I appreciate the input. Cheers! Commented Feb 5, 2023 at 16:24

1 Answer 1

1

To make my comments an answer:

You don't seem to need SQL semantics for anything here, so I'd just toss RDS and use e.g. Redis (or DynamoDB, I guess) for that data store.

Sign up to request clarification or add additional context in comments.

4 Comments

accepting this as the answer. I will be storing user profiles and some form of a db will be needed, but for speed and efficiency it would still be best to keep this real-time as a REDIS cache. I did some reading yesterday and will setup a REDIS env via AWS and see how it goes. Thanks again AKX.
Redis isn't just a cache, though – if your dataset is small enough to fit in memory, it's a great key-value database too, and yes, it does have persistence.
yes, I did read it is technically a database, i guess what I meant was in-memory-storage as it can still snapshot the db state incase of a server reboot (at least from my understanding). Now, if my understanding is correct, should I set it up so it's within my EC2 instance or do a cloud managed REDIS account within the EC2's VPC for the best performance, then connect via whatever endpoint the cloud manager provides? Will look at bit more this evening after work.
Since your architecture seems to be very tied to AWS, just use their ElastiCache Redis offering.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.