0
$\begingroup$

So, I collect Open/Close data for a lot of tickers, process it in multiple ways, and use it for an ARIMA model. Right now, I do this through Excel, but I'm running into a lot of data capacity problems.

I've got a python terminal setup to my IBKR TWS so that I can collect data more efficiently, and I'm considering using an SQL database to store the data, but the problem is that I'm collecting, up-to-date data, on top of storage filling up fast, I reference the data (including the new, daily data) and I create dynamic model.

Would SQL be the right call for me? Things are real slow on excel, but it "works" (not 100% of the time because it crashes sometimes). I'm not super confident with SQL, so taking on the endeavor is a little scary without knowing that those sort of data demands can even be met by SQL in the first place. Would creating an SQL database for my daily collected, auto-updating, data be beneficial?

$\endgroup$

2 Answers 2

0
$\begingroup$

QuestDB is designed specifically for this use case. You can store the raw market data in QuestDB, you can (if needed) create materialized views so you can have up to date OHLC or whatever aggregations you want, and you can set TTL expiration to either tables or materialized views, to remove older data when not needed.

You can query QuestDB with SQL with time-series extensions. Performance is excellent, as it was designed specifically for this use case. On a small server you can easily ingest hundreds of thousands of records per second while querying it to display real-time charts.

$\endgroup$
0
$\begingroup$

In general, a SQL database is a suitable choice for data storage - certainly a reasonable next step when you've outgrown Excel. I would recommend SQLite because it's the simplest to work with. While there are various specialized time-series databases, they introduce additional complexity compared to SQLite and are overkill if you're just upgrading from Excel. Keep it simple (but not as simple as Excel).

That said, good performance requires more than just putting your data in a SQL database. How you design the tables, how much data you put in them, how you query them, what indexes you create on them - these can mean the difference between excellent or subpar performance. Expect a learning curve.

I would recommend an alternative data source than IBKR. Collecting data from the IBKR API is not for the faint of heart. There are quite a few hassles to contend with (downtime windows, rate throttling, timeouts, etc.), and if you make a mistake, you can end up with missing data that will throw off your models but be hard to detect. You also have to be careful that you properly handle splits or your data will have erroneous jumps.

Depending on where you fall on the "save time vs save money" spectrum, you might want to look at QuantRocket, which gives you a Python/JupyterLab environment, built-in data collection (stored in SQL databases, but with no need to learn SQL), and connectivity to IBKR (which you wouldn't actually need if you just require pricing for lots of tickers, as that's included). This would give you flexibility and performance without the need to become a database admin. (Disclaimer: I'm affiliated with QuantRocket.)

$\endgroup$

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.