Let's say I have an e-commerce website, with products I want to sell.
In addition to analytics, I decide to log a select few actions myself such as visits to product detail page while logged in.
So I want to store things like user id, product id, timestamp, etc.
How do I actually store this?
My naive approach is to stick it in a table.
The DBA yelled at me and asked how long I need data.
I said at least a month.
They said ok and I think they moved all older data to a different table (set up a job for it?)
How do real people store these logs?
How long do you keep them?
Unless you’re at huge volume you can totally do this in a Postgres table. Even if you are you can partition that table by date (or whatever other attributes make sense) so that you don’t have to deal with massive indexes.
I once did this, and we didn’t need to even think about partitioning until we hit a billion rows or so. (But partition sooner than that, it wasn’t a pleasant experience)
An analytics database is better (clickhouse, bigquery...).
They can do aggregations much faster and can deal with sparse/many columns (the "paid" event has an "amount" attribute, the "page_view" event has an "url" attribute...)
We've got 13 years worth of data stored in mysql (5 million visitor/year). It's a pain to query there so we keep a copy in clickhouse as well (which is a joy to query).
I only track visits to a product detail page so far.
Basically,
some basic metadata about the user (logged in only),
some metadata about the product,
and basic "auditing" columns -- created by, created date, modified by, modified date
(although why I have modified by and modified date makes no sense to me, I don't anticipate to ever edit these, they're only there for "standardization". I don't like it but I can only fight so many battles at a time).
I am approaching 1.5 million rows in under two months.
Thankfully, my DBA is kind, generous, and infinitely patient.
Clickhouse looks like a good approach.
I'll have to look into that.
> select count(*) from trackproductview;
> 1498745
> select top 1 createddate from TrackProductView order by createddate asc;
> 2023-08-18 11:31:04.000
what is the maximum number of rows in clickhouse table?
Is there such a limit?
I use Postgres with timescale db. Works unless your e-commerce is amazon.com. Great thing with timescale db is that they take care of creating materialized views with the aggregates you care about (like product views per hour etc) and you can even choose to "throw away" the events themselves and just keep the aggregations (to avoid getting a huge db if you have a lot of events).
Let's say I have an e-commerce website, with products I want to sell. In addition to analytics, I decide to log a select few actions myself such as visits to product detail page while logged in. So I want to store things like user id, product id, timestamp, etc.
How do I actually store this? My naive approach is to stick it in a table. The DBA yelled at me and asked how long I need data. I said at least a month. They said ok and I think they moved all older data to a different table (set up a job for it?)
How do real people store these logs? How long do you keep them?