The new forums will be named Coin Return (based on the most recent vote)! You can check on the status and timeline of the transition to the new forums here.
The Guiding Principles and New Rules document is now in effect.
For work I'm designing a database to hold data from our programs in relational form. Some tables require an index column that matches their index in the programs' internal collections because there is no other way to preserve order. This index column is the table's primary key.
This works okay until we need to add points to the middle of the collection. This involves changing the indices of all the points at a greater index than the ones being added so the ones being added can be given the correct indices. I can't seem to do this as a single UPDATE, since it violates the primary key constraint, so it seems I'm forced to change every point that needs to be changed one by one in descending order. The problem is our collections can get very large (in the 10's of millions), and it takes a prohibitive amount of time to add points near the beginning of large collections doing it this way.
We're using SQLite, so I can't drop the primary key before doing the add and then add it back after. Do any of you have any suggestions of a more efficient way of doing this? I'm not necessarily tied to SQLite, but my boss wants whatever DB we end up using to be embedded and available at a per developer license or better, and SQLite is the best we've found so far that meets those criteria.
You can't have one column that's the actual unique primary key, and another that is the value used to determine order? Granted you have a lot of data, but I can't imagine the way you're doing it is in the spirit of normal PK usage.
Thanks guys, those both sound like workable solutions. I'll run both by my boss and see which one he likes better. I don't know what I was thinking when I did it the way I have it now.
You're going to do collections in the 10s of millions, with SQLite? I hope those collections are really small, byte-wise. I hear SQLite is great for a lot of things, but from what I've read scaling to that kind of size isn't one of them. From their own site, they don't advise using it for datasets beyond 20-30GB. If a collection of 30 million is possible with your application, you'd be limited to 1 KB per entry or RAM usage could get quite intense, and I'd see overall performance as a big question mark at that scale. You might want to evaluate the size of your datasets, and their performance with your chosen tool, before you sink a lot of development time into it.
For reference, I work in an Oracle shop that's moving to SQLite on some of our edge systems, which I think is absolutely the right choice for those units (the only client for that DB is on the same system, the datasets are well under 20-30GB, it's a lot cheaper than Oracle, etc.). Our central servers, which gather data from those edge systems, run Oracle, and that's also the right choice for that point in the system (there are multiple clients for the server, the datasets get up over 40GB on the server for large sites, it scales well when tuned properly, etc.). There's definitely a place for both, but you have to make sure you're putting the right tool in the right spot.
Posts
You could also add a couple of other fields, the row that comes before, (and the row that comes after, optionally) by primary key.
This way, when you update you only have a to change either two or four fields, depending on if you want a double linked list or a single one.
For reference, I work in an Oracle shop that's moving to SQLite on some of our edge systems, which I think is absolutely the right choice for those units (the only client for that DB is on the same system, the datasets are well under 20-30GB, it's a lot cheaper than Oracle, etc.). Our central servers, which gather data from those edge systems, run Oracle, and that's also the right choice for that point in the system (there are multiple clients for the server, the datasets get up over 40GB on the server for large sites, it scales well when tuned properly, etc.). There's definitely a place for both, but you have to make sure you're putting the right tool in the right spot.