I have a MySQL query question. Let me first explain the table setup:
I have two tables,
posts and
comments. Every post has a unique post_id as well as other information such as title, text, etc. Every comment has a unique comment_id and then also a reference to the post_id of the post it belongs to.
So my question to all you SQL gurus is: Is there a single query that could return a table where the first column is the post_id and the second column is the number of rows in
comments that have that respective post_id?
My current solution is to just do a "SELECT COUNT(*) FROM comments WHERE post_id=
id" query in each iteration of the loop that is used to list the posts. It works fine, but I always appreciate an elegant solution, as well as an opportunity to improve my query skills.
I also realize that I could just add a "number of comments" column to the
posts table and make sure that it is updated when comments are made, but that just seems like no fun.
"I believe in the philosophy of a ruling class, especially since I rule." --Randal , Clerks
Posts
select post_id, count(*)
from comments
group by post_id
My syntax is a bit rusty but that should work.
Bang on, Zilo!
Though I'm not sure you'd want to pass the wildcard to the count function. Here is what I generally use for this sort of thing:
You only need to count one column in the rows that will always have a value and be present. The auto_increment column is perfect for this.
So, for your table, this should work:
You could, in theory, even give the count column an alias so that you could order/sort the rows returned from the database (This is handy if you need/want to spare some time in language interpreters -PHP/Perl/Etc.]. ) like so...
And you can optionally add "DESC" to the end of the query to have the order sorted with the highest values on top. =]
As a long-winded side note, as well, it sounds like you might be able to leverage some joins, if you want to consolidate your queries per iteration. I know that they opened my eyes to a world of possibilities when I first discovered them. Feel free to PM me or, I suppose, we can continue the discussion here if you like.
Cheers, mate!
=]
-- CF
MySQL (and most RDBMSes) optimize the special case of COUNT(*) so that it's the fastest way to count.
I did not know that, I was just being lazy ;-)
Spiffy! I had no idea. It even seems to save about .34s on my query of ~2,518,199 records.
Not a ton for that query, but certainly adds up as the volume of data grows exponentially.
Thanks, jmillikin!
Is this a good way to do it in general? I realize I have such a small database size that it doesn't matter that much in my case.
It seems that in MSSQL, if you use an aggregate function in your SELECT statement then everything that is not part of an aggregate function must be part of the GROUP BY which is a problem because in MSSQL, you can't GROUP BY certain field types.
Perhaps this is just a difference between the two.
I tried to duplicate your query with one of my databases and came up with
I'm not exactly sure what the word is on the efficiency of sub-queries versus joins, but this returns basically the same thing.
It's my understanding, based on MySQL docs, that joins can be more efficient than subqueries, but it's going to depend on how good (or bad) you wrote the query, table optimization, etc. On small tables it's hardly likely to be noticeable in any case, I would think.
At least in this case, the bigger thing to consider is if the MySQL version running even supports subqueries. MySQL didn't do subqueries prior to 4.1.<something>, it might not have even been 4.1.0 if I remember right. There are still a good number of MySQL 4.0.x boxes running about.
What are the bad things I should look for in an EXPLAIN. I ran it and heres what it returned
table type possible_keys key key_len ref rows Extra
posts ALL NULL NULL NULL NULL 4 Using temporary; Using filesort
comments ALL NULL NULL NULL NULL 11
See rows column? That's going to destroy performance if either of those tables gets large. You're currently comparing each row in post to each row in comments. Right now it doesn't really matter, but just think how much that's going to suck when your posts table is 200,000 rows and the comments table is 3,000,000 rows. Of course with the LIMIT 4 on there, if that's permanent, it may never be a terribly big deal.
I would try reversing the tables in your query.
Here's the difference from one of my project db's where there's a many->one relationship between content and content_categories just like you've got a many->one relationship between posts and comments.
The second one there only has to hit one row in content_categories for each row in content whereas the other way around it had to hit 4 rows in content categories for each row in content.
It's saved me from doing stupid things at work many times... or at least let me know I was going to do something stupid so I could be sure I really wanted to do it.