MySQL question

gnarledgnarled Registered User regular
edited September 2007 in Help / Advice Forum
I have a MySQL query question. Let me first explain the table setup:

I have two tables, posts and comments. Every post has a unique post_id as well as other information such as title, text, etc. Every comment has a unique comment_id and then also a reference to the post_id of the post it belongs to.

So my question to all you SQL gurus is: Is there a single query that could return a table where the first column is the post_id and the second column is the number of rows in comments that have that respective post_id?

My current solution is to just do a "SELECT COUNT(*) FROM comments WHERE post_id= id" query in each iteration of the loop that is used to list the posts. It works fine, but I always appreciate an elegant solution, as well as an opportunity to improve my query skills.

I also realize that I could just add a "number of comments" column to the posts table and make sure that it is updated when comments are made, but that just seems like no fun.

"I believe in the philosophy of a ruling class, especially since I rule." --Randal , Clerks
gnarled on

Posts

  • zilozilo Registered User regular
    edited August 2007
    All post_ids with comment counts?

    select post_id, count(*)
    from comments
    group by post_id

    My syntax is a bit rusty but that should work.

    zilo on
  • Crit_FailCrit_Fail Registered User regular
    edited August 2007
    zilo wrote: »
    All post_ids with comment counts?

    select post_id, count(*)
    from comments
    group by post_id

    My syntax is a bit rusty but that should work.

    Bang on, Zilo!

    Though I'm not sure you'd want to pass the wildcard to the count function. Here is what I generally use for this sort of thing:
    select username, count(track_ID) from usage_track group by username
    

    You only need to count one column in the rows that will always have a value and be present. The auto_increment column is perfect for this.

    So, for your table, this should work:
    select post_id, count(post_id) from comments group by post_id
    

    You could, in theory, even give the count column an alias so that you could order/sort the rows returned from the database (This is handy if you need/want to spare some time in language interpreters -PHP/Perl/Etc.]. ) like so...
    SELECT post_id, count(post_id) AS num FROM comments GROUP BY post_id ORDER BY num
    

    And you can optionally add "DESC" to the end of the query to have the order sorted with the highest values on top. =]

    As a long-winded side note, as well, it sounds like you might be able to leverage some joins, if you want to consolidate your queries per iteration. I know that they opened my eyes to a world of possibilities when I first discovered them. Feel free to PM me or, I suppose, we can continue the discussion here if you like.

    Cheers, mate!

    =]

    -- CF

    Crit_Fail on
    "The world is a tragedy to those who feel, but a comedy to those who think." - Horace Walpole
  • JaninJanin Registered User regular
    edited August 2007
    Crit_Fail wrote: »
    zilo wrote: »
    All post_ids with comment counts?

    select post_id, count(*)
    from comments
    group by post_id

    My syntax is a bit rusty but that should work.

    Bang on, Zilo!

    Though I'm not sure you'd want to pass the wildcard to the count function.

    MySQL (and most RDBMSes) optimize the special case of COUNT(*) so that it's the fastest way to count.

    Janin on
    [SIGPIC][/SIGPIC]
  • zilozilo Registered User regular
    edited August 2007
    jmillikin wrote: »
    Crit_Fail wrote: »
    zilo wrote: »
    All post_ids with comment counts?

    select post_id, count(*)
    from comments
    group by post_id

    My syntax is a bit rusty but that should work.

    Bang on, Zilo!

    Though I'm not sure you'd want to pass the wildcard to the count function.

    MySQL (and most RDBMSes) optimize the special case of COUNT(*) so that it's the fastest way to count.

    I did not know that, I was just being lazy ;-)

    zilo on
  • Crit_FailCrit_Fail Registered User regular
    edited August 2007
    jmillikin wrote: »

    MySQL (and most RDBMSes) optimize the special case of COUNT(*) so that it's the fastest way to count.

    Spiffy! I had no idea. It even seems to save about .34s on my query of ~2,518,199 records.
    mysql> select username, count(username) as num from usage_track group by username order by num DESC;
    
    111 rows in set (3.45 sec)
    
    mysql> select username, count(*) as num from usage_track group by username order by num DESC;
    
    111 rows in set (3.10 sec)
    
    

    Not a ton for that query, but certainly adds up as the volume of data grows exponentially.

    Thanks, jmillikin!

    Crit_Fail on
    "The world is a tragedy to those who feel, but a comedy to those who think." - Horace Walpole
  • gnarledgnarled Registered User regular
    edited September 2007
    Thanks for the help. I ended up using the following query to do my lookup of posts and comment counts at the same time.
    SELECT posts.id, posts.title, posts.text, posts.time, COUNT(comments.post_id) AS comm_count FROM posts
    LEFT JOIN comments ON posts.id = comments.post_id
    GROUP BY posts.id
    ORDER BY posts.time DESC
    LIMIT 4
    

    Is this a good way to do it in general? I realize I have such a small database size that it doesn't matter that much in my case.

    gnarled on
    "I believe in the philosophy of a ruling class, especially since I rule." --Randal , Clerks
  • JasconiusJasconius sword criminal mad onlineRegistered User regular
    edited September 2007
    Does that actually work?

    It seems that in MSSQL, if you use an aggregate function in your SELECT statement then everything that is not part of an aggregate function must be part of the GROUP BY which is a problem because in MSSQL, you can't GROUP BY certain field types.

    Perhaps this is just a difference between the two.

    I tried to duplicate your query with one of my databases and came up with
    SELECT TOP 5 tblPost.*,
    (SELECT COUNT(tblComment.PostID) FROM tblComment WHERE tblComment.PostID = tblPost.PostID)'comment_total'
    FROM tblPost
    

    I'm not exactly sure what the word is on the efficiency of sub-queries versus joins, but this returns basically the same thing.

    Jasconius on
  • Jimmy KingJimmy King Registered User regular
    edited September 2007
    Off the top of my head, it looks like that should work, gnarled. I would run an EXPLAIN on it just to be sure it's not doing something extra stupid. To do so, just put 'EXPLAIN' at the beginning of your query. I use it all the time after killing our production system twice at work by putting in a really ugly query that hit multiple ~3 million row tables and did full table scans of each one for each record in the other. oops.
    Jasconius wrote:
    I'm not exactly sure what the word is on the efficiency of sub-queries versus joins, but this returns basically the same thing.
    It's my understanding, based on MySQL docs, that joins can be more efficient than subqueries, but it's going to depend on how good (or bad) you wrote the query, table optimization, etc. On small tables it's hardly likely to be noticeable in any case, I would think.

    At least in this case, the bigger thing to consider is if the MySQL version running even supports subqueries. MySQL didn't do subqueries prior to 4.1.<something>, it might not have even been 4.1.0 if I remember right. There are still a good number of MySQL 4.0.x boxes running about.

    Jimmy King on
  • gnarledgnarled Registered User regular
    edited September 2007
    It definitely works.

    What are the bad things I should look for in an EXPLAIN. I ran it and heres what it returned


    table type possible_keys key key_len ref rows Extra
    posts ALL NULL NULL NULL NULL 4 Using temporary; Using filesort
    comments ALL NULL NULL NULL NULL 11

    gnarled on
    "I believe in the philosophy of a ruling class, especially since I rule." --Randal , Clerks
  • Jimmy KingJimmy King Registered User regular
    edited September 2007
    gnarled wrote: »
    It definitely works.

    What are the bad things I should look for in an EXPLAIN. I ran it and heres what it returned


    table type possible_keys key key_len ref rows Extra
    posts ALL NULL NULL NULL NULL 4 Using temporary; Using filesort
    comments ALL NULL NULL NULL NULL 11

    See rows column? That's going to destroy performance if either of those tables gets large. You're currently comparing each row in post to each row in comments. Right now it doesn't really matter, but just think how much that's going to suck when your posts table is 200,000 rows and the comments table is 3,000,000 rows. Of course with the LIMIT 4 on there, if that's permanent, it may never be a terribly big deal.

    I would try reversing the tables in your query.
    SELECT posts.id, posts.title, posts.text, posts.time, COUNT(comments.post_id) AS comm_count FROM comments
    LEFT JOIN posts ON posts.id = comments.post_id
    GROUP BY posts.id
    ORDER BY posts.time DESC
    LIMIT 4
    

    Here's the difference from one of my project db's where there's a many->one relationship between content and content_categories just like you've got a many->one relationship between posts and comments.
    BAD:
    mysql> EXPLAIN SELECT * FROM content_categories LEFT JOIN content ON content_categories.category_id = content.category_id;                    +----+-------------+--------------------+------+---------------+------+---------+------+------+-------+
    | id | select_type | table              | type | possible_keys | key  | key_len | ref  | rows | Extra |
    +----+-------------+--------------------+------+---------------+------+---------+------+------+-------+
    |  1 | SIMPLE      | content_categories | ALL  | NULL          | NULL |    NULL | NULL |    4 |       |
    |  1 | SIMPLE      | content            | ALL  | NULL          | NULL |    NULL | NULL |   13 |       |
    +----+-------------+--------------------+------+---------------+------+---------+------+------+-------+
    2 rows in set (0.00 sec)
    
    GOOD:
    mysql> EXPLAIN SELECT * FROM content LEFT JOIN content_categories ON content.category_id = content_categories.category_id;                    +----+-------------+--------------------+--------+---------------+---------+---------+--------------------------------+------+-------+
    | id | select_type | table              | type   | possible_keys | key     | key_len | ref                            | rows | Extra |
    +----+-------------+--------------------+--------+---------------+---------+---------+--------------------------------+------+-------+
    |  1 | SIMPLE      | content            | ALL    | NULL          | NULL    |    NULL | NULL                           |   13 |       |
    |  1 | SIMPLE      | content_categories | eq_ref | PRIMARY       | PRIMARY |       4 | bash-shell.content.category_id |    1 |       |
    +----+-------------+--------------------+--------+---------------+---------+---------+--------------------------------+------+-------+
    2 rows in set (0.00 sec)
    

    The second one there only has to hit one row in content_categories for each row in content whereas the other way around it had to hit 4 rows in content categories for each row in content.

    Jimmy King on
  • gnarledgnarled Registered User regular
    edited September 2007
    I tested out the query you suggested and I realized it has a problem. If any thread has 0 comments then it will not show up because of the order the left join is in.

    gnarled on
    "I believe in the philosophy of a ruling class, especially since I rule." --Randal , Clerks
  • Jimmy KingJimmy King Registered User regular
    edited September 2007
    bah, you're right. Didn't think about there being no comments. I assume the explanation still makes the output of EXPLAIN clear, though, right? At least gives you something to test and play with to see if you can optimize your queries any.

    It's saved me from doing stupid things at work many times... or at least let me know I was going to do something stupid so I could be sure I really wanted to do it.

    Jimmy King on
Sign In or Register to comment.