The new forums will be named Coin Return (based on the most recent vote)! You can check on the status and timeline of the transition to the new forums here.
The Guiding Principles and New Rules document is now in effect.

Mysql server hosting to support massive loads...

DisrupterDisrupter Registered User regular
edited June 2011 in Help / Advice Forum
Hey all. I am currently developing a project that could potentially need to support a mysql database which would be hit with millions of queries per minute.

Realistically, the load will likely never get that big, and if it did, I would first need to have a server capabale of handling thousands.

Right now I have a shared host which obviously wont cut it. I dont know that much about server management. I contacted a buddy of mine which once ran his own webhost out of a rack in florida. I am waiting to hear back from him.

Every few seconds a user will be hitting the server with two queries and then updating a row. I can likely do some query caching to help aleviate, but the nature of the project means that most queries will be unique.

Will a dedicated server handle a few hundred users doing this at once? What about a few thousand? Obviously beyond that, if millions of folks are using it, id likely have to hire someone to manage the servers and run it ourselves, correct?

Or do companies exist which handle this kind of scaling from minimal use all the way up to millions of queries at once?

616610-1.png
Disrupter on

Posts

  • EtheaEthea Registered User regular
    edited June 2011
    Ethea on
  • schussschuss Registered User regular
    edited June 2011
    I'd worry more about table contention on that sort of thing, as you'll have to set row-level data locks (or everything will asplode). These sorts of requests are often done as non database transactions (at least at the outset) so you can serialize the work via processing queues of xmls or similar. I also imagine mysql will choke, though I don't have much experience with it.

    schuss on
  • Jimmy KingJimmy King Registered User regular
    edited June 2011
    Millions of queries per minute? That's a whole lot. Over 16,500/second. I'm not an expert by any means, but part of my job is pretending to be a DBA and sys admin. If you're going to use MySQL you'll want (probably need) a multi server setup for a master db that handles the writes/updates and a few slave DB's in a load balanced pool (this may just be round robin dns internally) to handle the read only queries. You'll also need some combo of a ton of ram, quite a bit of (preferably easily expandable) hdd space, and really smart db schema and data archiving because that's going to result in a fairly good sized db pretty quickly.

    I see you already mentioned it, but you'll probably want to look into caching to greatly reduce the number of queries. Memcached is the standard on the back end for storing frequently used objects. It's high performance for what it is and multiple servers can shared the same cache that they check before hitting the DB. On the front end you can use something like Varnish for caching entire rendered web pages or pay a CDN like Akamai who basically does this for you. I recommend taking a serious look to see how much you really can or can't cache, sometimes things can make a bigger difference than you'd expect.

    I've never done anything on the scale of millions of queries per minute, my platform at work is currently running MySQL on Debian 5 in VMWare with 6GB ram and 4 CPUs. The actual hardware is a dual Intel Xeon E5530 cpus, 40GB ram, and 15k rpm sas drives also running a few other VMs. The most I have seen it handle was around 1000 qps, but it wasn't breaking a sweat.

    You've got a number of options for doing this. If you're sure it's not going to go crazy right away, you could run it all on a single dedicated server you've got somewhere. How much it can handle depends on the server's specs, what else it's doing, etc. You can get dedicated servers ranging from little 2Ghz and 512MB ram on up through stuff similar to what I'm running my vmware machines on. You can also buy your own servers and pay someone to manage them as you mentioned and rent rack space at a datacenter - if you don't need someone full time, you can pay a 3rd party company to manage the hardware, etc. You can also use something like ec2 or whatever rackspace's cloud offering is, which limits you in server configurations but lets you very quickly add more servers as needed if things go really well for you.

    Jimmy King on
  • wmelonwmelon Registered User regular
    edited June 2011
    I'd probably recommend looking at vps.net You can dynamically change the amount of power you've got on the fly, so you could start low and move up as you need to.

    wmelon on
  • DisrupterDisrupter Registered User regular
    edited June 2011
    Thanks for the tips. I will look into the cloud services recommended. Realistically, the numbers I quoted wont happen. But they are a best (or worst as far as load is concerned) case situation.

    How does amazon work? They have price per hour. Do they track your use and then charge you based on hours the server was working on your shit? So in a day, youd range anywhere from 0-24 depending on how hard the server was hit?
    I'd worry more about table contention on that sort of thing, as you'll have to set row-level data locks (or everything will asplode). These sorts of requests are often done as non database transactions (at least at the outset) so you can serialize the work via processing queues of xmls or similar. I also imagine mysql will choke, though I don't have much experience with it.

    Well, almost all the interaction with the data will be reading or creating. A small subset will be updating, but even that may be eliminated or drastically limited. Reading or creating wouldn't require locks, correct? To be honest, Ive never developed anything with this potential scale, so I have a bit to learn. Researching potential hosting solutions for the present and future is just one part. Also I believe both facebook and twitter use mysql. Though, I suppose just because mysql CAN do it doesn't mean its the best choice.

    Disrupter on
    616610-1.png
  • schussschuss Registered User regular
    edited June 2011
    You'd want to do your reads with UR to prevent data page/row locking. Creating should lock down part of the table, but if you know the exact volume you'll write each time, I think there's a way to only lock that portion (Note - I'm a systems analyst/design guy, not a coder/DBA). Facebook uses a columnar database dealio that's massively peer-to-peer (I believe), so it's not always accurate (IE not all nodes have all the most recent of a page), but it is fast.

    schuss on
  • JasconiusJasconius sword criminal mad onlineRegistered User regular
    edited June 2011
    What are you building? What is your experience? What are you storing? What are you delivering?

    "millions of requests per minute" is a pretty incredulous statement for someone who doesn't know the basics of database administration.

    The amount of juice you would need to support that requires significant amounts of money, even with a cloud based solution.

    Looking into places like Amazon is a decent place to start, but really this entire line of questioning is moot. If you actually get that much traffic, not only will Amazon call you up and start asking questions, they are also going to ask for money.

    I think you need to supply more information about what you are doing to get a sophisticated answer.

    Jasconius on
    this is a discord of mostly PA people interested in fighting games: https://discord.gg/DZWa97d5rz

    we also talk about other random shit and clown upon each other
  • DisrupterDisrupter Registered User regular
    edited June 2011
    I think you need to supply more information about what you are doing to get a sophisticated answer.

    Well, the sophisticated answer isnt something I need right now. So far the info on this thread has been pretty helpful, giving me a place to start. Realistically if I ever got a load beginning to get anywhere close to what I predict as the potential peak, I would need to hire someone who knows a shit ton more about this stuff then I do.

    I mean, if the project had that kind of traffic, id likely both need to and afford to pay people a lot smart then me to redesign it from the ground up.

    I just wanted to have some information when I began so I wasn't going in blind. And to find a good host that would scale well with me from "nobody is using it" to "holy shit, thats actually a good amount of requests!". Realistically it wont ever get to "OMG! Millions per second!" But it is possible in theory. And in that case id likely have hired folks long before that.

    My experience so far has been designing PHP applications for a web design company and intranet systems for a regional buisness. I have developed a few minor social networks which never really got off the ground when I worked with said web design company, both from scratch and using various open source applications. But I have never had to really think about how to handle a system that would have thousands of users making multiple requests every few seconds. Im reading up on a lot and asking a lot of questions to help prepare me for that.

    I realize, this project is potentially way out of my league in regards to handling the load if it got massively popular. But I also realize it likely wont ever get there, so mainly its a learning experience.

    Disrupter on
    616610-1.png
  • JasconiusJasconius sword criminal mad onlineRegistered User regular
    edited June 2011
    Well every site in theory can be millions of requests. But you don't build for that. Any seasoned developer of large scale webservices will tell you that you optimize when you *need* to, and you have hardware to tide you over until you can respond to increased load.

    That's why cloud services like Amazon are popular, because you can make a junky PHP site, and if you grow from 100 users to 100,000 overnight, your site will not completely shit itself.

    In general, your first line of defense when dealing with scale on the web is SOFTWARE FIRST, hardware second. Throwing more chips at a scale problem is the last answer.

    Given that, a lot of your planning has to do with general topics in database design and administration. That's why what you are building is important.

    How you query things, when you query things, why, and for who, are all important questions you have to ask when designing a system that may realistically get a gajillion users.

    In the subject of large scale web apps, you will inevitably get into topics such as:

    Horizontal segmentation
    Database denormalization
    Master/slave SQL servers
    HTTP load balancers

    and so on

    I would recommend trying to dig up some of the power points and lecture material from Cal Henderson, who is the lead engineer for Flickr, and a old veteran of the PHP/MySQL era, who knows about a lot of this stuff.

    The general point is, the vast majority of database scale is NOT about hardware. That's why details about your project are relevant if you are serious about preparing for the future. You buy services like Amazon as insurance, not as a solution.

    Jasconius on
    this is a discord of mostly PA people interested in fighting games: https://discord.gg/DZWa97d5rz

    we also talk about other random shit and clown upon each other
  • mrt144mrt144 King of the Numbernames Registered User regular
    edited June 2011
    He's making a money shot website that requires a sql backend for all the meta data of the content on the site. that's what I got from it.

    mrt144 on
  • JimboJimbo down underRegistered User regular
    edited June 2011
    Jasconius wrote: »

    I would recommend trying to dig up some of the power points and lecture material from Cal Henderson, who is the lead engineer for Flickr, and a old veteran of the PHP/MySQL era, who knows about a lot of this stuff.

    Is the PHP/MySQL era over?

    Not being sarcastic or anything, just curious about what the current tech is? The school I am going to still teaches PHP and MySQL.

    Jimbo on
    404 not found
  • Jimmy KingJimmy King Registered User regular
    edited June 2011
    Jimbo wrote: »
    Jasconius wrote: »

    I would recommend trying to dig up some of the power points and lecture material from Cal Henderson, who is the lead engineer for Flickr, and a old veteran of the PHP/MySQL era, who knows about a lot of this stuff.

    Is the PHP/MySQL era over?

    Not being sarcastic or anything, just curious about what the current tech is? The school I am going to still teaches PHP and MySQL.
    Yes and no. There's a lot of stuff done using PHP and MySQL still. Most of it is amateur projects or smaller businesses, though. Big corporate stuff still uses a lot of .net or java as has been the case for a long time. Python (generally via Django or Pylons) and Ruby via RoR have been gaining popularity for years several years. Python/Django is seeing a pretty big push in my local area.

    Jimmy King on
  • HurtdogHurtdog Registered User regular
    edited June 2011
    Jasconius wrote: »
    I would recommend trying to dig up some of the power points and lecture material from Cal Henderson, who is the lead engineer for Flickr, and a old veteran of the PHP/MySQL era, who knows about a lot of this stuff.

    Cal Henderson actually has an O'reilley book called "Building Scalable Websites" which goes over everything you need to know about making web apps that scale. I recommend the op reads it, it's a light read only about 320 pages. First book I've read on this topic.
    Jimbo wrote: »
    Jasconius wrote: »

    I would recommend trying to dig up some of the power points and lecture material from Cal Henderson, who is the lead engineer for Flickr, and a old veteran of the PHP/MySQL era, who knows about a lot of this stuff.

    Is the PHP/MySQL era over?

    PHP/MySQL aren't going away any time soon.

    Some people may say "Ruby on Rails", but Ruby on Rails runs slower than PHP and you can do anything in PHP that you could do in Rails. People just like Rails because it enforces MVC ideologies but that's not exclusive to rails (you can do that in PHP too).

    Hurtdog on
  • schussschuss Registered User regular
    edited June 2011
    SQL is ubiquitous and useful for most items. It's only when you get to humongous volumes of requests that you truly NEED to shift to a more code-centric transactional dealio, but reporting on that will be complicated. Needless to say, if you're asking on a non-expert web forum about this, you probably don't need to worry about it.

    schuss on
  • JasconiusJasconius sword criminal mad onlineRegistered User regular
    edited June 2011
    There was a period of time (say, the late 90's early 00's) where PHP and MySQL was really the only option available to do web work without any upstart costs.

    That's not true anymore. That's all I meant.

    Also we're talking about a language that only got namespacing like a year ago.

    Jasconius on
    this is a discord of mostly PA people interested in fighting games: https://discord.gg/DZWa97d5rz

    we also talk about other random shit and clown upon each other
  • SeguerSeguer of the Void Sydney, AustraliaRegistered User regular
    edited June 2011
    Just my 2c on PHP and Ruby on Rails: Rails is a framework for Ruby, so it's bit dodgy to compare a framework to the language itself :P (not that anyone has been, just fyi). Apples to Oranges and all that.

    OP, depending on what type of site you're doing, you might want to look into NoSQL databases (CouchDB, MongoDB) as they scale horizontally very well (in comparison to traditional RDMSs).

    Also look into things like Memcache, eAccelerator, etc, if you're really worried about performance/optimisation.

    Otherwise, I think the previous suggestions are good - get your software working first, optimise later (unless you KNOW you'll get x requests/min, ie. rewriting an existing site)

    Seguer on
Sign In or Register to comment.