Saturday, January 11, 2014

Buy and Resize AWS RIs from Amazon's RI Marketplace

Anyone using Amazon Web Services (AWS) knows it gets expensive quick. Reserved Instances (RIs) allow AWS to predict capacity, and AWS customers to significantly lower costs. RIs aren't a free ride, however, often requiring a hefty upfront price. What if there you could buy RIs from other AWS customers who no longer needed them, and for a lower upfront cost? You can, of course, and the possibilities just expanded in Fall 2013.

First, a brief history:

September 2012 saw the introduction of the Amazon EC2 Reserved Instance Marketplace for the buying and selling of excess capacity.

Up until recently, purchasing RIs from the RI Marketplace was fairly restrictive. If one wished to take advantage of the Marketplace, they had to ensure an RI matched the EC2 instance type, the availability zone, and the network platform (EC2-Classic or EC2-VPC).

In September 2013, AWS announced the ability to modify the availability zone and network platform of an RI. Then, in October 2013, AWS announced that the instance types of an RI could now be modified. RI Instance types can be modified inside the same family type.

Fig 1
Now, returning to the Reserved Instance Marketplace... What if I want to purchase two Medium reserved instance for a m1.large? First, keeping in mind the new RI modification capabilities, I perform an open search for all RIs in my region.


I find that there are no "3rd Party" offerings currently available under the m1.large type. Now, keeping in mind that I can later modify the instance type, I perform the same search but with Instance Type m1.xlarge.


This time, I find one that fits the bill:


So I make the purchase of this 3rd Party reserved instance. Once the RI becomes active, it becomes a candidate for modification. What if I really needed a m1.large in us-east-1b and a m1.large in us-east-1c? All of that is available through the RI modification interface.


In this case, I had a Heavy RI type. Notice the m1.xlarge was composed of 8/8 units. Referring to Fig 1, I see that this means the xlarge instance type has a normalization factor of 8. A large instance type has a normalization factor of 4. Therefore, I can make two m1.large RIs from one m1.xlarge RI.

So, to recap: You can now purchase a reserved instance from the RI Marketplace, change its type, move it to a new availability zone, and change its network platform. The ways to save money on AWS just multiplied!

Bear in mind: I used the AWS Console for the screenshots contained herein. One can also use the API to make the lookups and changes. I am considering writing a tool to make searches by units within a family rather than by instance type, allowing these searches to be done in one step; here's hoping AWS beats me to it.

Saturday, January 4, 2014

AWS EC2 + MySQL with Complex, Highly-Concurrent Queries

I recently had the challenge of migrating one of our MySQL servers from a dedicated host to an AWS EC2 instance. Based on my research, the general consensus is that Amazon Web Services (AWS) Elastic Compute Cloud (EC2) is unable to power MySQL except for very light workloads.

Several newer developments gave me hope that acceptable MySQL performance could be attained:
  1. 4,000 Provisioned IOPs - With the May 2013 introduction of 4k pIOPs EBS volumes, EC2 instances could finally be configured with persistent storage (as opposed to ephemeral storage) fast enough for RDBMS performance. 
  2. cc2.8xlarge - this monstrosity provides 60GB of RAM, 32 virtual CPUs, and 10 gigabit networking. Since implementing this, AWS added the c3.8xlarge for the same price- running the next generation of Intel Xeon with a similar configuration but with more overall compute power; but adding 2 320GB SSDs. 
  3. MySQL threadpool - I wrote about this in my previous post. By pre-allocating a pool of threads available for handling connections, Percona Server is able to prevent the CPU from spending the majority of its cycles performing context switching
When testing our implementation on a cc2.8xlarge with a 4k PIOPs EBS volume, I found that the MySQL Server was still unable to handle the query load necessary to migrate to AWS. In fact, queries began queueing up in the processlist and all CPU threads would spike to 100% within minutes. 

Adding the threadpool support from Percona Server 5.5, I was able to successfully provide more than 600 qps of complex aggregate queries, all while CPU load remained within healthy thresholds. I did not perform extensive A / B benchmarking for this problem, but it would seem that the overhead added to CPU scheduling by the EC2 hypervisor makes it much more likely to overload under MySQL's workload.

Wednesday, June 12, 2013

The Percona Server Thread Pool Feature

I recently had the opportunity to explore, test, and implement Percona's new thread pool option. This feature has been included as of Percona Server 5.5.29-30.0. Percona ported the feature from MariaDB - more information about the original commit can be found here. To be fair, this feature is an alternative implementation of Oracle's Thread Pool Plugin from Oracle MySQL Enterprise Edition. To be completely fair, thread pool functionality was originally added, but not released to the public, in 2007 - some interesting history. Vadim Tkachenko, co-author of High Performance MySQL, Percona's CTO, and all around smart guy already explained thread pools better than I can,  no need to duplicate that effort. :)

I suspect that most MySQL DBA's have reviewed the numerous benchmarks and analyses regarding MySQL performance in relation to high-concurrency. If you haven't, refer to Dimitri Kravtchuk's blog for a benchmarking shot in the arm. Simply put: MySQL query performance drops substantially once the number of concurrent queries surpasses the number of CPU threads available. Multitenant servers can see spiked load and MySQL will come to a grinding halt while the OS kernel scheduler spends almost all its time context-switching processes rather than executing the instructions in each process. This is the very phenomenon I experience at TrackVia, and wish to prevent. I have tuned my database servers to handle the usual workload, it is the bursty traffic that can put them into an unhealthy state that requires automated, and at times, manual mitigation.

I have been testing thread pooling with some of our workload, and thus far, have seen marked improvement overall. As one would expect, CPU load during peak usage has been reduced to max out at about 100% per CPU thread. The observed side effect: ordinarily light and fast queries take longer to return results, due to being queued within the thread pool. Percona provides a few knobs for thread pool tuning - priority policy is interesting, but unhelpful in my case (all application connections come from the same db user). One option that has helped is thread_pool_stall_limit, which allows the DBA to adjust how long before Percona Server context switches out a query to the next connection in the queue.

I expect to write a followup at some point in the future, once I have a better grasp of how this changes the overall behavior of Percona Server's query processing. For now, I wanted to put it out there for those who may not have read about or considered thread pooling. Bear in mind, at the time of this post, Percona classifies this feature as beta quality - so do your own testing before considering using it.

I would love to see an implementation of Mark Callaghan's suggestion, a step closer to QoS in MySQL. But then, I'd love to see MySQL Proxy go from alpha to GA... :)