Speedy ML

10x your ML models using AWS databases and an EC2 instance.

I only have 4 cores on my laptop. That is severly restrictive when running an ML or stochastic model.

To speed things along, I upload my data to an AWS database and train my models in R in an AWS cloud computing instance. I find these are complementary workflows that ensure my data is securely hosted and my models exectute quickly.

While this workflow costs a few dollars, this is well worth it to me as it only takes a minute or less to execute most models.

Part 1

  1. Create an account on Amazon Web Services. AWS is its own ecosystem, so feel free to explore what it offers. Come back when you’re ready to continue.
  2. Create an Amazon relational database (RDS). Note the region for your database. For example, my RDS is in us-west-1a.
  3. In R, install RMariaDB. This is the latest package for interfacing with SQL through R.
  4. Make sure you can connect to your RDS from R studio on your local machine. I keep a test script for this very purpose. It should only take a few seconds to connect.
library(RMariaDB)
mydata <- mtcars; mydata <- as.data.frame(mydata)
con <- dbConnect(MariaDB(),
                 user = 'username',
                 password = 'password',
                 host = 'mydbinstance.xxx.region.rds.amazonaws.com', #copy "Endpoint" from database page
                 dbname='dbname')
dbWriteTable(conn = con, name = 'carsdata', value = mydata, overwrite = TRUE)

You may also want to install [MySQL Workbench] (https://dev.mysql.com/doc/refman/8.0/en/) for a user-friendly SQL GUI. You can use MySQL Workbench to create new databases and keep them organized.

That’s the end of part 1.

Part 2

  1. Duplicate the maintained EC2 instance that runs R Studio: http://www.louisaslett.com/RStudio_AMI/ Be sure to create the EC2 instance in the same region as the database instance.
  2. In order for your EC2 instance to connect to your database, navigate to security groups and add an open port at: inbound Type = MYSQL/Aurora, Protocol = TCP, Port Range = 3306, Source = [EC2 Private IP] /32
  3. Login to your R Studio EC2 instance. Re-run the R code above, but instead of dbWriteTable(), use
mydata <- dbReadTable(conn = con, name = 'carsdata', value = mydata, overwrite = TRUE)

Then, run your models at 64 cores.

Ultimately, you may want to download your trained models for further analysis/report creation.

  1. Navigate to Network & Security -> Key Pairs. Create, name, and download the .pem file.
  2. Install Putty and use it to generate a private key from your public key.
  3. Use Filezilla to access your EC2 instance’s files directory and follow the instructions here to login. (The default username is “ubuntu” and there is no password.)
  4. Don’t forget to turn off your EC2 instance. Amazon bills by the hour!