Amazon Machine Instance (AMI) SOLR Search Engine
Posted: April 10th, 2010 | Author: Julio Hernandez-Miyares | Filed under: Amazon Web Services, Cloud Computing |
Needing a search engine for a current under-development Mobile Application, I decided to follow the route of an Amazon Machine Image (AMI) running SOLR which is a version of Lucene with a nice XML over HTTP interface. This is part 1 of a multi part post as I meander my way to a truly production ready , fully stacked Search engine capability including the persistent data stores that will feed the Search Engine indexing operations. A good reason for this blog is solely as documentation for what I have done to get there. There is plenty of documentation surrounding Amazon Web Services including Amazon’s official documentation but at least for me , it all appears disjointed at times, dated and not correct at least as the current apis and services work and getting something done for a “newbie” and I am stretching the definition of that term can often times be grueling or at least more time consuming then it has to or should be.
The first part of the series of posts will have as it’s conclusion an instance of SOLR powered by the version of Jetty that comes with the SOLR build. I will follow up with a Tomcat powered version of SOLR in part 2 or 3 of the post. For now, just wanted to make the exercise more about getting you own functional AMI instance configured, defined, registered and launchable.
So here goes nothing;
First, my JittrSolr (Jittr is my company) AMI instance I built was initially based on the Fedora Core 8 (AMI Id: ami-b232d0db)
Minimal Fedora Core 8, 32-bit architecture, Apache 2.0, and Amazon EC2 AMI Tools.
First I downloaded the two packages I will need to add to the base Fedora Machine Instance to my local machine. They are
jre-6u19-linux-i586-rpm.bin for java and apache-solr-1.3.0.tar for SOLR.
*NOTE – I could have saved the download and upload cycle by just using “wget” from the command line of the Fedora Machine but I did not have the exact url for the download package. That will be for a future exercise. It is a time saver of sorts because the upload to the Amazon Machine Instance was time-consuming. We are talking minutes not hours but during those minutes you are simply waiting and 12 cents an hour, though you are not going to break the bank , every minute does count.
Firing up a Machine Instance through the AWS console and then obtaining the secure shell commandline string to connect to the machine instance
ssh -i imac.pem root@ec2-174-129-66-51.compute-1.amazonaws.com
I am ready for the first part of uploading the two packages to the Machine Instance.
*NOTES – the ssh command string will change depending on the machine domain you have been allotted and the actual key pair you assigned to the instance when you were instantiating it. For convenience and since I don’t need anything else, I just use the native terminal program that comes with Mac Snow Leopard.
Also, make sure the security group you choose when configuring the Machine images has the port 8983 open for http. This is the port SOLR uses to listen for search requests.
I use secure copy (scp) from my local machine’s terminal session to upload the two aforementioned packages.
scp -i ~/Desktop/ec2/imac.pem ~/Desktop/ec2/packages/java/jre-6u19-linux-i586-rpm.bin root@ec2ec2-174-129-66-51.compute-1.amazonaws.com:/root
scp -i ~/Desktop/ec2/imac.pem ~/Desktop/ec2/packages/solr/apache-solr-1.3.0.tar root@ec2ec2-174-129-66-51.compute-1.amazonaws.com:/root
Both packages are uploaded to the /root directory of the Machine Instance. Connecting to the Machine Instance using the ssh string detailed above and performing a little housekeeping from the machine instance /root directory
- chmod +x jre06u19-linux-i586-rpm.bin
to make it executable, I am now ready for a quick and dirty setup of the base software to have a simple jetty powered SOLR operational. - ./jre06u19-linux-i586-rpm.bin
will unpack and the install the java runtime engine package to the /usr directory of the machine instance. A symbolic link in /usr/bin will point to the java executable
lrwxrwxrwx 1 root root 26 Apr 9 15:40 bin/java -> /usr/java/default/bin/java
java -version returns
Java(TM) SE Runtime Environment (build 1.6.0_19-b04)
Java HotSpot(TM) Client VM (build 16.2-b04, mixed mode, sharing) - tar xvf apache-solr-1.3.0.tar
- mv apache-solr-1.3.0 /usr/local/.
will untar the solr package. I then move it to /usr/local
To validate I have the stock , simple SOLR working
- cd /usr/local/apache-solr-1.3.0/examples
- java -jar start.jar
to start up the SOLR instance - Entering “http://ec2174-129-66-51.compute-1.amazonaws.com:8983/solr/”
within a browser should now give you the “Welcome to Solr” page with a link to the admin panel of Solr.
Note – as stated above you must have port 8993 for http open in your security group for this to work.
-
From the terminal session of the machine instance, now comes time to package your configuration and system software and then register it as a private machine instance.
-
ec2-bundle-vol -d /tmp -k /root/pk-???.pem -u [Your AWS AccountID] -s 2048 -c /root/cert-???.pem
Will bundle the machine image into the /tmp directory. I did have to upload both my private key and certificate. Various documentation stated I did not need the certificate but it would not work without it. Also, though I used the -s option for the image size, some documentation I reviewed stated it was best to leave it to ec2 to decide. There was a lot of trial and error. It worked with -s 2048. Also , the -u option is your AWS Account ID which you can view if you have not memorized it from you AWS Account Web page. I removed the “-”’s but I did see comments where people having problems getting the command to work fixed the problems by reintroducing the “-”’s. It worked for me with no dashes.
-
ec2-upload-bundle -b jittrsolr -m /tmp/image.manifest.xml -a YOUR_ACCESS_KEY -s YOUR_SECRET_ACCESS_KEY
Will upload the machine image to Amazon’s S3 in the bucket defined by the -b argument. The bucket will be created if it doesn’t exist.
the -a and -s arguments are for your Access Key and Secret Access key respectively which are also available from your AWS Account web page in the Access Credentials section. -
ec2-register jittrsolr/image.manifest.xml -n jittrsolr_small
Finally to register your new Machine Image executed from your local terminal session , not the Machine Instance terminal sessiopn -
ec2-describe-images -o self
If all as gone as planned, you should see your new private instance listed.
It will also be listed on the AWS EC2 Console in the Private AMI tab

Leave a Reply