Docs‎ > ‎OpenSource‎ > ‎

jldaws

CAUTIONThis package is undergoing heavy development at the moment.
Expect things to be unstable until release 1.0.0

Overview

This project consists of Amazon AWS related scripts.  The open-source code can be found on Github: https://github.com/jldupont/jldaws and the Ohloh project page can be found here.

Related Projects

Installation

sudo easy_install jldaws

Setup the environment variables 
  • export AWS_ACCESS_KEY_ID=<access key>
  • export AWS_SECRET_ACCESS_KEY=<secret key>
  • export AWS_ACCOUNT_ID=<account id>
You can obtain those keys from the management console.

Common Options

 Option Name Description
 bs batch size Number of messages/tasks to process per polling interval
 e propagate error Send JSON encoded errors also to stdout
 cp check path See jldleader
 sp source path 
 mp move path  
 p polling interval Polling interval in seconds
 fq flush queue  
 qn queue name  
 mn python module name  
 fn python function name  


Scripts

jldsqsjob (since 0.3.0)


jldsqsjob [-w] [-fq] [-lne] [-bs batch_size] [-trm msg] [-doe] [-dpt] [-se] [-em] [-ds delay_seconds] -trt topic -qn input_queue_name  -oqn output_queue_name

This script waits for a JSON message on stdin of a specified topic, dequeue a batch of messages from the input SQS queue, sends the messages on stdout, sends the same message on output SQS queue and finally deletes the source messages from the input queue.

The input stdin message format should contain a field named "topic" e.g. {"topic": "some_topic_string" ... }


jldexec

jldexec [-p polling_interval] [-j] [-bs batch_size] -qn topic  -mn python_module_name


This script creates a private SQS queue to which it subscribes to the specified SNS topic
The SQS queue is polled at polling_interval (seconds).
Optionally with the '-j' switch, the received JSON object can be echoed to stdout.
With the '-n' option, one can specify the maximum number of messages that can be processed at each interval.
Whenever a message is received on the private SQS queue, the specified python_module_name's "run" callable will be called:

python_module.run(private_queue_name, topic_name, received_JSON_message)



jlds3notify

jlds3notify [ -a ] [ -j ] [-p polling_interval] [-r prefix] -bn bucket_name [-mn module_name]

OR

jlds3notify @config.txt

This script monitors a specific S3 bucket along with a prefix.
Changes to the bucket/prefix is reported through the "run" function of the specified python module.
The '-a' option is for "always call the run function" even if there are no changes.
The '-j' option is for echoing the generated JSON object to stdout.

python_module.run(bucket_name, prefix, keys={ }, changes=[ ])

Where 'keys' is a dictionary { S3 object name : boto.s3.Key } pairs.
Where 'changes' is a list of S3 object name which changed between polling intervals.


jldtxsqs

jldtxsqs [-fq] [-w] [-r] -qn queue_name

OR  jldtxsqs @config.txt

This script sends the received JSON string objects from stdin to an SQS queue at the same time as echoing to stdout.
The ' -fq ' is useful for flushing the queue at startup.
Use '-w' to receive arbitrary strings.
Use '-r' to keep retrying operations without throwing an exception after a number of retries.



jldrxsqs

jldrxsqs [-e][-fq][-w][-r][-p polling interval][-bs batch_size][-tr][-trm string]-qn queue_name

OR jldrxsqs @config.txt

This script receives a string/JSON objects from an SQS queue and echoes them to stdout.
The ' -fq ' is useful for flushing the queue at startup.
Use '-w' to receive arbitrary strings.
Use '-e' to propagate SQS errors down stdout: { "error": "some exception message" }
Use '-r' to keep retrying operations without throwing an exception after a number of retries.
Use '-tr' to trigger process of 1 interval: the script waits for 1 line on stdin. The format of such line is unimportant.
Specify a string to send on stdout (trigger mode only) when no message is available using '-trm'. If a JSON string is required, be sure to encoding it prior to specifying on the command line.

Note that error message rate is at most 1 per polling period.


Note on configuration files

It is possible to provide a configuration file in place of command line options for most scripts.
The file path must be preceded with @ in order to be considered as configuration file.

The format of the configuration file must be 1 option per line. E.g.



jlds3upload

jlds3upload [-p polling_interval] [ -t ] [-e]  [-a] [-r prefix]  [-d] -bn bucket_name -sp source_path [-mp move_path] [-bs batch_size]

OR  jlds3upload @config.txt

This script monitors source_path in the local filesystem for files to upload to S3 in bucket_name.
The filename is generated as follows: ( assuming an input file accessible at:  source_path/file.ext )

prefix/file.ext

For nested paths, the generated filenames will omit the common prefix to all paths. E.g.

source_path: /tmp
path to process:  /tmp/some_dir/some_file.ext

generated filename: prefix/some_dir/some_file.ext

Once a file is uploaded to S3 (and only successfully), the file is moved to move_path (using '-mp' option) in the filesystem or optionally deleted ('-d' option).
Result of the operation is logged (accessible through stderr) and a JSON object detailing the operation is sent on stdout.
The -a option instructs the script to always output a status on stdout even if no files were processed: useful as "heartbeat".

To test the script before launching a job, use the '-t' : this option will print what would be done i.e. simulate processing.

NOTE:  the files must have write-access to all.  This is required in order to move or delete the file after processing. This condition is checked prior to uploading each file.



jlds3up

Simple S3 file uploader.

jlds3up [-t] [-do] -bn bucket_name -pr bucket_prefix -sp source_path [-dp dest_path]

Where '-t' (optional) means "enable simulation" i.e. list the proposed actions without performing them.
Where '-do' (optional) means "delete old files".
Where '-dp' (optional) means "destination path" i.e. filename in the bucket/prefix.

This script can optionally delete "old files" from the bucket.  If the uploaded file contains a version number of the following format:

filename-x.ext
filename-x.y.ext
filename-x.y.z.ext 

Use the '-t' option to simulate what would happen to the "old files" in case of doubt.



jldleader

Distributed leader election protocol robot. For more information on this protocol, consult the following presentation.

This robot uses Amazon's SNS and SQS services to elect a leader amongst a group of potential leaders.

When a node is elected leader, the local file dst_path is created else it is deleted (if it exists). This file can serve as signal for other robots. Output on stdout of the election status can also serve to signal other robots.


jldleader [-p interval] [-force] [-dq] [-id node_id] -dp dest_path  [-tn topic_name] [-qn queue_name] [-ms module_subscribe] [-mp module_publish] 

Where '-p' is the time interval on which the protocol operates.
Where '-dp' specifies the filesystem path where the "leader" status of the node will be written to.
Where '-dq' deletes the SQS queue upon termination.
Where '-force' specifies that the script should assume it is the leader.
Where '-id' specifies an integer/string node id: usually this shouldn't be set as a random unique id will be generated and reported.
Where '-tn' specifies the shared topic name used on the shared notification queue.
Where '-qn' optionally specifies the private SQS queue name (else a unique one is generated).
Where '-ms' (optional) specifies the Python module to use as substitute for private Amazon SQS subscribed to topic name.
Where '-mp' (optional) specifies the Python module to use as substitute for the Amazon SNS topic bus.

An example configuration file follows:



Substitution of Amazon components (TODO)

The robot can either use Amazon's SNS & SQS services to manage the election protocol or rely on substitute services.
The substitute services must be implemented as Python modules.  

The subscribe part of the protocol must be implemented in a module where the following function is callable:

subscribe_message(topic_name)

and must return a tuple:

("ok", message)   : signals the availability of a message

("error", error_message) : signals an error condition

("ok", None) : signals that no message is available

The publish part of the protocol must be implemented in a module where the following function is callable:

publish_message(topic_name, message)


jlds3download

jlds3download [-p polling_interval]  [-r prefix] -b bucket_name -dp dest_path [-cp check_path] [-jk] [-bs batch_size] [-dd] [-e]

Where '-cp' (optional) check path instructs the script to only run when the specified path exists.
Where '-jk' (optional) means "just keys" i.e. just get all the keys from bucket and write each key in a separate file under dst_path.
Where '-dd' (optional) means "don't download if file exits in dst_path".
Where '-e' (optional) means "propagate S3 access errors down stdout as JSON object".

This script downloads files (num_files at the time) from the S3 bucket named bucket_name and with prefix prefix. Optionally, only the corresponding keys can be written to the filesystem's path
dest_path i.e. 0 length files will be created with file names corresponding to the S3 keys in the bucket/prefix.

Status of the operation will be provided through stdout in the form of a JSON object.


Stuff in the pipeline
What follows are scripts being worked on currently.





Feeds