CSC/ECE 517 Fall 2014/OSS S1454 ccc: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
(Created page with "= Sahana Eden: Extract The Progress Details = == Introduction to Sahana Eden == Sahana is a software foundation with the express intent of “saving lives by providing informati...")
 
No edit summary
 
(One intermediate revision by the same user not shown)
Line 11: Line 11:
         return “even”
         return “even”
     return “odd”
     return “odd”
 
===== Queuing Tasks =====
Tasks can then be queued up with additional meta-parameters for running the function, such as when the task should die, how long can the process run until timeout occurs, how many times to repeat the function, what parameters to send to the task (if applicable), et cetera.
Tasks can then be queued up with additional meta-parameters for running the function, such as when the task should die, how long can the process run until timeout occurs, how many times to repeat the function, what parameters to send to the task (if applicable), et cetera.
  scheduler.queue_task(‘task_getParity’, [8], …)
  scheduler.queue_task(‘task_getParity’, [8], …)
===== Scheduler Creation =====
The scheduler is the manager for the tasks and workers. Creating a scheduler requires a database at the least in order to store the states of tasks. However, the scheduler can be instantiated with additional parameters that allow specifications for the time between each check of the queue, specific names of functions to queue, the amount of times the queue is checked before a worker must be terminated, et cetera.
The scheduler is the manager for the tasks and workers. Creating a scheduler requires a database at the least in order to store the states of tasks. However, the scheduler can be instantiated with additional parameters that allow specifications for the time between each check of the queue, specific names of functions to queue, the amount of times the queue is checked before a worker must be terminated, et cetera.
  from gluon.scheduler import Scheduler
  from gluon.scheduler import Scheduler
  scheduler = Scheduler(database [,…])
  scheduler = Scheduler(database [,…])
===== Task Lifecycle [http://www.web2py.com/books/default/chapter/29/04/the-core#web2py-Scheduler] =====
Similar to Java threads and Linux processes, a Web2Py process (interchangeable at this point with task) lifecycle is as follows:
* '''Queued''' a task is queued to be picked up by a worker.
* '''Expired''' the task died in the queue (the amount of time designated to cause the task to die has passed).
* '''Assigned''' a worker is assigned to this task.
* '''Running''' the task is being carried out.
* '''Timeout''' the task timed out (the amount of time designated to case the task to timeout during processing has passed).
* '''Failed''' an error was detected or an exception was thrown.
* '''Completed''' the task finished correctly.
===== Worker Management =====
Also similar to Linux is the ability to control the workers (processes in Linux are manageable via jobs, synonymous with workers managing Web2Py tasks). Workers can be controlled with the following functions:
disable()  # Put the worker to sleep
terminate() # The worker dies gracefully as soon as possible
kill()      # The worker dies immediately
== Sahana Eden-specific Tasks ==
The Sahana Eden developers created a wrapper class for the Web2Py tasks. The wrapper, s3task.py, is a very “thin” and adds only minimal functionality to the Web2Py tasks.
Specifically, it allows a task to be run asynchronously, the database to handle CRUD actions including setting defaults and hiding unnecessary fields, duplicate task checking, determination
of whether or not at least one worker is alive, and requeueing failed tasks. At the request of the mentors for this ticket, the functionality was generalized to all processes. Due to limitations of reporting the details of the progress of tasks, the logging is based on the status available via the scheduler. Explicit logging of steps would need to be on a per function basis, with each function logging their own progress.
The functionality of logging was added to the s3task.py and a timer was added to the s3utils.py file. If report_progress was set to true when scheduling a new task (default is false), then a logfile is created whose name contains the date and time the task was started and the task name. A timer is created. While this timer is running, it will check the status of the task and log the status of the task to the log file. The default is currently to log every second for ten seconds.
===== The Repeated Timer =====
class RepeatedTimer(object):
    # Create a timer and start a timer. Any additional arguments are
    # passed to the task for the tasks to handle
    def __init__(self, interval, function, *args, **kwargs):
        self._timer    = None
        self.interval  = interval
        self.function  = function
        self.args      = args
        self.kwargs    = kwargs
        self.is_running = False
        self.start()
    # Start the timer and the task
    def _run(self):
        self.is_running = False
        self.start()
        self.function(*self.args, **self.kwargs)
    # Start the timer and the task if the timer is currently not running
    def start(self):
        if not self.is_running:
            self._timer = Timer(self.interval, self._run)
            self._timer.start()
            self.is_running = True
    # Stop the timer
    def stop(self):
        self._timer.cancel()
        self.is_running = False
===== Logging the Status of a Task =====
def check_status(user_id, log_name, task_id, scheduler, task_name):
    # The log file’s parent directory
    log_path = "/home/dev/web2py/applications/eden/logs/tasks/"
    # Connect to the task status database
    from gluon import DAL, Field
    db = DAL('sqlite://storage.db', folder='applications/eden/databases', auto_import=True)       
    # Grab the status of the current task from the database connection
    table = db.scheduler_task
    query = (table.id == task_id)
    task_status = db(query).select(table.status).first().status                   
    # Make the log directory if it doesn’t exist
    import os
    if not os.path.exists(log_path):
        os.makedirs(log_path)
    # Open the log and write the status
    with open(log_path + log_name, "a+") as log:
        log.write('%s is currently in the %s state\n' % (task_name, task_status))
=====Constructing and Running the Timer=====
if report_progress:
    # Construct the name of the log with current time, date, and task name
    log_name = datetime.datetime.now().strftime("%y-%m-%d-%H-%M") + "_" + task + ".txt"
    # Import the sleep module
    from time import sleep
    # Create the timer that runs check_status on the task every 1 second
    rt = RepeatedTimer(1, self.check_status, log_name, record.id, self.scheduler, task) 
    # Allow the timer to run 10 seconds then stop it.
    try:
        sleep(10)
    finally:
        rt.stop()

Latest revision as of 03:34, 29 October 2014

Sahana Eden: Extract The Progress Details

Introduction to Sahana Eden

Sahana is a software foundation with the express intent of “saving lives by providing information management solutions that enable organizations and communities to better prepare for and respond to disasters.” [1] This is achieved by developing free and open source software to make disaster response coordination more efficient. Eden (or Emergency Development ENvironment for Rapid Deployment Humanitarian Response Management), one of their main products is a feature-rich and rapidly customizable humanitarian platform that allows its modules to be specialized for particular organizations and management needs. Eden has been used to help address the wildfires in Chile, earthquakes and tsunamis in Japan, flooding in Colombia, flooding in Venezuela, flooding in Pakistan, and hurricane in Veracruz, Mexico [2]. The idea is to mix and match several modules that Eden has to best suit the needs and context of an organization. The main modules are an Organization Registry, which can create databases of organizations to help facilitate coordination; Project Tracking, which tells who is doing what where and when and can tell who is working on similar projects to help facilitate collaboration; Human Resources, a tool that helps track and manage the people involved including what skills those people have; Inventory Management, which allows for recording and automating shipments and deliveries of supplies; Asset Management, which helps facilitate management of assets like vehicles, communication equipment, and generators as well as tell to whom each are assigned; Assessments, which can be used to collect and analyze information; Shelter Management, which helps facilitate the management of temporary shelters including required resources, staff and volunteer assignments, and check-in/check-out systems; Scenario and Event planning, which helps facilitate planning for various emergency scenarios; Mapping, which enables location-based visualization on maps; and Messaging, which facilitates communications over various protocol and social media mediums. Eden’s WebSetup is the tool that facilitates the setup process of a new Eden tool. The process of setting up a new Eden project entails many moving parts, so detecting bugs and troubleshooting issues in the process can be tricky. A developer created a New Enhancement ticket that requested work on being able to extract the progress details of the setup into a single local file, not dissimilar to a log file. [3]

Web2Py Tasks and Task Scheduler

Sahana Eden uses the Web2Py; an application development framework written in Python that uses a typical MVC pattern. In order to best manage work on large and complicated functionality in the background, Web2Py implemented a scheduler that allows control over processes. A task is a function defined in a model:

def task_getParity(i):
    if i % 2 == 0:
        return “even”
    return “odd”
Queuing Tasks

Tasks can then be queued up with additional meta-parameters for running the function, such as when the task should die, how long can the process run until timeout occurs, how many times to repeat the function, what parameters to send to the task (if applicable), et cetera.

scheduler.queue_task(‘task_getParity’, [8], …)
Scheduler Creation

The scheduler is the manager for the tasks and workers. Creating a scheduler requires a database at the least in order to store the states of tasks. However, the scheduler can be instantiated with additional parameters that allow specifications for the time between each check of the queue, specific names of functions to queue, the amount of times the queue is checked before a worker must be terminated, et cetera.

from gluon.scheduler import Scheduler
scheduler = Scheduler(database [,…])
Task Lifecycle [4]

Similar to Java threads and Linux processes, a Web2Py process (interchangeable at this point with task) lifecycle is as follows:

  • Queued a task is queued to be picked up by a worker.
  • Expired the task died in the queue (the amount of time designated to cause the task to die has passed).
  • Assigned a worker is assigned to this task.
  • Running the task is being carried out.
  • Timeout the task timed out (the amount of time designated to case the task to timeout during processing has passed).
  • Failed an error was detected or an exception was thrown.
  • Completed the task finished correctly.
Worker Management

Also similar to Linux is the ability to control the workers (processes in Linux are manageable via jobs, synonymous with workers managing Web2Py tasks). Workers can be controlled with the following functions:

disable()   # Put the worker to sleep
terminate() # The worker dies gracefully as soon as possible
kill()      # The worker dies immediately

Sahana Eden-specific Tasks

The Sahana Eden developers created a wrapper class for the Web2Py tasks. The wrapper, s3task.py, is a very “thin” and adds only minimal functionality to the Web2Py tasks. Specifically, it allows a task to be run asynchronously, the database to handle CRUD actions including setting defaults and hiding unnecessary fields, duplicate task checking, determination of whether or not at least one worker is alive, and requeueing failed tasks. At the request of the mentors for this ticket, the functionality was generalized to all processes. Due to limitations of reporting the details of the progress of tasks, the logging is based on the status available via the scheduler. Explicit logging of steps would need to be on a per function basis, with each function logging their own progress.

The functionality of logging was added to the s3task.py and a timer was added to the s3utils.py file. If report_progress was set to true when scheduling a new task (default is false), then a logfile is created whose name contains the date and time the task was started and the task name. A timer is created. While this timer is running, it will check the status of the task and log the status of the task to the log file. The default is currently to log every second for ten seconds.

The Repeated Timer
class RepeatedTimer(object):
    # Create a timer and start a timer. Any additional arguments are
    # passed to the task for the tasks to handle
    def __init__(self, interval, function, *args, **kwargs):
        self._timer     = None
        self.interval   = interval
        self.function   = function
        self.args       = args
        self.kwargs     = kwargs
        self.is_running = False
        self.start()
    # Start the timer and the task 
    def _run(self):
        self.is_running = False
        self.start()
        self.function(*self.args, **self.kwargs) 
    # Start the timer and the task if the timer is currently not running
    def start(self):
        if not self.is_running:
            self._timer = Timer(self.interval, self._run)
            self._timer.start()
            self.is_running = True 
    # Stop the timer
    def stop(self):
        self._timer.cancel()
        self.is_running = False
Logging the Status of a Task
def check_status(user_id, log_name, task_id, scheduler, task_name):
    # The log file’s parent directory
    log_path = "/home/dev/web2py/applications/eden/logs/tasks/"
    # Connect to the task status database
    from gluon import DAL, Field
    db = DAL('sqlite://storage.db', folder='applications/eden/databases', auto_import=True)        
    # Grab the status of the current task from the database connection
    table = db.scheduler_task
    query = (table.id == task_id)
    task_status = db(query).select(table.status).first().status                     
    # Make the log directory if it doesn’t exist
    import os
    if not os.path.exists(log_path):
        os.makedirs(log_path)
    # Open the log and write the status
    with open(log_path + log_name, "a+") as log:
        log.write('%s is currently in the %s state\n' % (task_name, task_status))
Constructing and Running the Timer
if report_progress:
    # Construct the name of the log with current time, date, and task name
    log_name = datetime.datetime.now().strftime("%y-%m-%d-%H-%M") + "_" + task + ".txt"
    # Import the sleep module
    from time import sleep
    # Create the timer that runs check_status on the task every 1 second
    rt = RepeatedTimer(1, self.check_status, log_name, record.id, self.scheduler, task)  
    # Allow the timer to run 10 seconds then stop it. 
    try:
        sleep(10)
    finally:
        rt.stop()