CSC/ECE 517 Spring 2015/ch1a 7 SA

From Expertiza_Wiki
Jump to navigation Jump to search
Source: http://www.w7cloud.com/7-reasons-to-use-amazon-s3-cloud-computing-online-storage/

Amazon Simple Storage Service (Amazon S3) is a remote, scalable, secure, and cost efficient storage space service provided by Amazon. Users are able to access their storage on Amazon S3 from the web via REST <ref>Wikipedia: REST[1]</ref> HTTP <ref>[2]</ref>, or SOAP <ref>[3]</ref> making their data accessible from virtually anywhere in the world. Amazon S3 implements redundancy across multiple devices on multiple facilities in order to safeguard against application failure ,data loss and minimization of downtime <ref>[4]</ref>. Some of the most prominent users of Amazon S3 include: Netflix, SmugMug, Wetransfer, Pinterest, and NASDAQ <ref>[5]</ref>.

Writing Assignment 1a


Background

Amazon S3 launched in March of 2006 in the United States <ref>[6]</ref> and in Europe in November of 2007 <ref>[7]</ref>. Since its inception, Amazon S3 has reported tremendous growth. Beginning in July of 2006, S3 hosted 800 million objects <ref>[8]</ref>; April of 2007, 5 billion objects <ref>[9]</ref>; October of 2007, 10 billion<ref>[10]</ref>; Jan 2008, 14 billion <ref>[11]</ref>; October 2008, 29 billion <ref>[12]</ref>; March 2009, 52 billion <ref>[13]</ref>; August 2009, 64 billion <ref>[14]</ref>. In April of 2013, S3 now hosts more than 2 trillion objects and on average 1.1 million requests every second! <ref>[15]</ref>.

Design

S3 is an example of an object storage and is not like a traditional hierarchical file system. S3 exposes a simple feature set to improve robustness and all data in S3 is accessed in the terms of objects and buckets.

Objects

Objects are the basic units of storage in Amazon S3. Each object is composed of object data and metadata. S3 supports a size of up to 5 Terabytes per object. Each object has a metadata part that is used to identify the object. Metadata is a set of name-value pairs that describe the object like date modified. Custom data about the object can be stored in metadata by the user. Every object is identified by a user defined key and is versioned by default. <ref>[16]</ref>. An object consists of the following - Key, Version ID, Value, Metadata, Subresources and Access Control Information. <ref>[17]</ref>

Buckets

A bucket is a container for objects and every object must be part of a bucket. Any number of objects can be part of a Bucket. Buckets can be configured to be hosted in a particular region (US, EU, Asia Pacific etc.) in order to optimize latency. S3 limits the number of buckets per account to 100. <ref>[18]</ref>

Keys and Metadata

An user specifies a key to an object on creation which is used to uniquely identify the object in the bucket. These names acts as the keys for the objects and can be at most 1024 bytes long.

There are two kinds of metadata for an object - System metadata and Object metadata. System metadata is used by S3 for object management. For eg. - Data, Content-Type etc. are stored as System metadata. Object metadata is optional and can be used by the user to add additional metadata to the objects while object creation. <ref>[19]</ref>

Regions

Regions allow a user to specify the geographical region where the buckets will be stored. This can be used to optimize latency and minimizing costs. S3 supports the following regions - US Standard, US West (Oregon) region, US West (N. California) region, EU (Ireland) region, EU (Frankfurt) region, Asia Pacific (Singapore) region, Asia Pacific (Tokyo) region, Asia Pacific (Sydney) region, South America (Sao Paulo) region <ref>[20]</ref>

Versioning

All objects in S3 are versioned by default and it can be used to retrieve and restore every version of an object in a bucket. Every change to an object(create, modify, delete) results in a separate version of the object which can be later used for restoring or recovery. Versioning is done at the bucket level and not for individual objects. It can be turned off or on per bucket but a versioned-enabled bucket cannot be turned to an unversioned bucket. Versioning can only be paused in these cases. <ref>[21]</ref>

Access Permissions

All resources(buckets,objects etc) are private in Amazon S3 by default. Only the resource owner can access the resource and can grant access to other users to accesss the resource. There are two types of access policies in S3 - Resource-based and user policies. Resource-based policies are attached to a particular resource and user policies are assigned to a particular user.<ref>[22]</ref>

Data Protection

Objects are redundantly stored on multiple devices across multiple facilities within a region for durability. To improve durability, write requests do not return success before storing the data across multiple facilities. Also checksums are used to verify data integrity. If any corruption is detected, it is repaired using redundant data.<ref>[23]</ref>

Ruby and Amazon S3

Amazon Web Services (AWS) provides an SDK (<ref>download</ref>) that works with Ruby for many amazon webservices, including Amazon S3. Developers new to the Amazon AWS SDK should begin with version 2 as it includes many built in features such as waiters, automatically paginated responses, and a streamlined plugin style architecture. Version 2 of the SDK has 2 "packages", also referred to as "gems" <ref>Wikipedia: Ruby Gems<ref>[24]</ref>:

  • aws-sdk-core - provides a direct mapping to the AWS APIs including automatic response paging, waiters, parameter validation, and Ruby type support
  • aws-sdk-resources - provides an object-oriented abstraction over low-level interfaces in the core to reduce the complexity of utilizing core interfaces; resource objects reference other objects such as an Amazon S3 instance and the attributes and actions as instance variables and methods.


It should also be noted that there exists a Version 1 of the aws sdk that lacks some "convenience features" otherwise available in version 2 of the sdk. For more information see the <ref>Ruby Development Blog</ref>

Examples

There are 3 key classes in AWS SDK <ref>[25]</ref> -

  • AWS::S3 - Denotes an interface to Amazon S3 for the Ruby SDK. It has the #buckets instance method for creating new buckets or accessing existing buckets.
  • AWS::S3::Bucket - Denotes an Amazon S3 Bucket. It provides the #objects instance method to access existing objects and also other methods to get information about a bucket.
  • AWS::S3::S3Object - Denotes an Amazon S3 Object. It provides the method that gives information about the object and also setting access permissions, copying, deleting and uploading objects.

Creating a connection to S3 server

AWS::S3::Base.establish_connection!(
        :server            => 'objects.example.com',
        :use_ssl           => true,
        :access_key_id     => 'my-access-key',
        :secret_access_key => 'my-secret-key'
)
Source: http://ceph.com/docs/master/radosgw/s3/ruby/

Listing all buckets you own

AWS::S3::Service.buckets.each do |bucket|
        puts "#{bucket.name}\t#{bucket.creation_date}"
end

Expected output:

mybuckat1   2011-04-21T18:05:39.000Z
mybuckat2   2011-04-21T18:05:48.000Z
mybuckat3   2011-04-21T18:07:18.000Z
Source: http://ceph.com/docs/master/radosgw/s3/ruby/

Listing a bucket's contents

new_bucket = AWS::S3::Bucket.find('my-new-bucket')
new_bucket.each do |object|
        puts "#{object.key}\t#{object.about<ref>['content-length']}\t#{object.about<ref>['last-modified']}"
end

Expected output

file1.filex 251262  2011-08-08T21:35:48.000Z
file2.filex 262518  2011-08-08T21:38:01.000Z
Source: http://ceph.com/docs/master/radosgw/s3/ruby/

Deleting a bucket

Note: The target bucket must be empty!
AWS::S3::Bucket.delete('my-new-bucket')
Source: http://ceph.com/docs/master/radosgw/s3/ruby/

Forced removal of non-empty buckets

AWS::S3::Bucket.delete('my-new-bucket', :force => true)
Source: http://ceph.com/docs/master/radosgw/s3/ruby/

Creating an object

AWS::S3::S3Object.store(
        'hello.txt',
        'Hello World!',
        'my-new-bucket',
        :content_type => 'text/plain'
)
Source: http://ceph.com/docs/master/radosgw/s3/ruby/

Change an object's ACL (access control list)

policy = AWS::S3::S3Object.acl('hello.txt', 'my-new-bucket')
policy.grants = <ref>[ AWS::S3::ACL::Grant.grant(:public_read) ]
AWS::S3::S3Object.acl('hello.txt', 'my-new-bucket', policy)

policy = AWS::S3::S3Object.acl('secret_plans.txt', 'my-new-bucket')
policy.grants = <ref>[]
AWS::S3::S3Object.acl('secret_plans.txt', 'my-new-bucket', policy)
Source: http://ceph.com/docs/master/radosgw/s3/ruby/

Download an object to a folder

Note: This downloads the object poetry.pdf and saves it in /home/larry/documents/
open('/home/larry/documents/poetry.pdf', 'w') do |file|
        AWS::S3::S3Object.stream('poetry.pdf', 'my-new-bucket') do |chunk|
                file.write(chunk)
        end
end
Source: http://ceph.com/docs/master/radosgw/s3/ruby/

Deleting an object

Note: This deletes the object goodbye.txt
AWS::S3::S3Object.delete('goodbye.txt', 'my-new-bucket')
Source: http://ceph.com/docs/master/radosgw/s3/ruby/

Generating object download urls

puts AWS::S3::S3Object.url_for(
        'hello.txt',
        'my-new-bucket',
        :authenticated => false
)

puts AWS::S3::S3Object.url_for(
        'secret_plans.txt',
        'my-new-bucket',
        :expires_in => 60 * 60
)

Expected Output:

http://objects.dreamhost.com/my-bucket-name/hello.txt
http://objects.dreamhost.com/my-bucket-name/secret_plans.txt?Signature=XXXXXXXXXXXXXXXXXXXXXXXXXXX&Expires=1316027075&AWSAccessKeyId=XXXXXXXXXXXXXXXXXXX
Source: http://ceph.com/docs/master/radosgw/s3/ruby/

Upload a file to Amazon S3

As per the Apache License v 2.0, the follow code is reproducible and redistributable with the following <ref>license</ref>.
# Copyright 2011-2013 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"). You
# may not use this file except in compliance with the License. A copy of
# the License is located at
#
#     http://aws.amazon.com/apache2.0/
#
# or in the "license" file accompanying this file. This file is
# distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF
# ANY KIND, either express or implied. See the License for the specific
# language governing permissions and limitations under the License.

require 'aws-sdk'

(bucket_name, file_name) = ARGV
unless bucket_name && file_name
  puts "Usage: upload_file.rb <BUCKET_NAME> <FILE_NAME>"
  exit 1
end

# get an instance of the S3 interface using the default configuration
s3 = AWS::S3.new

# create a bucket
b = s3.buckets.create(bucket_name)

# upload a file
basename = File.basename(file_name)
o = b.objects[basename]
o.write(:file => file_name)

puts "Uploaded #{file_name} to:"
puts o.public_url

# generate a presigned URL
puts "\nUse this URL to download the file:"
puts o.url_for(:read)

puts "(press any key to delete the object)"
$stdin.getc

o.delete

See the following link for the documentation for AWS SDK - <ref>AWS SDK for Ruby</ref>

References

<references/>