Documentation for Database Anonymization: Difference between revisions
No edit summary |
No edit summary |
||
(10 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
We use Database Anonymization to scrub the database and replace real-time Expertiza records with fake data to use it for development purposes and to avoid hamper the integrity while development! | We use Database Anonymization to scrub the database and replace real-time Expertiza records with fake data to use it for development purposes and to avoid hamper the integrity of users' data while development. | ||
==Initial Setup for Anonymization== | |||
We use the <code>faker</code> gem to create fake names and replace the actual data with the names generated by this gem. | |||
In the <code>Gemfile</code> add | |||
<pre> | |||
gem 'faker', '1.9.3', group: [:test, :development], require: false | |||
</pre> | |||
In <code>config/locales</code>, change <code>en-US.yml</code> to <code>en.yml</code> | |||
Inside the file, replace <code>en-US:</code> to <code>en:</code> | |||
(This is to avoid any translation error thrown by i18n) | |||
Have a database <code>expertiza_development</code> with real-time records of the expertiza database | |||
==Anonymization Script== | |||
===Rake task to run the script=== | |||
In <code>lib/tasks/scrub_database.rake</code> we have defined a namespace <code>db</code> inside which we have a task <code>scrub</code> | |||
<pre> | |||
namespace :db do | |||
namespace :data do | |||
desc 'Scrubs the database of user information' | |||
task scrub: :environment do | |||
require './db/data_migrations/scrub_database.rb' # Require the data migration class | |||
ScrubDatabase.run! # Run the function which contains the logic | |||
end | |||
end | |||
end | |||
</pre> | |||
===Logic of the script=== | |||
In <code>db/data_migrations/scrub_database.rb</code> we have the main logic to anonymize the script | |||
<pre> | |||
require 'i18n' | |||
require 'faker' | |||
class ScrubDatabase | |||
I18n.default_locale = :en # here, we set the default locale to be used | |||
I18n.reload! | |||
def self.run! # this is the function called in the rake file | |||
User.find_each do |user| | |||
#logic for anonymizing the records | |||
end | |||
end | |||
end | |||
</pre> | |||
===Format for anonymizing=== | |||
For Students: | |||
<pre> | |||
user.name = "<fname>_<lname_subS><num>" | |||
user.fullname = "<lname>, <fname> | |||
user.email = "expertiza@mailinator.com" | |||
user.password = "password" | |||
</pre> | |||
For Others: | |||
<pre> | |||
user.name = "<role>_<fname>_<lname_subS><num>" | |||
user.fullname = "<fname>, <role> | |||
user.email = "expertiza@mailinator.com" | |||
user.password = "password" | |||
</pre> | |||
where | |||
<code>fname</code> : First name generated by Faker gem | |||
<code>lname</code> : Last name generated by Faker gem | |||
<code>lname_subS</code> : Substring of Last name of first 4 characters | |||
<code>role</code> : Role of actual user (eg. teaching_assistant, instructor) | |||
<code>num</code> : A random number generated between 1 to 9 | |||
==Running the anonymization script== | |||
This script anonymizes the records present in the <code>expertiza_development</code> database | |||
To run this script, you have to switch to sudo user with the command <code>sudo su</code> | |||
<pre> | |||
cd lib/tasks | |||
</pre> | |||
<pre> | |||
rake db:data:scrub --trace | |||
</pre> | |||
After the task is completed, login to mysql | |||
<pre> | |||
mysql -u root -p | |||
use expertiza_development; | |||
select name, fullname, email from users limit 10; | |||
</pre> | |||
To check for any duplicate entries, run this SQL command | |||
<pre> | |||
select name, count(name) from users group by name having count(name) >1; | |||
</pre> | |||
'''*Note down the names, and add them to the blacklist, since we would not be able to login to these accounts | |||
''' | |||
'''*Ensure there are no more than 8 duplicate entries of students and none of other roles. Otherwise re-run the script | |||
''' |
Latest revision as of 07:54, 13 March 2023
We use Database Anonymization to scrub the database and replace real-time Expertiza records with fake data to use it for development purposes and to avoid hamper the integrity of users' data while development.
Initial Setup for Anonymization
We use the faker
gem to create fake names and replace the actual data with the names generated by this gem.
In the Gemfile
add
gem 'faker', '1.9.3', group: [:test, :development], require: false
In config/locales
, change en-US.yml
to en.yml
Inside the file, replace en-US:
to en:
(This is to avoid any translation error thrown by i18n)
Have a database expertiza_development
with real-time records of the expertiza database
Anonymization Script
Rake task to run the script
In lib/tasks/scrub_database.rake
we have defined a namespace db
inside which we have a task scrub
namespace :db do namespace :data do desc 'Scrubs the database of user information' task scrub: :environment do require './db/data_migrations/scrub_database.rb' # Require the data migration class ScrubDatabase.run! # Run the function which contains the logic end end end
Logic of the script
In db/data_migrations/scrub_database.rb
we have the main logic to anonymize the script
require 'i18n' require 'faker' class ScrubDatabase I18n.default_locale = :en # here, we set the default locale to be used I18n.reload! def self.run! # this is the function called in the rake file User.find_each do |user| #logic for anonymizing the records end end end
Format for anonymizing
For Students:
user.name = "<fname>_<lname_subS><num>" user.fullname = "<lname>, <fname> user.email = "expertiza@mailinator.com" user.password = "password"
For Others:
user.name = "<role>_<fname>_<lname_subS><num>" user.fullname = "<fname>, <role> user.email = "expertiza@mailinator.com" user.password = "password"
where
fname
: First name generated by Faker gem
lname
: Last name generated by Faker gem
lname_subS
: Substring of Last name of first 4 characters
role
: Role of actual user (eg. teaching_assistant, instructor)
num
: A random number generated between 1 to 9
Running the anonymization script
This script anonymizes the records present in the expertiza_development
database
To run this script, you have to switch to sudo user with the command sudo su
cd lib/tasks
rake db:data:scrub --trace
After the task is completed, login to mysql
mysql -u root -p use expertiza_development; select name, fullname, email from users limit 10;
To check for any duplicate entries, run this SQL command
select name, count(name) from users group by name having count(name) >1;
*Note down the names, and add them to the blacklist, since we would not be able to login to these accounts
*Ensure there are no more than 8 duplicate entries of students and none of other roles. Otherwise re-run the script