Documentation for Database Anonymization: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 3: Line 3:
==Initial Setup for Anonymization==
==Initial Setup for Anonymization==
We use the <code>faker</code> gem to create fake names and replace the actual data with the names generated by this gem.
We use the <code>faker</code> gem to create fake names and replace the actual data with the names generated by this gem.
In the Gemfile add
In the <code>Gemfile</code> add
<pre>
<pre>
gem 'faker', '1.9.3', group: [:test, :development], require: false
gem 'faker', '1.9.3', group: [:test, :development], require: false
</pre>
</pre>


In config/locales, change en-US.yml to en.yml
In config/locales, change <code>en-US.yml</code> to <code>en.yml</code>
Inside the file, replace en-US: to en:
 
Inside the file, replace <code>en-US:</code> to <code>en:</code>
(This is to avoid any translation error thrown by i18n)
(This is to avoid any translation error thrown by i18n)


Have a database expertiza_development with real-time records of the expertiza database
 
Have a database <code>expertiza_development</code> with real-time records of the expertiza database


==Anonymization Script==
==Anonymization Script==


===Rake task to run the script===
===Rake task to run the script===
In lib/tasks/scrub_database.rake we have defined a namespace db inside which we have a task scrub
In <code>lib/tasks/scrub_database.rake</code> we have defined a namespace <code>db</code> inside which we have a task <code>scrub</code>


<pre>
<pre>
Line 32: Line 34:


===Logic of the script===
===Logic of the script===
In db/data_migrations/scrub_database.rb we have the main logic to anonymize the script
In <code>db/data_migrations/scrub_database.rb</code> we have the main logic to anonymize the script


<pre>
<pre>
Line 49: Line 51:
</pre>
</pre>


Format for anonymizing is as follows
===Format for anonymizing===


For Students:
For Students:
Line 70: Line 72:


<code>fname</code> : First name generated by Faker gem
<code>fname</code> : First name generated by Faker gem
<code>lname</code> : Last name generated by Faker gem
<code>lname</code> : Last name generated by Faker gem
<code>lname_subS</code> : Substring of Last name of first 4 characters
<code>lname_subS</code> : Substring of Last name of first 4 characters
<code>role</code> : Role of actual user (eg. teaching_assistant, instructor)
<code>role</code> : Role of actual user (eg. teaching_assistant, instructor)
<code>num</code> : A random number generated between 1 to 9
<code>num</code> : A random number generated between 1 to 9




==Running the anonymization script==
==Running the anonymization script==
This script anonymizes the records present in the expertiza_development database
This script anonymizes the records present in the <code>expertiza_development</code> database


To run this script, you have to switch to sudo user with the command sudo su
To run this script, you have to switch to sudo user with the command <code>sudo su</code>


cd lib/tasks
cd lib/tasks

Revision as of 07:46, 13 March 2023

We use Database Anonymization to scrub the database and replace real-time Expertiza records with fake data to use it for development purposes and to avoid hamper the integrity while development!

Initial Setup for Anonymization

We use the faker gem to create fake names and replace the actual data with the names generated by this gem. In the Gemfile add

gem 'faker', '1.9.3', group: [:test, :development], require: false

In config/locales, change en-US.yml to en.yml

Inside the file, replace en-US: to en: (This is to avoid any translation error thrown by i18n)


Have a database expertiza_development with real-time records of the expertiza database

Anonymization Script

Rake task to run the script

In lib/tasks/scrub_database.rake we have defined a namespace db inside which we have a task scrub

namespace :db do
  namespace :data do
    desc 'Scrubs the database of user information'
    task scrub: :environment do
      require './db/data_migrations/scrub_database.rb' # Require the data migration class
      ScrubDatabase.run! # Run the function which contains the logic
    end
  end
end

Logic of the script

In db/data_migrations/scrub_database.rb we have the main logic to anonymize the script

require 'i18n'
require 'faker'

class ScrubDatabase
  I18n.default_locale = :en # here, we set the default locale to be used
  I18n.reload!
  def self.run! # this is the function called in the rake file
    User.find_each do |user|
        #logic for anonymizing the records
    end
  end
end

Format for anonymizing

For Students:

user.name = "<fname>_<lname_subS><num>"
user.fullname = "<lname>, <fname>
user.email = "expertiza@mailinator.com"
user.password = "password"

For Others:

user.name = "<role>_<fname>_<lname_subS><num>"
user.fullname = "<fname>, <role>
user.email = "expertiza@mailinator.com"
user.password = "password"

where

fname : First name generated by Faker gem

lname : Last name generated by Faker gem

lname_subS : Substring of Last name of first 4 characters

role : Role of actual user (eg. teaching_assistant, instructor)

num : A random number generated between 1 to 9


Running the anonymization script

This script anonymizes the records present in the expertiza_development database

To run this script, you have to switch to sudo user with the command sudo su

cd lib/tasks

rake db:data:scrub --trace

After the task is completed, login to mysql

mysql -u root -p

use expertiza_development;

select name, email from users limit 10;

To check for any duplicate entries, run this SQL command

select name, count(name) from users group by name having count(name) >1;

Note down the names, and add them to the blacklist, since we would not be able to login to these accounts Ensure there are no more than 8 duplicate entries of students and none of other roles.