CSC/ECE 517 Fall 2018- Project E1846. OSS Project Navy: Character Issues: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 23: Line 23:
   
   
  | Table    | Create Table |   
  | Table    | Create Table |   
 
  | versions | CREATE TABLE `versions` (
  | versions | CREATE TABLE `versions` (
   `id` int(11) NOT NULL AUTO_INCREMENT,
   `id` int(11) NOT NULL AUTO_INCREMENT,
Line 35: Line 34:
   KEY `index_versions_on_item_type_and_item_id` (`item_type`,`item_id`)
   KEY `index_versions_on_item_type_and_item_id` (`item_type`,`item_id`)
  ) ENGINE=InnoDB AUTO_INCREMENT=142423 DEFAULT CHARSET=latin1 |
  ) ENGINE=InnoDB AUTO_INCREMENT=142423 DEFAULT CHARSET=latin1 |
  1 row in set (0.01 sec)
  1 row in set (0.01 sec)


Line 41: Line 39:
   
   
  mysql> ALTER TABLE versions CONVERT TO CHARACTER SET utf8;
  mysql> ALTER TABLE versions CONVERT TO CHARACTER SET utf8;
Output of show create table now is


  mysql> show create table versions;
  mysql> show create table versions;
  | Table    | Create Table                                                                                                                                                                                                                                                                                                                                                                                                                         
  | Table    | Create Table                                                                                                                                                                                                                                                                                                                                                                                                                         
|
  | versions | CREATE TABLE `versions` (
  | versions | CREATE TABLE `versions` (
   `id` int(11) NOT NULL AUTO_INCREMENT,
   `id` int(11) NOT NULL AUTO_INCREMENT,
Line 56: Line 55:
   KEY `index_versions_on_item_type_and_item_id` (`item_type`,`item_id`)
   KEY `index_versions_on_item_type_and_item_id` (`item_type`,`item_id`)
  ) ENGINE=InnoDB AUTO_INCREMENT=142423 DEFAULT CHARSET=utf8 |
  ) ENGINE=InnoDB AUTO_INCREMENT=142423 DEFAULT CHARSET=utf8 |
1 row in set (0.01 sec)


1 row in set (0.01 sec)
This solved the problem. Therefore, we created the migration VersionTableSupportUTF8 to change version's characterset.


This solved the problem. Therefore, we created the migration VersionTableSupportUTF8 to change version's characterset.
rails g migration VersionTableSupportUTF8
 
def change
  execute "ALTER TABLE versions CONVERT TO CHARACTER SET utf8"
end


If we want to fix this in all tables, we can do following for each database via script or add migration for each table.<br>
If we want to fix this in all tables, we can do following for each database via script or add migration for each table.<br>
Line 67: Line 71:


== Testing Details==
== Testing Details==
We have manually tested the fix.


=== RSpec ===
=== RSpec ===

Revision as of 00:11, 5 November 2018

E1846. OSS Project Navy: Character Issues Fall 2018, CSC/ECE 517.

Problem Statement

1. In the existing Expertiza setup, the database supports only UTF-8 characters. Hence, if a user enters a non UTF-8 character, the database throws an error. This further leads to loss of data while refreshing or going back to the input page as data wasn't saved in database, effectively leading to loss of entire review if there's even a single non UTF-8 character. We need to solve the problem by removing such unsupported characters.

2. The existing expertiza stores the HTML formatting tags (Like <b> for bold) as a string. However, while rendering the string these tags are not escaped, resulting in no formatting. We need to solve the issue and display proper formatting.

Files Refactored

The following files were modified for this project namely:
1. application_controller
2. self_review_popup
3. Created a new migration - VersionTableSupportUTF8

Solution Approach

One of the solutions proposed was filtering out the non-UTF8 characters before saving the input in the database. Since the non-UTF8 input can come from any view, we implemented a filter_non_UTF8 method in application controller to do just that and adhere to DRY principle.

However, while experimenting with the fix, we found out that not all tables support UTF8 formatting. For E.g., the versions table which has the latin charset.

mysql> show create table versions;

| Table    | Create Table |   
| versions | CREATE TABLE `versions` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `item_type` varchar(255) NOT NULL,
 `item_id` int(11) NOT NULL,
 `event` varchar(255) NOT NULL,
 `whodunnit` varchar(255) DEFAULT NULL,
 `object` mediumtext,
 `created_at` datetime DEFAULT NULL,
 PRIMARY KEY (`id`),
 KEY `index_versions_on_item_type_and_item_id` (`item_type`,`item_id`)
) ENGINE=InnoDB AUTO_INCREMENT=142423 DEFAULT CHARSET=latin1 |
1 row in set (0.01 sec)

We changed the charset with the command:

mysql> ALTER TABLE versions CONVERT TO CHARACTER SET utf8;

Output of show create table now is

mysql> show create table versions;
| Table    | Create Table                                                                                                                                                                                                                                                                                                                                                                                                                         
| versions | CREATE TABLE `versions` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `item_type` varchar(255) NOT NULL,
 `item_id` int(11) NOT NULL,
 `event` varchar(255) NOT NULL,
 `whodunnit` varchar(255) DEFAULT NULL,
 `object` mediumtext,
 `created_at` datetime DEFAULT NULL,
 PRIMARY KEY (`id`),
 KEY `index_versions_on_item_type_and_item_id` (`item_type`,`item_id`)
) ENGINE=InnoDB AUTO_INCREMENT=142423 DEFAULT CHARSET=utf8 |
1 row in set (0.01 sec)

This solved the problem. Therefore, we created the migration VersionTableSupportUTF8 to change version's characterset.

rails g migration VersionTableSupportUTF8
def change
 execute "ALTER TABLE versions CONVERT TO CHARACTER SET utf8"
end 

If we want to fix this in all tables, we can do following for each database via script or add migration for each table.

SELECT CONCAT('ALTER TABLE ', TABLE_NAME, ' CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;') FROM information_schema.TABLES WHERE 
TABLE_SCHEMA = 'expertiza_development';

Testing Details

We have manually tested the fix.

RSpec

We found no existing test cases for the above two problem statements. So we are to define test cases which generate the list of unwanted Chinese non-UTF8 characters and also HTML tags in the views and then refactor our code so that it automatically removes them without any failing of functionalities.

References

  1. Link to the videos to see the steps followed for testing and resolving the problems: [1]