CSC/ECE 517 Fall 2018- Project E1846. OSS Project Navy: Character Issues: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
Line 16: Line 16:
=== Solution Approach ===
=== Solution Approach ===


The following changes were made in the app/controllers/application_controller.rb file:-
One of the solutions proposed was filtering out the non-UTF8 characters before saving the input in the database. Since the non-UTF8 input can come from any view, we implemented a ''filter_non_UTF8'' method in application controller to do just that and adhere to DRY principle.


1.We defined the following before_action: filter_ascii
However, while experimenting with the fix, we found out that not all tables support UTF8 formatting. For E.g., the versions table which has the latin charset.  
2. The following methods were defined by us<br>
  i. filter_ascii<br>
  This method makes a recursive call to the remove_ascii method.  


  ii. remove_ascii<br>
mysql> show create table versions;
   This method is a recursive hashing function that checks the input params and then removes the Chinese non-UTF-8 comma or any such characters and then saves this value.  
| Table   | Create Table | 
 
  | versions | CREATE TABLE `versions` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `item_type` varchar(255) NOT NULL,
  `item_id` int(11) NOT NULL,
  `event` varchar(255) NOT NULL,
  `whodunnit` varchar(255) DEFAULT NULL,
  `object` mediumtext,
  `created_at` datetime DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `index_versions_on_item_type_and_item_id` (`item_type`,`item_id`)
  ) ENGINE=InnoDB AUTO_INCREMENT=142423 DEFAULT CHARSET=latin1 |


  iii. self.verify<br>
1 row in set (0.01 sec)
  This function is just as its name suggests to verify the valid input.  


3. The above changes were also made to make the code follow ruby style and guidelines.
We changed the charset with the command: <br>
ALTER TABLE versions CONVERT TO CHARACTER SET utf8;


The following changes were made in the apps/views/popup/self_review_popup.html.erb file:- 
Which solved the problem. Thus, we created the migration VersionTableSupportUTF8 to change version's characterset.


1.Adding the functionality of ‘sanitize’ to specific html popups such as answer, comments and additional_comment which strips them of the html tags and thus ensures in the solution of our 2nd problem.
If we want to fix this in all tables, we can do following for each database via script or add migration for each table.<br>
SELECT CONCAT('ALTER TABLE ', TABLE_NAME, ' CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;') FROM information_schema.TABLES WHERE
TABLE_SCHEMA = 'expertiza_development';


== Testing Details==
== Testing Details==

Revision as of 00:04, 4 November 2018

E1846. OSS Project Navy: Character Issues Fall 2018, CSC/ECE 517.

Problem Statement

1. In the existing Expertiza setup, the database supports only UTF-8 characters. Hence, if a user enters a non UTF-8 character, the database throws an error. This further leads to loss of data while refreshing or going back to the input page as data wasn't saved in database, effectively leading to loss of entire review if there's even a single non UTF-8 character. We need to solve the problem by removing such unsupported characters.

2. The existing expertiza stores the HTML formatting tags (Like <b> for bold) as a string. However, while rendering the string these tags are not escaped, resulting in no formatting. We need to solve the issue and display proper formatting.

Files Refactored

The following files were modified for this project namely:
1. application_controller
2. self_review_popup
3. Created a new migration - VersionTableSupportUTF8

Solution Approach

One of the solutions proposed was filtering out the non-UTF8 characters before saving the input in the database. Since the non-UTF8 input can come from any view, we implemented a filter_non_UTF8 method in application controller to do just that and adhere to DRY principle.

However, while experimenting with the fix, we found out that not all tables support UTF8 formatting. For E.g., the versions table which has the latin charset.

mysql> show create table versions;

| Table    | Create Table |   
 
| versions | CREATE TABLE `versions` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `item_type` varchar(255) NOT NULL,
 `item_id` int(11) NOT NULL,
 `event` varchar(255) NOT NULL,
 `whodunnit` varchar(255) DEFAULT NULL,
 `object` mediumtext,
 `created_at` datetime DEFAULT NULL,
 PRIMARY KEY (`id`),
 KEY `index_versions_on_item_type_and_item_id` (`item_type`,`item_id`)
) ENGINE=InnoDB AUTO_INCREMENT=142423 DEFAULT CHARSET=latin1 |
1 row in set (0.01 sec)

We changed the charset with the command:

ALTER TABLE versions CONVERT TO CHARACTER SET utf8;

Which solved the problem. Thus, we created the migration VersionTableSupportUTF8 to change version's characterset.

If we want to fix this in all tables, we can do following for each database via script or add migration for each table.

SELECT CONCAT('ALTER TABLE ', TABLE_NAME, ' CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;') FROM information_schema.TABLES WHERE 
TABLE_SCHEMA = 'expertiza_development';

Testing Details

RSpec

We found no existing test cases for the above two problem statements. So we are to define test cases which generate the list of unwanted Chinese non-UTF8 characters and also HTML tags in the views and then refactor our code so that it automatically removes them without any failing of functionalities.

References

  1. Link to the videos to see the steps followed for testing and resolving the problems: [1]