CSC/ECE 517 Fall 2018- Project E1846. OSS Project Navy: Character Issues: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
Line 3: Line 3:
== Problem Statement ==
== Problem Statement ==


1. In the existing Expertiza setup, the database supports only UTF-8 characters. Hence, if a user enters a non UTF-8 character, the database throws an error. This further leads to loss of data while refreshing or going back to the input page as data wasn't saved in database, effectively leading to loss of entire review if there's even a single non UTF-8 character. We need to solve the problem by removing such unsupported characters.
As part of the project we were given task to fix following two issues.


2. The existing expertiza stores the HTML formatting tags (Like <nowiki><b></nowiki> for bold) as a string. However, while rendering the string these tags are not escaped, resulting in no formatting. We need to solve the issue and display proper formatting.
1. [https://github.com/expertiza/expertiza/issues/927 Issue 1]: In the existing Expertiza setup, the database supports only UTF-8 characters. Hence, if a user enters a non UTF-8 character, the database throws an error. This further leads to loss of data while refreshing or going back to the input page as data wasn't saved in database, effectively leading to loss of entire review if there's even a single non UTF-8 character. We need to solve the problem by removing such unsupported characters.
 
2. [https://github.com/expertiza/expertiza/issues/962 Issue 2]: The existing expertiza stores the HTML formatting tags (Like <nowiki><b></nowiki> for bold) as a string. However, while rendering the string these tags are not escaped, resulting in no formatting. We need to solve the issue and display proper formatting.
 
These are important issues from usability point of view and need to be fixed.


== Solution Approach ==
== Solution Approach ==

Revision as of 03:21, 8 November 2018

E1846. OSS Project Navy: Character Issues Fall 2018, CSC/ECE 517.

Problem Statement

As part of the project we were given task to fix following two issues.

1. Issue 1: In the existing Expertiza setup, the database supports only UTF-8 characters. Hence, if a user enters a non UTF-8 character, the database throws an error. This further leads to loss of data while refreshing or going back to the input page as data wasn't saved in database, effectively leading to loss of entire review if there's even a single non UTF-8 character. We need to solve the problem by removing such unsupported characters.

2. Issue 2: The existing expertiza stores the HTML formatting tags (Like <b> for bold) as a string. However, while rendering the string these tags are not escaped, resulting in no formatting. We need to solve the issue and display proper formatting.

These are important issues from usability point of view and need to be fixed.

Solution Approach

Files Created or Refactored

The following files were modified for this project namely:
1. Refactored application_controller
2. Refactored self_review_popup
3. Created a new migration - VersionTableSupportUTF8
4. Created Rspec file application_controller_spec.rb

Solution

One of the solutions proposed was filtering out the non-UTF8 characters before saving the input in the database. Since the non-UTF8 input can come from any view, we implemented a filter_non_UTF8 method in application controller to do just that and adhere to DRY principle. An alternative approach will be removing the problematic characters within individual functions, but this leads to repetitive code and a violation of DRY principle.

However, while experimenting with the fix, we found out that not all tables support UTF8 formatting. For E.g., the versions table which has the latin charset.

mysql> show create table versions;

| Table    | Create Table |   
| versions | CREATE TABLE `versions` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `item_type` varchar(255) NOT NULL,
 `item_id` int(11) NOT NULL,
 `event` varchar(255) NOT NULL,
 `whodunnit` varchar(255) DEFAULT NULL,
 `object` mediumtext,
 `created_at` datetime DEFAULT NULL,
 PRIMARY KEY (`id`),
 KEY `index_versions_on_item_type_and_item_id` (`item_type`,`item_id`)
) ENGINE=InnoDB AUTO_INCREMENT=142423 DEFAULT CHARSET=latin1 |
1 row in set (0.01 sec)

We changed the charset with the command:

mysql> ALTER TABLE versions CONVERT TO CHARACTER SET utf8;

Output of show create table now is

mysql> show create table versions;
| Table    | Create Table                                                                                                                                                                                                                                                                                                                                                                                                                         
| versions | CREATE TABLE `versions` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `item_type` varchar(255) NOT NULL,
 `item_id` int(11) NOT NULL,
 `event` varchar(255) NOT NULL,
 `whodunnit` varchar(255) DEFAULT NULL,
 `object` mediumtext,
 `created_at` datetime DEFAULT NULL,
 PRIMARY KEY (`id`),
 KEY `index_versions_on_item_type_and_item_id` (`item_type`,`item_id`)
) ENGINE=InnoDB AUTO_INCREMENT=142423 DEFAULT CHARSET=utf8 |
1 row in set (0.01 sec)

This solved the problem. Therefore, we created the migration VersionTableSupportUTF8 to change version's characterset.

rails g migration VersionTableSupportUTF8
def change
 execute "ALTER TABLE versions CONVERT TO CHARACTER SET utf8"
end 

If we want to fix this in all tables, we can do following for each database via script or add migration for each table.

SELECT CONCAT('ALTER TABLE ', TABLE_NAME, ' CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;') FROM information_schema.TABLES WHERE 
TABLE_SCHEMA = 'expertiza_development';


The HTML template issue was caused due to a security feature of Ruby, which by default does not evaluate strings. To resolve the HTML template issue, we used the sanitize function. This strips all the tags that aren't whitelisted, thus ruby now renders the standard HTML tags. We have sanitized required pages.

Test Plan

The HTML tags issue is tested by simply checking the output is properly formatted and appropriate.
We have implemented RSPEC test to test that the given non-UTF8 character is removed and a valid UTF-8 character is not removed. This ensures that the functionality is exhaustive.

Both the Rspec tests pass:

References

  1. Link to the videos to see the steps followed for testing and resolving the problems: [1]