CSC/ECE 517 Fall 2018- Project E1846. OSS Project Navy: Character Issues: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
No edit summary
Line 39: Line 39:


We changed the charset with the command: <br>
We changed the charset with the command: <br>
  ALTER TABLE versions CONVERT TO CHARACTER SET utf8;
   
mysql> ALTER TABLE versions CONVERT TO CHARACTER SET utf8;


Which solved the problem. Thus, we created the migration VersionTableSupportUTF8 to change version's characterset.
mysql> show create table versions;
| Table    | Create Table                                                                                                                                                                                                                                                                                                                                                                                                                       
|
| versions | CREATE TABLE `versions` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `item_type` varchar(255) NOT NULL,
  `item_id` int(11) NOT NULL,
  `event` varchar(255) NOT NULL,
  `whodunnit` varchar(255) DEFAULT NULL,
  `object` mediumtext,
  `created_at` datetime DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `index_versions_on_item_type_and_item_id` (`item_type`,`item_id`)
) ENGINE=InnoDB AUTO_INCREMENT=142423 DEFAULT CHARSET=utf8 |
 
1 row in set (0.01 sec)
 
This solved the problem. Therefore, we created the migration VersionTableSupportUTF8 to change version's characterset.


If we want to fix this in all tables, we can do following for each database via script or add migration for each table.<br>
If we want to fix this in all tables, we can do following for each database via script or add migration for each table.<br>

Revision as of 00:06, 5 November 2018

E1846. OSS Project Navy: Character Issues Fall 2018, CSC/ECE 517.

Problem Statement

1. In the existing Expertiza setup, the database supports only UTF-8 characters. Hence, if a user enters a non UTF-8 character, the database throws an error. This further leads to loss of data while refreshing or going back to the input page as data wasn't saved in database, effectively leading to loss of entire review if there's even a single non UTF-8 character. We need to solve the problem by removing such unsupported characters.

2. The existing expertiza stores the HTML formatting tags (Like <b> for bold) as a string. However, while rendering the string these tags are not escaped, resulting in no formatting. We need to solve the issue and display proper formatting.

Files Refactored

The following files were modified for this project namely:
1. application_controller
2. self_review_popup
3. Created a new migration - VersionTableSupportUTF8

Solution Approach

One of the solutions proposed was filtering out the non-UTF8 characters before saving the input in the database. Since the non-UTF8 input can come from any view, we implemented a filter_non_UTF8 method in application controller to do just that and adhere to DRY principle.

However, while experimenting with the fix, we found out that not all tables support UTF8 formatting. For E.g., the versions table which has the latin charset.

mysql> show create table versions;

| Table    | Create Table |   
 
| versions | CREATE TABLE `versions` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `item_type` varchar(255) NOT NULL,
 `item_id` int(11) NOT NULL,
 `event` varchar(255) NOT NULL,
 `whodunnit` varchar(255) DEFAULT NULL,
 `object` mediumtext,
 `created_at` datetime DEFAULT NULL,
 PRIMARY KEY (`id`),
 KEY `index_versions_on_item_type_and_item_id` (`item_type`,`item_id`)
) ENGINE=InnoDB AUTO_INCREMENT=142423 DEFAULT CHARSET=latin1 |
1 row in set (0.01 sec)

We changed the charset with the command:

mysql> ALTER TABLE versions CONVERT TO CHARACTER SET utf8;
mysql> show create table versions;
| Table    | Create Table                                                                                                                                                                                                                                                                                                                                                                                                                         
|
| versions | CREATE TABLE `versions` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `item_type` varchar(255) NOT NULL,
 `item_id` int(11) NOT NULL,
 `event` varchar(255) NOT NULL,
 `whodunnit` varchar(255) DEFAULT NULL,
 `object` mediumtext,
 `created_at` datetime DEFAULT NULL,
 PRIMARY KEY (`id`),
 KEY `index_versions_on_item_type_and_item_id` (`item_type`,`item_id`)
) ENGINE=InnoDB AUTO_INCREMENT=142423 DEFAULT CHARSET=utf8 |

1 row in set (0.01 sec)

This solved the problem. Therefore, we created the migration VersionTableSupportUTF8 to change version's characterset.

If we want to fix this in all tables, we can do following for each database via script or add migration for each table.

SELECT CONCAT('ALTER TABLE ', TABLE_NAME, ' CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;') FROM information_schema.TABLES WHERE 
TABLE_SCHEMA = 'expertiza_development';

Testing Details

RSpec

We found no existing test cases for the above two problem statements. So we are to define test cases which generate the list of unwanted Chinese non-UTF8 characters and also HTML tags in the views and then refactor our code so that it automatically removes them without any failing of functionalities.

References

  1. Link to the videos to see the steps followed for testing and resolving the problems: [1]