CSC/ECE 517 Fall 2024 - G2402 Refactor Graphql API endpoint for repositories
Github Miner
GitHub Miner is a versatile tool designed to interact with GitHub's APIs, allowing users to query and retrieve valuable insights and detailed information about GitHub profiles, repositories, and activities. It supports a range of functions to facilitate data extraction, such as:
Tracking a user’s contributions over time to understand their activity levels and areas of focus. Compiling comments made by a user, which can shed light on their interactions, discussions, and contributions to collaborative work. Listing repositories owned by the user, providing a clear view of their projects, technical interests, and areas of expertise. Beyond these core functions, GitHub Miner includes several other methods aimed at delivering in-depth analytics from GitHub data, making it useful for developers, recruiters, and researchers who need to analyze GitHub activity. For more details, comprehensive documentation on the available methods and their usage can be found on the GitHub repository[1].
Project Description
This project implements a Flask-based GraphQL API that provides access to GitHub user contribution metrics. It serves as a bridge between client applications and the GitHub GraphQL API, offering a streamlined interface for querying user activity data.
- GraphQL Integration: Utilizes the flask-graphql library to seamlessly integrate GraphQL functionality into a Flask application
- GitHub API Interaction: Communicates with the GitHub GraphQL API to fetch user contribution data, including repository information and activity metrics
- Customizable Queries: Supports flexible queries allowing clients to request specific user data and metrics
- Authentication: Implements secure authentication using GitHub personal access tokens
- Scalable Architecture: Designed with modularity in mind, allowing for easy expansion of query types and addition of new features
Problem Statement
Phase 1
Get Familiar with the Code:
- Start by exploring the codebase to get a clear understanding of its structure and purpose. The project uses a Flask-based GraphQL API to pull data from GitHub repositories and user contributions, and the README is a helpful guide to the modules, including authentication and the query classes.
- To see the code in action, run the demo.py file—it’ll show you what kind of responses to expect and how the different parts work together.
- Refactor the queries to improve modularity and efficiency. Use the constants module to handle field names and query nodes—this will help make the code more maintainable and reduce duplication.
- Organize the queries based on data type (e.g., repository commits, contributors), which will make the code cleaner and easier to reuse across different GitHub API endpoints.
Phase 2
Create an intuitive frontend interface using React with TypeScript to display information from both GitHub and GitLab repositories. This user-friendly GUI will allow users to view repository data easily. A Flask backend will handle the data processing, providing APIs that retrieve and send the information to the frontend.
Key Objectives
API Endpoint Development:
- Implement a /graphql endpoint using Flask and flask-graphql and then configure the endpoint to handle POST requests containing GraphQL queries.
- Implement resolvers for each query type
- Develop a middleware to authenticate requests using GitHub personal access tokens
- Implement comprehensive error handling for API requests and responses
Testing and Validation:
- Conduct thorough testing (unit tests and integration tests) to verify the correctness and reliability of the implemented endpoints. Validate API functionality across different scenarios to ensure robustness and reliability.
- Develop integration tests to verify the interaction between Flask, GraphQL, and the GitHub API.
- Test the complete request-response cycle for each implemented query.
Project Components
This project focuses on retrieving and refining GitHub-specific data for a user’s repository information. The objective is to efficiently fetch a detailed array of repository-related data, such as:
- Repository contributors
- Contributions from each contributor
- Commit history and details for contributors
The goal is to present a comprehensive view of a user’s activities within GitHub, using the GitHub GraphQL API for optimized data extraction.
Frontend GUI (React + Typescript):
- Design a responsive and user-friendly interface to display GitHub repository information
- Implement interactive features, such as searching, sorting, and filtering by allowing users to navigate and manage the displayed data
- Focus on a clean and visually appealing design that enhances the user experience
Backend API (Flask):
- Develop Flask APIs to handle GraphQL queries to GitHub by retrieving essential repository and contribution information.
- Set up endpoints that correspond to specific queries (e.g., contributors, commit details) and ensuring a well-organized data flow.
- Integrate authentication measures using GitHub personal access tokens to secure access to sensitive information.
Workflow:
- The React frontend communicates with the Flask backend via structured APIs.
- Flask acts as the middle layer which will execute GraphQL queries to GitHub to obtain relevant data on repositories and user contributions.
- The retrieved information is processed and delivered to the frontend, where it is organized and displayed for easy user interaction.
Design Pattern and Best Practices
We mention the different design patterns and best practices we plan on using in this project:
- DRY Principle: We will create reusable components in React to build out user interface. This also happens to be one of the major advantages of React. We can create larger components in React by reusing some utility components.
- Observer Pattern: React hooks like 'useEffect' provide a way for us observe all the changes in the user changes in the interface and allows us to provide suitable interactions based on those changes.
- HOC Pattern: We will use higher order components in React to create layouts for all our pages. We will also use them to provide routing for all the different pages in the app.
- Mediator Pattern: We will change the data coming from the backend application to a format that is suited for displaying in React. We will use the mediator pattern for this.
- Provider Pattern: We will pass down props from one component to another down the chain without explicitly passing it down, using the provider pattern
Components
Components are the basic building blocks in React. The section specifies the different components that we plan to create for this project.
- Repository Statistics Page: This will be the main page that will act as a home page for all the repository related statistics. These statistics will be based on the commit history of the repository.
- Contributor Commit Section: This section will contain will contain all the commit related data of a specific author in a specific repository.
- Contributor list Section: This section will contain the list of all the authors of the authors contributing to a particular repository. We will also show author-related data in this section. We might need more components to show all the author details.
- Contribution Section: This section will show all the contribution related information by a particular contributor to a particular repository.
Implementation
Refactoring for Improved Code Quality
We have made significant improvements to our codebase to enhance readability, maintainability, and consistency: Constant Usage for Field and Node Names We have introduced a new constants module to replace string literals in our query structure and data extraction processes. This change brings several benefits:
- Improved code consistency
- Reduced risk of typos
- Easier maintenance and updates
Streamlined Query Arguments
We have optimized our query structure by parameterizing the arguments. This enhancement allows for more flexible and reusable queries.
- Enhanced Query Flexibility: Queries are more adaptable, allowing parameter changes without altering the underlying structure.
- Improved Code Maintainability: A centralized query structure minimizes code duplication.
- Better Performance Optimization: Consistent query structures enable efficient caching and reuse of results
Front end
The interface for displaying GitHub repository information should be responsive and visually appealing, focusing on a minimalist and user-friendly design. At the top, a placeholder input field prompts users to enter the owner and repository name, keeping the layout clean and straightforward. Once the repository is loaded, the page displays information in an organized and interactive manner, with sortable columns (e.g., stars, forks, and issues) and filter options for attributes like programming language or activity status. Repository details are presented in expandable cards or rows, allowing users to view additional information without cluttering the interface.



/repositories
- repository_commits


The refactored code improves readability, maintainability, and scalability by replacing hardcoded strings with descriptive constants (e.g., NODE_REPOSITORY instead of "repository") and encapsulating query arguments as reusable variables (ARG_OWNER, ARG_NAME). This abstraction ensures consistency and minimizes the risk of errors when updating fields or arguments. If the GraphQL schema changes, such as renaming "repository", only the constant NODE_REPOSITORY needs to be updated in the refactored code, ensuring minimal disruption across the codebase.
* New Constants module and replaced all string literal in the query structure and data extraction with these constants. It includes node names, field names, and argument names. * Modified the `__init__` method to accept 'owner', 'repo_name', and 'pg_size' as parameters as it makes the class initialization more explicit.
- repository_contributors_contribution


The refactored code improves modularity, readability, and maintainability by replacing hardcoded strings with constants. This enhances clarity and reduces the likelihood of errors. Additionally, it introduces structured comments to clarify the purpose of each query component. For example, in the refactored code, NODE_AUTHOR and FIELD_NAME are used to represent the author and their name, respectively, making the logic more understandable compared to the actual code, where "author" and "name" are directly hardcoded. Such abstraction makes updates or debugging more straightforward, as changes to query fields require edits only in constants rather than throughout the code.
* Removed the `RepositoryContributorsContribution` class and its associated methods. Focuses now on `RepositoryContributors` and `extract_unique_author` * This uses new constants for accessing dictionary keys as it makes more consistent
- repository_contributors


The refactored code introduces constants for field and node names, replacing hardcoded strings to enhance maintainability and readability. This abstraction reduces potential errors, centralizes definitions, and promotes consistency across the codebase. Additionally, the refactored version improves semantic clarity by utilizing get() for safer dictionary key access, mitigating risks of runtime errors. In the refactored code, constants such as NODE_REPOSITORY, FIELD_NAME, and FIELD_LOGIN replace hardcoded strings like "repository", "name", and "login". This makes updates easier and reduces errors.
* Introduced a new constants module and replaced all string literal in the query structure and data extraction with these constants. * Modified extract_unique_author to use the safer 'get()' method while accessing dictionary keys.
/Constants
Field Constants

Node Constants

Argument Constants

/tests
- Testing with pytest
Test Discovery: pytest automatically discovers tests in the specified files (those starting with test_ or ending with _test.py) and runs them.
- Fixtures:
Fixtures are defined using the @pytest.fixture decorator. They provide reusable setup code for tests.In this case, mock_raw_data_single_author and mock_raw_data_multiple_authors provide simulated responses from a GitHub API for single and multiple authors.
- Assertions:
Each test function uses assert statements to check if the actual outputs match the expected results, helping verify the correctness of the code.
- Execution:
Run the tests with the command *python -m pytest -v [file name]* in the terminal, and it will show which tests passed or failed.
Testing RepositoryContributors
- test_repository_commits
- test_repository_contributors_contribution

Query Structure Test:
The test_repository_contributors_query_structure checks if the generated GraphQL query string from the RepositoryContributors class matches the expected query template by stripping whitespace and comparing the two strings.
Unique Author Extraction: The test_single_author function verifies that the extract_unique_author method correctly extracts the name and login of a single author from the provided mock data. The test_multiple_authors function checks that multiple authors are accurately extracted, ensuring that their names and logins are included in the results. This structured approach ensures that both the query generation and author extraction functionalities of the RepositoryContributors class are tested effectively.
- test_repository_contributor

Work Flow Diagram
OSS project phase 1:

OSS project phase 2 after the changes were made:

Team
Mentor:
Jialin Cui
Team Members:
Manan Manojkumar Tiwari
Harsh Shelar
Russel Andrew Lobo
Github Repository
original: https://github.ncsu.edu/jcui9/GitHub_Miner
forked: https://github.ncsu.edu/hshelar/GitHub_Miner/tree/refactor_repository