CSC/ECE 517 Spring 2024 - G2402 Implement REST client, REST API, and Graphql API endpoint for repositories - Part 2: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
No edit summary
 
(20 intermediate revisions by 4 users not shown)
Line 4: Line 4:
<br>
<br>
Subsequently, in phase two of the project, each API endpoint established in the initial phase will be complemented with a React-based frontend. This frontend will be specifically designed to visualize and present the GitHub data retrieved through these endpoints in an intuitive user interface. The goal is to seamlessly integrate backend functionality with frontend design to enhance user interaction and data presentation.
Subsequently, in phase two of the project, each API endpoint established in the initial phase will be complemented with a React-based frontend. This frontend will be specifically designed to visualize and present the GitHub data retrieved through these endpoints in an intuitive user interface. The goal is to seamlessly integrate backend functionality with frontend design to enhance user interaction and data presentation.
==Demo and Source code links==
[https://www.youtube.com/watch?v=5g_WtZqLE9A YouTube Link]
[https://github.ncsu.edu/slimbur/GH_Miner Source Code]


==Below is the MVC architecture of our GitHub project:==
==Below is the MVC architecture of our GitHub project:==
Line 14: Line 20:
The plan includes the following key steps:
The plan includes the following key steps:


1. <b>Setup React Environment</b>:  
1. <b>Set up React Environment</b>:  
<ul>
<ul>
<li>Selecting Tooling and Libraries: Choose an appropriate React setup such as Create React App for rapid development or Next.js for server-side rendering and enhanced performance.</li>
<li>Selecting Tooling and Libraries: Choose an appropriate React setup such as Create React App for rapid development or Next.js for server-side rendering and enhanced performance.</li>
Line 56: Line 62:
7. <b>Deploy and Document</b>: Finally, we will deploy the React frontend to a hosting platform, ensuring it is accessible and easily usable by end-users. Comprehensive documentation will be provided, including instructions for setup, usage, and any relevant information for future maintenance and development.
7. <b>Deploy and Document</b>: Finally, we will deploy the React frontend to a hosting platform, ensuring it is accessible and easily usable by end-users. Comprehensive documentation will be provided, including instructions for setup, usage, and any relevant information for future maintenance and development.
<br>
<br>
==Frontend==
==Frontend==
As shown below, we plan to display contributors' individual contribution and commit count. We will be implementing the UI using ReactJS
We've made significant progress in our project by implementing a frontend solution using ReactJS and Vite, which complements the endpoints developed in Part 1 seamlessly. With this frontend, we've focused on creating a user-friendly interface that enhances the overall user experience. Our primary goal is to present the data retrieved from the endpoints in a clear and intuitive manner.
 
To achieve this, we've employed various techniques such as parsing the JSON responses received from the endpoints. This parsing allows us to extract the relevant information efficiently. We've structured the presentation of this data using a combination of Cards and Tables.
 
The use of Cards enables us to showcase key information in a visually appealing manner, providing users with quick insights at a glance. On the other hand, Tables are utilized for displaying more detailed information in a structured format, enhancing readability and facilitating comparisons.
 
The landing page of GitHub Miner serves as the gateway to the application, providing users with a simple yet inviting interface to access its features. Positioned prominently at the center of the screen, the login button offers users a straightforward entry point into the application. To access GitHub Miner, simply click on the login button on the landing page. Users will be directed to the authentication page, where they can log in with their GitHub credentials.
 
<br>
[[File:LoginPage.png|900px|Image : 900 pixels]]
<br>
<br>
[[File:GitHub contributors 1.png|900px|Image : 900 pixels]]
 
On clicking the login button, the user will get redirected to the GitHub OAuth Login Page as shown below:
<br>
<br>
Commits page with number of commits
[[File:OAuthLogin.png|900px|Image : 900 pixels]]
<br>
<br>
[[File:Linux_commits.png|900px|Image : 900 pixels]]


Upon login, the user will be redirected to the Home Page. The Navbar at the top allows the user to try out various functionalities.
<br>
[[File:HomePage.png|900px|Image : 900 pixels]]
<br>
Upon clicking the "Profile" tab, we retrieve the user's information using an endpoint that we had developed in the previous part. The JSON response is parsed and information such as the User's name, their username, their email address, user's profile picture etc. are displayed in the form of a Card as shown below:
<br>
[[File:OODD4.png|900px|Image : 900 pixels]]
<br>
The Repo Contributors tab in the GitHub Miner app allows users to explore the contributors of a selected repository. The app fetches unique repository contributors, parses the JSON response, and presents each contributor's information in the form of cards. The user can give the name of the repository and the username of the owner of the repository as input and the app will fetch the data. If the user enters some incorrect information, an error message will be displayed saying "No data found."
[[File:OODD5.png|900px|Image : 900 pixels]]
<br>
The Repo Commits tab allows the user to get a list of all the commits and information related to each commit such as the timestamp, the additions/deletions, the commit message and the number of files changed. The JSON data obtained from the backend endpoint is being parsed and shown in tabular format. We have also given the option to download the table as PDF.
[[File:OODD6.png|900px|Image : 900 pixels]]


==GraphQL Endpoints==
==GraphQL Endpoints==
Line 287: Line 323:
To ensure reliability and robustness, comprehensive testing strategies, including unit tests and integration tests, have been implemented. These tests validate the functionality of the API endpoints across different scenarios, ensuring the system's correctness and stability.
To ensure reliability and robustness, comprehensive testing strategies, including unit tests and integration tests, have been implemented. These tests validate the functionality of the API endpoints across different scenarios, ensuring the system's correctness and stability.


== Implementation Details ==
=== Implementation ===
In the API, we have 2 separate endpoints to retrieve the same data: one for REST and the other using GraphQL queries. To manage this, we have made use of Flask Blueprints where all the "api/graphql" requests get routed to the graphql variants and the "api/rest" requests get routed to the REST variants.
In the API, we have 2 separate endpoints to retrieve the same data: one for REST and the other using GraphQL queries. To manage this, we have made use of Flask Blueprints where all the "api/graphql" requests get routed to the graphql variants and the "api/rest" requests get routed to the REST variants.


Line 297: Line 330:
== Testing ==
== Testing ==


To ensure the reliability and robustness of the GitHub Miner, comprehensive testing strategies, including unit tests and integration tests, have been implemented. These tests validate the functionality of the API endpoints across different scenarios, ensuring the system's correctness and stability. The testing suite covers the following areas:
<p>To ensure the reliability and robustness of the GitHub Miner, comprehensive testing strategies, including unit tests and integration tests, have been implemented. These tests validate the functionality of the API endpoints across different scenarios, ensuring the system's correctness and stability. The testing suite covers the following areas:</p>
 
<b>1. Unit Tests:</b>
 
<p>Unit tests have been meticulously developed to evaluate the individual components of the GitHub Miner in isolation. Each test is designed to ensure that every component functions correctly by itself without dependencies, thereby identifying issues in the smallest units of code. This section delves into both the GitHub GraphQL integration and the specific queries used for data retrieval, ensuring comprehensive coverage.</p>
 
<b>2. GitHub GraphQL Tests:</b>
<ul>
  <li><strong>test_authentication.py</strong>: Tests the robustness of the authentication system by verifying token handling and refresh mechanisms, critical for maintaining secure access to GitHub data.</li>
  <li><strong>test_client.py</strong>: Assesses the initialization and operational integrity of the GraphQL client, including its error handling capabilities and the accuracy of its response parsing, ensuring the client can reliably communicate with the GitHub API.</li>
  <li><strong>test_query.py</strong>: Evaluates the formation and execution of GraphQL queries and their error handling, confirming that queries are constructed correctly and yield expected results or manage errors gracefully.</li>
</ul>
 
<b>3. Queries Tests:</b>
<ul>
  <li><strong>Comments</strong>: Confirms the precise retrieval and parsing of comments from GitHub repositories, ensuring data integrity and accuracy for user interactions displayed on the platform.</li>
  <li><strong>Contributions</strong>: Verifies the accuracy and completeness of data regarding user contributions such as commits and pull requests, which are pivotal for analytics and tracking user engagement.</li>
  <li><strong>Costs</strong>: Tests the system's ability to calculate API usage costs efficiently, which is essential for managing rate limits and optimizing resource consumption.</li>
  <li><strong>Profiles</strong>: Ensures thorough and accurate fetching and parsing of GitHub user profiles, critical for displaying detailed user data and analytics.</li>
  <li><strong>Repositories</strong>: Validates the accurate retrieval of comprehensive repository information, essential for repository management and data display.</li>
  <li><strong>Time Range Contributions</strong>: Assesses the accuracy of fetching and aggregating contributions within specified time ranges, vital for historical data analysis and trend assessment.</li>
</ul>
 
<b>4. Integration Tests:</b>
 
<p>Integration tests are designed to verify the effective cooperation between various components of the GitHub Miner. These tests ensure that components work seamlessly together, which is essential for the smooth operation of the system. This testing phase is crucial for assessing the interaction between the frontend and backend components, ensuring data flows correctly through the application and that user actions trigger the appropriate responses in real-time.</p>
 
<ul>
  <li><strong>TestClient</strong>: Tests the GitHub client's ability to initialize correctly, generate appropriate headers, handle retries, execute queries efficiently, and manage paginated responses, ensuring robustness and reliability in accessing GitHub data.</li>
  <li><strong>Authentication Tests</strong>: Validates the generation of correct authorization headers and proper handling of personal access tokens, crucial for maintaining secure and authorized access to the GitHub API.</li>
</ul>
 


=== Unit Tests ===
<b>5. Testing Examples:</b>


Unit tests have been developed to test individual components in isolation, ensuring that each part functions correctly on its own. This includes testing the initialization, argument formatting, field formatting, string representation, and equality of query nodes, as well as the correct behavior of the query builders and authenticators. Examples of unit tests include:
<p>Here, we provide specific examples of test cases implemented in our project. You can see how we approach testing for both unit and integration levels, focusing on key functionalities and integration points.</p>


- **TestQueryNode**: Validates the initialization, argument formatting, and field formatting of query nodes.
<pre><code>
import pytest
import requests_mock
from unittest.mock import MagicMock
from datetime import datetime, timezone
from requests.exceptions import Timeout
from backend.app.services.github_query.github_graphql.client import Client, InvalidAuthenticationError, QueryFailedException
from backend.app.services.github_query.github_graphql.authentication import PersonalAccessTokenAuthenticator
from backend.app.services.github_query.github_graphql.query import Query, PaginatedQuery
from backend.app.services.github_query.github_rest.client import RESTClient


- **TestQuery**: Ensures correct query initialization, argument substitution, and time formatting.


- **TestQueryNodePaginator**: Tests the functionality of the paginator, including initialization, updating, and resetting.
@pytest.fixture
def valid_token():
    return "valid_token_123"


- **TestPaginatedQuery**: Validates the initialization and execution of paginated queries.
@pytest.fixture
def authenticator(valid_token):
    return PersonalAccessTokenAuthenticator(token=valid_token)


=== Integration Tests ===
@pytest.fixture
def github_client(authenticator):
    return Client(authenticator=authenticator)


Integration tests verify the interaction between different components of the system, ensuring that they work together as expected. This includes testing the behavior of the client when making actual requests to the GitHub API, handling authentication, executing queries, and processing paginated responses. Examples of integration tests include:
class TestClient:
    def test_client_without_authenticator(self):
        """Test that client raises error when no authenticator is provided"""
        with pytest.raises(InvalidAuthenticationError):
            Client()  # No authenticator provided


- **TestClient**: Tests the GitHub client's initialization, header generation, retry logic, query execution, and handling of paginated queries.
    def test_client_initialization(self, github_client):
- **Authentication Tests**: Validate the generation of correct authorization headers and the handling of personal access tokens.
        """Test that the client is correctly initialized with the given authenticator"""
        assert github_client._authenticator is not None, "Authenticator should be set."


These tests collectively ensure that the GitHub Miner operates reliably, providing accurate and timely data from GitHub's APIs. By covering a wide range of scenarios, from successful queries to error handling and rate limiting, the tests ensure that the application can be used confidently in production environments.
    def test_client_base_path(self, github_client):
        """Test that the base path is correctly constructed"""
        assert "api.github.com" in github_client._base_path(), "Base path should include the host."
   
    def test_generate_headers(self, github_client, authenticator):
        """Test that headers are correctly generated including authorization and additional headers."""
        additional_headers = {"Custom-Header": "CustomValue"}
        expected_headers = authenticator.get_authorization_header()
        expected_headers.update(additional_headers)


==Live Demo & Source Code==
        assert github_client._generate_headers(**additional_headers) == expected_headers, "Headers should include both authenticator and additional headers."
http://152.7.177.239:5000/auth/login Link]
(Please login using your personal GitHub accounts, not NCSU accounts)


[https://github.ncsu.edu/slimbur/GH_Miner Source Code]
    def test_retry_success(self, github_client, requests_mock):
        """Test that retry_request succeeds after a retry."""
        # Mock the request to timeout once then succeed
        requests_mock.register_uri('POST', github_client._base_path(), [
            {'exc': Timeout},
            {'json': {'data': 'success'}, 'status_code': 200}
        ])
       
        response = github_client._retry_request(2, 1, "query { viewer { login }}", {})
        assert response.json() == {'data': 'success'}, "Should succeed on the second attempt."
 
    def test_retry_timeout(self, github_client, requests_mock):
        """Test that retry_request gives up after attempts are exhausted."""
        # Mock the request to timeout
        requests_mock.register_uri('POST', github_client._base_path(), [
            {'exc': Timeout},
            {'exc': Timeout}
        ])
       
        with pytest.raises(Timeout):
            github_client._retry_request(2, 1, "query { viewer { login }}", {})
 
    def test_execute_success(self, github_client, requests_mock):
        """Test successful execution of a query."""
        # Mock the rate limit pre-check and the actual query execution
        requests_mock.post(github_client._base_path(), [
            {'json': {"data": {"rateLimit": {"cost": 1, "remaining": 5000, "resetAt": "2021-01-01T00:00:00Z"}}}, 'status_code': 200},
            {'json': {"data": "query success"}, 'status_code': 200}
        ])
        response = github_client._execute("query { viewer { login }}", {})
        assert response == "query success", "Execute should return success on valid response."
 
    def test_execute_rate_limit_exceeded(self, github_client, requests_mock):
        """Test execution of a query leading to waiting for rate limit reset."""
        # Set specific values for cost, remaining, and resetAt
        mock_cost = 10
        mock_remaining = 14  # Ensure remaining - 5 < mock_cost
        # mock_reset_at = datetime.utcnow().strftime('%Y-%m-%dT%H:%M:%SZ')
        mock_reset_at = datetime.now(timezone.utc).strftime('%Y-%m-%dT%H:%M:%SZ')
 
        # Mock rate limit response
        mock_rate_limit_response = {
            "data": {
                "rateLimit": {
                    "cost": mock_cost,
                    "remaining": mock_remaining,
                    "resetAt": mock_reset_at
                }
            }
        }
 
        # Setup requests_mock to simulate the rate limit response and any subsequent requests
        base_path = github_client._base_path()
        requests_mock.post(base_path, [
            {'json': mock_rate_limit_response, 'status_code': 200},
            {'json': {"data": "query success"}, 'status_code': 200}
        ])
        response = github_client._execute("query { viewer { login }}", {})
        assert response == "query success", "Execute should return success on valid response."
 
    def test_execute_query_failed(self, github_client, requests_mock):
        """Test execution of a query leading to QueryFailedException with retries."""
        # Mock a failed query response for each retry attempt
        requests_mock.post(github_client._base_path(), [
            {'json': {"data": {"rateLimit": {"cost": 1, "remaining": 5000, "resetAt": "2021-01-01T00:00:00Z"}}}, 'status_code': 200},
            {"json": {"error": "bad request"}, "status_code": 400}
        ])
        # Expecting QueryFailedException after all retries have been exhausted
        with pytest.raises(QueryFailedException):
            github_client._execute("query { viewer { login }}", {})
   
    def test_execution_generator(self, github_client):
        """Test that _execution_generator correctly handles paginated responses."""
        # Setup a mock paginated query
        query = MagicMock()
        query.paginator.has_next.side_effect = [True, True, False]  # Simulate 2 pages of results, then stop
        query.path = []  # Example path, adjust based on your actual usage
 
        # Mock the _execute method to return simulated page results
        github_client._execute = MagicMock()
        github_client._execute.side_effect = [
            {"pageInfo": {"endCursor": "cursor1", "hasNextPage": True}, "nodes": [{"edges": "data1"}]},
            {"pageInfo": {"endCursor": "cursor2", "hasNextPage": False}, "nodes": [{"edges": "data2"}]}
        ]
 
        # Mock the update_paginator method to reflect the changing state of pagination
        query.paginator.update_paginator = MagicMock()
 
        # Collect all results from the generator
        results = list(github_client._execution_generator(query, {}))
 
        # Assertions
        assert len(results) == 2, "Should yield two results for the two pages"
        assert results[0]['nodes'][0]['edges'] == "data1", "First result should match first mocked response"
        assert results[1]['nodes'][0]['edges'] == "data2", "Second result should match second mocked response"
 
        # Ensure update_paginator was called correctly
        assert query.paginator.update_paginator.call_count == 2, "update_paginator should be called twice, once per page"
        query.paginator.update_paginator.assert_called_with(False, "cursor2")  # Last call should reflect the end of pagination
 
    def test_client_execute_success(self, github_client, requests_mock):
        """Test successful execution of a query"""
        requests_mock.post(github_client._base_path(), [
            {'json': {"data": {"rateLimit": {"cost": 1, "remaining": 5000, "resetAt": "2021-01-01T00:00:00Z"}}}, 'status_code': 200},
            {'json': {"data": "query success"}, 'status_code': 200}
        ])
        response = github_client.execute(Query("query { viewer { login }}"), {})
        assert response == "query success", "Execute should return success on valid response."
 
    def test_client_execute_failed(self, github_client, requests_mock):
        """Test that a failed query raises QueryFailedException"""
        requests_mock.post(github_client._base_path(), [
            {'json': {"data": {"rateLimit": {"cost": 1, "remaining": 5000, "resetAt": "2021-01-01T00:00:00Z"}}}, 'status_code': 200},
            {"json": {"error": "bad request"}, "status_code": 400}
        ])
        with pytest.raises(QueryFailedException) as excinfo:
            github_client.execute(Query("query { viewer { login }}"), {})
        assert "Query failed with code" in str(excinfo.value), "QueryFailedException should contain the right error message."
</code></pre>


<p>These tests collectively ensure that the GitHub Miner operates reliably, providing accurate and timely data from GitHub's APIs. By covering a wide range of scenarios, from successful queries to error handling and rate limiting, the tests ensure that the application can be used confidently in production environments.</p>


== Team Members ==
== Team Members ==

Latest revision as of 00:37, 3 May 2024

Project Objective

This project aims to create a robust tool enabling users to access and analyze their GitHub metrics conveniently. The phase one objective was to develop API endpoints leveraging GitHub GraphQL and GitHub REST queries within a Python Flask framework. This involves integrating Flask with existing code to expose GraphQL queries as API endpoints, building a REST client, and establishing REST endpoints for streamlined data retrieval.
Subsequently, in phase two of the project, each API endpoint established in the initial phase will be complemented with a React-based frontend. This frontend will be specifically designed to visualize and present the GitHub data retrieved through these endpoints in an intuitive user interface. The goal is to seamlessly integrate backend functionality with frontend design to enhance user interaction and data presentation.

Demo and Source code links

YouTube Link

Source Code


Below is the MVC architecture of our GitHub project:


Image : 900 pixels

Plan of Work

In the second phase of the project, we will focus on developing a robust and user-friendly frontend using React. The frontend will serve as an intuitive interface for displaying and interacting with the data retrieved from the GitHub API endpoints implemented in the first phase.

The plan includes the following key steps:

1. Set up React Environment:

  • Selecting Tooling and Libraries: Choose an appropriate React setup such as Create React App for rapid development or Next.js for server-side rendering and enhanced performance.
  • Dependency Management: Use npm or Yarn to manage project dependencies, including React, ReactDOM, and additional libraries for state management (Redux, Recoil), routing (React Router), and styling (styled-components, Tailwind CSS).
  • Project Structure: Organize the project structure with components, containers, services, and utility folders to maintain a scalable and maintainable codebase.

2. Design UI Components:

  • Translate the data from the API endpoints into interactive React components, focusing on usability and accessibility standards.
  • Component Composition: Create reusable and composable UI components following component-based design principles, leveraging hooks (useState, useEffect) for managing state and side effects.
  • Styling: Implement consistent styling using CSS-in-JS solutions like styled-components or CSS modules, ensuring modular and scoped styles for each component.

3. Integrate with API Endpoints:

  • HTTP Requests: Use Axios or Fetch API to handle asynchronous data fetching from the backend API endpoints implemented in phase one.
  • State Management: Implement state management (e.g., Redux, Context API) to store and manage fetched data centrally across components, ensuring efficient data flow and reactivity.
  • Optimizing Performance: Implement memoization techniques (e.g., useMemo, useCallback) to optimize re-renders and reduce unnecessary API calls.

4. Implement Pagination:

  • Pagination Logic: Develop pagination logic to manage and display large datasets in manageable chunks, adhering to RESTful API standards for page-based navigation.
  • UI Components: Create pagination controls (e.g., buttons, page indicators) to allow users to navigate through paginated data smoothly, updating the UI dynamically based on user interactions.

5. Enhance User Experience:

  • We will focus on improving the user experience by adding features such as search functionality, filtering, sorting, and other relevant enhancements based on the data being displayed. These enhancements will empower users to interact with the data more effectively and derive valuable insights.
  • Interactive Visualizations: Integrate interactive charts (e.g., using Chart.js, D3.js) to visually represent data trends and insights, enhancing user engagement and understanding.

6. Test and Debug:

  • Unit Testing: Write unit tests using Jest and React Testing Library to ensure individual components render correctly and exhibit expected behavior based on props and state.
  • Integration Testing: Conduct integration tests to validate API integrations, data flow, and interactions between frontend components.
  • Cross-Browser Compatibility: Perform cross-browser testing (e.g., Chrome, Firefox, Safari) and responsive testing to ensure consistent rendering and functionality across different browsers and devices.

7. Deploy and Document: Finally, we will deploy the React frontend to a hosting platform, ensuring it is accessible and easily usable by end-users. Comprehensive documentation will be provided, including instructions for setup, usage, and any relevant information for future maintenance and development.

Frontend

We've made significant progress in our project by implementing a frontend solution using ReactJS and Vite, which complements the endpoints developed in Part 1 seamlessly. With this frontend, we've focused on creating a user-friendly interface that enhances the overall user experience. Our primary goal is to present the data retrieved from the endpoints in a clear and intuitive manner.

To achieve this, we've employed various techniques such as parsing the JSON responses received from the endpoints. This parsing allows us to extract the relevant information efficiently. We've structured the presentation of this data using a combination of Cards and Tables.

The use of Cards enables us to showcase key information in a visually appealing manner, providing users with quick insights at a glance. On the other hand, Tables are utilized for displaying more detailed information in a structured format, enhancing readability and facilitating comparisons.

The landing page of GitHub Miner serves as the gateway to the application, providing users with a simple yet inviting interface to access its features. Positioned prominently at the center of the screen, the login button offers users a straightforward entry point into the application. To access GitHub Miner, simply click on the login button on the landing page. Users will be directed to the authentication page, where they can log in with their GitHub credentials.


Image : 900 pixels

On clicking the login button, the user will get redirected to the GitHub OAuth Login Page as shown below:
Image : 900 pixels

Upon login, the user will be redirected to the Home Page. The Navbar at the top allows the user to try out various functionalities.


Image : 900 pixels

Upon clicking the "Profile" tab, we retrieve the user's information using an endpoint that we had developed in the previous part. The JSON response is parsed and information such as the User's name, their username, their email address, user's profile picture etc. are displayed in the form of a Card as shown below:


Image : 900 pixels

The Repo Contributors tab in the GitHub Miner app allows users to explore the contributors of a selected repository. The app fetches unique repository contributors, parses the JSON response, and presents each contributor's information in the form of cards. The user can give the name of the repository and the username of the owner of the repository as input and the app will fetch the data. If the user enters some incorrect information, an error message will be displayed saying "No data found."

Image : 900 pixels

The Repo Commits tab allows the user to get a list of all the commits and information related to each commit such as the timestamp, the additions/deletions, the commit message and the number of files changed. The JSON data obtained from the backend endpoint is being parsed and shown in tabular format. We have also given the option to download the table as PDF.

Image : 900 pixels

GraphQL Endpoints

Get current user login:

/api/graphql/current-user-login

Sample Output:

{
  "viewer": {
    "login": "<your-username>"
  }
}


Get specific user login:

/api/graphql/user-login/<username>

Get list of all commits in a repo:

/api/graphql/specific-user-commits/<owner>/<repo_name>

Sample Output: {

   "repository": {
     "defaultBranchRef": {
       "target": {
         "history": {
           "nodes": [
             {
               "additions": 0,
               "author": {
                 "email": "61797592+Atharva7007@users.noreply.github.com",
                 "name": "Atharva Pansare",
                 "user": {
                   "login": "Atharva7007"
                 }
               },
               "authoredDate": "2020-04-03T09:30:17Z",
               "changedFilesIfAvailable": 1,
               "deletions": 0,
               "message": "Add files via upload",
               "parents": {
                 "totalCount": 1
               }
             }
           ],
           "pageInfo": {
             "endCursor": "98ba34a6c62ff6fe7c4d4de5c342a194f72d66e4 0",
             "hasNextPage": true
           },
           "totalCount": 6
         }
       }
     }
   }
 },

Get details of all contributors in a repo:

/api/graphql/repository-contributors/<owner>/<repo_name>

Sample Output: {

   "repository": {
     "defaultBranchRef": {
       "target": {
         "history": {
           "nodes": [
             {
               "author": {
                 "email": "61797592+Atharva7007@users.noreply.github.com",
                 "name": "Atharva Pansare",
                 "user": {
                   "login": "Atharva7007"
                 }
               }
             }
           ],
           "pageInfo": {
             "endCursor": "98ba34a6c62ff6fe7c4d4de5c342a194f72d66e4 0",
             "hasNextPage": true
           },
           "totalCount": 6
         }
       }
     }
   }
 }

REST API Endpoints

Get current user login:

/api/rest/current-user-login-rest

This endpoint retrieves the login information of the currently authenticated user. It fetches the user's profile data from the GitHub API and returns details such as their username, avatar URL, and other relevant information.

Sample Output:

{
  "avatar_url": "https://avatars.githubusercontent.com/u/61797592?v=4",
  "bio": null,
  "blog": "",
  "company": null,
  "created_at": "2020-03-04T16:49:06Z",
  "email": null,
  "events_url": "https://api.github.com/users/Atharva7007/events{/privacy}",
  "followers": 1,
  "followers_url": "https://api.github.com/users/Atharva7007/followers",
  "following": 3,
  "following_url": "https://api.github.com/users/Atharva7007/following{/other_user}",
  "gists_url": "https://api.github.com/users/Atharva7007/gists{/gist_id}",
  "gravatar_id": "",
  "hireable": null,
  "html_url": "https://github.com/Atharva7007",
  "id": 61797592,
  "location": null,
  "login": "Atharva7007",
  "name": "Atharva Pansare",
  "node_id": "MDQ6VXNlcjYxNzk3NTky",
  "organizations_url": "https://api.github.com/users/Atharva7007/orgs",
  "public_gists": 0,
  "public_repos": 11,
  "received_events_url": "https://api.github.com/users/Atharva7007/received_events",
  "repos_url": "https://api.github.com/users/Atharva7007/repos",
  "site_admin": false,
  "starred_url": "https://api.github.com/users/Atharva7007/starred{/owner}{/repo}",
  "subscriptions_url": "https://api.github.com/users/Atharva7007/subscriptions",
  "twitter_username": null,
  "type": "User",
  "updated_at": "2024-03-14T19:03:46Z",
  "url": "https://api.github.com/users/Atharva7007"
}


Get list of all commits in a repo:

/api/rest/specific-user-commits/<owner>/<repo_name>

This endpoint retrieves a list of all commits made to a specific repository. It takes the repository owner and repository name as parameters. The endpoint fetches the commit history from the GitHub API and returns details about each commit, such as the author, commit message, and changes made.

Sample Output:

[
  {
    "repository": {
      "defaultBranchRef": {
        "target": {
          "history": {
            "nodes": [
              {
                "additions": 0,
                "author": {
                  "email": "61797592+Atharva7007@users.noreply.github.com",
                  "name": "Atharva Pansare",
                  "user": {
                    "login": "Atharva7007"
                  }
                },
                "authoredDate": "2020-04-03T09:30:17Z",
                "changedFilesIfAvailable": 1,
                "deletions": 0,
                "message": "Add files via upload",
                "parents": {
                  "totalCount": 1
                }
              }
            ],
            "pageInfo": {
              "endCursor": "98ba34a6c62ff6fe7c4d4de5c342a194f72d66e4 0",
              "hasNextPage": true
            },
            "totalCount": 6
          }
        }
      }
    }
  }
]


Get details of all contributors in a repo:

/api/rest/repository-contributors/<owner>/<repo_name>

This endpoint fetches the details of all contributors to a specific repository. It takes the repository owner and repository name as parameters. The endpoint retrieves the list of contributors from the GitHub API and returns information about each contributor, such as their username, avatar URL, and permissions within the repository.

Sample Output: [

 {
   "avatar_url": "https://avatars.githubusercontent.com/u/61797592?v=4",
   "events_url": "https://api.github.com/users/Atharva7007/events{/privacy}",
   "followers_url": "https://api.github.com/users/Atharva7007/followers",
   "following_url": "https://api.github.com/users/Atharva7007/following{/other_user}",
   "gists_url": "https://api.github.com/users/Atharva7007/gists{/gist_id}",
   "gravatar_id": "",
   "html_url": "https://github.com/Atharva7007",
   "id": 61797592,
   "login": "Atharva7007",
   "node_id": "MDQ6VXNlcjYxNzk3NTky",
   "organizations_url": "https://api.github.com/users/Atharva7007/orgs",
   "permissions": {
     "admin": true,
     "maintain": true,
     "pull": true,
     "push": true,
     "triage": true
   },
   "received_events_url": "https://api.github.com/users/Atharva7007/received_events",
   "repos_url": "https://api.github.com/users/Atharva7007/repos",
   "role_name": "admin",
   "site_admin": false,
   "starred_url": "https://api.github.com/users/Atharva7007/starred{/owner}{/repo}",
   "subscriptions_url": "https://api.github.com/users/Atharva7007/subscriptions",
   "type": "User",
   "url": "https://api.github.com/users/Atharva7007"
 }

]


Implementation Details

The GitHub Miner project consists of two main components: the backend API and the frontend user interface.

The backend API is implemented using Flask, a lightweight Python web framework. It exposes two sets of endpoints: one for GraphQL queries and another for REST API queries. Flask Blueprints are used to route requests to the appropriate endpoints based on the URL path.

The GraphQL implementation utilizes the `graphene` library, which provides a way to define GraphQL schemas and resolvers. The resolvers are responsible for fetching data from the GitHub API using the `pygithub` library.

The REST implementation uses the `requests` library to make HTTP requests to the GitHub API and process the responses.

To ensure reliability and robustness, comprehensive testing strategies, including unit tests and integration tests, have been implemented. These tests validate the functionality of the API endpoints across different scenarios, ensuring the system's correctness and stability.

In the API, we have 2 separate endpoints to retrieve the same data: one for REST and the other using GraphQL queries. To manage this, we have made use of Flask Blueprints where all the "api/graphql" requests get routed to the graphql variants and the "api/rest" requests get routed to the REST variants.

Design Patterns used

  1. The REST API Client /backend/app/services/github_query/github_rest/client.py is a Singleton.

Testing

To ensure the reliability and robustness of the GitHub Miner, comprehensive testing strategies, including unit tests and integration tests, have been implemented. These tests validate the functionality of the API endpoints across different scenarios, ensuring the system's correctness and stability. The testing suite covers the following areas:

1. Unit Tests:

Unit tests have been meticulously developed to evaluate the individual components of the GitHub Miner in isolation. Each test is designed to ensure that every component functions correctly by itself without dependencies, thereby identifying issues in the smallest units of code. This section delves into both the GitHub GraphQL integration and the specific queries used for data retrieval, ensuring comprehensive coverage.

2. GitHub GraphQL Tests:

  • test_authentication.py: Tests the robustness of the authentication system by verifying token handling and refresh mechanisms, critical for maintaining secure access to GitHub data.
  • test_client.py: Assesses the initialization and operational integrity of the GraphQL client, including its error handling capabilities and the accuracy of its response parsing, ensuring the client can reliably communicate with the GitHub API.
  • test_query.py: Evaluates the formation and execution of GraphQL queries and their error handling, confirming that queries are constructed correctly and yield expected results or manage errors gracefully.

3. Queries Tests:

  • Comments: Confirms the precise retrieval and parsing of comments from GitHub repositories, ensuring data integrity and accuracy for user interactions displayed on the platform.
  • Contributions: Verifies the accuracy and completeness of data regarding user contributions such as commits and pull requests, which are pivotal for analytics and tracking user engagement.
  • Costs: Tests the system's ability to calculate API usage costs efficiently, which is essential for managing rate limits and optimizing resource consumption.
  • Profiles: Ensures thorough and accurate fetching and parsing of GitHub user profiles, critical for displaying detailed user data and analytics.
  • Repositories: Validates the accurate retrieval of comprehensive repository information, essential for repository management and data display.
  • Time Range Contributions: Assesses the accuracy of fetching and aggregating contributions within specified time ranges, vital for historical data analysis and trend assessment.

4. Integration Tests:

Integration tests are designed to verify the effective cooperation between various components of the GitHub Miner. These tests ensure that components work seamlessly together, which is essential for the smooth operation of the system. This testing phase is crucial for assessing the interaction between the frontend and backend components, ensuring data flows correctly through the application and that user actions trigger the appropriate responses in real-time.

  • TestClient: Tests the GitHub client's ability to initialize correctly, generate appropriate headers, handle retries, execute queries efficiently, and manage paginated responses, ensuring robustness and reliability in accessing GitHub data.
  • Authentication Tests: Validates the generation of correct authorization headers and proper handling of personal access tokens, crucial for maintaining secure and authorized access to the GitHub API.


5. Testing Examples:

Here, we provide specific examples of test cases implemented in our project. You can see how we approach testing for both unit and integration levels, focusing on key functionalities and integration points.

<code>
import pytest
import requests_mock
from unittest.mock import MagicMock
from datetime import datetime, timezone
from requests.exceptions import Timeout
from backend.app.services.github_query.github_graphql.client import Client, InvalidAuthenticationError, QueryFailedException
from backend.app.services.github_query.github_graphql.authentication import PersonalAccessTokenAuthenticator 
from backend.app.services.github_query.github_graphql.query import Query, PaginatedQuery
from backend.app.services.github_query.github_rest.client import RESTClient


@pytest.fixture
def valid_token():
    return "valid_token_123"

@pytest.fixture
def authenticator(valid_token):
    return PersonalAccessTokenAuthenticator(token=valid_token)

@pytest.fixture
def github_client(authenticator):
    return Client(authenticator=authenticator)

class TestClient:
    def test_client_without_authenticator(self):
        """Test that client raises error when no authenticator is provided"""
        with pytest.raises(InvalidAuthenticationError):
            Client()  # No authenticator provided

    def test_client_initialization(self, github_client):
        """Test that the client is correctly initialized with the given authenticator"""
        assert github_client._authenticator is not None, "Authenticator should be set."

    def test_client_base_path(self, github_client):
        """Test that the base path is correctly constructed"""
        assert "api.github.com" in github_client._base_path(), "Base path should include the host."
    
    def test_generate_headers(self, github_client, authenticator):
        """Test that headers are correctly generated including authorization and additional headers."""
        additional_headers = {"Custom-Header": "CustomValue"}
        expected_headers = authenticator.get_authorization_header()
        expected_headers.update(additional_headers)

        assert github_client._generate_headers(**additional_headers) == expected_headers, "Headers should include both authenticator and additional headers."

    def test_retry_success(self, github_client, requests_mock):
        """Test that retry_request succeeds after a retry."""
        # Mock the request to timeout once then succeed
        requests_mock.register_uri('POST', github_client._base_path(), [
            {'exc': Timeout},
            {'json': {'data': 'success'}, 'status_code': 200}
        ])
        
        response = github_client._retry_request(2, 1, "query { viewer { login }}", {})
        assert response.json() == {'data': 'success'}, "Should succeed on the second attempt."

    def test_retry_timeout(self, github_client, requests_mock):
        """Test that retry_request gives up after attempts are exhausted."""
        # Mock the request to timeout
        requests_mock.register_uri('POST', github_client._base_path(), [
            {'exc': Timeout},
            {'exc': Timeout}
        ])
        
        with pytest.raises(Timeout):
            github_client._retry_request(2, 1, "query { viewer { login }}", {})

    def test_execute_success(self, github_client, requests_mock):
        """Test successful execution of a query."""
        # Mock the rate limit pre-check and the actual query execution
        requests_mock.post(github_client._base_path(), [
            {'json': {"data": {"rateLimit": {"cost": 1, "remaining": 5000, "resetAt": "2021-01-01T00:00:00Z"}}}, 'status_code': 200},
            {'json': {"data": "query success"}, 'status_code': 200}
        ])
        response = github_client._execute("query { viewer { login }}", {})
        assert response == "query success", "Execute should return success on valid response."

    def test_execute_rate_limit_exceeded(self, github_client, requests_mock):
        """Test execution of a query leading to waiting for rate limit reset."""
        # Set specific values for cost, remaining, and resetAt
        mock_cost = 10
        mock_remaining = 14  # Ensure remaining - 5 < mock_cost
        # mock_reset_at = datetime.utcnow().strftime('%Y-%m-%dT%H:%M:%SZ')
        mock_reset_at = datetime.now(timezone.utc).strftime('%Y-%m-%dT%H:%M:%SZ')

        # Mock rate limit response
        mock_rate_limit_response = {
            "data": {
                "rateLimit": {
                    "cost": mock_cost,
                    "remaining": mock_remaining,
                    "resetAt": mock_reset_at
                }
            }
        }

        # Setup requests_mock to simulate the rate limit response and any subsequent requests
        base_path = github_client._base_path()
        requests_mock.post(base_path, [
            {'json': mock_rate_limit_response, 'status_code': 200},
            {'json': {"data": "query success"}, 'status_code': 200}
        ])
        response = github_client._execute("query { viewer { login }}", {})
        assert response == "query success", "Execute should return success on valid response."

    def test_execute_query_failed(self, github_client, requests_mock):
        """Test execution of a query leading to QueryFailedException with retries."""
        # Mock a failed query response for each retry attempt
        requests_mock.post(github_client._base_path(), [
            {'json': {"data": {"rateLimit": {"cost": 1, "remaining": 5000, "resetAt": "2021-01-01T00:00:00Z"}}}, 'status_code': 200},
            {"json": {"error": "bad request"}, "status_code": 400}
        ])
        # Expecting QueryFailedException after all retries have been exhausted
        with pytest.raises(QueryFailedException):
            github_client._execute("query { viewer { login }}", {})
    
    def test_execution_generator(self, github_client):
        """Test that _execution_generator correctly handles paginated responses."""
        # Setup a mock paginated query
        query = MagicMock()
        query.paginator.has_next.side_effect = [True, True, False]  # Simulate 2 pages of results, then stop
        query.path = []  # Example path, adjust based on your actual usage

        # Mock the _execute method to return simulated page results
        github_client._execute = MagicMock()
        github_client._execute.side_effect = [
            {"pageInfo": {"endCursor": "cursor1", "hasNextPage": True}, "nodes": [{"edges": "data1"}]},
            {"pageInfo": {"endCursor": "cursor2", "hasNextPage": False}, "nodes": [{"edges": "data2"}]}
        ]

        # Mock the update_paginator method to reflect the changing state of pagination
        query.paginator.update_paginator = MagicMock()

        # Collect all results from the generator
        results = list(github_client._execution_generator(query, {}))

        # Assertions
        assert len(results) == 2, "Should yield two results for the two pages"
        assert results[0]['nodes'][0]['edges'] == "data1", "First result should match first mocked response"
        assert results[1]['nodes'][0]['edges'] == "data2", "Second result should match second mocked response"

        # Ensure update_paginator was called correctly
        assert query.paginator.update_paginator.call_count == 2, "update_paginator should be called twice, once per page"
        query.paginator.update_paginator.assert_called_with(False, "cursor2")  # Last call should reflect the end of pagination

    def test_client_execute_success(self, github_client, requests_mock):
        """Test successful execution of a query"""
        requests_mock.post(github_client._base_path(), [
            {'json': {"data": {"rateLimit": {"cost": 1, "remaining": 5000, "resetAt": "2021-01-01T00:00:00Z"}}}, 'status_code': 200},
            {'json': {"data": "query success"}, 'status_code': 200}
        ])
        response = github_client.execute(Query("query { viewer { login }}"), {})
        assert response == "query success", "Execute should return success on valid response."

    def test_client_execute_failed(self, github_client, requests_mock):
        """Test that a failed query raises QueryFailedException"""
        requests_mock.post(github_client._base_path(), [
            {'json': {"data": {"rateLimit": {"cost": 1, "remaining": 5000, "resetAt": "2021-01-01T00:00:00Z"}}}, 'status_code': 200},
            {"json": {"error": "bad request"}, "status_code": 400}
        ])
        with pytest.raises(QueryFailedException) as excinfo:
            github_client.execute(Query("query { viewer { login }}"), {})
        assert "Query failed with code" in str(excinfo.value), "QueryFailedException should contain the right error message."
</code>

These tests collectively ensure that the GitHub Miner operates reliably, providing accurate and timely data from GitHub's APIs. By covering a wide range of scenarios, from successful queries to error handling and rate limiting, the tests ensure that the application can be used confidently in production environments.

Team Members

Atharva Pansare

Sumedh Limburkar

Viraj Sanap

Mengning Li

Mentor: Jialin Cui

References

GitHub REST API documentation - https://docs.github.com/en/rest?apiVersion=2022-11-28