CSC/ECE 517 Spring 2024 - G2402 Implement REST client, REST API, and Graphql API endpoint for repositories: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
No edit summary
 
(38 intermediate revisions by 4 users not shown)
Line 1: Line 1:
# Students' Pre-class GitHub Contribution Research
__TOC__
== GitHub Miner ==
This is a convenient tool to query a user's GitHub metrics. The project aims to develop API endpoints for GitHub GraphQL queries and GitHub REST queries using Python Flask. It involves integrating existing code with Flask to expose GraphQL queries as API endpoints, developing a REST client, and creating REST endpoints for querying the same data sets. The project also includes thorough testing and documentation of the endpoints.


This is a research project about students pre-class GitHub contribution and its impact on students' in-class performance.
==Live Demo & Source Code==
http://152.7.177.239:5000/auth/login Link]
(Please login using your personal GitHub accounts, not NCSU accounts)


# Python Version
[https://github.ncsu.edu/slimbur/GH_Miner Source Code]


We provide a convenient tool to query a user's GitHub metrics.
==Installation==


**IN ORDER TO USE THIS TOOL, YOU NEED TO PROVIDE YOUR OWN .env FILE.**
We recommend using virtual environment. Steps to set up virtual environment:
Because we use the [dotenv](https://pypi.org/project/python-dotenv/) package to load environment variable.
**YOU ALSO NEED TO PROVIDE YOUR GITHUB PERSONAL ACCESS TOKEN(PAT) IN YOUR .env FILE**
i.e. GITHUB_TOKEN = 'yourGitHubPAT'


## Installation
cd path/to/your/project/directory
 
python -m venv venv
We recommend using virtual environment.
 
```shell
cd path/to/your/project/directory
python -m venv venv
```


On macOS and Linux:
On macOS and Linux:
 
source venv/bin/activate
```shell
source venv/bin/activate
```


On Windows (Command Prompt):
On Windows (Command Prompt):
.\venv\Scripts\activate


```shell
.\venv\Scripts\activate
```


On Windows (PowerShell):
On Windows (PowerShell):
.\venv\Scripts\Activate.ps1


```
Next, install all the necessary libraries:
shell
pip -r requirements.txt
.\venv\Scripts\Activate.ps1
```
 
then you can
 
```
shell
pip -r requirements.txt
```
 
## Execution
 
TBD
 
### authentication — Basic authenticator class
 
Source code: [github_graphql/authentication.py](https://github.com/JialinC/GitHub_GraphQL/blob/main/python_github_query/github_graphql/authentication.py)
 
This module provides the basic authentication mechanism. User needs to provide a valid GitHub PAT with correct scope to run queries.
A PersonalAccessTokenAuthenticator object will be created with the PAT that user provided. get_authorization_header method will return an
authentication header that will be used when send request to GitHub GraphQL server.
 
<span style="font-size: larger;">Authenticator Objects</span>
 
Parent class of PersonalAccessTokenAuthenticator. Serve as base class of any authenticators.
 
<span style="font-size: larger;">PersonalAccessTokenAuthenticator Objects</span>
 
Handles personal access token authentication method for GitHub clients.
 
`class PersonalAccessTokenAuthenticator(token)`
 
- The `token` argument is required. This is the user's GitHub personal access token with the necessary scope to execute the queries that the user required.
 
Instance methods:
 
`get_authorization_header()`
 
- Returns the authentication header as a dictionary i.e. {"Authorization": "your_access_token"}.
 
### query — Classes for building GraphQL queries
 
Source code: [github_graphql/query.py](https://github.com/JialinC/GitHub_GraphQL/blob/main/python_github_query/github_graphql/query.py)
 
This module provides a framework for building GraphQL queries using Python classes. The code defines four classes: QueryNode, QueryNodePaginator, Query, and PaginatedQuery.
QueryNode represents a basic building block of a GraphQL query.
QueryNodePaginator is a specialized QueryNode for paginated requests.
Query represents a terminal query node that can be executed.
PaginatedQuery represents a terminal query node designed for paginated requests.
 
- You can find more information about GitHub GraphQL API here: [GitHub GraphQL API documentation](https://docs.github.com/en/graphql)
- You can use GitHub GraphQL Explorer to try out queries: [GitHub GraphQL API Explorer](https://docs.github.com/en/graphql/overview/explorer)
 
<span style="font-size: larger;">QueryNode Objects</span>
 
The QueryNode class provides a framework for constructing GraphQL queries using Python classes.
It allows for building complex queries with nested fields and supports pagination for paginated requests.
 
`class QueryNode(name, fields, args)`
 
- `name` is the name of the QueryNode
- `fields` is a List of fields in the QueryNode
- `args` is a Map of arguments in the QueryNode.
 
Private methods:
 
`_format_args()`
 
- \_format_args method takes the arguments of a QueryNode instance and formats them as a string representation in the form of key-value pairs. The formatting depends on the type of the argument value, with special handling for strings, lists, dictionaries, booleans, and the default case for other types. The method then returns the formatted arguments as a string enclosed within parentheses.
 
`_format_fields()`
 
- \_format_fields method takes the list of fields within a QueryNode instance and formats them as a single string representation.
 
Instance methods:
 
`get_connected_nodes()`
 
- get_connected_nodes method returns a list of connected QueryNode instances within a QueryNode instance. It iterates over the fields attribute of the QueryNode instance and checks if each field is an instance of QueryNode. The resulting list contains all the connected QueryNode instances found.
 
`__str__()`
 
- \_\_str\_\_ method defines how the QueryNode object should be represented as a string. It combines the object's name, formatted arguments, and formatted fields to construct the string representation in a specific format.
 
`__repr__()`
 
- Debug method.
 
`__eq__(other)`
 
- \_\_eq\_\_ method defines how the QueryNode object should be compared to each other.
 
<span style="font-size: larger;">Query Objects</span>
 
The Query class is a subclass of QueryNode and represents a terminal QueryNode that can be executed.
It provides a substitute method to substitute values in the query using keyword arguments.
 
Class methods:
 
`test_time_format(time_string)`
 
- test_time_format is a static method that validates whether a given time string is in the expected format "%Y-%m-%dT%H:%M:%SZ".
 
`convert_dict(data)`
 
- convert_dict is a static method that takes a dictionary (data) as input and returns a modified dictionary with certain value conversions.
- If the value is of type bool, it converts it to a lowercase string representation.
- If the value is a nested dictionary, it converts it to a string representation enclosed in curly braces.
- If the value is a string and passes the test_time_format check, it wraps it in double quotes.
- For other value types, it keeps the value unchanged.
 
Instance methods:
 
`substitute(**kwargs)`
 
- This method substitutes the placeholders in the query string with specific values provided as keyword arguments.
 
<span style="font-size: larger;">QueryNodePaginator Objects</span>
 
The QueryNodePaginator class extends the QueryNode class and adds pagination-related functionality.
It keeps track of pagination state, appends pagination fields to the existing fields,
provides methods to check for a next page and update the pagination state,
and includes a method to reset the pagination state.
 
#### NOTE: We only implemented single level pagination, as multi-level pagination behavior is not well-defined in different scenarios. For example, you want to query all the pull requests a user made to all his/her repositories. You may develop a query that retrieves all repositories of a user as the first level pagination and all pull requests to each repository as the second level pagination. However, each repository not necessarily has the same number of pull requests. We leave this to the user to decide how they want to handle their multi-level pagination.
 
`class QueryNodePaginator(name, fields, args)`
 
- `name` is the name of the QueryNode.
- `fields` is a List of fields in the QueryNode.
- `args` is a Map of arguments in the QueryNode.
 
Instance methods:
 
`update_paginator(has_next_page, end_cursor)`
 
- update_paginator updates the paginator arguments with the provided has_next_page and end_cursor values. It adds the end cursor to the arguments using the key "after", enclosed in double quotes.
 
`has_next()`
 
- The has_next method checks if there is a next page by returning the value of has_next_page.
 
`reset_paginator()`
 
- The reset_paginator method resets the QueryPaginator by removing the "after" key from the arguments and setting has_next_page to None.
 
`__eq__(other)`
 
- \_\_eq\_\_ method overrides the equality comparison for QueryNodePaginator objects. It compares the object against another object of the same class, returning True if they are equal based on the parent class's equality comparison (super().**eq**(other)).
 
<span style="font-size: larger;">PaginatedQuery Objects</span>
 
`class PaginatedQuery(name, fields, args)`
 
- `name` is the name of the QueryNode
- `fields` is a List of fields in the QueryNode
- `args` is a Map of arguments in the QueryNode.
- The \_\_init\_\_ method initializes a PaginatedQuery object with the provided name, fields, and arguments. It calls the parent class's **init** method and then extracts the path to the pageInfo node using the extract_path_to_pageinfo_node static method.
 
`extract_path_to_pageinfo_node(paginated_query)`
 
- The extract_path_to_pageinfo_node static method is used to extract the path to the QueryNodePaginator node within the query. It takes a PaginatedQuery object as input and traverses the query fields to find the QueryNodePaginator. It returns a tuple containing the path to the QueryNodePaginator node and the QueryNodePaginator node. If the QueryNodePaginator node is not found, it raises an InvalidQueryException.
 
### client —
 
Source code: [github_graphql/client.py](https://github.com/JialinC/GitHub_GraphQL/blob/main/python_github_query/github_graphql/client.py)
 
This class represents the main GitHub GraphQL client.
 
`class Client(protocol, host, is_enterprise, authenticator)`
_`protocol`: Protocol used for server communication.
_`host`: Host server domain or IP.
_`is_enterprise`: Boolean to check if the host is running on GitHub Enterprise.
_`authenticator`: The authentication handler for the client.
 
Private methods:
 
`_base_path(self)`:
 
- Returns the base path for a GraphQL request based on whether the client is connected to GitHub Enterprise.
 
`_generate_headers(self, **kwargs)`:
 
- Generates headers for an HTTP request, including authentication headers and other additional headers passed as keyword arguments.
 
`_retry_request(self, retry_attempts, timeout_seconds, query, substitutions)`:
 
- Wrapper method to retry requests. Takes in the number of attempts, timeout duration, the query, and the substitutions for the query.
 
`_execute(self, query, substitutions)`:
 
- Executes a GraphQL query after performing the required substitutions. Handles possible request errors and rate limiting.
 
`_execution_generator(self, query, substitutions)`:
 
- Executes a PaginatedQuery by repeatedly querying until all pages have been fetched. Yields each response.
 
Instance methods:
 
`execute(self, query, substitutions):`
 
- Executes a query, which can be a simple Query or a PaginatedQuery. Utilizes the \_execute method or the \_execution_generator method based on the type of query.
 
### user_login — Query for user basic login info
 
Source code: [queries/login.py](https://github.com/JialinC/GitHub_GraphQL/blob/main/python_github_query/queries/profile/user_login.py)
 
The `UserLoginViewer` class represents a GraphQL query that retrieves the login information of the currently authenticated user.
The query is defined using the Query class, and the viewer field is requested with the login field nested inside it.
 
<table>
<tr>
<th>GraphQL</th>
<th>Python</th>
</tr>
<tr>
<td>
 
```
query {
  viewer {
    login
  }
}
```


</td>
Next, set the PYTHONPATH to
<td>


```
On Windows
python
set PYTHONPATH=%PYTHONPATH%;path/to/your/project
class UserLoginViewer(Query):
set PYTHONPATH=%PYTHONPATH%;path/to/your/project/backend
    def __init__(self):
        super().__init__(
            fields=[
                QueryNode(
                    "viewer",
                    fields=["login"]
                )
            ]
        )
```


</td>
On Unix or MacOS
</tr>
export PYTHONPATH=$PYTHONPATH:/path/to/your/project
</table>
export PYTHONPATH=%PYTHONPATH%;path/to/your/project/backend


The `UserLogin` class represents a GraphQL query that retrieves detailed information about a user.
You can run the app from your terminal by executing the following command:
The query accepts a variable called $user of type String!, which represents the user's login. The user field is requested with the login argument set to the value of the $user variable. Inside the user field, additional fields like login, name, email, and createdAt are requested.
python backend\run.py


<table>
==GraphQL Endpoints==
<tr>
<th>GraphQL</th>
<th>Python</th>
</tr>
<tr>
<td>


```
Get current user login:
/api/graphql/current-user-login


query ($user: String!){
Sample Output:
    user(login: $user){
{
        login
  "viewer": {
        name
    "login": "<your-username>"
        id
  }
        email
}
        createdAt
    }
}
```


</td>
<td>


```
Get specific user login:
python
/api/graphql/user-login/<username>
class UserLogin(Query):
    def __init__(self):
        super().__init__(
            fields=[
                QueryNode(
                    "user",
                    args={
                        "login": "$user"
                    },
                    fields=[
                        "login",
                        "name",
                        "id",
                        "email",
                        "createdAt"
                    ]
                )
            ]
        )
```


</td>
Get list of all commits in a repo:
</tr>
/api/graphql/specific-user-commits/<owner>/<repo_name>
</table>


### user_profile_stats — Query for user detailed profile info
Sample Output:
 
{
Source code: [queries/login.py](https://github.com/JialinC/GitHub_GraphQL/blob/main/python_github_query/queries/profile/user_profile_stats.py)
     "repository": {
 
      "defaultBranchRef": {
<table>
        "target": {
<tr>
          "history": {
<th>GraphQL</th>
             "nodes": [
<th>Python</th>
              {
</tr>
                 "additions": 0,
<tr>
                 "author": {
<td>
                  "email": "61797592+Atharva7007@users.noreply.github.com",
 
                  "name": "Atharva Pansare",
```
                  "user": {
 
                    "login": "Atharva7007"
query ($user: String!){
                  }
     user(login: $user){
                 },
        login
                 "authoredDate": "2020-04-03T09:30:17Z",
        name
                 "changedFilesIfAvailable": 1,
        id
                 "deletions": 0,
        email
                 "message": "Add files via upload",
        createdAt
                 "parents": {
    }
                  "totalCount": 1
}
```
 
</td>
<td>
 
```python
class UserProfileStats(Query):
    def __init__(self):
        super().__init__(
            fields=[
                QueryNode(
                    "user",
                    args={"login": "$user"},
                    fields=[
                        "login",
                        "name",
                        "email",
                        "createdAt",
                        "bio",
                        "company",
                        "isBountyHunter",
                        "isCampusExpert",
                        "isDeveloperProgramMember",
                        "isEmployee",
                        "isGitHubStar",
                        "isHireable",
                        "isSiteAdmin",
                        QueryNode("watching", fields=["totalCount"]),
                        QueryNode("starredRepositories", fields=["totalCount"]),
                        QueryNode("following", fields=["totalCount"]),
                        QueryNode("followers", fields=["totalCount"]),
                        QueryNode("gists", fields=["totalCount"]),
                        QueryNode("issues", fields=["totalCount"]),
                        QueryNode("projects", fields=["totalCount"]),
                        QueryNode("pullRequests", fields=["totalCount"]),
                        QueryNode("repositories", fields=["totalCount"]),
                        QueryNode("repositoryDiscussions", fields=["totalCount"]),
                        QueryNode("gistComments", fields=["totalCount"]),
                        QueryNode("issueComments", fields=["totalCount"]),
                        QueryNode("commitComments", fields=["totalCount"]),
                        QueryNode("repositoryDiscussionComments", fields=["totalCount"]),
                    ]
                )
            ]
        )
```
 
</td>
</tr>
</table>
 
### metrics — Query for user's total contribution metrics
 
Source code: [queries/metrics.py](https://github.com/JialinC/GitHub_GraphQL/blob/main/python_github_query/queries/metrics.py)
 
`UserMetrics` class represents a GraphQL query that retrieves various metrics and information about a user.
It is designed to fetch information such as the user's login, name, email, creation date, bio, company, and several other metrics related to their GitHub activity.
The root field in the query is "user", indicating that information about a specific user will be retrieved. The "user" field accepts an argument called "login", which represents the user's login.
Inside the "user" field, various other fields are requested, including "login", "name", "email", "createdAt", "bio", "company", and several other metrics related to the user's GitHub activity.
Some fields, such as "watching", "starredRepositories", "following", and "followers", have additional nested fields, specifically the "totalCount" field.
These nested fields allow you to retrieve the total count of certain metrics, such as the number of repositories a user is watching or the number of followers they have.
 
<table>
<tr>
<th>GraphQL</th>
<th>Python</th>
</tr>
<tr>
<td>
 
```
query ($user: String!) {
    user(login: $user) {
        login
        name
        email
        createdAt
        bio
        company
        isBountyHunter
        isCampusExpert
        isDeveloperProgramMember
        isEmployee
        isGitHubStar
        isHireable
        isSiteAdmin
        watching {
             totalCount
        }
        starredRepositories {
            totalCount
        }
        following {
            totalCount
        }
        followers {
            totalCount
        }
        gists {
            totalCount
        }
        gistComments {
            totalCount
        }
        issueComments {
            totalCount
        }
        issues {
            totalCount
        }
        projects {
            totalCount
        }
        pullRequests {
            totalCount
        }
        repositories {
            totalCount
        }
        repositoryDiscussionComments {
            totalCount
        }
        repositoryDiscussions {
            totalCount
        }
    }
}
```
 
</td>
<td>
 
```python
def __init__(self):
    super().__init__(
        fields=[
            QueryNode(
                 "user",
                 args={"login": "$user"},
                fields=[
                    "login",
                    "name",
                    "email",
                    "createdAt",
                    "bio",
                    "company",
                    "isBountyHunter",
                    "isCampusExpert",
                    "isDeveloperProgramMember",
                    "isEmployee",
                    "isGitHubStar",
                    "isHireable",
                    "isSiteAdmin",
                    QueryNode("watching", fields=["totalCount"]),
                    QueryNode("starredRepositories", fields=["totalCount"]),
                    QueryNode("following", fields=["totalCount"]),
                    QueryNode("followers", fields=["totalCount"]),
                    QueryNode("gists", fields=["totalCount"]),
                    QueryNode("gistComments", fields=["totalCount"]),
                    QueryNode("issueComments", fields=["totalCount"]),
                    QueryNode("issues", fields=["totalCount"]),
                    QueryNode("projects", fields=["totalCount"]),
                    QueryNode("pullRequests", fields=["totalCount"]),
                    QueryNode("repositories", fields=["totalCount"]),
                    QueryNode("repositoryDiscussionComments", fields=["totalCount"]),
                    QueryNode("repositoryDiscussions", fields=["totalCount"]),
                ]
            )
        ]
    )
```
 
</td>
</tr>
</table>
 
### commits — Query for user's contribution metrics within a specified time range
 
Source code: [queries/commits.py](https://github.com/JialinC/GitHub_GraphQL/blob/main/python_github_query/queries/commits.py)
 
UserCommits represents a GraphQL query for retrieving commit-related contributions of a user within a specified time range.
Inside the "user" field, there is a nested field called "contributionsCollection". This field represents the collection of contributions made by the user.
The "contributionsCollection" field accepts two additional arguments, "from" and "to", with values of "$start" and "$end" respectively.
These variables represent the start and end dates of the time range for which the contributions are requested.
Inside the "contributionsCollection" field, several other fields are requested,
such as "startedAt", "endedAt", "hasActivityInThePast", "hasAnyContributions", "hasAnyRestrictedContributions", "restrictedContributionsCount", and various other commit-related metrics.
By including these fields in the query, you can retrieve information about the user's commit contributions, issue contributions, pull request contributions, and other related metrics within the specified time range.
 
<table>
<tr>
<th>GraphQL</th>
<th>Python</th>
</tr>
<tr>
<td>
 
```
query ($user: String!, $start: DateTime!, $end: DateTime!) {
    user(login: $user){
        contributionsCollection(from: $start,to: $end){
            startedAt
            endedAt
            hasActivityInThePast
            hasAnyContributions
            hasAnyRestrictedContributions
            restrictedContributionsCount
            totalCommitContributions
            totalIssueContributions
            totalPullRequestContributions
            totalPullRequestReviewContributions
            totalRepositoriesWithContributedCommits
            totalRepositoriesWithContributedIssues
            totalRepositoriesWithContributedPullRequestReviews
            totalRepositoriesWithContributedPullRequests
            totalRepositoryContributions
        }
    }
}
```
 
</td>
<td>
 
```python
def __init__(self):
    super().__init__(
        fields=[
            QueryNode(
                "user",
                args={"login": "$user"},
                 fields=[
                    QueryNode(
                        "contributionsCollection",
                        args={"from": "$start", "to": "$end"},
                        fields=[
                            "startedAt",
                            "endedAt",
                            "hasActivityInThePast",
                            "hasAnyContributions",
                            "hasAnyRestrictedContributions",
                            "restrictedContributionsCount",
                            "totalCommitContributions",
                            "totalIssueContributions",
                            "totalPullRequestContributions",
                            "totalPullRequestReviewContributions",
                            "totalRepositoriesWithContributedCommits",
                            "totalRepositoriesWithContributedIssues",
                            "totalRepositoriesWithContributedPullRequestReviews",
                            "totalRepositoriesWithContributedPullRequests",
                            "totalRepositoryContributions",
                        ]
                    ),
                 ]
            )
        ]
    )
```
 
</td>
</tr>
</table>
 
### comments — Query for retrieving comments made by a user
 
Source code: [queries/comments.py](https://github.com/JialinC/GitHub_GraphQL/blob/main/python_github_query/queries/comments.py)
 
UserComments represents a GraphQL query for retrieving comments made by a user. The query structure includes a "user" field with the user's login as an argument. Inside the "user" field, there is a nested field "QueryNodePaginator". This field represents the pagination of comments made by the user.
The "QueryNodePaginator" field accepts two additional arguments: "$comment_type" and "$pg_size". These arguments control the type of comments to retrieve and the pagination size, respectively. The value of "$comment_type" determines the type of comments to fetch, and the value of "$pg_size" sets the number of comments to retrieve per page.
Inside the "QueryNodePaginator" field, there are several requested fields. These fields include "totalCount" to get the total count of comments, "nodes" to retrieve the comment nodes with their body and creation timestamps, and "pageInfo" to fetch pagination information such as the end cursor and whether there are more pages available.
 
#### NOTE: comment_type can be commitComments, gistComments, issueComments, and repositoryDiscussionComments.
 
<table>
<tr>
<th>GraphQL</th>
<th>Python</th>
</tr>
<tr>
<td>
 
```
query($user: String!, $pg_size: Int!){
    user(login: $user){
        login
        issueComments(first: $pg_size){
            totalCount
            pageInfo{
                hasNextPage
                endCursor
            }
            nodes{
                body
                createdAt
            }
        }
    }
}
```
 
</td>
<td>
 
```python
def __init__(self):
    super().__init__(
        fields=[
            QueryNode(
                "user",
                 args={"login": "$user"},
                 fields=[
                    "login",
                    QueryNodePaginator(
                        "$comment_type",
                        args={"first": "$pg_size"},
                        fields=[
                            "totalCount",
                            QueryNode(
                                "nodes",
                                fields=["body", "createdAt"]
                            ),
                            QueryNode(
                                "pageInfo",
                                fields=["endCursor", "hasNextPage"]
                            )
                        ]
                    )
                 ]
            )
        ]
    )
```
 
</td>
</tr>
</table>
 
### contributions — Query for retrieving contributions made by a user
 
Source code: [queries/contributions.py](https://github.com/JialinC/GitHub_GraphQL/blob/main/python_github_query/queries/contributions.py)
 
UserContributions represents a GraphQL query for retrieving contributions made by a user. The query structure includes a "user" field with the user's login as an argument. Inside the "user" field, there is a nested field "QueryNodePaginator".
This field represents the pagination of contributions made by the user. The "QueryNodePaginator" field accepts two additional arguments: "$contribution_type" and "$pg_size". These arguments control the type of contributions to retrieve and the pagination size, respectively.
The value of "$contribution_type" determines the type of contributions to fetch, such as "issue", "pullRequest", or any other valid contribution type. The value of "$pg_size" sets the number of contributions to retrieve per page.
Inside the "QueryNodePaginator" field, there are several requested fields. These fields include "totalCount" to get the total count of contributions, "nodes" to retrieve the contribution nodes with their creation timestamps, and "pageInfo" to fetch pagination information such as the end cursor and whether there are more pages available.
 
#### NOTE: contribution_type can be any valid contribution type such as "issues" or "pullRequests"
 
<table>
<tr>
<th>GraphQL</th>
<th>Python</th>
</tr>
<tr>
<td>
 
```
query($user: String!, $pg_size: Int!){
    user(login: $user){
        login
        issues(first:$pg_size){
            totalCount
            pageInfo{
                hasNextPage
                endCursor
            }
            nodes{
                createdAt
            }
        }
    }
}
```
 
</td>
<td>
 
```python
def __init__(self):
    super().__init__(
        fields=[
            QueryNode(
                 "user",
                args={"login": "$user"},
                fields=[
                    "login",
                    QueryNodePaginator(
                        "$contribution_type",
                        args={"first": "$pg_size"},
                        fields=[
                            "totalCount",
                            QueryNode(
                                "nodes",
                                fields=["createdAt"]
                            ),
                            QueryNode(
                                "pageInfo",
                                fields=["endCursor", "hasNextPage"]
                            )
                        ]
                    )
                ]
            )
        ]
    )
```
 
</td>
</tr>
</table>
 
### repositories — Query for retrieving repositories owned or contributed to by a user
 
Source code: [queries/repositories.py](https://github.com/JialinC/GitHub_GraphQL/blob/main/python_github_query/queries/repositories.py)
 
UserRepositories represents a GraphQL query for retrieving repositories owned or contributed to by a user. The query structure includes a "user" field with the user's login as an argument. Inside the "user" field, there is a nested field "QueryNodePaginator".
This field represents the pagination of repositories. The "QueryNodePaginator" field accepts several arguments that allow for filtering and ordering the repositories. These arguments include "$pg_size" to set the pagination size, "$is_fork" to filter by whether the repository is a fork, "$ownership" to filter by owner affiliations, and "$order_by" to specify the field and direction for ordering the repositories.
Inside the "QueryNodePaginator" field, there are several requested fields. These fields include "nodes" to retrieve information about the repositories. Each repository node includes various details such as the repository name, whether it is empty, creation and update timestamps, fork count, stargazer count, total watcher count, primary programming language, and information about the languages used in the repository.
The "languages" field provides information about the languages used in the repository. It accepts additional arguments for filtering and ordering the languages. The requested fields within the "languages" field include "totalSize" to get the total size of the languages used, "totalCount" to get the count of distinct languages, and "edges" to retrieve detailed information about each language, including its size and name.
 
#### NOTE: isFork can be "True" or "False", ownerAffiliation can be "OWNER" or "COLLABORATOR"
 
<table>
<tr>
<th>GraphQL</th>
<th>Python</th>
</tr>
<tr>
<td>
 
```
query($user: String!, $pg_size: Int!, $isFork: Boolean!, $ownerAffiliations: [RepositoryAffiliation!]!) {
    user(login: $user) {
        repositories(first: $pg_size, isFork: $isFork, ownerAffiliations: $ownerAffiliations, orderBy: { field: CREATED_AT, direction: ASC }) {
            totalCount
            pageInfo {
                hasNextPage
                endCursor
            }
            nodes {
                name
                isEmpty
                createdAt
                updatedAt
                forkCount
                stargazerCount
                watchers {
                    totalCount
                 }
                 }
                primaryLanguage {
              }
                    name
            ],
                }
            "pageInfo": {
                languages(first: 100) {
              "endCursor": "98ba34a6c62ff6fe7c4d4de5c342a194f72d66e4 0",
                    totalSize
              "hasNextPage": true
                    totalCount
            },
                    edges {
            "totalCount": 6
                        size
          }
                        node {
                            name
                        }
                    }
                }
            }
         }
         }
      }
     }
     }
}
  },
```


</td>
Get details of all contributors in a repo:
<td>
/api/graphql/repository-contributors/<owner>/<repo_name>


```python
Sample Output:
def __init__(self):
{
     super().__init__(
     "repository": {
        fields=[
      "defaultBranchRef": {
            QueryNode(
        "target": {
                "user",
          "history": {
                args={"login": "$user"},
            "nodes": [
                fields=[
              {
                    QueryNodePaginator(
                "author": {
                        "repositories",
                  "email": "61797592+Atharva7007@users.noreply.github.com",
                        args={"first": "$pg_size",
                  "name": "Atharva Pansare",
                              "isFork": "$is_fork",
                  "user": {
                              "ownerAffiliations": "$ownership",
                    "login": "Atharva7007"
                              "orderBy": "$order_by"},
                   }
                        fields=[
                            QueryNode(
                                "nodes",
                                fields=[
                                    "name",
                                    "isEmpty",
                                    "createdAt",
                                    "updatedAt",
                                    "forkCount",
                                    "stargazerCount",
                                    QueryNode("watchers", fields=["totalCount"]),
                                    QueryNode("primaryLanguage", fields=["name"]),
                                    QueryNode(
                                        "languages",
                                        args={"first": 100,
                                              "orderBy": {"field": "SIZE",
                                                          "direction": "DESC"}},
                                        fields=[
                                            "totalSize",
                                            "totalCount",
                                            QueryNode(
                                                "edges",
                                                fields=[
                                                    "size",
                                                    QueryNode("node", fields=["name"])
                                                ]
                                            )
                                        ]
                                    )
                                ]
                            ),
                            QueryNode(
                                "pageInfo",
                                fields=["endCursor", "hasNextPage"]
                            )
                        ]
                    ),
                ]
            )
        ]
    )
```
 
</td>
</tr>
</table>
 
### repository_contributors — Query for retrieving contributors of a repository
 
Source code: [queries/repository_contributors.py](https://github.com/JialinC/GitHub_GraphQL/blob/main/python_github_query/queries/repository_contributors.py)
 
This GraphQL query aims to retrieve the default branch reference of a specified repository.
Specifically, it extracts the login names of authors from the commit history of the default branch.
 
<table>
<tr>
<th>GraphQL</th>
<th>Python</th>
</tr>
<tr>
<td>
 
```
query ($owner: String!, $name: String!) {
  repository(owner: $owner, name: $name) {
    defaultBranchRef {
      target {
        ... on Commit {
          history{
            nodes {
              author {
                user {
                   login
                 }
                 }
               }
               }
             }
            ],
            "pageInfo": {
              "endCursor": "98ba34a6c62ff6fe7c4d4de5c342a194f72d66e4 0",
              "hasNextPage": true
             },
            "totalCount": 6
           }
           }
         }
         }
Line 926: Line 127:
     }
     }
   }
   }
}
==REST API Endpoints==
```
Get current user login:
/api/rest/current-user-login-rest


</td>
Sample Output:
<td>
{
  "avatar_url": "https://avatars.githubusercontent.com/u/61797592?v=4",
  "bio": null,
  "blog": "",
  "company": null,
  "created_at": "2020-03-04T16:49:06Z",
  "email": null,
  "events_url": "https://api.github.com/users/Atharva7007/events{/privacy}",
  "followers": 1,
  "followers_url": "https://api.github.com/users/Atharva7007/followers",
  "following": 3,
  "following_url": "https://api.github.com/users/Atharva7007/following{/other_user}",
  "gists_url": "https://api.github.com/users/Atharva7007/gists{/gist_id}",
  "gravatar_id": "",
  "hireable": null,
  "html_url": "https://github.com/Atharva7007",
  "id": 61797592,
  "location": null,
  "login": "Atharva7007",
  "name": "Atharva Pansare",
  "node_id": "MDQ6VXNlcjYxNzk3NTky",
  "organizations_url": "https://api.github.com/users/Atharva7007/orgs",
  "public_gists": 0,
  "public_repos": 11,
  "received_events_url": "https://api.github.com/users/Atharva7007/received_events",
  "repos_url": "https://api.github.com/users/Atharva7007/repos",
  "site_admin": false,
  "starred_url": "https://api.github.com/users/Atharva7007/starred{/owner}{/repo}",
  "subscriptions_url": "https://api.github.com/users/Atharva7007/subscriptions",
  "twitter_username": null,
  "type": "User",
  "updated_at": "2024-03-14T19:03:46Z",
  "url": "https://api.github.com/users/Atharva7007"
}


```python
def __init__(self):
    super().__init__(
        fields=[
            QueryNode(
                "repository",
                args={"owner": "$owner",
                      "name": "$repo_name"},
                fields=[
                    QueryNode(
                        "defaultBranchRef",
                        fields=[
                            QueryNode(
                                "target",
                                fields=[
                                    QueryNode(
                                        "... on Commit",
                                        fields=[
                                            QueryNode(
                                                "history",
                                                fields=[
                                                    QueryNode(
                                                        "nodes",
                                                        fields=[
                                                            QueryNode(
                                                                "author",
                                                                fields=[
                                                                    QueryNode(
                                                                        "user",
                                                                        fields=[
                                                                            "login"
                                                                        ]
                                                                    )
                                                                ]
                                                            )
                                                        ]
                                                    )
                                                ]
                                            )
                                        ]
                                    )
                                ]
                            )
                        ]
                    )
                ]
            )
        ]
    )
```


</td>
Get list of all commits in a repo:
</tr>
/api/rest/specific-user-commits/<owner>/<repo_name>
</table>


### repository_contributors_contribution — Query for retrieving contributions of a contributor made to a repository
Sample Output:
[
  {
    "repository": {
      "defaultBranchRef": {
        "target": {
          "history": {
            "nodes": [
              {
                "additions": 0,
                "author": {
                  "email": "61797592+Atharva7007@users.noreply.github.com",
                  "name": "Atharva Pansare",
                  "user": {
                    "login": "Atharva7007"
                  }
                },
                "authoredDate": "2020-04-03T09:30:17Z",
                "changedFilesIfAvailable": 1,
                "deletions": 0,
                "message": "Add files via upload",
                "parents": {
                  "totalCount": 1
                }
              }
            ],
            "pageInfo": {
              "endCursor": "98ba34a6c62ff6fe7c4d4de5c342a194f72d66e4 0",
              "hasNextPage": true
            },
            "totalCount": 6
          }
        }
      }
    }
  }
]


Source code: [queries/repository_contributors_contribution.py](https://github.com/JialinC/GitHub_GraphQL/blob/main/python_github_query/queries/repository_contributors_contribution.py)


This GraphQL query is designed to retrieve the commit history of a specified author ($id)
Get details of all contributors in a repo:
in the default branch of a specified repository ($owner and $name).
/api/rest/repository-contributors/<owner>/<repo_name>
It returns key metrics like the total count of commits, the date each commit was authored, the number of changed files,
additions, and deletions for each commit, along with the author's login name.


<table>
Sample Output:
<tr>
[
<th>GraphQL</th>
  {
<th>Python</th>
    "avatar_url": "https://avatars.githubusercontent.com/u/61797592?v=4",
</tr>
    "events_url": "https://api.github.com/users/Atharva7007/events{/privacy}",
<tr>
    "followers_url": "https://api.github.com/users/Atharva7007/followers",
<td>
    "following_url": "https://api.github.com/users/Atharva7007/following{/other_user}",
    "gists_url": "https://api.github.com/users/Atharva7007/gists{/gist_id}",
    "gravatar_id": "",
    "html_url": "https://github.com/Atharva7007",
    "id": 61797592,
    "login": "Atharva7007",
    "node_id": "MDQ6VXNlcjYxNzk3NTky",
    "organizations_url": "https://api.github.com/users/Atharva7007/orgs",
    "permissions": {
      "admin": true,
      "maintain": true,
      "pull": true,
      "push": true,
      "triage": true
    },
    "received_events_url": "https://api.github.com/users/Atharva7007/received_events",
    "repos_url": "https://api.github.com/users/Atharva7007/repos",
    "role_name": "admin",
    "site_admin": false,
    "starred_url": "https://api.github.com/users/Atharva7007/starred{/owner}{/repo}",
    "subscriptions_url": "https://api.github.com/users/Atharva7007/subscriptions",
    "type": "User",
    "url": "https://api.github.com/users/Atharva7007"
  }
]


```
== Implementation Details ==
query ($owner: String!, $name: String!, $id: ID!, $pg_size: Int!){
  repository(owner: $owner, name: $name) {
    defaultBranchRef {
      target {
        ... on Commit {
          history(author: { id: $id }, first: $pg_size) {
            totalCount
            nodes {
              authoredDate
              changedFilesIfAvailable
              additions
              deletions
              author {
                user {
                  login
                }
              }
            }
            pageInfo {
              endCursor
              hasNextPage
            }
          }
        }
      }
    }
  }
}
```


</td>
=== Implementation ===
<td>
In the API, we have 2 separate endpoints to retrieve the same data: one for REST and the other using GraphQL queries. To manage this, we have made use of Flask Blueprints where all the "api/graphql" requests get routed to the graphql variants and the "api/rest" requests get routed to the REST variants.


```python
=== Design Patterns used ===
    def __init__(self):
# The REST API Client '''/backend/app/services/github_query/github_rest/client.py''' is a Singleton.
        super().__init__(
            fields=[
                QueryNode(
                    "repository",
                    args={"owner": "$owner",
                          "name": "$repo_name"},
                    fields=[
                        QueryNode(
                            "defaultBranchRef",
                            fields=[
                                QueryNode(
                                    "target",
                                    fields=[
                                        QueryNode(
                                            "... on Commit",
                                            fields=[
                                                QueryNode(
                                                    "history",
                                                    args={"author": "$id"},
                                                    fields=[
                                                        "totalCount",
                                                        QueryNode(
                                                            "nodes",
                                                            fields=[
                                                                "authoredDate",
                                                                "changedFilesIfAvailable",
                                                                "additions",
                                                                "deletions",
                                                                QueryNode(
                                                                    "author",
                                                                    fields=[
                                                                        QueryNode(
                                                                            "user",
                                                                            fields=[
                                                                                "login"
                                                                            ]
                                                                        )
                                                                    ]
                                                                )
                                                            ]
                                                        )
                                                    ]
                                                )
                                            ]
                                        )
                                    ]
                                )
                            ]
                        )
                    ]
                )
            ]
        )
```


</td>
== Testing ==
</tr>
</table>


### repositories — Query for retrieving commits af a contributor made to a repository
To ensure the reliability and robustness of the GitHub Miner, comprehensive testing strategies, including unit tests and integration tests, have been implemented. These tests validate the functionality of the API endpoints across different scenarios, ensuring the system's correctness and stability. The testing suite covers the following areas:


Source code: [queries/repository_commits.py]()
=== Unit Tests ===


This GraphQL query is structured to retrieve commits from the default branch of a specified repository. For each commit, it fetches the authored date, the number of changed files (if available), the number of additions and deletions, the commit message, and details about the commit's author.
Unit tests have been developed to test individual components in isolation, ensuring that each part functions correctly on its own. This includes testing the initialization, argument formatting, field formatting, string representation, and equality of query nodes, as well as the correct behavior of the query builders and authenticators. Examples of unit tests include:


<table>
- **TestQueryNode**: Validates the initialization, argument formatting, and field formatting of query nodes.
<tr>
<th>GraphQL</th>
<th>Python</th>
</tr>
<tr>
<td>


```
- **TestQuery**: Ensures correct query initialization, argument substitution, and time formatting.
query ($owner: String!, $repo_name: String!, $pg_size: Int!) {
  repository(owner: $owner, name: $repo_name) {
    defaultBranchRef {
      target {
        ... on Commit {
          history(first: $pg_size) {
            totalCount
            nodes {
              authoredDate
              changedFilesIfAvailable
              additions
              deletions
              message
              parents (first: 2) {
                totalCount
              }
              author {
                name
                email
                user {
                  login
                }
              }
            }
            pageInfo {
              endCursor
              hasNextPage
            }
          }
        }
      }
    }
  }
}
```


</td>
- **TestQueryNodePaginator**: Tests the functionality of the paginator, including initialization, updating, and resetting.
<td>


```python
- **TestPaginatedQuery**: Validates the initialization and execution of paginated queries.
    def __init__(self):
        super().__init__(
            fields=[
                QueryNode(
                    "repository",
                    args={"owner": "$owner",
                          "name": "$repo_name"},
                    fields=[
                        QueryNode(
                            "defaultBranchRef",
                            fields=[
                                QueryNode(
                                    "target",
                                    fields=[
                                        QueryNode(
                                            "... on Commit",
                                            fields=[
                                                QueryNodePaginator(
                                                    "history",
                                                    args={"first": "$pg_size"},
                                                    fields=[
                                                        'totalCount',
                                                        QueryNode(
                                                            "nodes",
                                                            fields=[
                                                                "authoredDate",
                                                                "changedFilesIfAvailable",
                                                                "additions",
                                                                "deletions",
                                                                "message",
                                                                QueryNode(
                                                                    "parents (first: 2)",
                                                                    fields=[
                                                                        "totalCount"
                                                                    ]
                                                                ),
                                                                QueryNode(
                                                                    "author",
                                                                    fields=[
                                                                        'name',
                                                                        'email',
                                                                        QueryNode(
                                                                            "user",
                                                                            fields=[
                                                                                "login"
                                                                            ]
                                                                        )
                                                                    ]
                                                                )
                                                            ]
                                                        ),
                                                        QueryNode(
                                                            "pageInfo",
                                                            fields=["endCursor", "hasNextPage"]
                                                        )
                                                    ]
                                                )
                                            ]
                                        )
                                    ]
                                )
                            ]
                        )
                    ]
                )
            ]
        )
```


</td>
=== Integration Tests ===
</tr>
</table>


### ratelimit —
Integration tests verify the interaction between different components of the system, ensuring that they work together as expected. This includes testing the behavior of the client when making actual requests to the GitHub API, handling authentication, executing queries, and processing paginated responses. Examples of integration tests include:


Source code: [queries/rate_limit.py](https://github.com/JialinC/GitHub_GraphQL/blob/main/python_github_query/queries/rate_limit.py)
- **TestClient**: Tests the GitHub client's initialization, header generation, retry logic, query execution, and handling of paginated queries.
- **Authentication Tests**: Validate the generation of correct authorization headers and the handling of personal access tokens.


<table>
These tests collectively ensure that the GitHub Miner operates reliably, providing accurate and timely data from GitHub's APIs. By covering a wide range of scenarios, from successful queries to error handling and rate limiting, the tests ensure that the application can be used confidently in production environments.
<tr>
<th>GraphQL</th>
<th>Python</th>
</tr>
<tr>
<td>


```
== Team Members ==
query ($dryrun: Boolean!){
Atharva Pansare
  rateLimit (dryRun: $dryrun){
    cost
    limit
    remaining
    resetAt
    used
  }
}


```
Sumedh Limburkar


</td>
Viraj Sanap
<td>


```python
Mengning Li


```
Mentor: Jialin Cui


</td>
== References ==
</tr>
GitHub REST API documentation - https://docs.github.com/en/rest?apiVersion=2022-11-28
</table>

Latest revision as of 20:48, 7 April 2024

GitHub Miner

This is a convenient tool to query a user's GitHub metrics. The project aims to develop API endpoints for GitHub GraphQL queries and GitHub REST queries using Python Flask. It involves integrating existing code with Flask to expose GraphQL queries as API endpoints, developing a REST client, and creating REST endpoints for querying the same data sets. The project also includes thorough testing and documentation of the endpoints.

Live Demo & Source Code

http://152.7.177.239:5000/auth/login Link] (Please login using your personal GitHub accounts, not NCSU accounts)

Source Code

Installation

We recommend using virtual environment. Steps to set up virtual environment:

cd path/to/your/project/directory
python -m venv venv

On macOS and Linux:

source venv/bin/activate

On Windows (Command Prompt):

.\venv\Scripts\activate


On Windows (PowerShell):

.\venv\Scripts\Activate.ps1

Next, install all the necessary libraries:

pip -r requirements.txt

Next, set the PYTHONPATH to

On Windows

set PYTHONPATH=%PYTHONPATH%;path/to/your/project
set PYTHONPATH=%PYTHONPATH%;path/to/your/project/backend

On Unix or MacOS

export PYTHONPATH=$PYTHONPATH:/path/to/your/project
export PYTHONPATH=%PYTHONPATH%;path/to/your/project/backend

You can run the app from your terminal by executing the following command:

python backend\run.py

GraphQL Endpoints

Get current user login:

/api/graphql/current-user-login

Sample Output:

{
  "viewer": {
    "login": "<your-username>"
  }
}


Get specific user login:

/api/graphql/user-login/<username>

Get list of all commits in a repo:

/api/graphql/specific-user-commits/<owner>/<repo_name>

Sample Output: {

   "repository": {
     "defaultBranchRef": {
       "target": {
         "history": {
           "nodes": [
             {
               "additions": 0,
               "author": {
                 "email": "61797592+Atharva7007@users.noreply.github.com",
                 "name": "Atharva Pansare",
                 "user": {
                   "login": "Atharva7007"
                 }
               },
               "authoredDate": "2020-04-03T09:30:17Z",
               "changedFilesIfAvailable": 1,
               "deletions": 0,
               "message": "Add files via upload",
               "parents": {
                 "totalCount": 1
               }
             }
           ],
           "pageInfo": {
             "endCursor": "98ba34a6c62ff6fe7c4d4de5c342a194f72d66e4 0",
             "hasNextPage": true
           },
           "totalCount": 6
         }
       }
     }
   }
 },

Get details of all contributors in a repo:

/api/graphql/repository-contributors/<owner>/<repo_name>

Sample Output: {

   "repository": {
     "defaultBranchRef": {
       "target": {
         "history": {
           "nodes": [
             {
               "author": {
                 "email": "61797592+Atharva7007@users.noreply.github.com",
                 "name": "Atharva Pansare",
                 "user": {
                   "login": "Atharva7007"
                 }
               }
             }
           ],
           "pageInfo": {
             "endCursor": "98ba34a6c62ff6fe7c4d4de5c342a194f72d66e4 0",
             "hasNextPage": true
           },
           "totalCount": 6
         }
       }
     }
   }
 }

REST API Endpoints

Get current user login:

/api/rest/current-user-login-rest

Sample Output:

{
  "avatar_url": "https://avatars.githubusercontent.com/u/61797592?v=4",
  "bio": null,
  "blog": "",
  "company": null,
  "created_at": "2020-03-04T16:49:06Z",
  "email": null,
  "events_url": "https://api.github.com/users/Atharva7007/events{/privacy}",
  "followers": 1,
  "followers_url": "https://api.github.com/users/Atharva7007/followers",
  "following": 3,
  "following_url": "https://api.github.com/users/Atharva7007/following{/other_user}",
  "gists_url": "https://api.github.com/users/Atharva7007/gists{/gist_id}",
  "gravatar_id": "",
  "hireable": null,
  "html_url": "https://github.com/Atharva7007",
  "id": 61797592,
  "location": null,
  "login": "Atharva7007",
  "name": "Atharva Pansare",
  "node_id": "MDQ6VXNlcjYxNzk3NTky",
  "organizations_url": "https://api.github.com/users/Atharva7007/orgs",
  "public_gists": 0,
  "public_repos": 11,
  "received_events_url": "https://api.github.com/users/Atharva7007/received_events",
  "repos_url": "https://api.github.com/users/Atharva7007/repos",
  "site_admin": false,
  "starred_url": "https://api.github.com/users/Atharva7007/starred{/owner}{/repo}",
  "subscriptions_url": "https://api.github.com/users/Atharva7007/subscriptions",
  "twitter_username": null,
  "type": "User",
  "updated_at": "2024-03-14T19:03:46Z",
  "url": "https://api.github.com/users/Atharva7007"
}


Get list of all commits in a repo:

/api/rest/specific-user-commits/<owner>/<repo_name>

Sample Output:

[
  {
    "repository": {
      "defaultBranchRef": {
        "target": {
          "history": {
            "nodes": [
              {
                "additions": 0,
                "author": {
                  "email": "61797592+Atharva7007@users.noreply.github.com",
                  "name": "Atharva Pansare",
                  "user": {
                    "login": "Atharva7007"
                  }
                },
                "authoredDate": "2020-04-03T09:30:17Z",
                "changedFilesIfAvailable": 1,
                "deletions": 0,
                "message": "Add files via upload",
                "parents": {
                  "totalCount": 1
                }
              }
            ],
            "pageInfo": {
              "endCursor": "98ba34a6c62ff6fe7c4d4de5c342a194f72d66e4 0",
              "hasNextPage": true
            },
            "totalCount": 6
          }
        }
      }
    }
  }
]


Get details of all contributors in a repo:

/api/rest/repository-contributors/<owner>/<repo_name>

Sample Output: [

 {
   "avatar_url": "https://avatars.githubusercontent.com/u/61797592?v=4",
   "events_url": "https://api.github.com/users/Atharva7007/events{/privacy}",
   "followers_url": "https://api.github.com/users/Atharva7007/followers",
   "following_url": "https://api.github.com/users/Atharva7007/following{/other_user}",
   "gists_url": "https://api.github.com/users/Atharva7007/gists{/gist_id}",
   "gravatar_id": "",
   "html_url": "https://github.com/Atharva7007",
   "id": 61797592,
   "login": "Atharva7007",
   "node_id": "MDQ6VXNlcjYxNzk3NTky",
   "organizations_url": "https://api.github.com/users/Atharva7007/orgs",
   "permissions": {
     "admin": true,
     "maintain": true,
     "pull": true,
     "push": true,
     "triage": true
   },
   "received_events_url": "https://api.github.com/users/Atharva7007/received_events",
   "repos_url": "https://api.github.com/users/Atharva7007/repos",
   "role_name": "admin",
   "site_admin": false,
   "starred_url": "https://api.github.com/users/Atharva7007/starred{/owner}{/repo}",
   "subscriptions_url": "https://api.github.com/users/Atharva7007/subscriptions",
   "type": "User",
   "url": "https://api.github.com/users/Atharva7007"
 }

]

Implementation Details

Implementation

In the API, we have 2 separate endpoints to retrieve the same data: one for REST and the other using GraphQL queries. To manage this, we have made use of Flask Blueprints where all the "api/graphql" requests get routed to the graphql variants and the "api/rest" requests get routed to the REST variants.

Design Patterns used

  1. The REST API Client /backend/app/services/github_query/github_rest/client.py is a Singleton.

Testing

To ensure the reliability and robustness of the GitHub Miner, comprehensive testing strategies, including unit tests and integration tests, have been implemented. These tests validate the functionality of the API endpoints across different scenarios, ensuring the system's correctness and stability. The testing suite covers the following areas:

Unit Tests

Unit tests have been developed to test individual components in isolation, ensuring that each part functions correctly on its own. This includes testing the initialization, argument formatting, field formatting, string representation, and equality of query nodes, as well as the correct behavior of the query builders and authenticators. Examples of unit tests include:

- **TestQueryNode**: Validates the initialization, argument formatting, and field formatting of query nodes.

- **TestQuery**: Ensures correct query initialization, argument substitution, and time formatting.

- **TestQueryNodePaginator**: Tests the functionality of the paginator, including initialization, updating, and resetting.

- **TestPaginatedQuery**: Validates the initialization and execution of paginated queries.

Integration Tests

Integration tests verify the interaction between different components of the system, ensuring that they work together as expected. This includes testing the behavior of the client when making actual requests to the GitHub API, handling authentication, executing queries, and processing paginated responses. Examples of integration tests include:

- **TestClient**: Tests the GitHub client's initialization, header generation, retry logic, query execution, and handling of paginated queries. - **Authentication Tests**: Validate the generation of correct authorization headers and the handling of personal access tokens.

These tests collectively ensure that the GitHub Miner operates reliably, providing accurate and timely data from GitHub's APIs. By covering a wide range of scenarios, from successful queries to error handling and rate limiting, the tests ensure that the application can be used confidently in production environments.

Team Members

Atharva Pansare

Sumedh Limburkar

Viraj Sanap

Mengning Li

Mentor: Jialin Cui

References

GitHub REST API documentation - https://docs.github.com/en/rest?apiVersion=2022-11-28