CSC/ECE 517 Fall 2023 - G2350. Add GitLab support for using GraphQL to query user metrics 1

From Expertiza_Wiki
Revision as of 04:49, 16 November 2023 by Srajara4 (talk | contribs) (Commit links added)
Jump to navigation Jump to search

This wiki page is for information regarding the changes made for the G2350. Add GitLab support for using GraphQL to query user metrics 1 OSS assignment for Fall 2023, CSC/ECE 517.

Overview of Gitlab

GitLab is a web-based platform that provides a comprehensive set of tools for managing the complete software development lifecycle. It is widely used for version control, continuous integration, issue tracking, code review, and more. GitLab supports both private and public repositories, allowing individuals and teams to collaborate on software projects.

Here's an overview of key features and components of GitLab:

1. Version Control (Git):

  • GitLab is built on top of the Git version control system, allowing users to manage and track changes in their source code efficiently.
  • It supports branching, merging, and distributed workflows, enabling collaborative development.

2. Web-Based Git Repository Management:

  • GitLab provides a web-based interface for managing Git repositories.
  • Users can create, clone, and fork repositories, and they can view the commit history, branches, and tags.

3. Issue Tracking:

  • GitLab includes an integrated issue tracking system that enables teams to manage tasks, bugs, and other project-related issues.
  • Issues can be assigned, labeled, and tracked through various stages of development.

4. Continuous Integration (CI/CD):

  • GitLab CI/CD is a powerful continuous integration and delivery system integrated into GitLab.
  • It allows developers to automate the building, testing, and deployment of their code.
  • CI/CD pipelines are defined using a .gitlab-ci.yml file in the project repository.

5. Code Review:

  • GitLab facilitates code review through its merge request (MR) system.
  • Developers can submit MRs to propose changes, and other team members can review, comment, and approve the changes before merging.

6. Wiki and Documentation:

  • GitLab includes a built-in wiki for each project, providing a space for documentation and collaboration.
  • The wiki can be used to document project-specific information, guidelines, and more.

7. Code Quality and Security:

  • GitLab integrates with various code quality and security tools to analyze code for potential issues.
  • Static code analysis, dependency scanning, and container scanning are among the security features.

8.Permissions and Access Control:

  • GitLab provides fine-grained access control, allowing project owners to manage user roles and permissions.
  • Access can be granted at different levels, including repositories, branches, and CI/CD pipelines.

9. Integration with Third-Party Tools:

  • GitLab integrates with various third-party tools and services, such as Slack, Jira, Jenkins, and more.
  • These integrations enhance collaboration and streamline workflows.

10. Container Registry:

  • GitLab includes a container registry to store and manage Docker images.
  • Teams can utilize this feature to package and distribute their applications using containerization.

11. Monitoring and Analytics:

  • GitLab provides monitoring and analytics features, including performance metrics, error tracking, and operational insights.
  • GitLab is available in different editions, including a free and open-source Community Edition, as well as a more feature-rich Enterprise Edition for larger organizations. The platform can be self-hosted on-premises or used through GitLab's cloud-based service.

Overview of GraphQL

GraphQL is a query language and runtime for APIs (Application Programming Interfaces) that was developed by Facebook. It provides a more efficient, powerful, and flexible alternative to traditional REST APIs. GraphQL enables clients to request only the data they need, and it allows for more dynamic and intuitive interactions between clients and servers.

Here's an overview of key concepts and features of GraphQL:

1. Declarative Data Fetching:

  • Clients can specify exactly what data they need, and they receive only that data in response.
  • This eliminates over-fetching (receiving more data than necessary) and under-fetching (not receiving enough data) issues commonly associated with REST APIs.

2. Hierarchical Structure:

  • GraphQL queries are hierarchical and mirror the structure of the data being requested.
  • Clients can traverse nested relationships to retrieve related data in a single request.

3.Strongly Typed Schema:

  • GraphQL APIs are defined by a schema that specifies the types of data that can be queried and the relationships between them.
  • The schema serves as a contract between the client and the server, providing a clear definition of what data can be requested.

4. Single Endpoint:

  • Unlike REST APIs, which may have multiple endpoints for different resources, GraphQL typically exposes a single endpoint for all interactions.
  • Clients can request the specific data they need, reducing the need for multiple endpoints.

5. Mutations for Write Operations:

  • While queries are used for read operations, GraphQL uses mutations for write operations such as creating, updating, or deleting data.
  • Mutations are explicitly defined in the schema and are executed sequentially.

6. Real-time Data with Subscriptions:

  • GraphQL supports subscriptions, allowing clients to receive real-time updates when data changes on the server.
  • This is useful for implementing features like live notifications or real-time collaboration.

7.Introspection:

  • GraphQL provides introspection, allowing clients to query the schema itself to discover what types and operations are available.
  • This makes it easy to explore and understand the capabilities of a GraphQL API.

8.Tooling and Ecosystem:

  • GraphQL has a rich ecosystem of tools, libraries, and integrations for various programming languages.
  • Popular GraphQL clients include Apollo Client, Relay, and urql, while server implementations include Apollo Server, Express GraphQL, and others.

9.Security and Efficiency:

  • GraphQL allows servers to define rate limits and depth limits to prevent malicious or inefficient queries.
  • The client has more control over the data it receives, reducing the risk of overloading the network with unnecessary data.

10.Wide Adoption:

  • GraphQL has gained widespread adoption in the tech industry and is used by major companies such as Facebook, GitHub, Shopify, and others.
  • It is actively supported by a large and growing community.

GraphQL is well-suited for scenarios where clients have specific data requirements, and it excels in scenarios such as mobile app development, where minimizing data transfer is crucial. Its flexibility and efficiency make it a popular choice for modern API development.

Project Description

This project is dedicated to retrieving GitLab-specific data associated with a user by their username. Our objective was to fetch a comprehensive array of user-related information, including:

  • User commit comments
  • User issue comments
  • User gist comments
  • User repository discussion comments
  • User profile stats
  • User contribution collection

Using the powerful GraphQL APIs provided by GitLab, we aimed to streamline the extraction process and obtain a holistic view of a user's activity and contributions within the GitLab ecosystem.

Problem Statement

Phase 1 :

  • Go through the github queries present in the repo
  • Execute the demo.py file to check how the code is executing for the github queries
  • Go through the gitlab documentation and write the relevant gitlab queries for the queries available on github.
  • Indicate the gitlab queries that are not available and what gitlab query has been obtained instead.
  • Ensure the code is not breaking.


Phase 2 :

  • Create an UI to display the fetched data from the Github/Gitlab.
  • Use Flask to create a backend REST API to host the data.
  • Use react to create a front end UI.

Workflow

The workflow of the project is defined by the above flow diagram.

  • User first queries for user data from gitlab.
  • The query is transformed to a GraphQL query using the GraphQL query builder.
  • The GraphQL query builder queries the GraphQL API.
  • GraphQL API in turn queries the Gitlab for the required data.
  • Gitlab returns the response to the GraphQL API.
  • GraphQL API formats the data into JSON response.
  • The JSON response is returned back to the user.


File(s) Modified / Added

File Name

Rationale

Commit link

1. commit_comments.py

Queries the user commit comments from Gitlab.

https://github.ncsu.edu/jcui9/G2370/commit/bf24212eb90c7e6c1cd6f7b30a9d904ca5c9fc6d

2. discussion_comments.py

Query the user discussion comments stats from Gitlab.

https://github.ncsu.edu/jcui9/G2370/commit/16a2d2a9b8ba131890f0c272550148ddbdd04208

3. issue_comments.py

Query the issue comment stats from Gitlab.

https://github.ncsu.edu/jcui9/G2370/commit/c0a181edf7f93ea8dbc1b39226670104c327d54c

4. snippet_comments.py

Query the snippet comments stats from Gitlab.

https://github.ncsu.edu/jcui9/G2370/commit/12d348e54d2c8aae37bb3d8064ae79f21964a094

5. user_login.py

Query the user profile stats from Gitlab.

https://github.ncsu.edu/jcui9/G2370/commit/16a2d2a9b8ba131890f0c272550148ddbdd04208

6. user_profile_stats.py

Query the user profile stats from Gitlab.

https://github.ncsu.edu/jcui9/G2370/commit/16a2d2a9b8ba131890f0c272550148ddbdd04208

7. user_contribution_collections.py

Query the user contribution stats from Gitlab.

https://github.ncsu.edu/jcui9/G2370/commit/ffeb97c2652c4d6f1944d0d03b302c74657e08f7

Solutions/Details of Changes Made

1. user_login — Query for user basic login info
Source code: [1](https://github.ncsu.edu/srajara4/G2370/blob/main/gitlab_query/queries/profile/user_login.py)

The `UserLogin` class represents a GraphQL query that retrieves the login information of the currently authenticated user. The query is defined using the Query class, and the username field is requested with the email field nested inside it.

GraphQL Python


   user(username:"srajara41") {
       username{
       emails{
       nodes{
       email
           }
           }
       }
   }


   class UserLogin(Query):
   def __init__(self):
       super().__init__(
           fields=[
               QueryNode(
                   "user",
                   args={
                       "username": "$username"
                   },
                   fields=[
                       "username",
                       QueryNode("emails",
                                 fields=[
                                     QueryNode("nodes", fields=["email"])])
                   ]
               )
           ]
       )


2. user_profile_stats — Query for user detailed profile info
Source code: [2](https://github.ncsu.edu/srajara4/G2370/blob/main/gitlab_query/queries/profile/user_profile_stats.py)

`UserProfileStats` class represents a GraphQL query that retrieves various metrics and information about a user. It is designed to fetch information such as the user's login, name, email, creation date, bio and several other metrics related to their GitLab activity. The root field in the query is "user", indicating that information about a specific user will be retrieved. The "user" field accepts an argument called "login", which represents the user's login. Inside the "user" field, various other fields are requested, including "login", "name", "email", "createdAt", "bio" and "organization". Inside "organization" there is a snippets field. Inside the "snippets" field there is a "nodes" fields and inside the "nodes" filed there is a "commenters" field. The "commenters" field has count as an argument.

Inside "projectmemberships" there is a field named "nodes" and "id". Inside "nodes" there is a field named "projects". Inside projects there is a nested field named "issues". Inside "issues" there is a field named "count". Inside "count" there is a "commenters" field which has "count" as argument.

There is a "starredProjects" field which has "count" as argument. There is also a "authoredMergedRequests" field which has "count" as argument. This field has other fields as attribute as well such as "avatarUrl", "bot", "discord", "linkedin" and "twitter".

GraphQL Python
   user(username: "$username") {
       username
       name
       email
       createdAt
       bio
       organization
       snippets {
           nodes {
               id
               commenters {
                   count
               }
           }
       }
       projectMemberships {
           nodes {
               id
               project {
                   name
                   issues {
                       count
                       nodes {
                           commenters {
                               count
                           }
                       }
                   }
               }
           }
       }
       starredProjects {
           count
       }
       authoredMergeRequests {
           count
       }
       avatarUrl
       bot
       commitEmail
       discord
       gitpodEnabled
       groupCount
       id
       jobTitle
       linkedin
       location
       name
       preferencesGitpodPath
       profileEnableGitpodPath
       pronouns
       publicEmail
       state
       twitter
       webPath
       webUrl
   }



   class UserProfileStats(Query):
   def __init__(self):
       super().__init__(
           fields=[
               QueryNode(
                   "user",
                   args={
                       "username": "$username"
                   },
                   fields=[
                       "username",
                       "name",
                       "email",
                       "createdAt",
                       "bio",
                       "organization",
                       QueryNode("snippets", fields=[QueryNode("nodes", fields=["id",QueryNode("commenters",fields=["count"])])]),
                       QueryNode("projectMemberships", fields=[QueryNode("nodes", fields=["id", QueryNode("project",
                       fields=[
                       "name",
                       QueryNode(
                       "issues",
                       fields=[
                      "count",QueryNode("nodes",fields=[QueryNode("commenters",fields=["count"])])])])])]),
                                                                                                                                                       
                       QueryNode("starredProjects", fields=["count"]),
                       QueryNode("authoredMergeRequests", fields=["count"]),
                       "avatarUrl",
                       "bot",
                       "commitEmail",
                       "discord",
                       "gitpodEnabled",
                       "groupCount",
                       "id",
                       "jobTitle",
                       "linkedin",
                       "location",
                       "name",
                       "preferencesGitpodPath",
                       "profileEnableGitpodPath",
                       "pronouns",
                       "publicEmail",
                       "state",
                       "twitter",
                       "webPath",
                       "webUrl"
                   ]
               )
           ]
       )


3. IssueComments — Query for retrieving comments made by a user
Source code: [3](https://github.ncsu.edu/srajara4/G2370/blob/main/gitlab_query/queries/comments/issue_comments.py)

IssueComments represents a GraphQL query for retrieving issue comments made by a user. But in Gitlab we were only able to get the number of issue commenters. The query structure includes a "user" field with the user's login as an argument. Inside the "user", there is a nested query "projectMemberships". Inside this query there are fields such as “nodes” and “projects”. “issues” is nested inside “projects” and it accepts an argument called count. Inside the “nodes” there is a field called “createdAt” which accepts the date created as the argument. There is a “commenters” field nested inside “nodes”. “PageInfo” is nested inside “commenters”. “Pageinfo” accepts fields “endCursor” and “hasNextPage”

GraphQL Python


   user(username: "$username"){
       projectMemberships {
           nodes {
               project {
                   issues {
                       count
                       nodes {
                           createdAt
                           commenters {
                               count
                               pageInfo {
                                   endCursor
                                   hasNextPage
                               }
                           }
                       }
                   }
               }
           }
       }
   }



   class IssueComments(Query):
   def __init__(self):
       super().__init__(
           fields=[
               QueryNode(
                   "user",
                   args={
                       "username": "$username"
                   },
                   fields=[
                       QueryNode("projectMemberships",
                                 fields=[
                                     QueryNode("nodes",
                                               fields=[
                                                   QueryNode("project",
                                                   fields=[
                                                   QueryNode("issues",
                                                   fields=[
                                                   "count",
                                                   QueryNode("nodes",
                                                   fields=[
                                                   "createdAt",
                                                    QueryNode(
                                                   "commenters",
                                                    fields=[
                                                   "count",
                                                   QueryNode(
                                                   "pageInfo",
                                                   fields=[
                                                   "endCursor",
                                                   "hasNextPage"
                                                    ])
                                                    ])
                                                    ])
                                                    ])
                                                    ])
                                               ])
                                 ])
                   ])]
       )


4. UserContributionsCollections — Query for retrieving comments made by a user
Source code: [4](https://github.ncsu.edu/srajara4/G2370/blob/main/gitlab_query/queries/time_range_contributions/user_contribution_collections.py)

'UserContributionCollections' represents a GraphQL query for retrieving issue comments made by a user. The query structure includes a "user" field with the username as an argument. Inside the "user" field, there is a nested field "groupMemberships". Inside “groupMemberships” there are nested fields named “nodes” and “groups”. Inside “groups” there are nested fields “contributions” which have arguments start and end. There is another “nodes” field nested inside “contributions” which have arguments "issuesClosed", "totalEvents", "issuesCreated", "mergeRequestsApproved", "mergeRequestsClosed" and "repoPushed". Inside the “repoPushed” field there is a field named user which accepts “name” as an argument. Inside the contributions field there is also a "pageInfo" field which accepts "hasNextPage" and "endCursor" as arguments.

GraphQL Python


{

   user(username: $username) {
       groupMemberships {
           nodes {
               group {
                   contributions(from: "$start", to: "$end") {
                       nodes {
                           issuesClosed
                           totalEvents
                           issuesCreated
                           mergeRequestsApproved
                           mergeRequestsClosed
                           mergeRequestsCreated
                           mergeRequestsMerged
                           repoPushed
                           user {
                               name
                           }
                       }
                       pageInfo {
                           hasNextPage
                           endCursor
                       }
                   }
               }
           }
       }
   }

}



   class UserContributionsCollection(Query):
   def __init__(self):
       super().__init__(
           fields=[
               QueryNode(
                   "user",
                   args={"username": "$username"},
                   fields=[
                       QueryNode("groupMemberships", fields=[
                           QueryNode("nodes", fields=[
                               QueryNode("group", fields=[
                                   QueryNode("contributions", args={"from": "$start", "to": "$end"},fields=[
                                       QueryNode("nodes",fields=[
                                           "issuesClosed",
                                           "totalEvents",
                                           "issuesCreated",
                                           "mergeRequestsApproved",
                                           "mergeRequestsClosed",
                                           "mergeRequestsCreated",
                                           "mergeRequestsMerged",
                                           "repoPushed",
                                           QueryNode("user",fields=[
                                               "name"
                                           ])
                                       ]),QueryNode("pageInfo",fields=["hasNextPage","endCursor"])
                                   ])
                               ])
                           ])
                       ])
                   ]
               )
           ]
       )

5. commit comments — Query for retrieving comments made by a user
Source code: [5](https://github.ncsu.edu/srajara4/G2370/blob/main/gitlab_query/queries/comments/commit_comments.py)

commit comments represents a GraphQL query for retrieving issue comments made by a user. The query structure includes a "user" field with the user's login as an argument. Inside the "user" field, there is a nested field "groupMemberships". Inside “groupMemberships” there are fields nested which are “nodes”, “group”, “releases”. There is a “nodes” field nested inside “releases” and a “commit” field nested inside “nodes”. Inside the “commits” field there are arguments “authorEmail”, “committedDate”, “committerName” and “author”. Inside the “author” field there is an argument called “username”.

GraphQL Python


{

   user(username: "$username"){
       username
       groupMemberships {
           nodes {
               group {
                   releases {
                       nodes {
                           commit {
                               authorEmail
                               committedDate
                               committerName
                               author {
                                   name
                               }
                           }
                       }
                   }
               }
           }
       }
   }

}



   class CommitComments(Query):
   def __init__(self):
       super().__init__(
           fields=[
               QueryNode(
                   "user",
                   args={
                       "username": "$username"
                   },
                   fields=[
                       QueryNode("groupMemberships",
                                 fields=[
                                     QueryNode("nodes",
                                               fields=[
                                                   QueryNode("group",
                                                   fields[
                                                   QueryNode("releases",
                                                   fields=[
                                                   QueryNode("nodes",
                                                   fields=[
                                                   QueryNode("commit",
                                                   fields=[
                                                  "authorEmail",
                                                  "committedDate",
                                                  "committerName",
                                                   QueryNode(
                                                  "author",
                                                   fields=[
                                                  "name"])
                                                   ])
                                                   ])
                                                   ])
                                                   ])
                                               ])
                                 ])
                   ])
           ]
       )



6. Snippet comments — Query for retrieving comments made by a user
Source code: [6](https://github.ncsu.edu/srajara4/G2370/blob/main/gitlab_query/queries/comments/snippet_comments.py)

Snippet comments represents a GraphQL query for retrieving issue comments made by a user. The query structure includes a "user" field with the user's login as an argument. Inside the "currentUser" field, there is a node named “snippets”. Inside “snippets” there are 2 fields “nodes” and “pageInfo”. Inside “nodes” there is a field named ”commenters” which has another field named “count”. Inside “pageInfo” there are 2 more fields which are “hasNextPage” and “endCursor”.

GraphQL Python


   user(username: "$username"){
       snippets {
           nodes {
               commenters {
                   count
               }
           }
           pageInfo {
               hasNextPage
               endCursor
           }
       }
   }



   class SnippetComments(Query):
   def __init__(self):
       super().__init__(
           fields=[
               QueryNode(
                   "user",
                   args={
                       "username": "$username"
                   },
                   fields=[
                       QueryNode(
                           "snippets",
                           fields=[
                               QueryNode(
                                   "nodes",
                                   fields=[
                                       QueryNode(
                                           "commenters",
                                           fields=["count"])
                                   ]
                               ),
                               QueryNode(
                                   "pageInfo",
                                   fields=["hasNextPage", "endCursor"]
                               )
                           ]
                       )
                   ]
               )])


7. discussion comments — Query for retrieving comments made by a user
Source code: [7](https://github.ncsu.edu/srajara4/G2370/blob/main/gitlab_query/queries/comments/discussion_comments.py)

discussion comments represents a GraphQL query for retrieving issue comments made by a user. The query structure includes a "user" field with the username as an argument. Inside the "user" field, there is a nested field named “groupMemberships”. Inside “groupMemberships” there is a nested field named “group” which has a nested field named “epic”. Inside “epic” there is a nested field named “discussions”. There is a “nodes” field nested inside “discussions” which has arguments “id” and “createdAt”.

GraphQL Python


   user(username: "$username"){
       username
       groupMemberships {
           nodes {
               group {
                   epic {
                       discussions {
                           nodes {
                               id
                               createdAt
                           }
                       }
                   }
               }
           }
       }
   }


   class DiscussionComments(Query):
   def __init__(self):
       super().__init__(
           fields=[
               QueryNode(
                   "user",
                   args={
                       "username": "$username"
                   },
                   fields=[
                    "username",
                    QueryNode("groupMemberships",
                    fields=[
                    QueryNode("nodes",
                    fields=[
                    QueryNode("group",
                    fields=[
                    QueryNode("epic",
                    fields=[
                    QueryNode("discussions",
                    fields=[
                    QueryNode("nodes",
                    fields=[
                    "id",
                    "createdAt"
                    ]
                    )
                    ]
                    )
                    ]
                    )
                    ]
                    )
                    ]
                    )
                    ])
                   ]
               )
           ]
       )


Test Plan

For the testing of this project, we have added several unit tests along with manual testing. For manual testing we checked the queries for different popular Gitlab users with many commits, contributions etc.

For the unit tests we have a file called conftest_gitlab.py where we test the client details that is base path, headers and authentication. For that we have created a dummy authenticator and tested all the client methods.

Given below is the screenshot of the tests passing for the same

Then we have written individual unit test methods for each query inside test_gitlab.py. That is we have written tests for user login, user profile stats, issue comments, commit comments, discussion comments and user contributions collection.

Given below is the screenshot which shows that all the tests pass.

We have also added a few helper methods to check invalid date format, invalid date range and to see if it is a valid gitlab format.


Relevant Links


Team Mentor

  • Jialin Cui

Team Members

  • Saikrishna Rajaraman (srajara4)
  • Siddharth Anand (sanand8)
  • Yash Chandrani (ychandr)