One of my recurring activities is the search for interesting projects on GitHub: I am often pleasantly surprised to find big and small projects of great utility (or fascinating, but sometimes useless).
Very often, the discovery of these projects is accidental and, on several occasions, it is virtually impossible to find some of them by simply navigating in a linear way.
Reflecting on the techniques I know and use in my work at Elif Lab, I wanted to try to help the probability. I have therefore created a small tool that allows me to fish not completely blindly in the sea of repositories.
The idea is simple (more or less): if a project interests me, it is also likely to interest a subset of people with similar tastes to mine.
In short: I tried to see which informal networks form around a repository.
Github generates different types of networks thanks to different methods of user interaction and different actions that can be performed with respect to repositories (star, fork, watch, etc.).
For this experiment I decided to use the fork, perhaps the most basic action that a git user can perform.
The fork action allows a Github user to create a copy of a repository to begin independent code development.
The forks can lead to an enrichment of the initial project, through the merge operation, or they can lead to a fork even within the active community.
So why use the fork? Because this action indicates interest in the content of a project.
Where do you start from?
I decided to use one of the repositories that best helps you navigate other projects: Awesome Rust https://github.com/rust-unofficial/awesome-rust.
The various Awesome xyz repositories are structured lists of projects: they are often well made and are excellent starting points for tackling a theme.
Awesome Rust is dedicated to the programming language Rust. An interesting language, related to the Redox operating system.
I started from all the accounts that made a fork of the awesome-rust project and collected the list of projects associated with each user.
Simplifying the process: I created the network formed by users and projects (so users are bound by projects they share as forks), I took the subnetwork of the nodes with the highest pagerank and broke it into communities.
In short, I have used various hidden techniques with an arbitrary choice of some parameters 🙂
The result is an interesting collection of small bipartite networks that, as I had hoped, collect several projects that I like and that are fun to analyse.
Some details about the most interesting ones for me:
As one might expect, there is a large community dedicated to Rust with the main projects that revolve around this programming language: https://github.com/rust-lang/rust
This community is strongly oriented to deep-learning, data science and machine-learning.
It is noticeable that there are important projects in languages other than Rust like Caffe, tensorflow and neuraltalk.
Interesting repositories here are:
The great book that shows some interesting application of Bayesian models: https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers
Awesome Artificial Intelligence: https://github.com/owainlewis/awesome-artificial-intelligence
And the very useful https://github.com/BurntSushi/xsv that allows you to manage very large csv files from command line.
This is a community more related to Rust: here the Rust port of Shadowsocks is central. https://github.com/shadowsocks/shadowsocks-rust
You can find the curious project https://github.com/nvbn/thefuck that allows you to correct errors from console and a Chinese project that needs an English translation (this is what is asked in English in the README) https://github.com/rustcc/RustPrimer
This community has a large collection of awesome projects including the project https://github.com/sindresorhus/awesome pin of the movement.
Note the repositories:
https://github.com/alex/what-happens-when “This repository is an attempt to answer the age old interview question “ What happens when you type google. com into your browser’s address box and press enter?””;
https://github.com/avelino/awesome-go Awesome list of Go language, in some ways similar to Rust;
http://github.com/souravchk/cheatsheets-ai “Essential Cheat Sheets for deep learning and machine learning researchers”;
https://github.com/Miserlou/Zappa “Serverless Python Web Services”;
And https://github.com/sympy/sympy A computer algebra system written in pure Python.
Some notable projects in other communities:
https://github.com/BurntSushi/ripgrep “a line-oriented search tool that recursively searches your current directory for a regex pattern while respecting your gitignore rules”;
I started from the repository: https://github.com/rust-unofficial/awesome-rust
Through the Github API I collected the list of accounts that have made a fork to the repository. For each account I collected a list of all the account’s repositories.
I then generated the network of repositories.
In the network, nodes represent repositories and the weighted arcs the number of accounts that follow a pair of nodes.
I calculated the Pagerank to find the most important nodes (with a larger radius in the image) and extracted the subnetworks of the most central ones.
Finally, I used Louvain algorithm with a standard probabilistic model to bring out communities.
Interested in this or other data analysis techniques?
Need some help with your project? Contact us firstname.lastname@example.org