Rewire Annotation project

PEXELS FREE IMAGES

For the past month, I was participating in Rewire’s abusive speech detection project in the role of entries annotator. Its been an amazing last couple of weeks in which I learned more about what such a project looks like from the inside and was able to meet very interesting individuals from the NLP field.

About Rewire

Rewire is a start-up company established by researchers from Alan turning Institute, Bertie Vidgen and Paul Röttger who were both involved in similar projects before deciding to start their own company. Rewire is delivering socially responsible AI solutions for online safety. Their proprietary algorithm is trained for various problems ranging from hate speech to sexism detection or in this case, abuse identification.

Why Have I participated?

I learned about this project through one of my colleagues at the Social Informatics Research Group and since it was related to my current research, I decided to apply. I wanted to learn more about what such a project looks like from the inside, explore the team’s working dynamics and what could be expected from the participants. On top of that, I was hoping to meet people from the AI field so that I could incorporate some of their expertise into my own work.

How did it go?

Throughout the whole duration of the project, I enjoyed the working environment created by the company. From the beginning, we were given a clear picture of what the whole project will look like and what is expected from us.  I was surprised by how much our input was considered in the entire annotation process. We were given a chance to discuss harder cases and could always count on help from the expert annotators, providing explanations on top of the answers. That way the entire process run smoothly and deadlines could be easily met. I have to say that the Rewire team was really flexible when it came to the working conditions which made working for them an amazing experience. Moreover, we were also given a chance to discuss the results and any doubts during weekly meetings which gave the team the feeling of being involved in the process, thus resulting in greater productivity (in my opinion).

During the project, I learned how severe the issue with abuse on the internet really is. After the first batch of entries, I immediately understood the importance of such work. The issue is very severe and users are getting more sophisticated in creating their abusive entries in a way that the traditional safety systems won’t work. Working towards something with a positive impact on the agenda gave me the motivation to excel at the job.

What Have I learned?

PIXABAY FREE IMAGES

First of all, I explored how such a project progresses. If I were to use a similar method in the future for my research or even my PhD I now know where to start, what tools can I use for the communication and what is the remuneration for such task. The knowledge I gained will definitely help me set up a similar task in the future and made me more confident about working in a large team environment.
As for my second objective, I was able to establish contact with the company’s CTO Paul Röttger, who was keen enough to organise a meeting to discuss the algorithms used by a company as well as share some insights regarding the project itself as well as his personal experience in the field. Hopefully, I would be able to discuss my work with him and will be given an opportunity to hear feedback from the experienced AI expert. I want my research to be most impactful for the LM experts, policymakers and most importantly job seekers. To do that I need to ensure that all of the inefficiencies are addressed accordingly and that the methods I use will lead to the utmost results. Having a chance to speed up with someone with this experience in such early stages of research is extraordinary and likely to improve the entire process.

Thank you Rewire for such a great experience and looking forward to your future projects!

 

How to Read Machine Learning Papers

The Importance of Literature Review

The literature review is an essential part of every research. Identifying the potential areas of development, limitations in the current research and most importantly, gaining the essential topic knowledge, are the key reasons for the high importance of literature review. While many papers are easily absorbable even by people from outside the area of expertise, others with their domain-specific jargon can prove difficult to digest, even by the people from within the field.

In my case, it is usually the ML papers that I have a hard time understanding. With their high association to the mathematic theory and complex explanation of the models applied, they usually require multiple reads to fully comprehend. Based on the information from other blogs and websites I managed to develop a system that facilitates my ML paper’s reading and understanding process.

Why ML Papers Can Be Hard to Understand?

PIXABAY FREE IMAGES

The main reason behind the “challenging” nature of many ML papers is possibly their interdisciplinary nature. To deliver an effective ML model it is necessary to consider statistics, mathematics, programming and in some cases economics and finance theory. Given their complex nature, it is normal for them to be difficult to understand. Therefore, the main recommendation for reading ML papers is:

Do not get frustrated if you can’t grasp all of the concepts from the first read

That is the most important part of reading every paper, however, in my opinion, crucial when it comes to understanding ML papers and using my guide.

Secondly:

Take Regular breaks at least 5 minutes for 1h of work

It is crucial to give your brain rest,  especially when working on a computer and looking at the screen all the time. Make yourself a coffee, meditate, sit down, everything that would make you stand up from your chair and change the environment (even for a while).

Summary for Reading ML Papers

The guide you can see below is a summary of Andrew NG lecture on how to read the ML papers with the addition of my personal tips and other information I have found online on the topic (the rest of the sources are listed in the references section). Nevertheless creating a “manual” on how to read ML papers might prove useful to any of you, hence I decided to collect all information in one place in a form of a structured guide.  

First Pass- Title -> Abstract -> Graphs

Reading every ML paper should begin with identifying the Context of the research, which can be easily inferred from the paper’s title and abstract. Moreover, it is well documented that people can process images much faster than text and more importantly we are more likely to remember information stored within pictures. That said when it comes to ML papers, understanding the familiarizing with pictures can complement the information from the abstract.

Second Pass- Introduction and Conclusion (ONLY!)

The main idea behind skipping the rest of the text and focusing on the introduction and conclusion is that they are likely to contain all of the information on the author’s Motivation and Results. Understanding those two is essential to comprehend the tools and techniques used in an article. In my experience, they often made novel and complex concepts, easier to grasp providing the reader with a bigger picture regarding the whole idea of the paper. Although they usually contain less domain-specific jargon, they mention crucial to the topic concepts and terms. You can easily spot challenging expressions and learn about them before undertaking the whole document.

However, remember if you are new to a field, even Introduction and a Conclusion might require some more time to understand.

Third Pass- Read the Whole Article

Provided you have already gone through the first two steps, you should have sufficient knowledge to understand how authors have Implemented their methodology. Very often ML papers (the good ones should) will be accompanied by the mathematics equations, to explain the basis of the theory/model. In my experience, if you read the mathematics formula couple of times and you have a hard time understanding it, just skip it (for now). Do not get stuck and try to read them over and over again, in the hope of finally understanding the concept. By progressing with the text, you are more likely to understand the maths behind it, so keep reading and don’t get frustrated with complex mathematics concepts.

Fourth Pass- Read the Article Again 😉 

Are you still there? Good now go over the thing again! As harsh it might sound it is often necessary to read the article couple of times to fully understand it. However, bear in mind that if there are still areas that you have a hard time understanding, it is better to skip them. It is very likely for you not to understand all of the concepts mentioned in the article. As long as you feel comfortable with what authors did and how they did it, it should be enough to understand the article. It is all part of the learning process and we are unable to comprehend everything at one go. Try exploring other authors’ work or related research to learn about what concepts are crucial and which aren’t anymore.

What about the Code?

Some ML papers include the code which authors have used. In some cases, authors put their code on Github sometimes including thorough explanations of their programming choices. If you are interested in using the authors’ approach, you can download the code from the linked repository and try to run it yourself or recreate it using the methods you know.

Conclusion

Hoepfully now, reading ML articles will be less of a struggle and more of a joy to you. Remember that ML is a vastly developing discipline, hence it can initially be confusing and hard to understand. However, I hope that this guide will make the whole process more pleasant.


References: