There will be signs that the second phase is coming to an end. First, your monthly profits will decrease. You`ll start to have trade-offs between metrics: you`ll see some go up and some fall in some experiments. This is where it gets interesting. As profits become harder to achieve, machine learning needs to become more sophisticated. One caveat: this section has more blue sky rules than previous sections. We`ve seen how many teams have gone through the happy days of machine learning in Phases I and II. Once Phase III is reached, teams must find their own way. In total, there are 43 rules. The following can be used as a table of contents to navigate the document. Usually, the problems that machine learning is trying to solve are not entirely new. There is a grading or classification system or the problem you are trying to solve.
This means that there are a number of rules and heuristics. These same heuristics can give you a boost when optimized with machine learning. Their heuristics should be broken down after all the information they have for two reasons. First, the transition to a machine learning system will be smoother. Second, these rules usually contain a lot of intuition about the system that you don`t want to throw away. There are four ways to use an existing heuristic: Quality ranking is an art, but spam filtering is a war. The signals you use to determine high-quality posts will become obvious to those using your system, and they will optimize their messages to have those characteristics. Therefore, your quality ranking should focus on ranking content published in good faith. You should not strongly disparage the learner of the quality ranking for ranking spam. Similarly, “racy” content must be treated separately from quality ranking. Spam filtering is another story.
You should expect the functions you need to generate to change constantly. Often there are obvious rules that you put in the system (if a message has more than three spam votes, do not retrieve it, etc.). Each learned model must be updated daily, if not faster. The reputation of the content creator will play a big role. Martin`s work is extensive and informative. I am very pleased to take an in-depth look at his work. The idea behind the document is that it should provide a set of guidelines/style rules for practical programming. The original document is linked here. I would suggest reading and consuming the document from start to finish.
The following notes are my reading notes of the document from beginning to end. Think of it as a compressed version of the original. Continuation of the first part of this research. This piece will provide an in-depth look at Martin Zinkevich`s work. These rules are very tactical and require a high degree of familiarity with machine learning-based product development. To be honest, after reviewing the document thoroughly, I can only understand many rules conceptually, because my daily work is not directly related to making models. If you really want to get user feedback, use user experience methods. Create user personas (a description can be found in Bill Buxton`s Sketching User Experiences) at the beginning of a process and run usability tests later (a description can be found in Steve Krug`s Don`t Make Me Think). User personas involve the creation of a hypothetical user. For example, if your team is all-male, it might be helpful to design a 35-year-old female user persona (with user features) and look at the results it generates, rather than 10 results for 25- to 40-year-old men. Involving real people to observe their reaction to your website (local or remote) in usability testing can also give you a new perspective. This document contains various references to Google products.
To provide more context, I will give a brief description of the most common examples below. Batch processing is different from online processing. Online processing requires that you process each request as it arrives (for example, you must perform a separate search for each query), while in batch processing, you can combine tasks (for example, create a join). At the time of the operation, you perform online processing, while training is a batch task. However, there are some things you can do to reuse the code. For example, you can create an object specific to your system, where the result of queries or joins can be stored in a very human-readable way, and errors can be easily tested. Then, once you`ve gathered all the information during service or training, perform a common method to bridge the gap between the readable object specific to your system and the format expected by the machine learning system. This eliminates a source of training bias. Therefore, try not to use two different programming languages between training and serving. This decision makes it almost impossible to share code.
Use a simple template for assembly that uses only the output of your “base templates” as input. You also want to apply properties for these set models. For example, increasing the score generated by a base model should not decrease the score of the set. It is also preferable that incoming models are semantically interpretable (for example, calibrated) so that changes to the underlying models do not confuse the overall model.