Abhishek Thakur

Abhishek Thakur

Norway
154K followers 500+ connections

About

Get my book: Approaching (Almost) Any Machine Learning Problem for FREE:…

Articles by Abhishek

See all articles

Activity

Join now to see all activity

Experience

Education

Licenses & Certifications

Publications

  • Approaching (Almost) Any Machine Learning Problem

    Abhishek Thakur

    This is not a traditional book.
    The book has a lot of code. If you don't like the code first approach do not buy this book. Making code available on Github is not an option.

    This book is for people who have some theoretical knowledge of machine learning and deep learning and want to dive into applied machine learning. The book doesn't explain the algorithms but is more oriented towards how and what should you use to solve machine learning and deep learning problems. The book is not…

    This is not a traditional book.
    The book has a lot of code. If you don't like the code first approach do not buy this book. Making code available on Github is not an option.

    This book is for people who have some theoretical knowledge of machine learning and deep learning and want to dive into applied machine learning. The book doesn't explain the algorithms but is more oriented towards how and what should you use to solve machine learning and deep learning problems. The book is not for you if you are looking for pure basics. The book is for you if you are looking for guidance on approaching machine learning problems. The book is best enjoyed with a cup of coffee and a laptop/workstation where you can code along.

    Table of contents:
    - Setting up your working environment
    - Supervised vs unsupervised learning
    - Cross-validation
    - Evaluation metrics
    - Arranging machine learning projects
    - Approaching categorical variables
    - Feature engineering
    - Feature selection
    - Hyperparameter optimization
    - Approaching image classification & segmentation
    - Approaching text classification/regression
    - Approaching ensembling and stacking
    - Approaching reproducible code & model serving

    There are no sub-headings. Important terms are written in bold.

    See publication
  • AutoCompete: A Framework for Machine Learning Competitions

    AutoML @ ICML

    In this paper, we propose AutoCompete, a highly automated machine learning framework for tackling machine learning competitions. The framework has been learned, validated and improved over a period of more than two years by participating in online machine learning competitions. It aims at minimizing human interference required to build a first useful predictive model and to assess the practical difficulty of a given machine learning challenge. The proposed system helps in identifying data…

    In this paper, we propose AutoCompete, a highly automated machine learning framework for tackling machine learning competitions. The framework has been learned, validated and improved over a period of more than two years by participating in online machine learning competitions. It aims at minimizing human interference required to build a first useful predictive model and to assess the practical difficulty of a given machine learning challenge. The proposed system helps in identifying data types, choosing a machine learning model, tuning hyper-parameters, avoiding over-fitting and optimization for a provided evaluation metric. We also observe that the proposed system produces better (or comparable) results with less runtime as compared to other approaches.

    Other authors
  • Computer Vision for Head Pose Estimation: Review of a Competition

    Scandinavian Conference on Image Analysis

    This paper studies the prediction of head pose from still
    images, and summarizes the outcome of a recently organized competition,
    where the task was to predict the yaw and pitch angles of an image
    dataset with 2790 samples with known angles. The competition received
    292 entries from 52 participants, the best ones clearly exceeding the
    state-of-the-art accuracy. In this paper, we present the key methodologies
    behind selected top methods, summarize their prediction accuracy…

    This paper studies the prediction of head pose from still
    images, and summarizes the outcome of a recently organized competition,
    where the task was to predict the yaw and pitch angles of an image
    dataset with 2790 samples with known angles. The competition received
    292 entries from 52 participants, the best ones clearly exceeding the
    state-of-the-art accuracy. In this paper, we present the key methodologies
    behind selected top methods, summarize their prediction accuracy and
    compare with the current state of the art.

    Other authors
  • Parallel Processing Architecture for ECG Signal Analysis

    International Journal of Machine Learning and Computing

    Research in detecting QRS peaks in ECG signals
    has progressed to an acceptable extent and hence has gained
    adequate confidence with respect to the validity of the outputs
    produced. In view of the dynamics associated with ECG signals,
    their variants among subjects owing to varied types of problems
    encountered; it has become essential to, continuously, expand
    the scope of analysis to provide more and useful information
    from the ECG data. This warrants for a flexible…

    Research in detecting QRS peaks in ECG signals
    has progressed to an acceptable extent and hence has gained
    adequate confidence with respect to the validity of the outputs
    produced. In view of the dynamics associated with ECG signals,
    their variants among subjects owing to varied types of problems
    encountered; it has become essential to, continuously, expand
    the scope of analysis to provide more and useful information
    from the ECG data. This warrants for a flexible architecture for
    ECG signal analysis. This paper presents one such flexible
    architecture. The authors are working towards identification of
    appropriate interfaces and their definitions.

    Other authors
    See publication

Patents

  • Classification of keywords

    Issued US US9798820B1

    A computer-implemented method of classifying a keyword in a network comprises: identifying a plurality of candidate categories, comprising: converting a plurality of search results related to the keyword into a plurality of search vectors, wherein each of the plurality of search results indicates a related resource in the network; converting a plurality of resources into a plurality of category vectors, wherein each of the plurality of resources is classified in one or more categories of a set…

    A computer-implemented method of classifying a keyword in a network comprises: identifying a plurality of candidate categories, comprising: converting a plurality of search results related to the keyword into a plurality of search vectors, wherein each of the plurality of search results indicates a related resource in the network; converting a plurality of resources into a plurality of category vectors, wherein each of the plurality of resources is classified in one or more categories of a set of categories; and determining, for the plurality of category vectors, a plurality of similarity values indicating similarity to the plurality of search vectors; processing the plurality of candidate categories; and classifying the keyword by selecting the candidate category having a highest similarity value within the plurality of similarity values, a corresponding system, computing device and non-transitory computer-readable storage medium.

    Other inventors
    See patent
  • Classification of search queries

    Issued US US9767182B1

    A computer-implemented method of classifying a search query in a network comprises: classifying a plurality of search queries into categories, comprising: applying predetermined rules to each of the plurality of search queries, wherein the predetermined rules are indicative of the categories and each of the plurality of search queries is associated with search results in the network; determining, for each of the plurality of search queries, similarity values indicating similarity to each of the…

    A computer-implemented method of classifying a search query in a network comprises: classifying a plurality of search queries into categories, comprising: applying predetermined rules to each of the plurality of search queries, wherein the predetermined rules are indicative of the categories and each of the plurality of search queries is associated with search results in the network; determining, for each of the plurality of search queries, similarity values indicating similarity to each of the categories based on the applied predetermined rules; and training a machine learning module, comprising: applying the machine learning module to a plurality of training sets to a plurality of training sets, wherein each of the plurality of training sets is based on one of the plurality of classified search queries and at least one of the respective one or more similarity values, a corresponding system, computing device and non-transitory computer-readable storage medium.

    Other inventors
    See patent

Courses

  • Computer Animation

    -

  • Foundations of Graphics

    -

  • Foundations of Vision and Audio

    -

  • Image Processing: Retrieval and Analysis

    -

  • Intelligent Information Systems

    -

  • Mobile Robots

    -

  • Network Security

    -

  • Pearls of Algorithms

    -

  • Temporal Information Systems

    -

  • User Centered Software Design

    -

Honors & Awards

  • Winner: Naive Bees Classification Challenge

    Drivendata.Org, Metis

    Developed a deep learning algorithm to distinguish between bumblebee and honeybee using images.
    The model scored 0.9956 area under roc curve on private set.

  • 2nd/2225 Springleaf Marketing Response Challenge

    Kaggle.com

  • 3rd / 3514 Otto Group Product Classifiation Challenge

    Kaggle.com

    Our team ranked 3rd out of 3500+ participants. It was the largest kaggle competition till date.

  • Rank 3rd - Countable Care: Modeling Women's Health Care Decisions

    Drivendata.Org

    Recent literature suggests that the demand for women’s health care will grow over 6% by 2020. Given how rapidly the health landscape has been changing over the last 15 years, it’s increasingly important that we understand how these changes affect what care people receive, where they go for it, and how they pay. Through the National Survey of Family Growth, the CDC provides one of the few nationally representative datasets that dives deep into the questions that women face when thinking about…

    Recent literature suggests that the demand for women’s health care will grow over 6% by 2020. Given how rapidly the health landscape has been changing over the last 15 years, it’s increasingly important that we understand how these changes affect what care people receive, where they go for it, and how they pay. Through the National Survey of Family Growth, the CDC provides one of the few nationally representative datasets that dives deep into the questions that women face when thinking about their health.

    The task was to predict what drives women’s health care decisions in America.

  • Winner - Box Plots for Education

    DrivenData.Org

  • Rank 10th - KDD Cup 2014

    -

    KDD Cup is the annual Data Mining and Knowledge Discovery competition organized by ACM Special Interest Group on Knowledge Discovery and Data Mining, the leading professional organization of data miners.

  • Rank 7th - Acquire Valued Shoppers Challenge

    -

    ### Predict which shoppers will become repeat buyers

    Ranked 7th out of ~900 participants

    The Acquire Valued Shoppers Challenge asked participants to predict which shoppers are most likely to repeat purchase. To aid with algorithmic development, they had provided complete, basket-level, pre-offer shopping history for a large set of shoppers who were targeted for an acquisition campaign. The incentive offered to that shopper and their post-incentive behavior was also…

    ### Predict which shoppers will become repeat buyers

    Ranked 7th out of ~900 participants

    The Acquire Valued Shoppers Challenge asked participants to predict which shoppers are most likely to repeat purchase. To aid with algorithmic development, they had provided complete, basket-level, pre-offer shopping history for a large set of shoppers who were targeted for an acquisition campaign. The incentive offered to that shopper and their post-incentive behavior was also provided.

    This challenge provided almost 350 million rows of completely anonymised transactional data from over 300,000 shoppers. It was one of the largest problems run on Kaggle to date.

  • Rank 10th - The Random Number Grand Challenge

    -

    Decode a sequence of pseudorandom numbers

  • Rank 4th - Crowdflower Partly Sunny with a Chance of Hashtags

    -

    In this competition you are provided a set of tweets related to the weather. The challenge is to analyze the tweet and determine whether it has a positive, negative, or neutral sentiment, whether the weather occurred in the past, present, or future, and what sort of weather the tweet references.

  • Rank 6th - StumbleUpon Evergreen Classification Challenge

    -

    StumbleUpon is a user-curated web content discovery engine that recommends relevant, high quality pages and media to its users, based on their interests. While some pages we recommend, such as news articles or seasonal recipes, are only relevant for a short period of time, others maintain a timeless quality and can be recommended to users long after they are discovered. In other words, pages can either be classified as "ephemeral" or "evergreen". The ratings we get from our community give us…

    StumbleUpon is a user-curated web content discovery engine that recommends relevant, high quality pages and media to its users, based on their interests. While some pages we recommend, such as news articles or seasonal recipes, are only relevant for a short period of time, others maintain a timeless quality and can be recommended to users long after they are discovered. In other words, pages can either be classified as "ephemeral" or "evergreen". The ratings we get from our community give us strong signals that a page may no longer be relevant - but what if we could make this distinction ahead of time? A high quality prediction of "ephemeral" or "evergreen" would greatly improve a recommendation system like ours.
    Many people know evergreen content when they see it, but can an algorithm make the same determination without human intuition? Your mission is to build a classifier which will evaluate a large set of URLs and label them as either evergreen or ephemeral. Can you out-class(ify) StumbleUpon?

  • Rank 10th - Cause Effect Challenge by CHALEARN, 2013

    -

    The problem of attributing causes to effects is pervasive in science, medicine, economy and almost every aspects of our everyday life involving human reasoning and decision making. What affects your health? the economy? climate changes? The gold standard to establish causal relationships is to perform randomized controlled experiments. However, experiments are costly while non-experimental "observational" data collected routinely around the world are readily available. Unraveling potential…

    The problem of attributing causes to effects is pervasive in science, medicine, economy and almost every aspects of our everyday life involving human reasoning and decision making. What affects your health? the economy? climate changes? The gold standard to establish causal relationships is to perform randomized controlled experiments. However, experiments are costly while non-experimental "observational" data collected routinely around the world are readily available. Unraveling potential cause-effect relationships from such observational data could save a lot of time and effort.
    Consider for instance a target variable B, like occurence of "lung cancer" in patients. The goal would be to find whether a factor A, like "smoking", might cause B. The objective of the challenge is to rank pairs of variables {A, B} to prioritize experimental verifications of the conjecture that A causes B.
    As is known, "correlation does not mean causation". More generally, observing a statistical dependency between A and B does not imply that A causes B or that B causes A; A and B could be consequences of a common cause. But, is it possible to determine from the joint observation of samples of two variables A and B that A should be a cause of B?

  • Rank 16th - Amazon Employee Access Challenge, 2013

    -

    When an employee at any company starts work, they first need to obtain the computer access necessary to fulfill their role. This access may allow an employee to read/manipulate resources through various applications or web portals. It is assumed that employees fulfilling the functions of a given role will access the same or similar resources. It is often the case that employees figure out the access they need as they encounter roadblocks during their daily work (e.g. not able to log into a…

    When an employee at any company starts work, they first need to obtain the computer access necessary to fulfill their role. This access may allow an employee to read/manipulate resources through various applications or web portals. It is assumed that employees fulfilling the functions of a given role will access the same or similar resources. It is often the case that employees figure out the access they need as they encounter roadblocks during their daily work (e.g. not able to log into a reporting portal). A knowledgeable supervisor then takes time to manually grant the needed access in order to overcome access obstacles. As employees move throughout a company, this access discovery/recovery cycle wastes a nontrivial amount of time and money.
    There is a considerable amount of data regarding an employee’s role within an organization and the resources to which they have access. Given the data related to current employees and their provisioned access, models can be built that automatically determine access privileges as employees enter and leave roles within a company. These auto-access models seek to minimize the human involvement required to grant or revoke employee access.

    Objective:
    The objective of this competition was to build a model, learned using historical data, that will determine an employee's access needs, such that manual access transactions (grants and revokes) are minimized as the employee's attributes change over time. The model will take an employee's role information and a resource code and will return whether or not access should be granted.

Languages

  • English

    Full professional proficiency

  • German

    Limited working proficiency

  • Hindi

    Full professional proficiency

  • Python

    Full professional proficiency

  • Norwegian

    Elementary proficiency

More activity by Abhishek

View Abhishek’s full profile

  • See who you know in common
  • Get introduced
  • Contact Abhishek directly
Join to view full profile

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named Abhishek Thakur