Abstract
                                                                        Code-mixing (CM) is a frequently observed  phenomenon on social media platforms in mul-  tilingual societies such as India. While the  increase in code-mixed content on these plat-  forms provides good amount of data for study-  ing various aspects of code-mixing, the lack  of automated text analysis tools makes such  studies difficult. To overcome the same, tools  such as language identifiers and parts of-speech  (POS) taggers for analysing code-mixed data  have been developed. One such tool is Named  Entity Recognition (NER), an important Natu-  ral Language Processing (NLP) task, which is  not only a subtask of Information Extraction,  but is also needed for downstream NLP tasks  such as semantic role labeling. While entity  extraction from social media data is generally  difficult due to its informal nature, code-mixed  data further complicates the problem due to its  informal, unstructured and incomplete infor-  mation. In this work, we present the first ever  corpus for Kannada-English code-mixed social  media data with the corresponding named en-  tity tags for NER. We provide strong baselines  with machine learning classification models  such as CRF, Bi-LSTM, and Bi-LSTM-CRF  on our corpus with word, character, and lexical  features.