Episodes

  • Episode 003 | June 02, 2020

    Many of us who speak multiple languages switch seamlessly between them in conversations and even mix multiple languages in one sentence. For us humans, this is something we do naturally, but it’s a nightmare for computing systems to understand mixed languages. On this podcast with Kalika Bali and Dr. Monojit Choudhury, we discuss codemixing and the challenges it poses, what makes codemixing so natural to people, some insights into the future of human-computer interaction and more.

    Kalika Bali is a Principal Researcher at Microsoft Research India working broadly in the area of Speech and Language Technology especially in the use of linguistic models for building technology that offers a more natural Human-Computer as well as Computer-Mediated interactions, and technology for Low Resource Languages. She has studied linguistics and acoustic phonetics at JNU, New Delhi and the University of York, UK and believes that local language technology especially with speech interfaces, can help millions of people gain entry into a world that is till now almost inaccessible to them.

    Dr. Monojit Choudhury is a Principal Researcher in Microsoft Research Lab India since 2007. His research spans many areas of Artificial Intelligence, cognitive science and linguistics. In particular, Dr. Choudhury has been working on technologies for low resource languages, code-switching (mixing of multiple languages in a single conversation), computational sociolinguistics and conversational AI. He has more than 100 publications in international conferences and refereed journals. Dr. Choudhury is an adjunct faculty at International Institute of Technology Hyderabad and Ashoka University. He also organizes the Panini Linguistics Olympiad for high school children in India and is the founding chair of the Asia-Pacific Linguistics Olympiad. Dr. Choudhury holds a B.Tech and PhD degree in Computer Science and Engineering from IIT Kharagpur.

    Related

    Microsoft Research India Podcast: More podcasts from MSR IndiaiTunes: Subscribe and listen to new podcasts on iTunesAndroidRSS FeedSpotifyGoogle PodcastsEmailTranscript

    Monojit Choudhury: It is quite fascinating that when people become really familiar with a technology, and search engine is an excellent example of such a technology, people really don’t think of it as technology, people think of it as a fellow human and they try to interact with the technology as they would have done in natural circumstances with a fellow human.

    [Music plays]

    Host: Welcome to the Microsoft Research India podcast, where we explore cutting-edge research that’s impacting technology and society. I’m your host, Sridhar Vedantham.

    [Music plays]

    Host: Many of us who speak multiple languages switch seamlessly between them in conversations and even mix multiple languages in one sentence. For us humans, this is something we do naturally, but it’s a nightmare for computing systems to understand mixed languages. On this podcast with Kalika Bali and Monojit Choudhury, we discuss codemixing and the challenges it poses, what makes codemixing so natural to people, some insights into the future of human-computer interaction and more.

    [Music plays]

    Host: Kalika and Monojit, welcome to the podcast. And thank you so much. I know we’ve had trouble getting this thing together given the COVID-19 situation, we’re all in different spots. So, thank you so much for the effort and the time.

    Monojit: Thank you, Sridhar.

    Kalika: Thank you.

    Host: Ok, so, to kick this off, let me ask this question. How did the two of you get into linguistics? It’s a subject that interests me a lot because I just naturally like languages and I find the evolution of languages and anything to do with linguistics quite fascinating. How was it that both of you got into this field?

    Monojit: So, meri kahani mein twist hai (In Hindi- “there is a twist in my story”). I was in school, quite a geeky kind of a kid and my interests were the usual Mathematics, Science, Physics and I wanted to be a scientist or an engineer and so on. And, I did study language, so I know English and Hindi which I studied in school. Bangla is my mother tongue, so, of course I know. And I also studied Sanskrit in great detail, and I was interested in the grammar of these languages. Literature was not something which would pull me, but language was still in the backbench right, what I really loved was Science and Mathematics. And naturally I ended up in IIT, I studied in IIT Kharagpur for 4 years doing Computer Science, and everything was lovely. And then one day there was a project when we were in final year where my supervisor was working on what is called a text to speech system. So, in this system, it takes a Hindi text and the system would automatically speak it out and there was a slight problem that he was facing. And he asked me if I could solve that problem. I was in my final year- undergrad year at that time. And the problem was how to pronounce Hindi words correctly. At that time, it sounded like a very simple problem, because in Hindi the way we write is the way we pronounce unlike English, where you know, you have to really learn the pronunciations. And turns out, it isn’t. If you think of the words, ‘Dhadkane’ and ‘Dhadakne’, you pretty much write them in exactly the same way, but one you pronounce as ‘Dhadkane’ and the other one is pronounced as ‘Dhadakne’. So, this was the issue. So, my friend, of course, who was also working with me was all for machine learning. And I was saying, there must be a pattern here and I went through lots and lots of examples myself and turned out that there is this very short, simple, elegant rule which can explain most of Hindi words- the pronunciation of those words perfectly. So, I was excited. I went to my professor, showed him the thing, he was saying, “Oh! This is fantastic!”, let’s write a paper and we got a paper and all this was great. But then, somebody, when I was presenting the paper said, “Hey, you know what the problem you solved!” It’s called ‘schwa deletion’ in Hindi. Of course, I wasn’t in linguistics, neither my professor was, so he had no clue what was ‘schwa’ and what was ‘schwa deletion’. I dug a little deeper and found out that people had written entire books on ‘schwa deletion’. And, actually what I really found out was in line with what people had done their research on. And this got me really excited about linguistics. And more interestingly, you know, what I saw is, like you said, language evolution, if you think of why this is there. So, Hindi uses exactly the same style of writing that we use for Sanskrit. But in Sanskrit, there is no ‘schwa deletion’. But if you look at all the modern Indian languages which came from Sanskrit, like Hindi, Bengali or Oriya, they have different degree of pronunciation different from Sanskrit. I am not getting into the detail of what exactly is ‘schwa deletion’, that’s besides the point. But the pronunciations evolve from the original language. The question I then eventually got interested in is, how this happens and why this happens. And then I ended up doing a Ph.D. with the same professor on, language evolution and how sound change happens across languages. And of course, being a computer scientist, I tried modelling all these things computationally. And then there was no looking back, I went, more and more deeper into language, linguistics and natural language processing.

    Host: That's fascinating. And I know for sure that Kalika has got an equally interesting story, right? Kalika, you have a undergrad degree in chemistry?

    Kalika: I do.

    Host: Linguistics doesn’t seem very much like a natural career progression from there.

    Kalika: Yes, it doesn’t. But before I start my story, I have one more interesting thing to say. When Monojit was presenting his ‘schwa deletion’ paper, I was in the audience. I was working somewhere else and I looked at my colleague at that time and said, “We should get this guy to come and work with us.” So, I actually was there when he was presenting that particular ‘schwa deletion’ paper. So, yes, I was a Science student, I was studying Chemistry, and after Chemistry, the thing in my family was everybody goes for higher studies, I rebelled. I was one of those difficult children that we now are very unhappy about. But I said that I didn’t want to study anymore. I definitely didn’t want to do Chemistry and I was going to be a journalist, like my dad. I had already got a job to work in a newspaper. And I went to the Jawaharlal Nehru University to pick up a form for my younger sister. And I looked at the university and said, “This is a nice place, I want to study here.” And then I looked at the prospectus, kind of flicked through it and said, “what’s interesting?”. And I looked at this thing called Linguistics, and it seemed very fascinating. I had no idea what linguistics was about. And then, there was also ancient history which I did know what it was about and it seemed interesting. So, I filled in forms and sat for the entrance exam, after having read like a thin, layman’s guide to linguistics I borrowed from the British Council Library. And I got through. And the interesting thing is that the linguistic entrance exam was in the morning, the ancient history exam was in the afternoon. This was peak summer in Delhi. There were no fans in the place where the exam was being held. So, after taking the linguistic exam, I thought I can’t sit for another exam in this heat and I left. So, I only took the linguistic exam. I got through, no one was more surprised than I was. And I saw it as a sign that I should be going. So, I started a course without having any idea what linguistics was and completely fell in love with the subject within the first month. And coming from a science background, I was very naturally attracted towards phonetics, which I think is, to really understand phonetics and speech science part of linguistics, you do need to have a lot of understanding of how waves worked- the physics of sound. So, that part came a little naturally to me and I was attracted towards speech and the rest as they say is history. So, I went from there, basically.

    Host: Nice. So, chemistry’s loss is linguistics gain.

    Kalika: Yeah, my gain as well.

    Host: Ok, so, I’ve heard you and Monojit talk at length and many times about this thing called codemixing. What exactly is codemixing?

    Kalika: So, codemixing is when people in a multi-lingual community switch back and forth between two or more languages. And you know, as we all, all of us here come from multi-lingual communities where at a community level, not at an individual level, all of us speak more than one language, two, three, four. It’s very natural for us to keep switching between these languages in a normal conversation. So, right now of course, we are sticking to English, but if this was, say, in a different setting, we would probably be switching between Hindi, Bengali and English because these are three languages, all three of us understand, right.

    Host: That’s true.

    Kalika: That’s what code switching is, when we mix languages that we know, when we talk to each other, interact with each other.

    Host: And how prevalent it is?

    Kalika:“Abhi bhi kar sakte hain” (in Hindi- “we can even do it now”). We can still switch between languages.

    Monojit: Yeah.

    Host: “Korte pari” (In Bangla- “we can do that”). Yeah, Monojit, were you saying something when I interrupted you?

    Monojit: You asked how prevalent it is. So, actually, linguists have observed that in all multi-lingual societies where people know multiple languages at a societal level, they codemix. But there is no quantitative data for how much mixing is there and one of the first things we tried to do when we started this project was to do some measurement and see how much mixing does really happen. We looked at social media where people usually talk the way they talk in their real life. I mean they type it, but it’s almost like speech. So we studied English-Hindi mixing in India and some of the interesting things we found is, if you look at public forums on Facebook in India and if you look at sufficiently long threads, let’s say 50 or more comments, then all of them are multi-lingual. You will find at least two comments in two different languages. And sometimes there will be many many languages, right, not only two languages. And interestingly, if you look at each comment, and try to measure how many of them are mixed within itself, like a single comment has multiple languages, it’s as high as 17%. Then, we extended this study to Twitter and now for seven European languages including English, French, Italian, Spanish, Portuguese, German, Turkish. And we studied how much codemixing was happening there. Again, interestingly, 3.5% of the tweets from, I would say the western hemisphere is codemixed. I would guess from South Asia, the number would be very high, we already said 17% for India itself. But then, what’s interesting is, if you look at specific cities, the amount of codemixing also varies a lot. So, in our study we found Istanbul has the largest amount of codemixed tweets, as high as 13%. Whereas some of the cities in the US, let’s say Houston, or cities in southern United States where we know that there is a huge number of English-Spanish bilinguals, even then we see around 1% of codemixing. So, yes, it’s all over the world and it’s very prevalent.

    Kalika: Yeah, and I would like to add that there is this mistaken belief that people codemix because they are not proficient in one language, you know, people switch to their so called native language or mother tongue when they are talking in English because they don’t know English well enough or they can’t think of the English word when they are talking in English and therefore they switch to, say Hindi or Spanish or some other language. But that actually is not true. For people to be able to fluently switch between the two languages and fluently codemix and code switch, they actually have to know both the languages really well. Otherwise, it’s not mixing or code switching, it is just borrowing… borrowing from one language to another.

    Host: Right. So, familiarity with multiple languages basically gets you codemixing, whereas if you are forced to do it, that’s not codemixing. Codemixing is more intentful and purposeful is what you are saying.

    Kalika: Exactly.

    Host: Ok. Do you see any particular situations or environments in which codemixing seems to be more prevalent than not?

    Kalika: Yeah, absolutely. So, in the more formal scenarios, we definitely tend to stick to one language and if you think about it, even if you are a mono-lingual, when you are talking in a formal setting, you kind of have a very structured and have a very different kind of language used than when you are speaking in an informal scenario. But as far as codemixing is concerned, over the years when linguists actually started looking into this, you know some of the first papers that are published on code switching are from 1940’s. And at that time, it was definitely viewed as an informal use of language, but as our language use over the decades has become… you know informal has become much more acceptable in various scenarios. We’ve kind of also started codemixing in a lot of scenarios. So earlier if you’ve thought about it, if you looked at television, people stuck to just one language at a time. So, of it was a Hindi broadcast, it was just Hindi, if it was an English broadcast, it was just English. But now, television, radio, they all switch between English and multiple Indian languages when they are broadcasting. So, though it is like a much more informal scenario… use-case, now it’s much more prevalent in various scenarios.

    Monojit: And to add to that, there is a recent study which says that there is all the signs that Hinglish- mixing of Hindi and English- is altogether a new language rather than mixing. Because there are children who grow up with that as their mother tongue. So, they hear Hinglish being spoken or in other words codemixing between these two languages happening all the time in their family, by their parents and other in their family and they take that as the language or the native language they learn. So, it’s quite interesting like on one extreme like Kalika earlier mentioned, there are words which are borrowed, so you just borrow them to fill a gap which is not there in your language, or you can’t remember, whatever the reason might be. On the other extreme, you have two languages that are fused to give a new language. So, these are called fused-lects like Hinglish. I would leave it to you to decide whether you consider it as a language or not. But definitely there are movies which are entirely in Hinglish or ads which are in Hinglish, you can’t say it’s either Hindi or English. And in between, of course there is a spectrum of different levels and level of integration of mixing between the languages

    Host: This is fascinating. You are saying something like Hinglish, kind of becomes a language that’s natural rather than being synthetic.

    Kalika: Yes.

    Monojit: Yes.

    Host: Wow! Ok.

    Kalika: I mean, if you think of a mother tongue as the language that you dream in and then ask yourself what is the language that you dream in- I dream in Hinglish, so that’s my mother tongue.

    [Music plays]

    Host: How does codemixing come into play or how does it impact the interaction that people have with computers or computing systems and so on?

    Monojit: So, you know, there is again another misconception which is, in the beginning we said that when people codemix, they know both the languages equally well. So, the misconception is if I know both Hindi and English and my system, let’s say a search engine or a speech recognition or a chat bot system, understands only one of the languages, let’s say English, then I will not use the other language or I will not mix the two languages. But we have seen that this is not true. In fact, long time ago, when I say long time I mean, let’s say ten years ago, when there was no research in computational processing of codemixing and there were no systems which could handle codemixing, even at that time, we saw that people issued a lot of queries to Bing which were codemixed. My favorite example is this one – “2017 mein, scorpio rashi ka career ka phal” in Hindi. So, this is the actual query. And everything is typed in the Roman script. Now, it has mixed languages, it has mixed scripts and everything. So It is quite fascinating that when people become really familiar with a technology, and search engine is an excellent example of such a technology, people really don’t think of it as technology, people think of it as a fellow human and they try to interact with the technology as they would have done in natural circumstances with a fellow human.. And that’s why even though we designed chat bots or ASR (automatic speech recognition) systems, thinking of one particular language in mind, but when we deploy them, we see everybody is mixing languages actually, even without realizing that they are mixing languages. So in that sense all technologies that we build which are user facing or any technology that is actually analyzing data which is user generated ideally should have the capability to process codemixed input.

    Host: So, you used the word ideally which obviously means that it’s not necessarily happening may be too often or as much as it should be. So, what are the challenges out here?

    Kalika: Initially, the challenge was to accept that this happens. But now we have crossed that barrier and people do accept that large percentage of this world lives in multi-lingual communities and this is a problem. And if they are to interact naturally with the so-called natural language systems, then they have to use and process codemixing. But I think the biggest challenge is data because most of the technologies… language technologies these days are data hungry. They all are based on machine learning and deep neural network systems and we require a huge amount of data to train these systems. And it’s not possible to get data in the same sense for codemixing as we can for mono-lingual language use, because if you think about it, the variation in code mixing where you can switch from one language to another is very high. So, to be able to get enough examples in your data of all the possible ways in which people can mix two languages is a very, very difficult task. And this has implications for almost all the systems that we might want to look at like machine translations, speech recognition, because all of these ultimately rest on language models and to train these language models we need this data.

    Host: So, are there any ways to address this challenge of data?

    Monojit: So, there are several solutions that we actually thought of. One thing is, asking a fundamental question that “Do we really need a new data set for training codemix systems?”. For instance, imagine a human being who knows two languages, let’s say Hindi and English which the three of us know. And imagine that we have never heard anybody mix these two languages in our life before. A better example might be English and Sanskrit. I really haven’t heard anybody mixing English and Sanskrit. But if somebody does mix these two languages, would I be able to understand? Would I be able to point out- this sounds grammatical and this doesn’t? It turns out that intuitively at least, for human beings, that’s not a problem. We have an intuitive notion of what is mixing and which patterns of mixing are acceptable. And we really don’t need to learn codemix language as a separate language once we know the two languages involved equally well. So, this was the starting point for most of our research. So then, we thought, how best- instead of creating data in codemixed language- can we start with mono-lingual data sets or mono-lingual models and from there somehow combine them to build codemixed models? Now there are several approaches that we took and they worked to various degrees. But the most interesting one which I would like to share is based on some linguistic theories. Now, these linguistic theories, says that certain, I mean given the grammar of the two languages, so if you have the grammar of English and let’s say Hindi and depending on how these grammars are, there are only certain ways in which mixing is acceptable. And to give an example, let’s say, I can say, “I do research on codemixing”. Now, for this, I can codemix and say… let’s say, “Main codemixing pe research karta hoon”. It sounds perfectly normal. “I do shodh karya on codemixing”- we don’t use it that often. Probably we wouldn’t have heard, but you still might find it quite grammatical. But if I say, “Main do codemixing par shodh karya”, does it sound natural to you? Now, there is something which doesn’t sound right, and linguists have theories on why this doesn’t sound right. And, starting from those theories we build models which can take data in two languages… parallel data or if you have a translator, then you can actually translate a sentence, let’s say, “I do research in codemixing.” And you use a English-Hindi translator and translate it into Hindi: “Main codemixing (I don’t know what the Hindi for codemixing is) par shodh karya karta hoon”. And then given these two sentences… this pair of parallel sentences, there is a systematic process in which you can generate all the possible ways in which these two sentences can be mixed in a grammatically valid way, when you are saying Hinglish. Now, we built those models, the linguistic theories were more theories, so we had to build… we had to flesh them out and build real systems which could generate this. Now, once we have that, now you can imagine that there is no dearth of data. You can take any data in a mono-lingual… in a single language… any English sentence and convert it into codemixed Hindi versions. And then you have lot of data. And then whatever you could do for English, you can now train the same system on this artificially created data and you can solve those tasks. So that was the basic idea using which we could solve a lot of different problems starting from translation to part of speech tagging, to sentiment analysis to parsing.

    Host: So, what you are saying is that given that you need a huge amount of data to build… build out models, but the data is not available, you just create the data yourself.

    Monojit: Right.

    Host: Wow.

    Kalika: Yes, based on certain linguistic theoretical models which we have made into computational linguistic theoretical models.

    Host: Ok, so, we’ve then talking about codemixing as far as textual data is concerned for the large part. Now, are you doing something as far as speech is concerned?

    Kalika: Yes, speech is slightly more difficult than pure text, primarily because there you have to kind of look at both the acoustic models as well as the language models. But our colleague Sunayana Sitaram, she’s been working now for almost three years on codemixed automatic speech recognition system and she had… she had actually come up with this really interesting Hindi-English ASR system which mixed between Hindi and English and… was able to recognize a person speaking in mixed Hindi-English speech.

    Host: Interesting. And where do you see the application of all the work that you guys have done? I mean, I know you have been working on this stuff for a while now, right?

    Kalika: If you think about opinion mining as one of the things and you are looking at a lot of user generated data. The user generated data is a mix between say, English and Spanish and your system can only process and understand English. It can’t understand either the Spanish part or the mixed part, like both English and Spanish together, then, the chances are that you will only get a very skewed and most probably incorrect view of what the user is saying or what the user’s opinion is. And therefore, any analysis you do on top of that data is going to be incorrect. I think Monojit has a very good example of that in the work that you know we did on sentiment and codemixing on Twitter and he looked at how negative sentiment was expressed on Twitter.

    Monojit: Yeah. That’s actually pretty interesting. So this brings us to the question of why people codemix? We said in the beginning that first it’s not random and second it has… it seems to have a purpose. So what is that purpose? Of course, there are lots of theories or observations from linguists starting from humor, sarcasm or even when you are reporting a speech. All these have various degrees of codemixing and there are reasons for this. So, we thought- there is a lot of codemixing on social media, so, we could do a systematic and quantitative study of the different features which make people switch people from Hindi to English or vice-versa. We formulated a whole bunch of hypotheses to test based on the current linguistic theories. So our first hypothesis was that people might be switching from English to Hindi when they are moving from facts to opinions. Because it’s a well-known thing that when you are talking of facts, you can speak it in any language and more likely to be in English in Indian context. Whereas when you are expressing something emotional or an opinion, you are more likely to move… switch to your native language. So people might be more likely switching to Hindi. So, we tried to test all these hypotheses and nothing actually was statistically significant. So, we didn’t see strong signals for that in the data. But then what we saw a really strong signal is when people are expressing negative sentiment they are more likely- actually nine times more likely- to use Hindi than when they are expressing positive sentiment. It seems like English is the preferred language for expressing positive sentiment whereas Hindi is the preferred language for expressing negative sentiment. And we wrote a paper based on these findings that we might praise you in English but gaali to Hindi mein hi denge (In Hindi- we will swear only in Hindi). So, if you did only sentiment analysis in one language, let’s say English and try to do trend analysis of some Indian political issue based on that. It is very likely that you will get a much rosier picture because if you do only English, people would have said more positive things. And the Hindi, I mean, all the gaalis (cuss words) or negative things will actually be in Hindi which you will be missing out. So ideally you should do a processing of all the languages when you are looking at a multi-lingual society and analyzing content from there.

    Kalika: Yeah. And this actually touches a lot on why people codemix and that’s a very vast area of research. Because people codemix for a lot of reasons. People might codemix because they want to be sarcastic, people might codemix because they want to express in group… the three of us will… can move to Bengali to kind of bond and show that we are part of this group that knows Bengali. Or, you meet somebody and they want to keep you at a distance, and not talk to you in that language or mix. So people do it for humor, people do it for reinforcement, there’s a lot of reasons why people codemix and if we miss out on all that it’s very hard for us to make any claims… any firm claims on why people are saying what they are saying.

    Host: It seems like this is an extremely complex area of research which spans not just the computer science or linguistics but also affects sentiment, opinion, etc., a whole lot of stuff going here.

    Monojit: Yeah, and in fact most of the computational linguistics work that you’d see mostly draws from linguistics starting from, you know, how grammar works, syntax and may be how meaning works, semantics. But codemixing goes much beyond that. So, we are talking now of what is called pragmatics and sociolinguistics-. So, pragmatics would be, given a particular context or situation, how language is used there. And modelling pragmatics is insanely difficult. Because you not only need to know the language but you need to know the speakers, the relationship between the speakers, what is the context in which the speakers are situated and speaking and all this information. So, for instance I mean, typically example is if I tell you, “Could you please pass the water bottle?”. Now actually it is a question and you could say, “Yes, I can.”. But that’s not what will satisfy me, right, it’s actually a request. So, that’s how we use language and what we say is not necessarily what we mean. And this intent- understanding this hidden intent is very situational. And in different situation, the same sentence might mean very different things. And codemixing is actually at the boundaries of syntax, semantics and pragmatics. And sociolinguistics is the study of how language is used in society, especially how social variables corelate with linguistic variables. So social variables could be somebody’s level of education, somebody’s age, somebody’s gender, where somebody is from etc. And linguistic variables are whether it’s codemixed or not, at what degree of codemixing, just to give some examples. And we do see some very strong social factors which determine codemixing behavior. In fact, that’s used a lot in our Hindi movies, Bollywood. So, we did a study on Bollywood scripts, so we studied some 37 or 40 Hindi movie scripts which are freely available for research online to see where does codemixing happen in Bollywood. And what we found is codemixing is employed in a very sophisticated way by the script writers in two particular situations. One is, if they want to show a sophisticated urban crowd, as opposed to a rural crowd. So if you look at movies like “Dum Lagake Haisha” which are set either in a small town or in a rural scenario or in the past. Usually those movies will have lot less codemixing. Then, let’s say “Kapoor & Sons” or “Pink” which are set in typically in a city and people are all educated, urban people, so, just to show that codemixing is used heavily in these kinds of movies. And another case where in Bollywood they use a lot of codemixing, in fact accented codemixing, is when you want to show that somebody has been to “foreign” as we would say- abroad- and would come back to India and interact with poor country cousins. So, it’s used a lot in different ways in the movies. And that’s the sociolinguistics bit which is kicking in.

    Kalika: And you know to add to that, what we had touched upon earlier how this usage has kind of changed over time. In the earlier Bollywood movies, this mixing was much less. Not only that, the use of English was mostly used to denote who is the villain in the movie. The evil guys were usually the ones who spoke… if you look at 1970’s or 60’s movies, it’s always the smugglers, the kingpins of the mafia who spoke a lot of English and mixed English into Hindi. So obviously that kind of change has happened over years even in Bollywood movies.

    Host: I would never have thought about all these things. Villains speaking English, ok, in Bollywood!

    [Music plays]

    Host: Where do you see this area of research going in the future? Do you guys have anything in particular or you are just exploring to see ?

    Kalika: I think one of the things we have been looking at a lot is that how when AI interacts with users, with humans, this human-AI interaction scenario, where does codemixing fit in because there is one aspect that the user is mixing and you understand but does the bot or the AI agent also have to mix or not. And if the AI agent has to mix, then where and when should it mix? So, that’s something that we have been looking at and that is something that we think that is going to play an important role in human-AI interaction going forward. We’ve studied this in some detail and it’s actually very interesting- people have a whole variety of attitudes towards not only codemixing but also towards AI bots interacting with them. And this kind of reflects on what they feel about a bot that will codemix and talk to them in a mixture of language irrespective of whether they themselves codemix or not. And our study has shown that some people would look at a bot which codemixes as ‘cool’… and in a very positive way but some people would look at it very negatively. And the reason for that is some people might think that codemixing is not the right thing to do, it’s not a pure language. Other people would think that it’s a bot, it should talk in a very “proper” way, so it should only talk in English or only talk in Hindi and it shouldn’t be mixing language. And a certain set of people are kind of freaked out by the fact that the AI is trying to sound more human like when it mixes. So, there is a wide range of attitudes that people have towards a codemixing AI agent. And how can we kind of tackle that? How do we make a bot then, that codemixes or doesn’t codemix and it please the entire crowd, right?

    Host: Is there such a thing like pleasing the entire crowd?

    Kalika: So, we have ideas about that. How to go about, trying to at least please the crowd.

    Monojit: Yeah. Basically, you have to adapt to the speaker. Essentially the way we please the crowd is through accommodation. So, when we talk to somebody who is primarily speaking in English, I will try to reciprocate in English. Whereas if somebody is speaking in Hindi, I will try to speak in Hindi if I want to please that person. Of course, if I don’t, then I will use the other language to show the social distance. And this is one of the ways which we call the ‘Linguistic Accommodation Theory’. There are many other ways or in general there are various style components that we incorporate in our day to day conversation, mostly unknowingly, based on whether we want to please the other person or not. So, call it sycophancy or whatever, but we want to build bots which kind of model that kind of an attitude. And if we are successful, then the bot will be a world pleaser.

    Kalika: I don’t think it has so much to do with sycophancy- human beings actually have to cooperate and that’s in a sense hardwired to a certain extent into our spine now. For evolutionary reasons, we do need to cooperate and to be able to have a successful interaction, we have to cooperate, and one of the ways we do this is by trying to be more like the person we are talking to and both parties kind of converge to a middle ground and that’s what accommodation is all about.

    Host: So, Kalika and Monojit, this has been a very interesting conversation. Are there any final thoughts you’d like to leave with the listeners?

    Kalika: I hope people get an idea through our work on codemixing that human communication is quite intricate. There are many factors that come into play when human beings communicate with each other. There can be social contexts, there can be pragmatic contexts and of course, the structure of the language and the meaning that you are trying to convey, all of it plays a big role in how we communicate. And by studying codemixing in this context, we are able to hopefully grapple with a lot of these factors which in a very general human-human communication become too big to handle all at once.

    Monojit: Yeah. Language is an extremely complicated and multi-dimensional thing, so, codemixing is just one of the dimensions where we are talking of switching between languages, but then even within languages there are words, there are structural differences between languages, sometimes you can use features of another language in your own language. It won’t be called codemixing, but essentially you are mixing. For instance, accents, when you talk your own native language in, let’s say another kind of an accent borrowed from another language. In Indian English we use things like “little-little”… “those little-little things that we say”. Now “little-little” is not really an English construct, this is a Hindi or Indian language construct which we are borrowing into English. So, all this studying at once would be extremely difficult. But on the other hand, codemixing does provide us with a handle into this problem of computational modeling of pragmatics and sociolinguistics and all those concepts and how we can then not only model these things for the sake of modeling, but they are concrete use-cases… not only use-cases, they are needs. Users are already codemixing through technology. So technology should respond back by understanding codemixing and if possible even generating codemixing. So, through this entire research we are trying to close this loop of how linguistic theories can be used to build computational models and these computational models can then be taken to users and in all its complications and complexities and then we understand and learn from the user technology interaction and feed back to our model. So, this entire cycle of theory to application to deployment is what we would like to do or get deeper insight into in the context of natural language processing.

    Host: And I am looking forward to doing another podcast once you guys have gone down the road with your research on that. Kalika and Monojit, this was a very interesting conversation. Shukriya (In Hindi/ Urdu- thank you).

    Kalika: Aapka bhi bahut bahut thank you (In Hindi- many thanks to you too). It was great fun.

    Monojit: Thank you, Sridhar. Khoob enjoy korlam ei conversationta tomar shaathe. Aar ami ekta kotha (In Bangla- “I very much enjoyed this conversation with you, Sridhar. There’s one thing) I want to tell to the audience: Never feel apologetic anytime when you codemix. This is all very natural and don’t think you are talking an impure language. Thank you.

    Host: Perfect.

    [Music plays]

  • Episode 002 | March 20, 2020

    Enabling Rural Communities to Participate in Crowdsourcing, with Dr. Vivek Seshadri

    Crowdsourcing platforms and the gig economy have been around for a while. But are they equally accessible to all communities? Dr. Vivek Seshadri, a researcher at Microsoft Research India, doesn’t think so, and is trying to change this. On this podcast, Vivek talks about what motivated him to focus on research that can help underserved communities, and in particular, about Project Karya, a new platform to provide digital work to rural communities. The word “Karya” literally means “work” in a number of India languages.

    Vivek primarily works with the Technology for Emerging Markets group at Microsoft Research India. He received his bachelor's degree in Computer Science from IIT Madras, and a Ph.D. in Computer Science from Carnegie Mellon University where he worked on problems related to Computer Architecture and Systems. After his Ph.D., Vivek decided to work on problems that directly impact people, particularly in developing economies like India.

    Related

    · Microsoft Research India Podcast: More podcasts from MSR India

    · iTunes: Subscribe and listen to new podcasts on iTunes

    · Android

    · RSS Feed

    · Spotify

    · Google Podcasts

    · Email

    Transcript

    Vivek Seshadri: If you look at crowdsourcing platforms today, there are a number of challenges that actually prevent them from being accessible to people from rural communities. The first one is, most of these platforms contain tasks only in English. And all their task descriptions, everything, is in English which is completely inaccessible to rural communities. Secondly, if you go to rural India today, the notion of digital work is completely alien to them. And finally, there is a logistical challenge here. Most crowdsourcing platforms will assume that the end-user has a computer and constant access to internet. This is actually a luxury in many rural communities in India even today.

    (Music plays)

    Host: Welcome to the Microsoft Research India podcast, where we explore cutting-edge research that’s impacting technology and society. I’m your host, Sridhar Vedantham.

    Crowdsourcing platforms and the gig economy have been around for a while. But are they equally accessible to all communities? Dr. Vivek Seshadri, a researcher at Microsoft Research India, doesn’t think so, and is trying to change this. On this podcast, Vivek talks about what motivated him to focus on research that can help underserved communities, and in particular, about Project Karya, a new platform to provide digital work to rural communities. The word “Karya” literally means “work” in a number of India languages.

    Vivek primarily works with the Technology for Emerging Markets group at Microsoft Research India. He received his bachelor's degree in Computer Science from IIT Madras, and a Ph.D. in Computer Science from Carnegie Mellon University where he worked on problems related to Computer Architecture and Systems. After his Ph.D., Vivek decided to work on problems that directly impact people, particularly in developing economies like India.

    (Music plays)

    HOST: Vivek, welcome to the podcast.

    Vivek: Thanks, Sridhar. This is the first time I am doing anything like this, so I am really excited and a little bit nervous.

    Host: Oh, I don't think there's anything to be nervous about really here. You guys are used to speaking in public all the time. So, I'm sure it'll be fine.

    Vivek, you are a computer scientist and you did your PhD in Computer Science in Systems, right? What made you gravitate towards research that helps underserved communities, typically the kind of research that one associates with the ICTD space?

    Vivek: So, Sridhar, when I finished my PhD in 2016, I sort of had two decisions to make- should I stay in the US or should I move back to India? Should I stay in the same area that I am doing research in or should I move to a different field? Both these questions were sort of answered when I visited MSR and had interactions with people like Bill Thies. The kind of research that they were doing impressed me and also influenced me to make the decision to come back to India and work on similar problems that directly impact people.

    Host: That's interesting. So this is something that was brought upon by meeting people in the lab here rather than something that was there in your mind all along.

    Vivek: Absolutely. Actually, when I started my PhD, I wanted to come back and become professor in places like IIT or IISc. And when I moved back, I was actually introduced to MSR by one of my friends who actually visited MSR before me. And I just thought I'll pay a visit. And the conversations that I had with people here, sort of made my decision absolutely easy.

    Host: And the rest is history, as they say.

    Vivek: Absolutely. It’s been three years since I moved here and I couldn't be happier.

    Host: Great. So Vivek, walk us through this project called Karya, which I know you have been associated with for quite a while. What exactly is project Karya and what are your goals with that project?

    Vivek: So, there are two trends that enables or motivates the need for a project like Karya. The first trend is that there is a digital revolution in the world today, where improvements in technologies like Machine Learning are allowing people to interact with devices using natural language. The second trend is specific to India where we are trying to push towards a digital future which is creating a lot of tasks like audio transcription, document digitization, etc. Both these trends are going to result in a huge amount of what we call digital work. And the goal for project Karya is to take this digital work and make it accessible to people from rural communities who typically have very low incomes today and are predominantly stuck with physical labor. We believe completing these digital tasks and getting paid for them will be a valuable source of supplemental income for people from rural communities.

    Host: Crowdsourcing and crowdsourcing platforms have been around for quite a while now. And they are also well-established methods of gig work. So what's the need for another approach or a different framework like Karya?

    Vivek: That's a great question. If you look at crowdsourcing platforms today, there are a number of challenges that actually prevent them from being accessible to people from rural communities. Specifically, let me describe to you three challenges. The first one is, most of these platforms contain tasks only in English. And all their task descriptions, everything, is in English which is completely inaccessible to rural communities. Secondly, if you go to rural India today, the notion of digital work is completely alien to them. In fact, when we went to rural communities in our first visit and told them we will actually pay some money for completing some set of digital tasks, they looked at us in disbelief. Like they actually didn't believe that we are going to pay them until we actually did. So, there is this huge issue of awareness. And finally, there is a logistical challenge here. Most crowdsourcing platforms will assume that the end-user has a computer and constant access to internet. This is actually a luxury in many rural communities in India even today.

    Host: So, does Karya enable people to use their existing skillsets and knowledge to earn supplemental or extra income?

    Vivek: So, Sridhar, like I mentioned, there are two sources of digital work that we are looking at currently. One is creating label data sets for models like automatic speech recognition, and other language-based machine learning models. The second source of digital work that we are looking at is things like speech transcription or document digitization, which the government is very extremely interested in. Now depending on what type of task we are going to do, people may have to be able to read in their regional language or type in their regional language. Now, when it comes to reading, we find that most people from rural communities are adept at reading in their regional language. When it comes to typing, as you can imagine there are not many good keyboards that will allow you to type in your local language. This is something that most people in rural communities have never done before. In fact, even though, most people in rural communities are not familiar with English, they actually use a very crude form of transliteration to actually communicate in their regional languages. That's what we observed- most people used WhatsApp and when communicating with each other they actually use transliteration in English and not type in their native language.

    Host: So, you are saying that there is a large number of people who are actually typing in the English script, but the language that they are representing is their own vernacular.

    Vivek: Exactly. And the transliteration is very crude. They know what sounds each English alphabet corresponds to and they just put together a bunch of characters next to each other and it's almost like they have created a whole new script for their local language.

    Host: Right.

    Vivek: But something like that wouldn't actually be useful for us. We would want them to type in their local language. For instance, let’s take an example of document digitization. The idea there is, the government has a whole of government records which contain hand-written words in their local language. It could be names of people, it could be addresses, etc. When I want to digitize these documents, I may actually want someone to type out the names that they see in the document in the local language. Now, there, I would actually want them to use the native script. And not, some crude form of transliteration.

    Host: Sure.

    Vivek: So, in this particular case, we actually used a keyboard that was developed by IIT Bombay called Swarachakra. And our users actually learnt to use that keyboard within a very short span of time and they were able to perform extremely well in the task that we had assigned them.

    Host: So, it sounds like there is a lot of work that is readily available. What is required is to actually deliver it and make it possible for people to leverage that work in order to earn extra income.

    Vivek: Absolutely. Actually, the government of India has its own crowdsourcing platform, where they outsource text digitization like I mentioned to anyone in India who wants to do it. Unfortunately, even that platform is not accessible to rural communities. If I go to rural India and ask anyone about that platform, they wouldn't know anything about it. So, in some sense, there is work that is readily available, but there is this huge gap in access.

    Host: And the gap in access is because these platforms work on their traditional paradigm of needing a desktop computer with an internet connection?

    Vivek: Exactly. In fact, the platform that the Government of India has, it’s a website that you have to access and you need internet connection to receive tasks and complete tasks. And our goal is to sort of eliminate that requirement. In fact, the goal of project Karya is to enable anyone with just a smartphone to be able to perform digital tasks on their phone.

    (Music plays)

    Host: I know you've already conducted some experiments with Karya. And you've also published a paper in Chi in 2019. Can you walk us through some of the results of the experiments that you've conducted?

    Vivek: So, one of the biggest challenges in creating a platform like Karya is the perceived lack of trust in rural labor. When we actually spoke to many potential work providers on whether they would be willing to outsource their work to rural workers, one of the first questions that they ask is if they can trust the quality of labor that we get from rural workers. So, in the Chi paper, what we wanted to sort of evaluate was the accuracy and effectiveness with which workers from rural India can actually complete a specific type of digital task. So, in that particular paper, we actually looked at text digitization, where the task is as simple as the user is shown an image of an hand-written text and all they had to do was type out whatever word they see in the particular image. And of course, they will be given thousands of images that they have to digitize over a period of two weeks. And what we actually found in the paper was that workers from rural India actually did fantastically well. In fact, in a crowdsource setting, they outperformed a professional transcription firm to which we gave the same data set. So, that was very interesting for us.

    Host: That’s really interesting. Do you have any insights into why that might have happened or, how this community of people that you engaged with were able to outperform professional services?

    Vivek: So, with respect to the performance of the transcription firm itself, we could only guess, because it was a black box for us. We just gave them the data set and asked them to provide the results and the results that we got were not that good. But we can definitely guess why workers from rural communities did so well. First of all, the additional income that workers from rural communities are getting out of completing these tasks is significant. So, for them there is actually a fear that they may not get paid if they don’t complete the tasks accurately. So, from that point of view, most users paid extreme attention to completing the tasks accurately. And these workers also found it a lot of fun. Like I mentioned before, most of their current work is typically physical labor, be it farming, many of them are actually unemployed. So, for them, this is actually a fun activity that they can do together with their friends where they also get some money. So, from their point of view, it was both fun and it gave them very very valuable supplemental income. I think both these were significant factors in the rural workers performing really well in the task that we gave them.

    Host: Your Chi paper was based on text digitization by members of rural communities. But have you looked at other types of tasks that can be completed through Karya?

    Vivek: Yes. Actually, as we were working on the platform, we realized that there is

    a real need for speech data sets in various languages in India. In fact, in our very lab, Kalika Bali, who is a researcher, is working on this project called Ellora, whose goal it is, is to create voice technologies for all the languages in India. One of the fundamental bottlenecks in achieving this is labeled speech data sets. A labeled speech data set is essentially a data set that contains various audio recordings, and the transcripts that correspond to those recordings. We actually found a mechanism to use Karya to collect such a data set for various languages. In fact, we have an ongoing study where we are collecting hundreds of hours of speech data for languages for which there is almost no data today.

    Host: So, when you give out these speech collection tasks, what is the actual process, how does it actually work?

    Vivek: So, at the lowest level, the task is essentially for the user to read out, record themselves reading out a sentence. However, to make the task more fun, we actually made them read out stories. Some empowering stories, some stories about history of our country, some stories about popular figures like Buddha, and users really liked reading out stories as opposed to reading out random bits of sentences.

    Host: So, we've been talking about Karya as a project in which we are helping or building a new paradigm in crowdsourcing. What are the actual components that go into Karya as a system?

    Vivek: So, Sridhar, if you look at any crowdsourcing platform that is out there today there are two major components. One is the server that actually contains all the tasks that have to be completed, that is the component that work providers interact with to submit the task that they want to get completed. The second component is actually the client that the workers will use to actually complete the tasks. In a typical crowdsourcing platform where internet connection is assumed, the client will directly talk to the server, get the tasks and the responses are also directly submitted to the server.

    Host: Right.

    Vivek: Now, like I mentioned, most rural communities in India do not have internet connectivity. In fact, two of the three locations that we have worked with have absolutely no connectivity. Which means a platform that assumes internet connectivity is going to exclude those people from participating in the platform and get paid for completing valuable tasks.

    Host: So, how do you bridge that?

    Vivek: So, the way we bridge this gap is by introducing this third component that we are calling a Karya Box. Now, the Karya Box is essentially a device that we will place in the village where we want to work with people. And you can think of the box as a local crowdsourcing server for that particular village.

    Host: Okay.

    Vivek: So, the Karya Box will essentially act as a local crowdsourcing server in the village where we have placed it. Users in the village can directly interact with the box through the Wi-Fi access point that the box will expose. So, anyone with a smartphone can just connect to the Karya box Wi-Fi and then interact with the box to get tasks and submit their responses as well. Now the question is, how does the box communicate with our server?

    Host: Yeah.

    Vivek: So, in most villages which do not have connectivity what we observe is there are definitely people who go to nearby cities for work or even to get digital content that they can get back to the village. What we need to do is to employ someone like that who can carry the box to a location where is internet connectivity, periodically, maybe once a day or even once a week. And at that instant, when the box gets connectivity to the server it can exchange, both the responses that have been submitted already by the rural workers and also get any new tasks for the village, if any.

    Host: That seems to be a smart and inexpensive way to get around the lack of connectivity issue.

    Vivek: Absolutely. Actually, I can tell you a story around this.

    Host: Oh, please, we love stories.

    Vivek: When we did our recent study, we actually deployed the box in the village. That village actually has really good connectivity. So, we were actually expecting the box to be in regular contact with our server. But due to various reasons, there was an internet shutdown in the village for the first one week after we deployed the box. But there you go. Our system actually worked because it does not assume that the box regularly talks to the server.

    Host: I am assuming and correct me if I'm wrong, that a lot of the people who are interacting with the system, the Karya app especially on the phone, right, they'd be doing something of this nature for the first time. How did people typically find working with the Karya app and were there significant hurdles, were there issues in the communities that you went and did your experiments with?

    Vivek: So, like I mentioned before, most people actually found doing this kind of an activity a lot of fun. So, from that point of view, there was not much boredom even though the tasks were extremely repetitive. Now imagine, looking at words screen after screen and typing them out or sentences screen after screen and reading them out. This is probably a very mundane task for people in urban communities. But for people in rural communities, where they don't get to do this kind of thing very often or even interact with a smartphone very often, they actually found it a lot of fun. In fact, many people actually found some sense of pride in actually completing tasks in their local language.

    (Music plays)

    Host: It seems like this kind of digital work has the potential over time to provide people with livelihoods and enhance existing incomes. Do you think there is a potential downside to digital work or a potential downside to the online gig economy?

    Vivek: Definitely, there is a limitation, similar to any other gig economy, like your, cab-hailing services where it's a physical gig work that you are doing, or delivering food, it's again a physical gig economy. As more people join the platform, the amount of work that is going to be available for every individual person is going to go down. So in that sense, one should not think of even the digital gig economy as a sustainable source of livelihood. So from that point of view, one of the limitations is the excitement that workers in rural communities have for such kinds of tasks. These tasks are much easier to complete than the task that they are involved in right now. And they also pay much higher than the task that they are doing right now. So there is definitely the possibility that some of them may think this is a much more lucrative job that will provide a full-time income for them. But we have to warn them in advance, saying, this is not the case.

    Host: So expectation setting is going to be key.

    Vivek: Expectation setting is, in fact, a huge part of what we need to do when we actually scale out the platform. In fact, even for the small studies that we conducted in these villages where studies were for a period of two weeks, during which time people may earn let's say 3000 rupees, their question at the end of the study is, "When are you going to come back?" Right? So, that sort of enthusiasm is both encouraging and scary. Because, if you don't have a sustainable source of work that you can provide to these villagers, it can end up in disappointment.

    Host: Was there anything that surprised you when you were working with and when you were talking to various communities during the experiments with Karya?

    Vivek: Yes. Actually, two things stood out for us. The first thing is, how inclusive the notion of digital work can be when it comes to employing people from diverse backgrounds. What we observed was, women who were typically not allowed to get out of their house in rural communities for various reasons were able to participate on our platform and actually earn income for the first time in their lives. People with physical disabilities were able to participate on our platform.

    Host: That must have felt extremely empowering for them.

    Vivek: Absolutely. And the second thing that we observed is like I mentioned before, this sense of pride that they had when they were completing tasks in their local language. Like I mentioned before, this is not something that they get to do often. In fact, in one of our studies where the task involved was recording themselves reading out stories, many people actually went over and did the tasks all over again, just so that they can read the stories to their kids or to the community. This is something that was completely surprising to us. Now imagine if someone in an urban community would actually be willing to do that.

    Host: Yeah. That's good food for thought.

    So it certainly seems like your experiments with Karya show that it's got a huge amount of promise and potential. Over time, where to you see or where do you hope to see Karya?

    Vivek: So Sridhar, like I mentioned, as language technologies keep improving the need for creating these technologies for various Indian languages is only going to increase. There are going to be many startups which would want data sets for creating the models that they want in local languages. We believe, with our insights and solutions that we have built for creating a crowdsourcing platform

    for rural communities, Karya can be the platform that these organizations, both private startups or even the government, can come to, to get their valuable task competed.

    Host: Vivek, this has been an extremely interesting conversation. Thank you for your time.

    Vivek: Thanks a lot, Sridhar for giving me this opportunity to both talk about the project and also do my first podcast.

    Host: My pleasure.

    To learn more about Dr. Vivek Seshadri, the Technology for Emerging Markets Group, visit Microsoft Research India.

  • Missing episodes?

    Click here to refresh the feed.

  • Episode 001 | March 06, 2020

    Dr. Eric Horvitz is a technical fellow at Microsoft, and is director of Microsoft Research Labs, including research centers in Redmond, Washington, Cambridge, Massachusetts, New York, New York, Montreal, Canada, Cambridge, UK, and Bengaluru, India. He is one of the world’s leaders in AI, and a thought leader in the use of AI in the complexity of the real world.

    On this podcast, we talk to Dr. Horvitz about a wide range of topics, including his thought leadership in AI, his study of AI and its influence on society, the potential and pitfalls of AI, and how useful AI can be in a country like India.

    Transcript

    Eric Horvitz: Humans will always want to make connection with humans, sociologists, social workers, physicians, teachers, we’re always going to want to make human connections and have human contacts.

    I think they’ll be amplified in a world of richer automation so much so that even when machines can generate art and write music, even music with lyrics that might put tear in someone’s eye if they didn’t know it was a machine, that will lead us to say, “Is that written by a human. I want to hear a song sung by a human who experienced something, the way I would experience something, not a machine.” And so I think human touch, human experience, human connection will grow even more important in a world of rising automation and those kinds of tasks and abilities will be even more compensated than they are today.

    (music plays)

    Host: Welcome to the Microsoft Research India podcast, where we explore cutting-edge research that’s impacting technology and society. I’m your host, Sridhar Vedantham.

    Host: Our guest today is Dr. Eric Horvitz, Technical Fellow and director of the Microsoft Research Labs. It’s tremendously exciting to have him as the first guest on the MSR India podcast because of his stature as a leader in research and his deep understanding of the technical and societal impact of AI.

    Among the many honors and recognitions Eric has received over the course of his career are the Feigenbaum Prize and the Allen Newell Prize for contributions to AI, and the CHI Academy honor for his work at the intersection of AI and human-computer interaction. He has been elected fellow of the National Academy of Engineering (NAE), the Association of Computing Machinery (ACM) and the Association for the Advancement of AI , where he also served as president. Eric is also a fellow of the American Association for the Advancement of Science (AAAS), the American Academy of Arts and Sciences, and the American Philosophical Society. He has served on advisory committees for the National Science Foundation, National Institutes of Health, President’s Council of Advisors on Science and Technology, DARPA, and the Allen Institute for AI.

    Eric has been deeply involved in studying the influences of AI on people and society, including issues around ethics, law, and safety. He chairs Microsoft’s Aether committee on AI, effects, and ethics in engineering and research. He established the One Hundred Year Study on AI at Stanford University and co-founded the Partnership on AI. Eric received his PhD and MD degrees at Stanford University.

    On this podcast, we talk to Eric about his journey in Microsoft Research, his own research, the potential and pitfalls he sees in AI, how AI can help in countries like India, and much more.

    Host: Eric, welcome to the podcast.

    Eric Horvitz: It’s an honor to be here. I just heard I am the first interviewee for this new series.

    Host: Yes, you are, and we are really excited about that. I can’t think of anyone better to do the first podcast of the series with! There’s something I’ve been curious about for a long time. Researchers at Microsoft Research come with extremely impressive academic credentials. It’s always intrigued me that you have a medical degree and also a degree in computer science. What was the thinking behind this and how does one complement the other in the work that you do?

    Eric Horvitz: One of the deep shared attributes of folks at Microsoft Research and so many of our colleagues doing research in computer science is deep curiosity, and I’ve always been one of these folks that’s said “why” to everything. I’m sure my parents were frustrated with my sequence of whys starting with one question going to another. So I’ve been very curious as an undergraduate. I did deep dives into physics and chemistry. Of course, math to support it all – biology and by the time I was getting ready to go to grad school I really was exploring so many sciences, but the big “why” for me that I could not figure out was the why of human minds, the why of cognition. I just had no intuition as to how the cells, these tangles of the cells that we learn about in biology and neuroscience could have anything to do with my second to second experience as being a human being, and so you know what I have to just spend my graduate years diving into the unknowns about this from the scientific side of things. Of course, many people have provided answers over the centuries- some of the answers are the foundations of religious beliefs of various kinds and religious systems.

    So I decided to go get an MD-PhD, just why not understand humans deeply and human minds as well as the scientific side of nervous systems, but I was still an arc of learning as I hit grad school at Stanford and it was great to be at Stanford because the medical school was right next to the computer science department. You can literally walk over and I found myself sitting in computer science classes, philosophy classes, the philosophy of mind-oriented classes and cognitive psychology classes and so there to the side of that kind of grad school life and MD-PhD program, there are anatomy classes that’s being socialized into the medical school class, but I was delighted by the pursuit of- you might call it the philosophical and computational side of mind- and eventually I made the jump, the leap. I said “You know what, my pursuit is principles, I think that’s the best hope for building insights about what’s going on” and I turned around those principles into real world problems in particular since that was, had a foot in the medical school, how do we apply these systems in time-critical settings to help emergency room, physicians and trauma surgeons? Time critical action where computer systems had to act quickly, but had to really also act precisely when they maybe didn’t have enough time to think all the way and this led me to what I think is an interesting direction which is models of bounded-rationality which I think describes us all.

    Host: Let’s jump into a topic that seems to be on everybody’s mind today – AI. Everyone seems to have a different idea about what AI actually is and what it means to them. I also constantly keep coming across people who use AI and the term ML or machine learning as synonyms. What does AI mean to you and do you think there’s a difference between AI and ML?

    Eric Horvitz: The scientists and engineers that first used the phrase artificial intelligence did so in a beautiful document that’s so well written in terms of the questions it asks that it could be a proposal today to the National Science Foundation, and it would seem modern given that so many the problems have not been solved, but they laid out the vision including the pillars of artificial intelligence.

    This notion of perception building systems that could recognize or perceive sense in the world. This idea of reasoning with logic or other methods to reason about problems, solve problems, learning how can they become better at what they did with experience with other kinds of sources of information and this final notion they focused on as being very much in the realm of human intelligence language, understanding how to manipulate symbols in streams or sequences to express concepts and use of language.

    So, learning has always been an important part of artificial intelligence, it’s one of several pillars of work, it’s grown in importance of late so much so that people often write AI/ML to refer to machine learning but it’s one piece and it’s an always been an important piece of artificial intelligence.

    Host: I think that clarifies the difference between AI and ML. Today, we see AI all around us. What about AI really excites you and what do you think the potential pitfalls of AI could be?

    Eric Horvitz: So let me first say that AI is a constellation of technologies. It’s not a single technology. Although, these days there’s quite a bit of focus on the ability to learn how to predict or move or solve problems via machine learning analyzing large amounts of data which has become available over the last several decades, when it used to be scarce.

    I’m most excited about my initial goals to understand human minds. So, whenever I read it a paper on AI or see a talk or see a new theorem being proved my first reaction is, how does it grow my understanding, how does it help to answer the questions that have been long-standing in my mind about the foundations of human cognition? I don’t often say that to anybody but that’s what I’m thinking.

    Secondly, my sense is what a great endeavor to be pushing your whole life to better understand and comprehend human minds. It’s been a slow slog. However, insights have come about advances and how they relate to those questions but along the way what a fabulous opportunity to apply the latest advances to enhancing the lives of people, to empowering people in new ways and to create new kinds of automation that can lead to new kinds of value, new kinds of experiences for people. The whole notion of augmenting human intellect with machines has been something that’s fascinated me for many decades. So I love the fact that we can now leverage these technologies and apply them even though we’re still very early on in how these ideas relate to what’s going on in our minds.

    Applications include healthcare. There’s so much to do in healthcare with decreasing the cost of medicine while raising the quality of care. This idea of being able to take large amounts of data to build high quality, high precision diagnostic systems. Systems that can predict outcomes. We just created a system recently for example that can detect when a patient in a hospital is going to crash unexpectedly with organ system failures for example, and that can be used in ways that could alert physicians in advanced, medical teams to be ready to actually save patient’s lives.

    Even applications that we’re now seeing in daily life like cars that drive themselves. I drive a Tesla and I’ve been enjoying the experience of the semi-automated driving, the system can do. Just seeing how far we’ve gotten in a few years with systems that recognize patterns like the patterns on a road or that recognize objects in its way for automatic braking. These systems can save thousands of lives. I’m not sure about India but I know the United States statistics and there are a little bit more than 40,000 lives lost on the highways in the United States per year. Looking at the traffic outside here in Bangalore, I’m guessing that India is at least up there with tens of thousands of deaths per year. I believe that that AI systems can reduce these numbers of deaths by helping people to drive better even if it’s just in safety related features.

    Host: The number of fatalities on Indian roads is indeed huge and that’s in fact been one of the motivators for a different research project in the lab on which I hope to do a podcast in the near future.

    Eric Horvitz: I know it’s the HAMS project.

    Host: It is the HAMS project and I’m hoping that we can do a podcast with the researchers on that sometime soon. Now, going back to AI, what do you think we need to look out for or be wary of? People, including industry leaders seem to land on various points on a very broad spectrum ranging from “AI is great for humanity” to “AI is going to overpower and subsume the human race at some point of time.”

    Eric Horvitz: So, what’s interesting to me is that over the last three decades we’ve gone from AI stands for almost implemented, doesn’t really work very well. Have fun, good luck to this idea of just getting things up and running and being so excited there’s no other concerns but to get this thing out the door and have it for example, help physicians diagnose patients more accurately to now, “Wait a minute! We are putting these machines in places that historically have always relied upon human intelligence, as these machines for the first time edge into the realm of human intellects, what are the ethical issues coming to the fore? Are there intrinsic biases in the way data is created or collected, some of which might come from the society’s biases that creates the data? What about the safety issues and the harms that can come from these systems when they make a mistake? When will systems be used in ways that could deny people consequential services like a loan or education because of an unfair decision or a decision that aligns mysteriously or obviously with the way society has worked amplifying deep biases that have come through our history?”

    These are all concerns that many of us are bringing to light and asking for more resources and attention to focus on and also trying to cool the jets of some enthusiasts who want to just blast ahead and apply these technologies without thinking deeply about the implications, I’d say sometimes the rough edges of these technologies. Now, I’m very optimistic that we will find pathways to getting incredible amounts of value out of these systems when properly applied, but we need to watch out for all sorts of possible adverse effects when we take our AI and throw it into the complexity of the open world outside of our clean laboratories.

    Host: You’ve teed-up my next question perfectly. Is it incumbent upon large tech companies who are leading the charge as far as AI is concerned to be responsible for what AI is doing, and the ethics and the fairness and all the stuff behind AI which makes it kind of equitable to people at large?

    Eric Horvitz: It’s a good question. There are different points of view on that question. We’ve heard some company leaders issue policy statements along the lines of “We will produce technologies and make them available and it’s the laws of the country that will help guide how they’re used or regulate what we do. If there are no laws, there’s no reason why we shouldn’t be selling something with a focus on profit to our zeal with technology.”

    Microsoft’s point of view has been that the technology could be created by experts inside its laboratories and by its engineers. Sometimes is getting ahead of where legislation and regulation needs to be and therefore we bear a responsibility as a company in both informing regulatory agencies and the public at large about the potential downsides of technology and appropriate uses and misuses, as well as look carefully at what we do when we actually ship our products or make a cloud service available or build something for a customer.

    Host: Eric, I know that you personally are deeply involved in thinking through AI and it’s impact on society, how to make it fair, how make it transparent and so on. Could you talk a little bit about that, especially in the context of what Microsoft is doing to ensure that AI is actually good for everybody?

    Eric Horvitz: You know, these are why this is such a passion for me – I’ve been extremely interested starting with the technical issues which I thought- I think- really deep and fascinating, which is when you build a limited system by definition that’s much simpler than a complex universe that’s going to be immersed in, you take it from the laboratory into the open world. I refer to that as AI in the open world. You learn a lot about the limitations of the AI. You also learn to ask questions and to extend these systems so they’re humble, they understand their limitations, they understand how accurate they are, you get them a level of self-knowledge. This is a whole area of open world intelligence that I think really reads upon some of the early questions for me about what humans are doing, what their minds are doing, and potentially other animals, vertebrates.

    It started there for me. Back to your question now, we are facing the same kind of things when we take an AI technology and put it in the hands of a judge who might make decisions about criminal justice looking at recommendations based on statistics to help him or her take an action. Now we have to realize we have systems we’re building that work with people. People want explanations. They don’t want to look at a black box with an indicator on it. They will say, why is this system telling me this?

    So at Microsoft we’ve made significant investments, both in our research team and in our engineering teams and in our policy groups at thinking through details of the problems and solutions when it comes to a set of problems, and I’ll just list a few right now. Safety and robustness of AI systems, transparency and intelligibility of these systems- can they explain themselves, bias and fairness, how can we build systems that are fair along certain dimensions, engineering best practices. Well, what does it mean for a team working with tools to understand how to build a system and maintain it over time so, that it’s trustworthy. Human AI collaboration – what are principles by which we can enable people to better work in a fluid way with systems that might be trying to augment their intelligence such that is a back and forth and understanding of when a system is not confident, for example. Even notions about attention and cognition is, are these systems being used in ways that might be favorable to advertisers, but they’re grabbing your attention and holding them on an application because they’ve learned how to do that mysteriously – should we have a point of view about that?

    So Microsoft Research has stood up teams looking at these questions. We also have stood up an ethics advisory board that we call the Aether Committee to deliberate and provide advice on hard questions that are coming up across the spectrum of these issues and providing guidance to our senior leadership team at Microsoft in how we do our business.

    Host: I know you were the co-founder of the Partnership on AI. Can you talk a little bit about that and what it sought to achieve?

    Eric Horvitz: This vision arose literally at conferences and, in fact, one of the key meetings was at a pub in New York City after meeting at NYU, where several computer scientists got together, all passionate about seeing it go well for artificial intelligence technologies by investing in understanding and addressing some of these rough edges and we decided we could bring together the large IT companies, Amazon, Apple, Facebook, Google, Microsoft to think together about what it might mean to build an organization that was a nonprofit that balanced the IT companies with groups in civil society, academic groups, nonprofit AI research to think through these challenges and come up with best practices in a way that brought the companies together rather than separating them through a competitive spirit. Actually this organization was created by the force of the friendships of AI Scientists, many of whom go back to being in grad school together across many universities, this invisible college of people united in an interesting understanding how to do AI in the open world.

    Host: Do you think there is a role for governments to play where policies governing AI are concerned, or do you think it’s best left to technology companies, individual thinkers and leaders to figure out what to do with AI?

    Eric Horvitz: Well, AI is evolving quickly and like other technologies governments have a significant role to play in assuring the safety of these technologies, their fairness, their appropriate uses. I see regulatory activity being of course largely in the hands of governments being advised by leadership in academia and in industry and the public which has a lot to say about these technologies.

    There’s been quite a bit of interest and activity, some of that is part of the enthusiastic energy, you might say, going into thinking through AI right now. Some people say there’s a hype-cycle that’s leaking everywhere and to all regimes, including governments right now, but it’s great to see various agencies writing documents, asking for advice, looking for sets of principles, publishing principles and engaging multi-stakeholder groups across the world.

    Host: There’s been a lot of talk and many conversations about the impact that AI can have on the common man. One of the areas of concern with AI spreading is the loss of jobs at a large scale. What’s your opinion on how AI is going to impact jobs?

    Eric Horvitz: My sense is there’s a lot of uncertainty about this, what kind of jobs will be created, what kinds of jobs will go away. If you take a segment like driving cars, I was surprised at how large a percentage of the US population makes their living driving trucks. Now, what if the long haul parts of truck driving, long highway stretches goes away when it becomes automated, it’s unclear what the ripples of that effect will be on society, on the economy. It’s interesting, there are various studies underway. I was involved in the international academy study looking at the potential effects of new kinds of automation coming via computer science and other related technologies and the results of that analysis was that we’re flying in the dark. We don’t have enough data to make these decisions yet or to make these recommendations or they have understandings about how things are going to go. So, we see people saying things on all sides right now.

    My own sense is that there’ll be some significant influences of AI on our daily lives and how we make our livings. But I’ll say one thing. One of my expectations and it’s maybe also a hope is that as we see more automation in the world and as that shifts in nature of what we do daily and what were paid to do or compensated to do what we call work, there’ll be certain aspects of human discourse that we simply will learn, for a variety of reasons, that we cannot automate, we aren’t able to automate or we shouldn’t automate, and the way I refer to this as in the midst of the rise of new kinds of automation some of which reading on tasks and abilities we would have in the past assumed was the realm of human intellect will see a concurrent rise of an economy of human around human caring. You think about this, humans will always want to make connection with humans, sociologists, social workers, physicians, teachers, we’re always going to want to make human connections and have human contacts.

    I think they’ll be amplified in a world of richer automation so much so that even when machines can generate art and write music, even music with lyrics that might put tear in someone’s eye if they didn’t know it was a machine, that will lead us to say, “Is that written by a human. I want to hear a song sung by a human who experienced something, the way I would experience something, not a machine.” And so I think human touch, human experience, human connection will grow even more important in a world of rising automation and those kinds of tasks and abilities will be even more compensated than they are today. So, we’ll see even more jobs in this realm of human caring.

    Host: Now, switching gears a bit, you’ve been in Microsoft Research for a long time. How have you seen MSR evolve over time and as a leader of the organization, what’s your vision for MSR over the next few years?

    Eric Horvitz: It’s been such an interesting journey. When I came to Microsoft Research it was 1992, and Rick Rashid and Nathan Myhrvold convinced me to stay along with two colleagues. We just came out of Stanford grad school we had ideas about going into academia. We came up to Microsoft to visit, we thought we were just here for a day to check things out, maybe seven or eight people that were then called Microsoft Research and we said, “Oh come on, please we didn’t really see a big future.” But somehow we took a risk and we loved this mission statement that starts with “Expand the state-of-the-art.” Period.

    Second part of the mission statement, “Transfer those technologies as fast as possible into real products and services.” Third part of the statement was, “Contribute to the vibrancy of this organization.” I remember seeing in my mind as we committed to doing this, trying it out- a vision of a lever with the fulcrum at the mountain top in the horizon. And I thought how can we make this company ours, our platform to take our ideas which then were bubbling. We had so many ideas about what we could do with AI from my graduate work and move the world, and that’s always been my sense for what Microsoft Research has been about. It’s a place where the top intellectual talent in the world, top scholars, often with entrepreneurial bents want to get something done can make Microsoft’s their platform for expressing their creativity and having real influence to enhancing the lives of millions of people.

    Host: Something I’ve heard for many years at Microsoft Research is that finding the right answer is not the biggest thing, what’s important is to ask the right, tough questions. And also that if you succeed in everything you do you are probably not taking enough risks. Does MSR continue to follow these philosophies?

    Eric Horvitz: Well, I’ve said three things about that. First of all, why should a large company have an organization like Microsoft Research? It’s unique. We don’t see that even in competitors. Most competitors are taking experts if they could attract them and they’re embedding them in product teams. Microsoft has had the foresight and we’re reaching 30 years now since we kicked off Microsoft Research to say, if we take top talent and attract this top talent into the company and we give these people time and we familiarize them with many of our problems and aspirations, they can not only come up with new ideas, out-of-the-box directions, they can also provide new kinds of leadership to the company as a whole, setting its direction, providing a weathervane, looking out to the late-breaking changes on the frontiers of computer science and other sciences and helping to shape Microsoft in the world, versus, for example, helping a specific product team do better with an existing current conception of what a product should be.

    Host: Do you see this role of Microsoft Research changing over the next few years?

    Eric Horvitz: Microsoft has changed over its history and one of my interests and my reflections and I shared this in an all-hands meeting just last night with MSR India. In fact, they tried out some new ideas coming out of a retreat that the leadership team from Microsoft Research had in December – just a few months ago, is how might we continue to think and reflect about being the best we can, given who we are. I’ve called it polishing the gem, not breaking it but polishing, buffing it out, thinking about what we can do with it to make ourselves even more effective in the world.

    One trend we’ve seen at Microsoft is that over the years we’ve gone from Microsoft Research, this separate tower of intellectual depth reaching out into the company in a variety of ways, forming teams, advising, working with outside agencies, with students in the world, with universities to a larger ecosystem of research at Microsoft, where we have pockets or advanced technology groups around the company doing great work and in some ways doing the kinds of things that Microsoft Research used to be doing, or solely doing at Microsoft in some ways.

    So we see that upping the game as to what a center of excellence should be doing. I’m just asking the question right now, what are our deep strengths, this notion of deep scholarship, deep ability, how can we best leverage that for the world and for the company, and how can we work with other teams in a larger R&D ecosystem, which has come to be at Microsoft?

    Host: You’ve been at the India Lab for a couple of days now. How has the trip been and what do you think of the work that the lab in India is doing?

    Eric Horvitz: You know we just hit 15 here – 15 years old so this lab is just getting out of adolescence- that’s a teenager. It seems like just yesterday when I was sitting with the Anandan, the first director of this lab looking at a one-pager that he had written about “Standing up a lab in India.” I was sitting in Redmond’s and having coffee and I tell you that was a fast 15 years, but it’s been great to see what this lab became and what it does. Each of our labs is unique in so many ways typically based on the culture it’s immersed in.

    The India lab is famous for its deep theoretical chops and fabulous theorists here, the best in the world. This interdisciplinary spirit of taking theory and melding it with real-world challenges to create incredible new kinds of services and software. One of the marquee areas of this lab has been this notion of taking a hard look and insightful gaze at emerging markets, Indian culture all up and thinking about how computing and computing platforms and communications can be harnessed in a variety of ways to enhance the lives of people, how can they be better educated, how can we make farms, agriculture be more efficient and productive, how can we think about new economic models, new kinds of jobs, how can we leverage new notions of what it means to do freelance or gig work. So the lab has its own feel, its own texture, and when I immerse myself in it for a few days I just love getting familiar with the latest new hires, the new research fellows, the young folks coming out of undergrad that are just bright-eyed and inject energy into this place.

    So I find Microsoft Research India to have a unique combination of talented researchers and engineers that brings to the table some of the deepest theory in the world’s theoretical understandings of hard computer science, including challenges with understanding the foundations of AI systems. There’s a lot of work going on right now. Machine learning as we discussed earlier, but we don’t have a deep understanding, for example, of how these neural network systems work and why they’re working so well and I just came out of a meeting where folks in this lab have come up with some of the first insights into why some of these procedures are working so well to understand that and understand their limitations and which ways to go and how to guide that, how to navigate these problems is rare and it takes a deep focus and ability to understand the complexity arising in these representations and methods.

    At the same time, we have the same kind of focus and intensity with a gaze at culture at emerging markets. There are some grand challenges with understanding the role of technology in society when it comes to a complex civilization, or I should say set of civilizations like we see in India today. This mix of futuristic, out-of-the-box advanced technology with rural farms, classical ways of doing things, meshing the old and the new and so many differences as you move from province to province, state to state, and these sociologists and practitioners that are looking carefully at ethnography, epidemiology, sociology, coupled with computer science are doing fabulous things here at the Microsoft Research India Lab. Even coming up with new thinking about how we can mesh opportunistic Wi-Fi with sneakers, Sneakernet and people walking around to share large amounts of data. I don’t think that project would have arisen anywhere, but at this lab.

    Host: Right. So you’ve again teed-up my next question perfectly. As you said India’s a very complex place in terms of societal inequities and wealth inequalities.

    Eric Horvitz: And technical inequality, it’s amazing how different things are from place to place.

    Host: That’s right. So, what do you think India can do to utilize AI better and do you think India is a place that can generate new innovative kinds of AI?

    Eric Horvitz: Well, absolutely, the latter is going to be true, because some of the best talent in computer science in the world is being educated and is working in this, in this country, so of course we will see fabulous things, fabulous innovations being originating in India in both in the universities and in research labs, including Microsoft Research. As to how to harness these technologies, you know, it takes a special skill to look at the currently available capabilities in a constellation of technologies and to think deeply about how to take them into the open world into the real world, the complex messy world.

    It often takes insights as well as a very caring team of people to stick with an idea and to try things out and to watch it and to nurture it and to involve multiple stakeholders in watching over time for example, even how a deployment works, gathering data about it and so on. So, I think some very promising areas include healthcare. There are some sets of illnesses that are low-hanging fruit for early detection and diagnosis, understanding where we could intervene early on by looking at pre-diabetes states for example and guiding patients early on to getting care to not go into more serious pathophysiologies, understanding when someone needs to be hospitalized, how long they should be hospitalized in a resource limited realm, we have to sort of selectively allocate resources, doing them more optimally can lead to great effects.

    This idea of understanding education, how to educate people, how to engage them over time, diagnosing which students might drop out early on and alerting teachers to invest more effort, understanding when students don’t understand something and automatically helping them get through a hard concept. We’re seeing interesting breakthroughs now in tutoring systems that can detect these states. Transportation – I mean, it’s funny we build systems in the United States and this what I was doing to predict traffic and to route cars ideally. Then we come to India and we look at the streets here we say, “I don’t think so, we need a different approach,” but it just raises the stakes on how we can apply AI in new ways. So, the big pillars are education, healthcare, transportation, even understanding how to guide resources and allocations in the economy. I think we’ll see big effects of insightful applications in this country.

    Host: This has been a very interesting conversation. Before we finish do you want to leave us with some final thoughts?

    Eric Horvitz: Maybe I’ll make a call out to young folks who are thinking about their careers and what they might want to do and to assure them that it’s worth it. It’s worth investing in taking your classes seriously, in asking lots of questions, in having your curiosities addressed by your teachers and your colleagues, family. There’s so much excitement and fun in doing research and development, in being able to build things and feel them and see how they work in the world, and maybe mostly being able to take ideas into reality in ways that you can see the output of your efforts and ideas really delivering value to people in the world.

    Host: That was a great conversation, Eric. Thank you!

    Eric Horvitz: Thank you, it’s been fun.