Episode 7: Deep Learning, Startups and Academia à la Montréal with Nicolas Chapados and JF Gagné

Montréal isn’t just a beautiful city, it’s also a hotbed of activity for deep learning. Fueled by ambitious entrepreneurs and some ground-breaking research coming out of the University of Montréal, the city is at the forefront of machine learning, deep learning, and artificial intelligence. In this episode, Jon Prial is joined by Georgian Partners’ Ben Wilde for a fascinating discussion with two machine learning practitioners: Nicolas Chapados, Chief Science Officer at Imagia, and Jean-François Gagné, Entrepreneur in Residence at Real Ventures. Learn from these experts about what it takes to successfully leverage this technology in your own business. You’ll hear about:

  • Why Montréal is one of the world’s most dynamic cities for machine learning (0:50)
  • The types and stage of companies that Real Ventures typically invests in (7:51)
  • What it takes to incorporate machine learning into an early stage company (8:53)
  • Data acquisition and the transferability of deep learning models to other domains (12:04)
  • The University of Montréal’s use of models for and research around image analysis (15:40)
  • How the University’s research is coming to market and the opportunities that creates (19:17)
  • The availability of data and its implications and opportunities are for early stage companies (22:10)
  • How deep learning can facilitate smarter decision-making (24:57)
  • What the open sourcing of Google’s TensorFlow means for deep learning (27:40)
  • Typical applications for Theano (29:45)

Subscribe: iTunes | Google PlaySoundCloud | StitcherRSS



Jon Prial: Something you might not be aware of, Montréal, Canada, is a hotbed of research in artificial intelligence. Today we’ll be hearing about the intersection of academia and startups, and what companies are doing, and need to be doing, to leverage the emerging technologies of deep learning and AI.

Welcome to “The Impact Podcast.” I’m Jon Prial.

Today on The Impact Podcast, we’re joined by Ben Wilde from Georgian Partners, who recently sat down with two entrepreneurs from Montréal, both of whom are deep into the machine learning field. Ben, welcome to the podcast.

Ben Wilde:  Thanks Jon. Yes, we were lucky enough to get some time with a couple of machine learning practitioners who have been heavily involved with deep learning, and in particular, the research that’s coming out of the University of Montréal.

Of course, Montréal is the home of Yoshua Bengio, who is considered one of the three fathers of deep learning, along with Geoff Hinton and Yann LeCunn.

Jon:  Ben, Geoff’s affiliated with University of Toronto, but also Google, and Yann’s affiliated with Facebook and New York University, so where are we going with all this?

Ben:  There are a couple of things that struck me from the conversation, and one was the opportunities for smaller companies to access the research and talent coming out of Montréal, and deep learning.

What’s interesting here is that the University of Montréal appears to be taking a different approach in terms of linking technology from academia with smaller companies and startups.

But the other thing that was particularly interesting for entrepreneurs to be aware of is that there are some real gotchas that companies need to think about when they’re looking to take some of this cutting edge machine learning, deep learning research out of the lab, and start to create a business around it.

Jon:  Thanks. Let’s have a listen.

Ben:  Thanks guys for taking the time to be on the podcast today. What I’d like to do is start off with a little bit of background on you both. Nicolas, you received your Ph.D. from the University of Montréal in machine learning, I believe. Could you tell us a little bit more about that, what you’re doing now, and how you got here?

Nicolas Chapados:  I did my PhD with Yoshua Bengio at University of Montréal. I started…actually my actual background is studying engineering at McGill University. I started my career doing speech recognition for a big lab that Nortel was running back in the day, close to Montréal.

That gave me the taste for machine learning, I was a programmer in the speech recognition research lab. The least qualified, the least dedicated person in that lab, debugging people with [inaudible 3:11] code. That gave me the taste for machine learning and research, and had me go back to university, do my Master’s and Ph.D. in machine learning with Yoshua Bengio.

Initially, I studied the combination of neural networks with a portfolio modelling, and financial modeling with Bengio. In the early 2000s, we started a technology transfer company called ApSTAT whose purpose was to take some of the innovations that we had developed in the lab and academic setting and applied more broadly in industry. We did work on insurance, risk modelling. We worked also in portfolio management.

We co-managed the hedge fund with the big Canadian bank. We also did some signal processing, signal recognition work with the US Department of Defense. Also, worked in schedule optimization with Jean Francois’ previous company that we will come back to later.

More recently, I’ve been involved in a new startup called Imagia, whose purpose is to take all the incredible innovations that have been coming of deep learning over the past few years. Apply it in the field of medical image analysis, more specifically detecting cancer tumors.

Ben:  Thanks for that Nicholas. Jean Francois, you’re currently an entrepreneur in residence over at Real Ventures. Could you tell us a little bit about what you’ve doing previously as well, and also what you’re focusing on now in that new role?

Jean Francois Gagne:  It’s been quite a ride. I’ve been interacting with the Real Ventures for a while now, and being the first fund two. That was back in 2010. I also one of the first [inaudible 5:24] in 2012. Sold my last company which was called Planora.

Like I mentioned, that was Nicolas how we met and interacted together in building some very interesting machine learning technologies, and applied it to what Planora used to do around workforce management and workforce optimization tools for enterprise.

I sold to a company called RedPrairie in 2012, and from there, I got promoted to a chief product officer role. I manage product innovation there, for two years in-house, and then decided to go back to smaller companies and startups last July, after spending three years over there.

Real Ventures invited me to join them as an entrepreneur in residence, so I could learn and interact with startups all across Canada.

Ben:  JF, could you explain a little bit more about what an entrepreneur in residence does, day to day?

Jean Francois:  Absolutely. My role, in the way we’ve carved it here, is to help portfolio companies with the knowledge and skills, sharing my approach and being a mentor to them, helping them go after and tackling the different activities they have in front of them. That’s one, helping portfolio companies.

The second is also specific, being part of due diligence for companies that are within, again, my field of expertise, software, AI and machine learning and optimization. I also help Real Ventures build and progress in their pieces of their spending, the delaying around the AI and machine learning that is really blooming here in Montréal.

Ben:  Is there any particular size of company? You’re typically investing slightly earlier in the cycle than George and partners. Could you talk a little bit about that age, and stage of your ideal investment?

Jean Francois:  Real Ventures is focused on seed investments, mostly.

Enterprise, and companies that are focusing on building their product, building their team, still investigating exactly what’s the current value prop and what’s the right way to message their offering, up to the point where they’ve figured out all of these things and are starting to generate revenue, and growing, which is often the stage that we relate to in venture capital as the Series A stage, which is when you guys at George’s start investing in.

Ben:  Got it. Nicolas, can you talk a little bit about what it really takes to incorporate machine learning into an early stage company? There’s a lot of talk about this idea that the next 10,000 startups, the idea is you take idea X and you add machine learning.

Can you talk a little bit but the realities of that, and maybe where companies should start thinking, and looking, and how organizations like those they are involved with or MIRA at the University of Montréal can help?

Nicolas:  Yes, of course. The first thing to understand is that machine learning is really a consequence of having data. The first essential ingredient is to have data, and usually the more of it you have, the better it is. I have to say machine learning is an obvious consequence of big data that has been pictured in the past couple of years.

Now, when you’re a startup, obviously the big problem is that you don’t have big data to work with. Usually, big data is the result of having a lot of customers leaving behind a ton of transaction data, a ton of personal details. When you’re starting out, what do you do? You don’t have those data elements available.

The prerequisite, really, is to have a reasonably clear picture of what you’re trying to accomplish in the marketplace, what your product is going to be doing, and put together the mechanisms to acquire this data, either through well-structured NLPs that will save as much as possible of the interactions that early users are generating, or to be creative, and get data from third-party sources that you can leverage in useful ways.

One of the biggest transformative elements in the in the machine learning community, and more specifically, the computer vision community in the last few years, has been the availability of those big public data sources. For example, there is a standard database in computer vision called ImageNet that consists of 14 million images. They are publicly available.

A lot of them are sourced on Flickr. They are all tagged with useful labels that give the contents of what’s inside the image itself. If you want to train a high-quality image classification model, you can download this public dataset, set a computer to run for one week. After one week, you will get your trained ImageNet model. This simply, 10 years ago, was not available

Ben:  Could you talk a little more about that? With Imagia, my understanding is you’re doing some very specific image analysis, in particular in cancer. How transferable is the training that happens on the public dataset of Internet images, and cats, and things like that, to being able to go into another domain?

Is there something that’s a characteristic of deeper learning, this ability to create models which are quite flexible? Then also, could you talk a little bit about if you’ve got specific data, where does that come from in the case of Imagia?

Nicolas:  Yes, of course. What we’re trying to do first at Imagia is to apply the learning, and translate the amazing advancements that we have seen in the deep learning community over the past couple of years, to become clinically available and help patients. So help physicians diagnose things like cancer tumors more effectively, more quickly, and with fewer errors.

Of course, to train useful classification models, especially deep learning models, we need lots of visual data. Medical imaging data, by its nature, tends to be reluctantly sparse, because you have those patient consent issues, and the fact that there are just not that many patients with a very specific type of liver tumors.

On the other hand, anybody can take lots of pictures with their smart phones. There is tons of pictures available of general objects. What people have been recognizing over the past few years is that deep learning models have this amazing ability to be transferable, to some extent, between fields.

We can take, for example, a model trained on one million general images. We let it train for one week so that it learns to become very good at classifying general images that we see in everyday life. Then we can take this trained model, do a post-processing operation that we call fine-tuning, which is essentially taking the initial trained model, and adjusting its parameters according to a much smaller database that would correspond, in our case, to very specific medical imaging data.

Ben:  So you do some initial training on tens of thousands, or hundreds of thousands, of images. Then you do some fine‑tuning on the dataset that’s more in the hundreds or thousands of images? Is that how it works?

Nicolas:  This is exactly how we do it, yes. The performance compared to training from scratch using only a thousand medical images is much better if we do this pre-training with a million general purpose images first.

Ben:  That’s really interesting. The University of Montreal is particularly known for what types of models, and research around image analysis? Could you talk a little bit about those? Not into much detail, but just give us a bit of the flavor for the particular type of approach that’s being used.

Nicolas:  The first thing to know is that Yoshua Bengio is an outstandingly prolific researcher. By far, the most productive person I know. Over the span of his more than 20-year career, he worked on a great, great variety of models, and a great variety of location domains.

Most recently, the lab has been most known for its incredible breakthroughs in sequence processing models, especially natural language processing, or NLP. Some of the recent work that came out of the lab are automatic neural translation models. Models that can take a sentence in French, and output the translation in English in a completely data-driven fashion.

We don’t do rule‑based translation that we used to do in the old days. This is completely a single narrow network model that is trained to take an arbitrary sequence of English words as input, and output its translation in a different language.

Ben:  It’s doing the pattern matching rather than requiring a human to define a bunch of rules ahead of time, and then have those rules executed to do the translation?

Nicolas:  Exactly, those models are trained on aligned corpora. Let’s say you are in the Canadian Parliament, and all the debates have to be translated into both French and English, then you’re very lucky, because you have this corpus of aligned sentences. You know that one English sentences the translation of this very specific French sentence, and you have the raw data that you need to train your models to translate from French to English, or from English to French.

Ben:  Got it. In addition to the NLP, because there have been quite a number of breakthroughs around it, is it the application of CNNs, or convolutional neural networks to image matching? That sounds like that’s what’s become Imagia.

Nicolas:  The type of models that apply more to visual processing, indeed are CNNs. What Bengio’s lab has been doing that’s really significant in the last couple of months are automatic caption generation models. Models that are part CNN-based, and part sequence-processing based, and that given an image as an input, will generate complete English sentence, or an English paragraph that describes the contents of that image.

Some of the most innovative and highest performing caption generation work came out recently from them.

Ben:  Interesting. In terms of how that research is coming to market, and showing up in new markets, of the three universities known for deep learning, where the other universities are University of Toronto and MIU, Montreal seems to be taking a pretty strong independent approach to its research, and in particular, not aligning with any one corporate partner.

Given your background, could you talk a little bit more about what you see us thinking there, and where that’s important from an innovation perspective?

Nicolas:  Yes. I guess that’s a fair assessment. It is often said that modern deep learning has three major co‑founders, or founding fathers, if you will. We have Jeff Hampton at the University of Toronto, he’s at Google most of his time. Yann LeCun who had a long and illustrious career, but mostly recently was at NYU and is now head of Facebook AI research, and Yoshua Bengio at the University of Montreal.

Bengio really has the last very large academy research lab in deep learning that has no preferred industry attachment. Bengio will take quite a bit of learning from many different companies, but is very strict about this policy that his research should all be put in the public domain, and published freely.

Companies that will fund the lab get no preferred access to the IP that is generated by them as a general rule, as a general policy.

Ben:  That does sound good. It sounds like a great model and it, I think, creates an opportunity in Montreal for smaller companies.

Nicolas:  We see that. We see the ecosystem, and people collaborating. There are also a lot of labs from larger corporations that I moved into Montréal to get access to not only the talent, but being close to what’s happening.

As some of you may know, out of papers and research, there’s some interesting things that you can learn, but it’s small in comparison to what you can actually learn from the people that made the research. Being close to the environment, and to the ecosystem, connecting directly is worth a lot, and a lot of people are recognizing that, and are moving and establishing themselves here in Montréal.

Ben:  JF, Nicolas mentioned that machine learning is the obvious consequence of big data. Changing tack for a moment, can you talk a little bit more about what you’re seeing, in terms of the availability of data today, and what the implications and opportunities are above that for early stage companies?

Jean Francois:  You’ll see what we see. AI, machine learning and optimization, because I’ll group them as one big thing. As we used to say in the past, automation, we try to move away from that word, because in terms of industry trend or buzzword, it relates sometimes to rule‑based systems.

In reality, what we’re seeing here is that sort of trend in the market, either you can talk about the cost of sensors that went down, all the new information that we can acquire now with our cell phones, all the sensors that we have there. We’re starting to generate new types of data, new types of insights that can now close the loop in processes that were acclimated, that back in the day systems had no oversight on.

What’s acclimating? It’s not only that we are getting access to more data, but we get more information at different points in time and processes. Getting that information, we can now leverage that to make smarter decisions, or just simply automate those decisions.

It’s just natural, as you’re closing the loop, to then compare allometric, measuring cells to how well you’re performing it. Again, if you have all the data points, and you have the decision-making process that is all automated, why not learn from that and get better at it?

I don’t want to oversimplify stuff, but basically what we’re seeing is that giving all of the tracking and sensors that are given to such things, the computing power that is available, all these things have been going on for a while now. We now have access, for a startup, very cheaply to, of course, we automate processes that haven’t been automated in the past.

When you get these opportunities, often what happens is that you can start disrupting the current models that are based on alternatives that are no longer true.

Ben:  JF, you talked a bit about automation there, but if you look at what Imagio are doing, and what Nicolas is doing, a lot of that is around augmentation as well, isn’t it? Nicolas, do you want to talk a little bit about that vision, as well? Because this isn’t really about just taking humans out of the loop. In a lot of use cases, it’s about smarter decisions for humans, right?

Jean Francois:  Of course. The whole premise behind Imagio, if you will, is to add of actionable information on top of raw medical energy data. It used to be the case that a radiologist would see a scan, and those scans are very complex, 3D depictions of what goes on inside a human being, and then painstakingly analyzed, slice per slice.

There can be hundreds of those slices that you would need to go through in order to find what is an anomaly inside the body, and eventually that anomaly is a pathology. What we want to do is be able to overlay some automated analysis that will point the physician in the right direction and say, “Well, out of all the 200 slices that comprise the scan, I think you should go look at slice number 39 because it is the most likely to contain some anomalous lesion.”

If we can order cases in this way, and have the case load of the doctor ordered from the most severe patient, down to the likely normal cases, then we can make much more effective use of the doctors intellect, and ultimately the whole healthcare system becomes more effective as well.

That’s really our vision, to use those predictive models to remove frictions inside the system and make everybody operate more effectively. More generally, all uses of machine learning that are good at doing predictions, optimization of algorithms that will take those predictions and turn them into optimal decisions, you can view them as ways of reducing friction inside any economic system we can think of.

Ben:  The big news recently was that Google had gone ahead and open‑sourced the TensorFlow deep learning framework. Nicolas, could you talk a little bit about your perspective on that? How useful you think it might be, and what it could mean for the commercialization of deep learning into bigger solutions and products.

Nicolas:  Yes. TensorFlow was the big news release of two weeks ago. Google basically took their entire C++ environment deep learning framework, packaged it very nicely and released it out to the open. Now to be fair, they took the single processor, single computer version of it, so the whole internal TensorFlow at Google is reputed to scale to many thousands of computers that can do its job. Google released only a portion of that.

I think the nice part about TensorFlow is that the library itself is developed by dozens of well-paid, highly competent software engineers. We get a very high-quality library that’s being released. There are other open source, machine learning libraries available, including one that’s been developed in Bengio’s lab at University of Montréal called Theano that, functionally speaking, does very much the same thing that TensorFlow is doing.

Theano has been available for six or seven years now, so it’s very old compared to TensorFlow. It’s been checked through and updated on a regular basis, but the number of people working on Theano is tiny compared to what resources that Google can put behind TensorFlow.

Ben:  Going back to Theano, can you explain a little bit more about how that works, and what types of problems it’s particularly suited to?

Nicolas:  Yes, of course. Theano, at it’s core, is a library that lets you be very creative about machine learning algorithms that can be trained with gradient-based algorithms. This sounds a little bit technical, but basically, it is a very good fit to any neural network model that you can think of.

Theano really started out as a single processor library that could offload much of it’s work to graphical processing units, or GPUs, that speed up neural networks by a factor of 1,000.

Theano is very good at that. It’s not very good at scaling out to multiple computers in a cluster. It really hits bottlenecks there. There are some people that are working on distributed Theano but that work remains experimental right now.

That being said, it remains to this day the workhorse of many, many machine learning researchers, a large number of deep learning papers that are being written by academic groups, either here in Montreal or elsewhere across the world use Theano as their go-to tool to train deep learning models.

Ben:  That’s a really good point on that. There seems to be a gap between where the state-of-the-art is in deep learning or in research, and the engineering of software solutions based on deep learning. Could you talk a little bit about that?

Because you’re trying to close that gap, obviously, with Amedia. You’ve done this stuff before with ApSTAT, where you’re dropping these ideas and algorithms into larger bodies of software.

Can you just talk a bit about the practicalities of taking what is very clearly effective research, and turning it into amazing products?

Nicolas:  Yes. How can I start with this?


Nicolas:  It’s not easy, let’s just say that. It’s not.

Ben:  That’s the thing. In the media, it is somewhat portrayed as a solved problem, but it seems to me the nuances around training and tuning, there’s still work being done. Yes, we’ve got a lot more data. Yes, we’ve got much faster, bigger computers, except that the point of, how do you bring those algorithms to life inside a real software environment, that seems to be where maybe projects with TensorFlow have a stronger role to play.

I think the fact that it’s that hard explains a little bit the rise of salaries that you’ve seen for experts, and why there’s so much pressure in the market to get very highly talented people. I think that gap is obvious, and it shows there.

Jean Francois:  I give the little-known secret reality of deep learning is that not everything is written in the papers. You will read researchers saying, “We solved this kind of problem using this kind of architecture, trained with this data,” and so on and so forth.

What you don’t see is that they trained models for weeks and weeks on a cluster of computers to this side, how to tune new architectural parameters, what we call the hyper-parameters that go into why this particular network that has 4,096 individual units in this layer, as opposed to this one that has twice as many different units.

There is a lot of engineering that’s hidden into training neural networks, and it’s a big part of the gap to bring useful solutions to market. You need to worry about robustness. You need to worry about, “OK, I’m not just writing if he were that will be read by 500 people, but I’m actually deploying a net that will be used by doctors every day on flesh-and-bone patients, and that must never fail.”

The gap is to be able to ensure that your net is robust enough to face the real‑world deployment.

[background music]

Jon:  Ben, to summarize, what would you tell the leadership team of a startup that recognizes the need to leverage some of this technology?

Ben:  The key is really getting existing talent. It’s a similar challenge that every organization faces in any of these newer technology areas, and it certainly has been a challenge in the case of applied analytics, getting access to that data science talent.

With deep learning in particular, it’s going to be more complicated because the number of people with the knowledge is smaller, and I think that’s where partnering with academia and transitioning that know‑how and talent into industry is going to be really key. Of course, in addition to those skills, you need data, and you’re probably going to need a lot more data than you do today.

Jon:  As we talked about before, on lots of podcasts, it is all about the data. That’s great, Ben. Thanks for being with us, and thanks to everyone for listening. This is Jon Prial for the Impact Podcast.