Towards Secure and Interpretable AI: Scalable Methods, Interactive Visualizations, & Practical Tools


>>So it’s my great
pleasure to introduce Polo Chau to Microsoft
today. He’s going to talk. He’s Associate Professor at Georgia Tech,got his PhD at Carnegie Mellon, and discuss some famous papers that polonium papers like the ones I know, have read definitely in detail. Today he’s going to talk to us
about mining, research papers, and patents, and
probably descriptions for great ideas. So welcome.>>Okay. Yeah. Thank you Jay and also everyone in attendance
and also joining us remotely. So today I’m going to talk
about two related topic. One is how to make AI more secure, and the other is how to
make AI more interpretable. So at the first glance
you might think may not be directly related, but hopefully through this talk, you’ll see that they’re
highly related. I will first begin with a little overview of
what we do for our group. I like to name things that
are myself before this talk. So I name my group the
Polo Club Data Science and at the bottom so
they are my students, current students and
recently graduate students. At the Core what we’ve been
working on is how to combine intelligence if you want
is the AI stays very big and then offer also the human
intelligence side of things. So you can think of it more about how the scalable automated methods and also the flexibility
of human beings. The goal there is develop scalable interactive tools to make sense of large-scale
data set and model. So these are argue that these are essential for any kind of
analysis that we want to do. There’s a number of research thrust
that we have at Georgia Tech. So five of them I summarized them. One is human-centered AI, how do people interact with AI
systems that make it easier. Security, where have been working in a long time in the cyber security. But these days we also
care about security of AI, which is why you have
FL Machine Learning. Some of my PhD and also some of my students have been working
on large graph mining. Graph data and network kind of data, how to visualize them, and also application in social good and health and of course
also security as well. So today we’re going to
focus more on a two of them, particularly how to make AI secure, how to do for example look at
the defends protection of AI, and also relatedly how to
make them more interpretable. So why do we focus on them, or how are they related? So for a secure AI part I
think I don’t need to convince you too much because now AI’s often used in safety critical applications. So it’s very important that we study that threatened countermeasure. So you have porcine
probably a lot more than you would want
at examples and news. There are these accidents, and people don’t really fully know until after a lot of investigation, and somebody even after investigation you still don’t believe all we know. So that bring us to the need to really go deeper into
understanding when you say you have like an accident
or maybe you say someone’s I now have a protection you can actually help
avoids accident. How does that really work? So and as you know like
these days often AI, when people mentioned AI, often they think about
it as black boxes. Especially these days because AI often mean using the model
or the learning models. Often use of black-box
not really that good. So ideally, instead is we want to open the black box
a little bit, maybe not fully. Maybe too much details,
but at least make it useful for people to understand
what is really happening, and when something goes wrong
with a you know how to fix it. So and our way of doing is through scalable interactive user interfaces. You cannot just say oh let’s just tell you-all the details you need provider right interface
to help people understand these complex
large-scale systems. So that bring us to our today’s
agenda like so far each of these two are big topic
security AI interpreted with AI. I will get you a few
examples of projects that we have been
working on recently. I’m really excited
about and also some of the liters worth extra
start to combine these two. So I’ll start with
the secure AI side. So we will talk about some of
the recent work that we do, both on the attack and
defense side of AI, and you may say why do
we need study attack? Reason is that we wanted to develop a strong defense as we need to
know how the bad guy thinks. So that’s why I also be the bad guy and think about
how to do that attack. So we’ll give two examples. Now one call shapeshifter
and shield both of them are with Intel pretty
recent into collaboration. Then the second subsection
I will focus on two-system, one is called Adagio in ML spot. So those are tools that make a
researchers students practitioner more easily able to test these
attack and defense methods. If you have been working in the area, you notice that a lot
of interest recently that also means a lot of work
with an attack and defense. So how do you allow
people to easily test all these attack and defense instead of them writing
everything from scratch? So for that I will
show you two examples. On the interpreter were AI side, we’ll talk about some
other recent collaboration with Facebook for example development activists
system that help them Facebook scientists understand
the models and data sets. Also very recently with Google, we developed Gamelab to help you understand it’s generative
adversarial network. So very hot kind of
deep learning models, but also very very hard to train
and understanding it from expert. So we’re looking at
how do we do that. Then lastly we’ll wrap up with a high level I guess survey
of this day of the art and also work that my students did with a Microsoft research last summer. So on side of the secure AI side. So specifically the
attack and defense. So begin with a cartoon. So you might have seen it is that
in the not-so-distant future, everything become smart
including a toaster. So if you got hacked and thinking
is a blender and [inaudible] well that’s just again it up
pull, it turned out that’s true. So you have smart toaster. Can smart anything you
want, and actually, and all these devices anything
can imagine now is just having all these learning
capability or AI in it. So AI security is becoming
increasingly important. There’s a lot of these numbers and from the popular press
you probably have seen that there’s just more and more
of them and a lot even more. Also that unavoidably or naturally, say that AI now is
also used in safety critical application where
the stick is really high. So what our goal is to study
when they might break down. Vulnerabilities and also then develop secure AI methods for
these high-stake problems. So give you some example
like that as I said the first to collaboration with Intel. The first one is about Tech
call Shapeshifter attack. How to launch some physically
and realizable attack. Second one is for defense. So we start with the first one. So I’ll Tech and it was published
pretty recently at PKDD 2018, and as I mentioned in
collaboration with Intel. This is also the very first we call our target to physical
attack on object two seconds. So that’s quite a number of technical terms here.
So what does that mean? So often when people study MSR machine learning problems as in machinery model want to attack it. What does it really do
it? Often is image-based. So as an attacking
an image classifier and where the usual task
is you have an image. Then even though there
might be many things in it, but you just want one single label. For example for this
image you’re looking at. Oh okay that label is
probability a car. But for human being that
may not really make sense, or if this is really a
part of a larger image, you will see that well
there’s not really only a car that’s also people, there’s buildings or why don’t
we also label those things. So that’s inherent
some ambiguity there. So on the other hand if you studying this as an
after detection problem. Then is at this problem more matching what people were really thinking
when they’re looking at an image. Well not only a cop, but also have people-people are these things as building and so on. So what that means
is when you’re doing object detection is very
different image classification. Is trying to recognize n localized multiple
objects at the same time. So there’s an important
distinction between there because our attack is to attack
these object detector. So meaning attacking the
machinery model that can do these recognize Alsatian
and localization. So we’ll break that, not just one image and
then change that label. So little bit of information in case you’re not familiar with Azure Machine Learning. So now it’s pretty well established that not just machine
learning model but just even deep neural network
models are very vulnerable. Vulnerable you’ve seen many of these examples as in the image
and I say image of stop sign. You can change the pixels in some
very human imperceptible way, like change and make this slightly
darker is like the brighter, and that would get misclassified
do whatever you want actually. So in this case can misclassify
into a speed limit sign. Very easy to do digitally. We’re turn out well
because all our work is done in research in academia. So we often come up with what I call it pretty
impractical threat model. As in we assumed the bad guy know a lot of information
about the system. So as in to do this, they were assumed the bad
guys know everything. They know how the model parameters, they have access to
data and everything. But in practice, you
will imagine well, the bad guys may not know as much information or may
be much harder to do it. For example if you are
building a self-driving car, and you want your car to be able
to are recognize stop sign. So think about you building a car. What you will do you need
a system that’s able to capture image by using
some coordinate camera, and likely you need to pre-process, or process those image or video, and they do recognition
and all these. So what’s really academic folks have been focusing on in state two
what I call the digital attack. As in they assume the bad guy would have access to all
the whole pipeline. They have access to the data,
they have access to the model, have access to even do
pre-processing step. So if you think about it, all these things are in
the autonomous car system. If I really know about everything, then I probably don’t even need to attack them or I can
actually do other thing. I could just change the
label and then that’s it. So but nonetheless that’s often what we assume that
bad guys would know. So what we want to
focus on is the more, I would say maybe more
practical 3D-model, as in called a Physically
Realizable Adversarial Attack. Where we don’t really
mess with the system. We don’t even assume that the bad guys would be
able to hack into it. But now instead we manipulate
physical environment. As in we may be able to do
something to the stop sign, without really going into the car. So that’s the kind of attack that
we’re interested in looking at. So we hope this is more realistic
and also we want to study, can we change their substantive
to whatever we want. So we’ve upgraded their detail
which show you the outcome. It’s like doing a cooking show, it will show you the cooked chicken before we go into the
details about the recipes. So there’s a video, a short video, and where we have the real
stop sign on the left, on your left, and then
the fake stop sign there while shifted techniques
are able to degenerate. So it’s printed out from a printer and a student, my student, Shawn, so he will drive his car closely towards the stop
sign and you will see that the real stop sign is correctly
classified almost out of time and the fake one as you go closer and closer and originally, it
wasn’t detected at all. But as you go closer and closer, it become a detected person. It turned out that you can make it detected and with
anything you want. But that’s to show you an example
of how you can do with these. We call it physically
realizable attack. You don’t really need to
hack into the system. You just need to manipulate
the environment. So this kind of what we call
physical attack is not new, not new as in they all
have been focusing on attacking image classification
where you give a whole image, you just keep one image. For example, there’s classes that
can for face classification, you can print something
like a 3D object and again, image classifier or you can do
a sticker even on a stop sign. But all these are for Image
Classifier focusing on taking an image classifier. Some recent work is also while
if you apply these techniques, image classifier and then now you
run your object detector on it. Actually, those technique
would not be able to fool the object detector. As in here on the left
and on the right, the sticker attack at work for
image classifier could not fool the image at the object detector and similarly on the right hand side. So that led us to our
technique as well. So how do I attack
the object detector? So in particular we attack a state of the art object detector
called Faster R-CNN. How it works is that they have
like an to image and then try to detect what potential region of the image that might
have some object in it. We call that figuring
out the region proposal. So I’m particularly stage
one general region proposal. What are the potential
regions for example? Around the stop sign, you will
see many potential region, and the average potential region
that it tries to figure out like, oh what might be in it? So if there’s enough
region that says, oh this is probably a stop sign and the other
region are overlapping. This is actually a stop sign. So that’s also done through with the localization classification step. So that means if you want to
attack this object detector, Faster R-CNN, what are the
challenges that would be? One is that because of the stop
and multiple region proposal, if you want to attack it, kind
of like attacking an ensemble. If we want to make
this stop sign to be mislabeled as let’s
say a person, right. Since that’s many overlapping region, you could fool all the regions. You just fooling one
region is not enough because the other region
would compensate and say, hey wait, you say it’s a person, but I say it’s a stop sign, and many of them will say stop sign. So that’s one of the challenge. The second is that since we now we’re dealing with
physical objects, so we need to care about
distances, angles, lighting. For example, when there’s
a Cloud that goes past would actually break the detection. So all those we need to care about. So how do we solve these? Like our techniques is to, well, if we need to fool
the whole ensemble, well then so be it. So what we do is then and we try to minimize it during
the optimization set. So a lot of these are
based on optimization. So instead of really optimizing
for region by region by region, now we have an automized book with some of all their
classification on it. So that means now we’ll say, well a manipulation or changes to the color
and we will only accept that change when it’s able to fool all the regions
and then we accept it. So that the [inaudible] Yes. After the fact. Yes. It actually
took a long time to try that. The second thing is
that we don’t just want the stop sign to change
your whatever you want. So we still want it to
look like a stop sign, to human beings at least. So then what we do is that
we do a constraint on a part that the technique, the shapeshifted
technique can change. So in this case we only
protect the red area. So we’ll either stop the
white text there alone. There’s a reason for it, doing
it is because for color re, using our human perception, recognition, and that the human eyes is less sensitive to changes in the darker color, for
something like darker red. You can do quite a lot more there and people still cannot see that changes. So that’s one kind of
challenge that we solved. Then the second challenge which is these distortions due to
real-world. How do we solve it? So as in like in different
environment, different lighting. So how we account for it? So we adapt a technical expectation
over transformation that was originally applied for
the image classification. In this case, we extend it so that it will also work for
attacking after detector. So specifically how it works
is you can think of it, we tried to assimilate many of
these scenario where we say, if it’s already
generated a stop sign, we overlaid it on top there images. We can do the other rotation. We can do this scaling. You can do a different lighting
similar to that lighting, and then we run the optimization
over many of the simulation. So after during this, many many trial of course and then you are able to
generate whatever you want. So now if there’s a target path. Let’s say I want to turn it
into a person, it’s possible. Or even a target that you want. So here you notice that these perturbations are
a lot more conspicuous. So generally, if you
want it to survive, while these are physical
perturbation or manipulation, then the padding would need
to be more conspicuous. If you want to lower it, that’s fine. So I’ll show you an example, a video off an untargeted
data and then in this case, that means we want the
stop sign to disappear. So again, similar setup. So assume you’re
driving a car towards this stop sign and in this case, it would just not be detected
at all. Pretty scary. When it go into the very
very end, an RPN number. So human eyes from a distance,
you can’t really tell apart. This is scary. Yes. Scary
enough that DARPA agreed. So they have a new
program called GARD: specific defense on AI and they highlight shapeshifter as
a state of the art attack. So the black one is where they show the video and the program is centered around including developing defenses for these physically
realizable attack. So that is one there. I think we’re really happy that there’s now a support for this
kind of line of research. The lead student, Shawn, because also this work
is a part of his thesis, so he’ll be starting as
an assistant professor at National Taiwan
University in the spring. So even more impact
hopefully in his career. So that’s about attack. I also want to mention
a little bit defense, so that we are not hopeless now. So we only have all the
attack though there’s wisdom. So there are defenses, so we also work our defenses
including a system called Shield, and where to focus here is how
to develop practical defense. You’ll notice that a lot of work
in our group is very empirical. We want things that
works and for Shield, won that KDD ’18 Audience
Appreciation Award, show a nice video convince people this is interesting
problem study. So Shield focus on defense and, in particular, fast
& practical defense. So of course, there’s a
lot of work on attack, a lot of work on defense. Our focus here is we
want things that work. When we say we want things that work, that means we want
things that are fast. We don’t want to say, “I’ll, wait forever and then now you deploy it.” So we want things that work
in real-time at high-speed. So you probably already
seen and remember this. So currently how we attack this image classifier object
detection is to manipulate the image. For example, you have a cat, typically if this is correct
then we’ll say it’s a cat, if it’s a dog, it’s a dog, and how the attacker
through the device in perturbation is they take advantage of the gradient
information from the model. So once they say, so I know
all these how the dark or the pixels used by the model and reach their dark and in depth
in the gradient information. You can say, “Well, now
I change the gradient so that whenever I have
the perturbation, those perturbation
would be sufficient to change the model upward and
now it become the cat.” So naturally, that means
you ideally want to prevent the attacker from using that gradient information and
that’s the core idea of Shield. Technically, we call Stochastic
Local Quantization or SLQ. A little technical, I’ll explain
what that means shortly. At a very high level is
really tried to destroy or to make the gradient information much harder for the attacker to use, so denying their access. How do we do that in the high level? We are using JPEG compression. So actually, everyone knows
about it, JPEG compression. So how this JPEG
compression then come in? So JPEG compression is specifically targeting
everyone has to use it. You have an image, you want a
smaller size, and put it on the web. What is it really doing
is removing information. Removing information
that people cannot see. At a pretty high compression level, you can really tell a difference. We’re using it because
that’s exactly targeting what the bad guy is trying to
create for the perturbation. Where it is, “Oh, I’m
going to introduce something that human
eyes cannot see.” Well, if that’s the case, then we’ll just use JPEG
compression to remove that. Specifically, we don’t use it like just one kind of
compression level, we use multiple kinds
of compression level. Some of the lower compression
level to high compression level; here you see a lot more artifact
and then there’s less artifact. Also, that’s where the
randomization come in. We don’t want the attacker to know exactly what level we are using. So then that means for an image, we divide it into blocks
and each one randomly use that sample compression
level and together that still become the overall image. So that is the main idea. So Shield can think of it as
the middle part where you use this real-time compression and
to protect the neural network. This can be applied to both
benign and ever-still image. Actually, we don’t even care
whether it’s good or not. Whatever it is, it just
pass through the middle. So there’s attack, hopefully
Shield will be able to correct it. If it’s already correct,
hopefully it doesn’t really damage the image content. So at the high level, this is a multi-pronged
approach that combine the randomization and ensembling
through the many models and also, there’s another part I haven’t
talked about is model vaccination. So that means you can actually train the model or retrain your model with these JPEG compressed models
of JPEG compressed images, so the model is used to
seeing these JPEG artifacts. So how well does it work? So if you apply our technique
on ResNet-50 model, so the dotted line
at the bottom there is when there’s no defense
and so the accuracy drops. So vertical axis here is accuracy and horizontal axis here
is pertubation strength. So more to your right, that means there are
more pixel changes; more to your left, less changes. So you see more changes while
the accuracy drop a lot, where for Shield in orange
here are still pretty good. If you look at individual JPEG level, we see is not as high as the
combination or the ensemble. So tiniest here we’re
looking at FGSM and I-FGSM. So those are some
common attack method for manipulating the images. As I mentioned, so we care
about how practical it is, so that’s another reason
why we picked JPEG. So we look at their
running time as well, comparing to other competing methods. For example, you can say
removing noise suppose, where you think about median filter, and where you think about
total variation denoising. So JPEG is significantly faster. That order is magnitude faster and also doesn’t hurt
the benign accuracy. So for sure, there
are some limitations. So one limitation was that we didn’t study what we call an
adaptive adversary, as in the bad guy would know our approach and then
they tune their methods. So we extended Shield to also
look at that type attack. So usually, whenever you
look at adaptive attack, it also means the bad guy know or have access to your
model so usually, you will see the access
rate is pretty high. So the gray line here is
our original Shield model, where you will say that
as if all the models we use like JPEG 20, 40, 60, and 80. If all the models are known, then attack success rate
is about 60 percent. But we found that well, actually we can lower
it further by training all these JPEG 20, 60,
80 model from scratch. So don’t use the information from
the other model and then we can lower that attack success
rate even further. So this work just appear
like a KDD 2019 Workshop. So the idea of compression
not only helpful for images, so you can also apply
it on audio as well. So there are now some audio attack. So ADAGIO is a system
where we allow people apply their defense using
compression on audio. So adversarial audio
very similar to images, but in this case it’s changing the audio instead of saying
“Open Intel dot com”, now it become, “open evil dot com.” Again, you can do whatever you want, but this again is digital attack. So it doesn’t really survive. Once you play the video in the
while and then get recorded, actually, those noise are gone. But we still study it, it’s interesting for
curiosity academic reason. So adversary again, use a similar gradient information like a backpropagation to
attack the model. Here again, we apply a compression
so that means whenever you have an audio you do the
compression technique. Here, compression you
can imagine there are different method to do it. ADAGIO here is focusing on developing a usable front-end to allow people to upload their
own simple audio, and then you can choose whatever
attack you want and also apply whatever defense you
may like and then even better you can play the
audio, hear the difference. So they’re very quick video. Here you can say clear cat
upload audio, very simple audio, and now you can do the transcription, you can deep speech and that’s
the correct benign transcription. Now, you can choose
an attack and you can choose whatever word you want to
change it to, whatever you want. This method often
really, really slow, so we need to speed it up 20 times. So now you see. Okay, I got it. Manipulate it to your target, and then now think, “Okay, so we can apply the defense.” MP3, let’s say, it’s the most common. You can do other to AMR and so on. So of course, MP3 is much
faster so we can fix it. So there’s a very, very
cool way to try it. This is actually part
of what I consider the do-it-yourself adversarial
machine learning. As you saw, we already have
ShapeShifter and Shield, and there’s actually tons
more other techniques. So how do we know
which one’s working? How do we pit them
against each other? So ADAGIO is one of the example recently where we want to make
it easy for people to do it. As a bigger effort, we also developed that system now
open source is called MLsploit, to allow people to try it even more. So it’s doing interactive
experimentation with Adversarial Machine
Learning Research. So now it’s an open
source and presented at BlackHat Asia and seen
in KDD ’19 Showcase. Their high-level goal
there is as I explained. So it make it easier for students, practitioners, researchers
who try all these, and the way that we can encapsulate this research as modules
and that, hopefully, researcher can contribute
and also to try pretty easily allow easier comparison
of attack and defense, and also importantly, we provide a user-friendly that’s easy to use, allow people to try
this pretty easily. Currently, there are
a number of modules, some including Shield
which are showed, and we also have all these
attack and defense technique on images and also some are on malware. For example, AVPass is
bypassing Android malware, we have like Linux malware, and attack and defense and so on. So at a very high level, MLsploit will allow you to upload
a sample, that’s step one; upload to MLsploit and then they
can choose what they want to try, whether an attack, whether
it’s defense they want to try, and then they compare the result. The architecture, we
also try to make it simple because we want
researchers to contribute to it. So a web portal as a front-end and through the RESTful API that
you can schedule all these jobs which are the attack and defense computation that you want to run which will
run on worker instances. Currently, this installation
is pretty easy, one-click almost and then
you can run your attack. So we won’t go through
those in detail. So this is screenshot of how
the system will look like. So each row here on your right
is what we call a pipeline. So as in if I want to
try an attack technique you create a pipeline. You upload data, you pick
what you want to run. Each little box here is one of
the operation you want to try, and then you can also see, of course, all the technical details of that
research module will output. MLsploit now, we also
want to teach in classes and also have a more of impact, so now it’s also being integrated
into AI Intel Academy course. So that’s some of the recent
project on a Secure AI. So not like now to switch gear a little bit to the
Interpretable AI side. So one thing about all these things that have
talked to you so far. I showed you some accuracy numbers, and I convince you, even video convince you
that they are working. But then the dirty secret is that
even if you’re working on them, they have no idea why they work. So they’re just okay, number
looks good, it must be working, but they cannot answer you at which part of the
model that they’re attacking, what’s the extent, what’s the
footprint, how do you stop it? Like when do you say your
protection was really protecting. So that means for the Interpretable
AI is really important, you need to know how was they’re doing so that you can
[inaudible] about it. For that, I will give
you a few example work that recently we did to our
company like Facebook, and Google, and including also Microsoft as well, so how to make that
helpful, be smarter, more interpretable so that
people know what’s going on. So if you’re talking to practitioners often and you don’t really need
to convince them too much, because of their job, they’re
already very interested in knowing how and why AI work. How do we do that? It’s easier to say that you want to do,
but how do we do that? The general medium that we use is through scalable
interactive visualization, and we think that visual way is a very powerful method
to achieve that. The reason that we want to do it is that for machine learning is great
in finding patterns of data, but it’s not very easy to digest, and where for visualization, that is a very effective way
to amplify humans connection, so it’s using our very effective
way of identifying patterns. Now in a combination, through interacting
with the visualization and users can incrementally
make sense of the AI. So one common perception that people may have when
we show visualization, “Wow, you’d make a pretty
picture,” and then that’s it, so not really, so visualization as a medium and through interacting
with the visualization, the user can ask question. So they see something, and then
they want to ask about it, and then to show them some more, something in visualization,
incremental is very important. So we’ll show you a few
examples of how do we do that. The very first one is work
that we did with Facebook, and helping their
internal data scientists to understand their
large-scale model and dataset. So ActiVis, it was
published in this 2017, this is a talk visualization
conference, and ActiVis is now deployed by Facebook on their
machine learning platform. So all the motivation come from
data scientists at Facebook, needs to visualization tools
to interpret a complex model. So I get that all of
you, now at Microsoft, so you probably have those kind of needs whenever you we look
at machine any models, we want to know when does it work, when does it not work or doesn’t
work, what’s really happening? So practical design challenges
as you could also imagine, like for industry is a
lot of deep/wide models, large datasets, and
also many feature, many different kinds of dataset, could be images, text, and so on. ActiVis started from a
participatory design session with a number of researchers, and engineers, and data scientists, that’s quite a long-term project, so over 11 month. ActiVis is a quick screenshot, I’m showing you some
video of how it works. Now after developments,
so now it can visualize the industry scale model
that used by Facebook. So there are a number of challenges
that ActiVis aim to solve. The first as I mentioned is that the many model
parameter or large model. Large model, so many neuron, so many layers, and
they’re particularly interested in deep network. So many of them, what
do we do about it? So the nice thing about talking to practitioner through an
observation we found them all. There are many of it, but
we don’t really need to spend the energy for
every single one of them. So in particular, we
can find out which are the ones that the practitioners
or researcher care more about, and then just focus on those. So for example, this is a
user front-end of ActiVis, and what we do is we still provide
an overview of the full model, like we call the model
architecture overview, but then we allow you to
zoom in to the graph, and then pick the one that they’re really interested
and show them to that. So I click on it and then they can
show the details at the bottom. So a very simple way but then that does solve then the
important problem, so instead of overwhelming people in everything which give
them an overview, that’s actually a pretty
common design technique in human-computer
interaction overview first, detail, so then filter
details on demand. So we’re using that here. The second challenge
is about how they are interacting with these
models and data. So through talking into
a data scientists, we also learn that they
are many different ways that they want to do
the understanding. At a very high level, we would categorize as we call it instance-level understanding
and subset-level investigation. Instance-level here, we mean
looking individual instances. So as in you’re doing
programming or writing a model, that’s when you pass
specific individual examples through a data point through it
and then look at how it works. So as we mention, it’s very, very helpful for debugging, as in like unit test if you would. That’s one very common
way, the other way, and their practitioner notices that
their instance-level is great, it can get a very deep understanding but it’s often not very scalable, this is why the subset-level
also very common which is more scalable as in doing high-level
categorization of the data. So it could be, let’s say,
you’re doing classification, so that means your subset-level
would be at the class-level, the whole class of image, whole class of dark,
whole class of a cat. But then subset, they could also be a smaller granularity as
we’re looking at human data, you may want to look at, okay, maybe people in southern country, people at certain level, and so on. So you can actually define
the subset in different way. So that means instance-level and subset-level is
actually really helping a spectrum of analysis all the
way down to individual examples, and then all the way up
to a super high-level. So these turn out as you may
expect to be complementary. So data scientists, researchers, they really need to do both. So how do we support this in
the ActiVis user-interface? So at the bottom of the
lower left corner of ActiVis is we call the
neuron activation view. So here each column is a
neuron in this matrix, so each column here are neuron, and each row here, we actually combining both
things so it could be instances and also
could be a classes. For example, the first six
those are high-level classes, we are looking at a text data, so that means we have
six category of text. Let say these are Facebook feeds, so we want to classify them
instead of feed about description, is about entity, about abbreviation,
about number and so on. So each row here is one class, and the dot here is the
neuron activation strength so that means for a particular class, let’s say for description, so it’s activating neuron number one, six, and this quantity, and so on, a lot, so Docker
higher-level activation. But it’s not limited
to just the classes, so the rows doesn’t
need to be classed, it can be other thing too, it
could be user-defined subset. So again, we’re looking at text
data, so that means we can say, why don’t we put together all the texts phrases that
contain the word how? So that become one row. How about all the feed that contained to what how many that’s in other, so that means the user
or data scientists in the scale can define the
subset they’re interested in, and again, we can put them
into the same matrix. Even better, well, no limit
thing ourself on during subset, we can even add individual instances. So on the right-hand side, that’s the instance where you can actually click some instances and that get added as additional
row to the matrix. So for example, this
particular sentence, “Where is Prince Edward Island?” So that’s 1994, and then you
add it to there as well. The nice thing about it
is really combining, putting everything together,
much easier for comparison, and you can add a few more, and you can add instances,
you can add subsets, subset could be very high-level
classes or user-defined instances. The nice thing about
doing all those at the same time is that you can now change the sorting order of the
columns, which are the neurons. They allow you very easily
to see oh so for some of the correctly classified
instances how well does it match a particular class is
meant to be classified and for the one that are
miss-classified now at the very bottom, how
are they changing? Just visually you can
immediately solve the right one, they
match really well, for the wrong one, you can
actually identify which are the neuron that are contributing
to that miss-classification. So design-wise here, you’ll see that we’re not
using any fancy thing. It’s more about the
conceptual idea of how do we actually unify everything
and combining them in an effective way doing categorization and doing the right
interaction to allow people to add everything in there and also providing the
right sorting order. So that’s what we’re aiming to do, as in what we don’t aim for
various thing like a very fancy, very sophisticated we actually want people maybe to very easily use it. So the first challenges
I mentioned I don’t have a screenshot to show but I also want to mention is about scalability, like any industry and any company. Scalability is very important. Actually it is under the first
thing that the group discussed. So how do we achieve scalability? Is through a number of techniques. One definitely we leverage our Facebook competition platform to scale the images computation
which is Number three, and also we do selected
pre-computation. So through talking to Facebook scientists
and also looking at a lock data we know which layers, which neurons are more
likely to get used. So that means we can do selected
pre-computation beforehand. Also, each data scientist might
care about different thing. So that means for each user
they can actually say, “Oh, I care more about this
particular kind of dataset,” and so that means we can do that user got it instance
sampling as well. So that again we can do it beforehand so that in
real time when they have all their model trained this is the internal system FB learner, and after they do the training
of all pre-computation, now they can just do one click
and ActiVis would launch. So Facebook’s machinery platform is used by over 25
percent of the engineer, so now we would make it really
easy for them to try ActiVis. So something you might notice is in ActiVis here we focus
on a single model. So it’s not very easy to
compare multiple models. But in practice, we
do want to compare model or even compare dataset
or compare classes, right? So we’ll have this system
called Summit that’s going to appear in this 2019 that help
people do the comparison wisely. Class comparison, model comparison, and also we allow people to dig
deeper into the model as in well let’s say I have other images
for ImageNet 1,000 classes, the prediction is white wolf. How do you reach that conclusion in the network that the
label is white wolf? So particularly, if you
have the whole network, there are many, many
layers many neurons. How do we know what are the features? In the end we know
it’s white wolf will but how do we reach that conclusion? So Summit is the technique
and also the system that generate this kind of graph
representation to not only tell you, hey this is the final
prediction but also show you the path and the specific neuron
in this case it could be legs, could be a wafer,
could be pointy year and all these connection and important neurons that contribute
a lot to the final one. So in other words, we’re creating an attribution graph where you can attribute a final outcome to
individual neurons and connection. So I’ll show you very briefly a
video of how awesome it works. So this is the user interface. This case, I will show
you a scenario where we use Summit and attribution graph to discover a very interesting
kind of faulty but a scenario. So on the right-hand. Yes. On the right-hand side that is the attribution graph extracted for a class of images called tench. So tench is yellow fish. So we note accuracy pretty high, 92 percent, but if you
look at the features, you’ll notice that’s no fish. They are people. They’ve
a lot of faces actually. Faces, faces, ISH neurons. So here showing some examples
pulled from real data, and then after, maybe closer to the end then you
start to see some fish. Some scales, some fish. So that was surprising
and still some people. So actually, maybe that thing will
be a question for the audience. Any guess why there are people in the tench class, or people features?>>Because Facebook
data is mostly people.>>What did you say there?>>Because Facebook’s
data is mostly fishes.>>One answer is
because Facebook data, all the data is from Facebook
so you have people and->>Also.>>Yes.>>It shows the facial, mostly
people holding up their fish.>>Great guess. So it
turned out that yes. So it turned out that tench is the kind of fish that
you often will want to catch and hold it in your arms and then kind of
a trophy and say oh, okay. It is China most of it. So that means what you really
detecting very accurate accuracy. So I need to present it’s great. It’s really first
detect, is their face, is their fingers, if they are
fingers okay then they’re good. That is probably that. So that’s surprising if that lead
you to really think about okay, so detection just looking at the
accuracy not really sufficient. You wander really no,
what’s happening? Because data could change, in the future you might get new data, do not have people,
then everything can do great probably actually. So that was the very quick example ActiVis and also submit until
they’re related to that. A lot of these models
and often is that we, all these tools we build the
tools and then there are already in audience where you
could be a data scientist. But often before people become
data science or work in industry, often they need to
learn these things. Learn this model, and often we can
assume that they know about it. But in practice actually learning
this model may not be easy. For example, one of the most recent and very
popular kind of network, generative adversarial
network on GAN for short, turned out to be very hard to
learn and train even for expert. So we’ve built a tool called GAN
lab in collaboration with Google, Google AI the PAIR groups
specifically based in Cambridge, and to make that
learning effort easier. So in other words, we
want to help or use interactive visualization
and help with education machine learning education. So they’re actually quite a number of recent work going in that direction. So quite a few including TensorFlow Playground and
also other earlier work on visualizing neural network, but modern deep learnings
are actually quite complex. So many layers, many
components inside it and specifically for GAN generative adversarial network the one that you’ve probably seen
another example like fake faces that generate it or this Airbnb doesn’t exist or
something like that. Realistic looking images
but they’re actually fake. So this technique is very powerful, wafers generating synthetic
data for a variety of users and also for improving
AI accuracy but very, very hard to understand and
to train even for experts. So I’ll show you some math. We’re
not going through the math but just show you there’s
some equation there. Now you need to turn
all these equations in actual the actual implementation
and then seen in training, putting in data and so
on a lot more harder. So why GANs so hard? Beside there’s some
math, there’s there. The reason can be attributed to that it’s using to
competing neural networks. I will show you an easy to understand example first and
before going then to get detail. So mythically, using
two neural network, one we call the
generator and the other the discriminator to produce
the realistic looking data. We call it generator. Think of other scenario where we
want to make fake bills, right? So generator and GAN would
be like a counterfeiter in, the real world we’ll
try to make fake bills. The whole goal is to make
realistic looking bills. While the discriminator
component is like a police. You want to spot the fake bills. You imagine in the beginning the
counterfeit is not very good, they generate some bill but
doesn’t look like real bill. The police is doing a great
job in distinguishing okay, this must be fake right. Of course, the police
know that real thing. So I have access to the green
one, the real thing. All right. So can tell the difference, and counterfeiter get caught
and they now improve. Okay. Well, you caught me, I’m going to make the bill better. So and then at the same time, the police also up
their sleeves knowing, okay well so you’re getting
better, I’m also getting better. So you can imagine that this process repeats. It’s
a repeat and a repeat. So that means in the
end the generator is able to produce this
fake bills so well that the police have trouble distinguishing between
whether it’s good or not. So at that time, this GAN
the training can stop, so because now the
generator can produce realistic output. Yes,
there is a question.>>Police so they spot the fake and they get better. Counterfeiter. They like got one byte,
so they only get some of the data in a highly biased
strength of the data.>>That’s correct. It
kind of really doesn’t really actively probe the police and say, “Hey, you
didn’t need to do this.” So the police kind of looking
at whatever they have, and also compare to the real
thing and it kind of update. They internally is like updating, we call that decision boundaries or the more information they have. But you’re right. Yes. Yeah.
So this is very high level, so but now you can imagine
we’re not really talking about counterfeiter police
but actually these are two neural networks,
so that training. Also even for this toy example, you’ll see they’re quite a few Loops, like quite a few connection, and those are the
interaction that actually complex or hard for
people to understand. So we will show you it
shortly, why is it not. So just conceptually, we
already know that well. There is the kind of like component, interaction, conceptual understanding
of GAN that might be tricky. The other is the training. So iteratively we shall
already made the iteration, but internally actually
each of the network, the counterfeiter, or generator, or discriminator, or the police, though those are neural
network streams of string at the same time. The third one, we want this tool, if we’re successful we want it
to be accessible by anyone. We don’t want it to
be a research tool that only few people use it, when it is easy to access, easily accessible by anyone. So how do we solve
all these challenges? So going back to our scenario. So you want to explain this
now in a more technical way. So that it’s training,
looking at the data. How do we go about it? So the very first thing
you might imagine, well, okay, so I have
component, we have some bills. So that means we have the
models, we have the data. So what kind of data did we visualize or how people
understand this? We consider a high-dimension data. So I’ll use some builds
in high-dimensional data, but it turned out that that is
not really the main problem. It’s more about whether
you can successfully explain to people the
distribution of data. So in this case, how well does like the fake data and purple match to real thing. That’s more important. So that means we don’t really
need to go to high dimensional, high-dimensional as an image, images still does
high-dimensional data. Even 2D will be sufficient, so that we can focus more
on the main concepts, and at the same time also very easily visualize data distribution
at 2D more data in them. So in the Version 0.1 of
our outerwear [inaudible] , then we have the color-coding
set right at 0.1, and also we show the
animation through training. So the purple one we’ll see that this iteratively
become better and better. The real one is what the user one
on what you want to generate, and then over many iteration, then the fake thing would
look like the real thing. Okay. Yes.>>That’s really interesting. So like during the iterations it looks like, I don’t know much about
this they’re like [inaudible] that
there’s a overall bias I can see it in each iteration. Like all the points shift to
the left or right or something.>>Yeah. So there is some randomness. It also depend on initialization
and also iteration there. There are also some randomness
involved as well. So yeah. Sometimes the actual
training will fail. For example, there is something called a [inaudible] collapse or has just failed completely and
even express don’t know what, when will collapse happen.
Which is why it’s so tricky. Yes. But data, no, this is for a data
probably seems okay 2D maybe sufficient because
they’re more important thing to explain actually which is the second part about
the interaction. But before that so I want to look
at the individual components. So what do we do about that? How
do we explain the generator? The generator as the component or the neural network that created
the fake bill, so fake data. So how is really happening
in generator is that it takes some initial random,
usually it’s random. It doesn’t even random, but
usually people set at random. Then the generator or
the counterfeiter would manipulate and then turn it into something that
hopefully looks real. More technically you
can think of it as giving an input initialization. Now you do have a mapping or warping and then change
it into your target. So in other words you
can probably overly, overlay a grid on top of your input, and then figuring out how to
each of these grid cell that manipulate or reposition or
stretch or scale to the output. That’s technically
what’s really happening. So that means a counterfeiter
is actually a function. A function that get
updated again, and again, and again with the goal of changing
whatever initialization you have into what you desire which
is hopefully is fake but real. That means we are really
creating are many folds. So how do we show this? This is the time that
we think animation is really helpful where we
use the animation to warping in the middle
that the user can selectively replay and play it again. So they know how to a point in the original space get
twist and move to the end. So this together forms the visualization for the
generator, counterfeiter. Similarly you want to think
about discriminator, the police. The police is actually easier because police is kind of like a classifier. So you can use a heatmap in looking at the distribution which other region likely will
be classify as real, which would be fake. All right. Now putting together
like all of these we know how to visualize real data, fake data as purple. We know how to visualize generator, you know how to visualize
discriminator as a heatmap. Now I can overlay all of that
into one composite visualization. Then originally we thought, “Okay. Our job is done. Version 0.5. all done.” We say, “Well, the only thing left.” We say, “It’s neural network, just add in some
sliders and it’s done.” So neural network change
the number of layers. This is still 0.5. The reason why it’s 0.5 because
it’s totally doesn’t work. I actually show even
expert. They say, “Okay. It’s super confusing. I have
no idea what I’m looking at.” The main reason in the end
is because people have this mental model in their head
that we are totally not capturing. What we really want to
say if the first slider correspond to changing
something in the generator. Third slider correspond to something
changing the discriminator, but there’s no such
visualization in 0.5. So they got as to add to, well, there’s actually really important
to actually capture the flow. You want to have a
landmark mental anchor in there head so that
they know, okay, what is really the generator doing, what’s really a discriminator doing, and then we show the composite. So the end this is what we showing. We have the fake data, we have manipulation
through generator, and it’s something that looks real, and all these updated
so everything combined. So we’ll show you a
video of how that works. So in this case the user with AI, I want to generate data
that looks like a ring. So that becomes what you want, and you can set the parameters, all the parameters interactively. Throughout the network you can change the learning rate and everything, and now you can start training. So you press the “Training”. Now can see the flow of interaction. You can see how
everything get updated, how the generator get
updated, become better. How the intermediate
sample get update. It become more realistic. You can turn on or
off all the layers. Look at all those and
here you can play the animation of the
manifold, the warping. Time for the understanding, you really see how
things get twisted. You can change the
parameter on the fly. So one thing about training GAN, if you have ever used it, the dirty secret is
that it takes forever. So that’s why we play
15 times faster. After 2,000, I’m at 3,000
iterations. Now you get it. So in the end, it’s pretty good. So this is the fake thing
but looks really real. Yes. So that’s an example of GAN Lab. Typically, when you are deploying this interactive visualization tool, you often have a backend. Particularly, when
you talk to someone who wanted to do a visualization
for deep learning, they often will think
about, “Okay, well, we need a pretty beefy
expensive backend”, and a frontend is
something that you can run in browser, usually JavaScript. But then we think that
that’s really a problem because that means it’s much
harder for people to use the tool. So instead, what we do is we
leverage pretty recent development. You can actually do
in browser on laptop, or on a device computation using the TensorFlow JS Library which is essentially doing deep
learning using JavaScript, and that’s accelerated by WebGL. That means it’s your graphics card. If you have one, it doesn’t use CPU. So that means now everything
is in JavaScript. Well, of course then also mean now everything can just
run in your library, and again, it’s not dangerous. You can just try it on your browser. You can Google or Bing, I should say, GAN Lab and then that should come
up the first result. So unfortunately, also it went
viral so when we released it, it has so many likes, 2,000 likes and 800 retweets. So really exciting project. So I think I’m going to wrap up with some of the high level thinking about the Interpretable AI
and also for Secure AI. So I’ll show you some examples so far about where we are and
where we want to go. So if you’re interested
in Visual Analytics or Visualization particularly
for deep learning and you’re going to be working
more on the machine learning side, so I highly recommend a pretty recent survey
that we did with them. I think this is a great
service to community, is looking at human-computer
interaction and also visualizing research can do to help people advance
this AI Research. The reason I think it’s like, of course, I’m selling to you, but I think it’s really
nicely written survey is that it has done a
lot of work for you. Particularly, has summarized a lot of the recent visualization research using things like 5WNH category. So for example, each paper, they tell you why they’re
doing visualizing, why they are doing the explanation, what it’s really explaining. So you get, potentially, many thing you can explain. When they’re doing this
visualize explanation, who are they doing it for. Are they from model
user experts, domains, users, developers, how
they are doing it, what are their
technique? So all these. Also, I summarized some of
the key category or I’ll say takeaways from analyzing
all those 50-60 papers. Currently, a lot of
resource aimed for experts, so assume a lot of domain knowledge. But at the same time, they think more and more about
their users who may be students, who may be beginners want to
gain and to feel they want. They’re using these
techniques but they want to gain deeper insight. So we think that it would
be really helpful to have to design workflow
for beginners and novices. Often due to ease of design, a lot of tools are instance-based. Now, in this instance, you
may remember that means looking at individual example. It’s great, easy to design. But in practice
especially for industry, often we’re analyzing
things as scale. So that means we would like to see more work there focusing
on scalability. A one very nice observation we have is that the community they’re
really working together. So now, we have machine learning
people working in visualization, and we just hope that
that will continue and we believe it will continue. Number four is this. I think we need a lot more work, is the current lack of actionability. That means we’re building a lot of tools that helps surface some of the problems that people might be seeing from
their data and model, but then we don’t really
tell people how to fix them. So that means it will be even
better not only saying, “Hey, these are the problem”, but also say, “Here’s a solution one, two,
three that we want to try.” First thing it’s really hard to
get away with this evaluation. In practice, very hard to
evaluate a visualization tool. Main reason is because you
need to find all the user, find the right data, find the
right task, but it’s important. So always leave more comment that, yes, important, so we do more of it. Number six, a current time back to our Secure AI focus is
that are all these tools currently not focusing
so much on helping people to understand
vulnerability of the AI, and we think that’s a big need and because of these
models are not robust. We hope that there some focus that people would have
or want to develop. So lastly, I want to
point out we want to work coming out from Microsoft and also my student Hohman who
interned here last summer working with Rich and
Robin, Steven and MSR. They look at a very important
problem and actually I would say, we don’t really have a
full understanding of what do data scientists really
mean by interpretability? So I’ve been talking the
whole talk interpretability. If you talk to different people, they give you different answer depending on which
community you’re in. They’re going, “Oh, we’re
visualizing, explaining the internal. We are looking at the map. We may be looking at
input-output mapping. Maybe you’re looking at
representation visualization.” Actually, there’s not really
agreed upon definition. So what the team is doing, they build a system called Gamut. Some of you have seen
it already in demo. Here’s how to build
this to view the tool to figure out what data scientists really need when they say they
want to interpret something. So that means the tool itself
is not with the angle, but using the tool,
they want to use it as a design probe to understand
interpretability. So that was what they’re doing. There’s a lot of very
interesting discovery. Particularly, they use a scenario
where as they’re using that tool and suppose these are the problems that data scientists
would like to address. So how do they translate these questions into
interpretation and capability? So for example, if someone
is looking at housing data, they say, why does this house
costs a certain amount money? What they really looking at? They’re actually looking at
local instance explanation. Or let’s say, they are
interesting in asking what ifs? So that’s actually the
counterfactuals that they want to do. Or if they’re asking how do we find similar
homes prized similarly? What they’re really looking for, the interpretation task they have is actually looking at
the nearest neighbor. What are similar things? So these are the capabilities that actually data science came in for. So what that really means is, take away, and is interpretability. Often, people talk about it, not well-defined and
it’s actually natural, and I’m not sure if we really want to really force it into a one box. Very rigid things. It’s not really rigid concept. Interpretability is more like a collection of things
that you want to consider. Often, when people say
they want to interact, so interpreting things is really
depending on the audience. So if you’re a domain user, you may want to know a
lot more information, but then you don’t want to
know the technical details. The last part I want to
emphasize is the interpretation. Often, people think of it as, “Oh, I’m going to show you some aesthetic visual or a text
and then my job is done.” So through this research with Microsoft research colleagues
is that interaction is the key. So actually, you can’t,
almost impossible, to really do interpretation
without interactivity. The main reason is because
doing interpretation, inherently is question and answering. You see some results. You need to probe further.
How do you probe further? If you’re static, just
on a piece of paper, you can ask us question. So through interaction, that allow the user to
ask those questions, realizing interpretability
and also understanding. So from now on, you can
say, “Okay, so we are doing interpretability explanation.” The first thing you want to ask them, “Do you support interactivity? Hey, do you support
question answering or are you just showing me
the result and that’s it?” So there was work that was just
published at the Chi 2019. Really great and I look forward to more collaboration with
Microsoft colleagues. So I’m going to wrap up. It’s like we overrun. No, I’m not overrun,
I’m good on time. So I’ll talk about two
category of research that we have been working on. Secure AI and also Interpretable AI. So you start to see some
connection between them, and we’re doing a lot more
starting this semester. For example, the semi-tool
that you have seen creating attribution graph to really dig
deeper into their techniques. That’s what we plan to do to
apply it on all these many, many attack and defense techniques. So that we really know when
someone say I’ve a technique, I have a defense method. What is it really protecting? Is it protecting a
particular component of your network or not
really doing anything? So that is the
connection that we have. That is why I say Towards
Secure, Interpretable AI. A lot more work to do and I think
we have some good foundation. I look forward to all
the things that we do and also collaboration that we
have with Microsoft colleague. Yeah. Thank you. Thanks.>>More questions?>>Yes.>>So at the beginning, you talked about these physical
attacks on AI systems. So where I printed this stop sign, the background is slightly wonky, but to all of us it’s
obviously a stop sign. Then you talked about
these defenses for what seemed like more
of the digital attacks. Have you tried applying them
to the physical attack? So if I take the pictures you
took that are physical attacks, then I apply these compression
defenses, what happens?>>Great question. The question was whether we tried our defense
as a shield defense, compression-based on
the physical dexterity to attack there. We actually did. The small compression,
it doesn’t work. It could not overcome the attack. The reason is that, as you
saw in the ShapeShifter one, is actually pretty conspicuous. The overall, semantically,
you would say that it’s still like a
stop sign, as in this one. But then, those perturbations
are strong enough that compression alone
cannot overcome, so which is why briefly
I mentioned for the [inaudible] Project
which we’re in negotiation. So we hopefully will work on it. That program is purely
based on defense, focus on defense for
these kind of attack, where we will need to explore
more than compression. So one idea that we have
is to look at how to incorporate or to add in what
human would take for granted, as in we have this sense
of a temporal consistency. As in, if you’re showing a
video, driving towards it, you would not expect,
like a human being, you wouldn’t expect while
one frame is a stop sign, the next frame is a person,
the next frame is a bird. So that broken. For example, this one
that makes sense. Why would all of a sudden that have a lot of people floating in the air? So we want to add in some
of these into the defense. But to answer your
question on compression, compression alone is not
sufficient to overcome that. So in practice, we do
expect using a combination. We will certainly come in later
with compression as well because internal compression
often actually help to increase the model accuracy too. Because our model currently
they’re not trained on any compressed images. Yeah.>>Sorry. But I think this
is related to this topic, so if you don’t mind. So I previously did a lot of
steganography and steganalysis work, so this is hidden information in
digital images and ITO as well. I feel like there’s a lot
of similarities here, and as an electrical
engineering perspective, what I’m thinking about is that
the compression, at least JPEG, standard JPEG compression is about cutting off the high-frequency
DCT compartments. But where the research was
going at steganography was that more effective
steganography efforts or algorithms comprised spread
spectrum embeddings. So they will embed across low, medium, and high frequencies instead of just worrying
about the high-frequencies. Then inspired by this
question, and it seems like the physical thing that you generated actually has a lot of low
frequency information and, therefore, your compression
algorithm didn’t defeat it because it’s only eliminated in the
high frequency stuff. So finally, the question is, have you thought about spread
spectrum adversarial defense?>>Yeah. So the one example
I related to that is, the audience may have seen it, is there’s a composite image. If you look really closely, it’s, I think, Einstein. If you pull it really far away, it become Marilyn Monroe. So that’s using the low frequency
and high frequency thing. So yes, certainly that’s in our mind. I think the tricky point is going into why would we
did the interpretation, is because actually
right now we don’t know why it’s doing that detection. It looks like low frequency, but low frequency is exactly
detected how in the model, we don’t know, but yes. So we do want to look at that frequency spectrum and
also throw in other things too. For example, here there
are two human beings. You can say there’s a shape, and then there’s a text. So a lot of times, where this model
they don’t even consider that. It’s just throw end-to-end, thrown n the image, stop sign. So those are semantic meaning
and also like a low frequency, the component in its own. So yeah, we want to do it. So we don’t know how yet.>>Okay. Thank you.>>Yes. Anything else? Yes.>>So actually I have two questions. One is regarding the GAN Lab. So a lot of time the low class happens
and it just never converges. I don’t know if there’s any visual interpretation from
like why that will happen. I don’t know from that
what you guys did. Did you guys try to explain
if certain things happen?>>Yeah, great question. So we
do see that it happen interface, but we still don’t know how to explain because even when
we were creating it’s hard, because of randomness, unless we
have exactly the same setting. But currently, no. We
don’t know how to explain. We talked to some experts
on that and more collapse, and one user there, “Oh, yeah, this is great. So I can visualize that kind of progress at least for that
particular setting initialization, I know more collapses is going
to happen where it’s very hard to know what the full sequence
or actions that lead to it. So the answer is no,
we don’t know. Yes.>>You said that you could use- I assume you use these labs for a banner that has higher
levels of dimensions. So how are you projecting
into two dimensions here?>>So actually, currently, can support up to two dimension. So it doesn’t go higher. It can do 1D too but
we let it go to 2D. The deliberate reason, the
intentionally reason for doing that is that because for 2D it’s actually already high enough
in there for understanding. So we wanted to keep it simpler. So if you want to use
for high dimensional, you do need to project
it down for us. A dimension reduction down to 2D. There certainly will be a lot of information loss,
but that’s possible. But once you turn to
2D, you can use it.>>Anyone?>>Yes.>>I have a question. Regarding the interpretability
tools that you’ve created, and specifically the
one for Facebook, you mentioned that it’s mostly for mobile developers to
understand their models, but then what about communicating
those models, for example, with product or project
managers or UX designers? Are these visualizations or
results helpful to communicate models or is it only for advanced
users like model developers?>>Very great questions.
For activists also, this is mainly for modal developers
and engineers, if you would. For communication, I think
this might be too complex, like too many moving parts. Some are certainly
helpful for the model, for developer because,
for example, here, they would want to see all
the individual instances, while for communication
maybe you don’t. Maybe you do, but
definitely not that many. So activists currently
is not for that one. I would imagine if you wanted to
build a more communication tool, then it needs to be
significantly simplified, and so to be much more focused. So to be more distilled
from this visualization, but currently, actively,
it doesn’t support that.>>Do you know if there
are any attempts in the research community to target those kind of audience
as opposed to model developers?>>Yes. Even here at
Microsoft Research, like Saleema Amershi,
for example, I met with.>>Yeah, I know of her track.>>Yes. Also Stephen Drucker, that’s the mentor of my student. So they have a lot of human-centered
AI effort and now they are focusing also on the end user, and also, as I understand, it’s also providing advice to
actual product groups on how to, through the user interface, do more of these explanation
and communication. Yeah, I think there’s
great work coming out. I read a lot of them and you will notice a lot of the tools
I would develop currently, as more geared towards maybe, I wouldn’t say expert, but at least people who know, are familiar with some
of the techniques. But we do want to bring it to have more of this tool
available to novices. For example, for for GAN Lab, that’s the direction that we go
to where it’s very accessible. People would come and play with it. It’s great for students, is already using some classes, and so there’s very
low barrier entry. So I imagined that would
be the direction. Yes.>>>Thank you.>>What’s your favorite
JavaScript toolkits these days?>>What’s my favorite
JavaScript toolkit? I always think JavaScript is an unfortunate coincidences
that happened before, which I think is true,
actually, is true. But then now, of
course, all my students are doing Python and JavaScript. So we do use a lot of D3 for the visualization because it’s programming so
you can do whatever you want. We do use TensorFlow, JSLR. Recently when we did
more scalable using GPU. We do that, and then
there’s also the library, like I’m offering where it
will react and we do it. Partially because of Facebook, that’s where we got started and
they went to work with Google. They have their own similar things. So those are the common
ones that were used.>>Of all the tools
you showed us today, are they all open
source, or some of them?>>Are they all open source? Yeah, so all of these are open
source except ActiVis. Yeah, ActiVis because
it’s on Facebook. Some of it is open source,
ShapeShifter, yeah, all. Actually, let me go
over to- [inaudible]>>Let me say, as an industry, essentially, we thank you for all your open source contributions That’s really helpful
for the rest of us.>>Yeah. No, we like to. So all the four here are open source. Here everything except ActiVis, even our survey is open source. So we want to add more
papers to it. Yes.>>Any more questions?>>Okay. Thank you

Leave a Reply

Your email address will not be published. Required fields are marked *