Best Practices for Using Tableau with Snowflake:

Webinar Transcript – Conor Knowles, Product Marketing, Tableau & Kelly Hotta, Sales Engineer, Snowflake

Kelly Hotta: My name is Kelly Hotta. I am a sales engineer at snowflake prior just stuff like, I was actually at Tableau for about five years. So this, uh, this webinar really it’s quite dear to my heart and I know that we’ve worked with lots of customers using both tools I successfully and so hopefully after the Webinar all of you, I will have a, a couple of ideas and some tips to go back to your gut to your work with. Um, great. So a couple of housekeeping items before we kick off. Again, there is a Q and a box at the bottom of your screen. You should see that, uh, there for you to be able to ask questions, feel free to ask questions that we’d go, um, we have some, some kind of what’s on her mind, who can ask you then, uh, I, the webinars progressing or um, we actually have some time at the end as well to answer your question. Um, and with that I will introduce my co speaker, Conor Knowles from tableau. Um, and he’s going to kick us off and just talk a little bit about tableau. We will jointly talk about how the two products sort of things together. And then Conor, we’ll dive into some of the specifics around catalog performance, uh, and, and some chips. All right, Conor.

Conor Knowles: Perfect. Sounds good. Thank you Kelly. I really happy to be joining you today and thanks everybody for taking the time out of your days to just see what Kelly and I have in store for you. So, um, a little bit about tableau that I’m going to speak to a little intro to myself as well, but I always like to kind of start by showing a slide like this with Tableau because ever since, you know, Tableau has been a company since 2003. Yeah, we’ve always had the same mission statement and that is we help people

see and understand data and the focal point there is a people, uh, I’ve been at tableau for a little over three years now. I’m on Tableau product marketing team specifically working with technology partners. That was actually a presales consultant, uh, for a couple of years before. And I’ve, I’ve worked with a lot of different customers and it really holds true that people statement, I’ve seen so many different types of people using tableau.Conor Knowles: Um, you know, data scientist, actual scientists, analysts, it professionals, Dva is journalists, teachers, students, you name it. And it really is for anyone and everyone. And I’m really happy to be speaking to you today on behalf of Tableau and how you can best use tableau with our great partners snowflake, to really give you, your, you know, to give your analytics platform the speed and optimization that you require. So happy to be here. We’ll kick into, uh, some of the content real quick, but I’ll hand it back to Kelly to tell us a little bit about, about snowflake yet.

Kelly Hotta:

Great. Thank you. Uh, yeah. And so I think for me the biggest thing here is that our mission statement is really go well together, right So Tableau is there to help people see and understand their data. And we’re here to enable every organization to be data driven. And the way that we approach it is again, some more of the backend right there where the engine that hopefully will drive all of those amazing data driven projects all of you are, are embarking upon. Um, okay, I’ll keep it short and sweet. Conor, go ahead and not, and give us a little overview of capital.Conor Knowles: Perfect. Thanks Kelly. So yeah, however you want to start today is anytime you’re talking about, you know, best practices, there’s, especially with Tableau, there’s, it’s a really broad area. There’s so many different things you’d have to keep in mind or consider when trying to make that a dashboard or that workbook.

Perfect. So what I want to start by doing is, first of all, just tell us a little bit about the tablet platform and then go over some general performance tips and then we’ll break into snowflake as well. And to see how we can use each other, uh, well in conjunction. So let’s very quickly, depending on how familiar some of you are, let me just tell you a little bit about the Tableau platform. We deliver an enterprise analytics platform that will empower and elevate your people, you know, it’ll really increase the value that they provide your organization.

Conor Knowles: And we also deliver a platform that is going to meet the Tufts security governance and scalability requirements of an organization like yours. And the platform consists of really four main products within tableau. There’s tablet prep helps you prepare your data for analysis by cleaning, shaping and combining your data. Tableau desktop, which I’ll actually show a little later in the demo, is our offering tool. You can connect your data wherever it might be a use powerful drag and drop analytics to help create visualizations and dashboards. Then there’s the sharing component. There’s Tableau server where you share your dashboards and findings, uh, deployed either on premises or in the public cloud. And there’s also Tableau online, visually the same experience as Tableau server, but it’s deployed in the cloud and managed by tableau. Ultimately it with Tableau, we want to provide you with a lot of choice and flexibility.

Conor Knowles: So you can choose to deploy on premises or in the cloud like AWS, Google cloud, Azure or in a fully hosted environment like Tableau online. And you can also choose to deploy on an operating system of your choice, right Windows, Linux, Mac, depending on if you know what you want to use and you can interact with Tableau securely via the browser, mobile device or even in bed, your analytics into other applications. And you can connect to that data easily, whether it’s flat files or fast database like snowflake, big data, cloud based data, or any other application data. There’s no need to move your data. Tableau can fit into your environment and then you can choose to query that data, either live or extract. We’ll get a little into that later. But ultimately Tableau is all about giving you that flexibility so you can choose what you want to deploy, how you want to do it.Conor Knowles: And furthermore, so as I said, I want to start off with some general performance tips with Tableau.

It’ll be a good segue into really finding out how do I know if there’s something better I could be doing as far as optimization on my workbook. And that’ll go really nicely into kind of using that in conjunction with snowflakes. So let’s take a look here at a couple of tidbits that I like to remind people of when we’re talking about performance and Tableau. And the first thing I really want to start with is that beginning phase, that connection phase. So when we’re talking about performance, it’s always good to start with the connection and Tableau supplies a multitude of options for connecting to your data. As you can see by the long list in tableau desktop, when we connect to a data source and we have a number of native connections, it’s over 60 or so.

Conor Knowles: But because data lives in so many places, we can also utilize things like the ODBC drivers. Essentially you’d want to connect to something not listed in that long list. You can, you do driver. Good example there is snowflake, there’s a snowflake ODBC driver, but we also have a native tuned connection with snowflake. And it’s really important to keep in mind that those native connections mean Tableau is implemented. Techniques, capabilities and optimizations that are specific to those data sources. You know, different engineering and testing activities for those connections really ensure that they’re the most robust that Tableau has to offer. So that’s why you should always leverage in native connection over an ODBC connector, snowflake prime example there. So it’s a good thing to keep in mind is what’s my initial connection, my utilizing a native connection there or not Conor Knowles: And then we also can point to design.

This is always important because regardless of, you know, the speed, if your connection or you know what you’re using on the backend or you know, it really comes down to how have I created my workbook, how have I designed this I always described Tableau is a blank canvas. There’s a lot you can do. There’s a lot of possibilities towards what you can create specifically when you’re trying to make that really perfect dashboard. So structure is very important. So I always the presales consultant at Tableau, I saw a lot of times customers maybe trying to use too many sheets on a dashboard and you know Tableau is having a query, you know back and forth for each of those sheets are each mark that you have on the view. Good, good example there. It can be like maps for example.

Conor Knowles: You might have millions, a few points and think about the person viewing it as well. Is it really useful to have so much clutter even from just the standpoint of being clear or simple as far as trying to convey what you’ve made. So you know, having too many sheets in a dashboard might not be great as far as understanding what’s going on, but also from a performance standpoint, Tableau has to query or you know, render a lot of different things at once. And we’ll go into a specific example of this in the demo portion as well. But definitely try to keep it down to maybe you know, four or five sheets depending on the size of them and maybe communicating across different dashboards so that you don’t have to bring every sheet up at once and you can kind of give that end user the ability to, okay, if you want more detailed information here, why don’t you click into this action and you can hit this button and action and filter will take you to another dashboard, help conserve some space and really make it a little more organized.

Kelly Hotta: Okay.

Conor Knowles: And then we can go back to, you know, specifics around only working with the data that you need specifically. You know, we’re, we’re always looking at billions and trillions of rows. Now that’s, that’s really just the new reality with data, right There’s so much around us. We oftentimes need to analyze all of it or most of it. But it’s important to know that you, you need to know what data you actually need for the questions you need to ask an

answer. So one of the first things you should do once you connect to a table and Tableau, add a add a data source or extract filter and your database. You know, maybe a good example is that might have years and years worth of data, but you only need to examine the past month of sales. So you can tell Tableau that’s only pulling the most recent month’s worth of data.

Conor Knowles: And that right there is one way to truly optimize, you know, all of the subsequent queries that are being run underneath back to the database. Those are going to speed up exponentially because you’re talking about only looking at a small subset of the data. Maybe it gets a little more complex to where you know you do need those years and years worth. But then you can start looking into things like, you know, aggregating certain extracts, rolling them up to a certain level. So you know, maybe if you don’t want 365 rows of data of sales every day for a year, you can roll it up to the year level. So now you’re looking at only one row for the year. You know, things like that are really some good performance. Things to keep in mind to kind of tidy up what you’re queering so that Tableau is only looking through the data that you need it to

Conor Knowles: and what’s also kind of keep going on. You know specifically maybe the layout. This is sort of a design and layout but also a really popular one that I like to talk about a lot and it comes down to guided analytics, which we like to emphasize at tableau a lot. There’s, there’s always instances where filters specifically those filter dialogue boxes are needed in a dashboard or a worksheet, but every time you’re clicking within a filter box, Tableau has to query, you know, it has to go back and get that information and maybe you have a very long discreet list.

Tableau is to go and find each and every single member of that list. Well guided analytics really maintains the use of action filters within a dashboard and it’s much more interactive, much more intuitive for the consumer of the dashboard because you might select maybe a high performing category that you can easily see and naturally see through a bar chart and once you make that selection, the other visualizations filter down accordingly.

Conor Knowles: So it really gives you more insight to influence where your next click is or where your next question is with filter boxes. You know, oftentimes when I see filter boxes used, um, they’re their longest street lists. It’s a lot of guesswork as far as, well, you know, what region do I want to see What customer, what do I want to look at This example we see here is a really great example of kind of click to see where the data is going to show you where to go next. You can see that I detailed to you in the bottom right showing you only had a number of different roles. It really helps you find that final answer a lot quicker.

Conor Knowles: And then keeping with filters is knowing when you do have a filter box, for example, how to use them and keep in mind what sort of things are happening depending on how you use them.

So a good thing I’ve seen a lot of people will use only relevant values. You got those long discrete list one filter selection. If you have only relevant values actually might affect more filters in the view, right So it can turn into a really time consuming query. You’re not just selecting one but you’re selecting multiples, they tie into each other. So try to use only relevant values sparingly as it can cause the filter list or the range to be refreshed whenever other filters are often change. Maybe consider appointing the filters to smaller data source for the dimension values and using something like cross database filters to push the where clause to the main data source.

Conor Knowles: So I’m also small things like if it is a long list and you have to select a number of different members or maybe customer names, you know, utilize the apply button, you can add, you know the apply button. Do a filter. So it only runs the query once you’ve made your selections and decide, okay, this is the list I want Tableau to go and find, let’s select apply and that can save you from having a query after every single click. So small things like that can really contribute to, you know, some, some noticeable increase in performance.

Kelly Hotta: Okay.

Conor Knowles:

And these are, uh, sort of a couple phrases I guess that I, that I’ve lived by. Um, ideally when we’re using Tableau, we’re connecting to data. It’s we have to think about what, what’s the backend like Right. And that’s what we’ll get into a lot of the snowflake conversation as far as, you know, making sure we have a really strong backend because if it’s not fast in the database, not going to be fast and Tableau is not kind of magically speed up our times. And then additionally, if it’s not fast and tableau desktop, right Once we first create the workbook or the dashboard, it’s not going to be fast in Tableau server or online. And when I hear customers trying to discover, well, how do I know what the issue is How do I know if it’s, you know, phrase number one. Here are number two.

Conor Knowles: And a good thing to keep in mind is just what’s happening.

Once we sort of click into, you know, some of these different um, workbooks on tableau server, what does Tableau actually doing when I’m trying to go and access a visualization So that’s what I’m going to get into next is, uh, once we click into a dashboard, I want to know what’s kind of happening underneath the hood. So here’s a good chart that I like to show. And it basically, when we’re in the browser, we click into our workbook. There’s, there’s a number of steps that happened, lightening fast behind the scenes, but sometimes you can experience a slow down, right And it’s very important to understand where that’s occurring. So like I already said in those general performance tips, it can come down to a lot of things. It can be dashboard design, maybe it’s too many marks or sheets to taken a while to process.

Conor Knowles: Or maybe Tableau has to calculate a lot of different table calculations within the view. So you know what we need to keep in mind is that maybe it’s also something where wherever the data is coming from, how do we know Well that’s where Tableau sort has a mini analysts that can help us out at our side to help determine the potential cause of workbooks, slowdowns. And that’s what I want to demo real quick and it’s called performance recorder and this is something that you can turn this performance recorder on from tablet desktop or you can turn it on in the browser in tableau server or tableau online. And for me it’s always a great place to start to really decipher where the issue is. So if you open a workbook to taking longer than you’d thought to fully render and show you the complete dashboard, this is a good place to start to find out, okay, where is my issue, where is I need to identify that that performance issues.

So let me go ahead and let’s break into a little demo. Let me show you how we can use the performance recorder and for that will open up a workbook I have.

Kelly Hotta: Yeah.

Conor Knowles: Okay. All right, here we go.

So just basic dashboard actually made pretty clear. It’s got some nice labels up top, but maybe I’m experiencing some sort of performance slow down, right So right now I’m a tableau desktop process is similar in tableau server and online, a little different, but all the same works very similarly. So if I go up to the tabs up top, I can navigate specifically to my settings and performance tab. And you’ll notice there’s an area down here called start performance recording. So this performance recording here, once I click it, we’ll see that nothing is really happening,

Kelly Hotta: right

Conor Knowles: And so once I actually start to click into these different areas can see that just going about my normal routine, maybe clicking in to see, you know, have a dashboard, is working that come in and change some of these regions. There we go a little further, click into one of these action filters, taking another view.

But Tableau is just recording what I’m doing. I’m in the background and it’s keeping logs of that data. So now what I can do,

Kelly Hotta: okay,

Conor Knowles: I’ll go to settings in performance and I’ll stop the performance recording.

Kelly Hotta: Okay.

Conor Knowles: And a workbook will come up. Whoops, that’s not the workbook.

Conor Knowles: One Sec. Minimize that. There we go. I think I came to my other screen. Okay, let’s drag that over. Here we go. So tablet pops open a workbook and what we can see here is a few different sheets, right And tablets going to show me, you know how long it takes each action to occur. So every click that I made Tableau at some data on it and we can see if I’m a pop into the sheet, you know a nice timeline showing how long everything took size these gantt bars or size.

So I can see the number of different queries that took a little while to run about 0.8 seconds. It’s still not that long, but we can see relative to the other actions I took. These were the things that took longest. If I go back to the dashboard, what’s useful here Because I can see what took the longest.

So a lot of queries being run, I can even click into some of those and I see the SQL down below of what exactly is happening. Right So that could be good to help us understand. Okay, why might this have taken so long I can see the specific worksheet and as I scroll down we’ll start to see some different areas and maybe adjust this filter is we’re not seeing quite all of them.

Conor Knowles: And if I scroll down here, we can see, you know, something different from the executing queries. I can see competing layouts. So maybe it’s a situation where I have, you know, a table calculation or something that’s taken a little longer to render. Right. Um, don’t see an example here in the events, but maybe it’s a geocoding issue. Maybe I have a lot of different marks on a map so you can kind of start to see how Tableau is telling us a little more of where time is being spent and easy enough to do. Right. We just turn it on in that help tab to help us see where some of those issues are occurring. But, uh, that that’s what I wanted to show for the performance record or easy enough to use and sort of show you those beginning stages of just where you can identify some of those issues.

So now what I’ll do is now that we know that that’ll be sort of a good transition into, okay, maybe it is something coming from the back end, maybe I need to speed up a little more. How can snowflake help with that So I’ll turn it over to Kelly and she can tell us a little more back in the snowflake side.

Conor Knowles: Okay.

Kelly Hotta: Hopefully everyone can see. Um, cool. So performance is one of the main reasons we start speaking to a couple customers. Tableau for a while and not, um, you know, with, with Tableau on its own, if you don’t have a fast engine underlying, again, it’s only as good as what you have under the covers. Um, and so from my background coming from tableau, that’s sort of that the angle that I was a coaching, this all that I would hear from people and they tell me that, um, you know, what was going to be building a data warehouse and get a single source of truth, um, it can be really performant. And I’d always in the back of my mind thinks that they’re chasing this will sort of Unicorn that didn’t exist.

And the reason for that is really because of what I saw. And I thought people struggle with this in many different ways. It’s the reality that I saw was with more like this image of what data is siloed data is treated sort of Isaac. Um, I have this sort of precious commodity that has to be protected and um, you know, restrict dude. And so people that want access to this data, all of the,

Kelly Hotta: Oh, it’s a key audience, not very clear. Hopefully it’s better now I’ll be closer to my mic. It’s all the people that want access to this data aren’t able to have it. Right. And the people that are managing the data. It’s not that they’re not wanting to give access, it’s that they can’t, it’s that the underlying infrastructure just couldn’t handle it. And so, um, this was actually an image that a customer of ours put up on the screen when they presented to their leadership to describe the feeling of, you know, moving from their old provider to snowflake. They’re also a tableau customer and they told us, look, with snowflake in place, now we can do all of the things we wanted to. We can um, you know, give access to our data to all the people that morning. So our vision is to really enable this a world where people can have fast access to data at whatever time they need it and make decisions from that information. And our solution is we built a full data warehouse from scratch to address this particular challenge that we saw.

Kelly Hotta: So really high level snowflake is, it’s three things. We are a full SQL data warehouse, meaning we are Ansi SQL compliant using snowflake. If you’re coming from any other SQL database should feel really familiar. You shouldn’t have to learn a new language or skill or hire for those skills. We are just a sequel database. So their data’s stored in tables. Views seem as as you would expect. The second piece is that we were built for the cloud so we weren’t built on any other existing technologies. Our founders set the first two years in stealth mode buildings, no flick from, from nothing. And the, the idea they had in mind was we’re going to build it for the cloud because the cloud is what will give us the elasticity and the scalability that we need in order to really make this thing work. And then the last piece is we are a full service. So we are a data warehouse as a service. We are, again, because we are in the cloud, not only do you not have to worry about upgrades and patches and maintenance, you also don’t need to do any indexing. There’s no vacuuming, there’s no manual backups you have to worry about. And so as a DBA you’re freed of all these sort of janitorial tasks of managing a database and really focusing on, you know, the data cleanliness and lineage and cataloging and all of the really important things surrounding those tasks.

Kelly Hotta: Okay.

So going a little bit deeper here, really what differentiates us as a product and as a platform is our architecture. So again, our founders, they were from oracle originally and when they were there, they sort of saw the constraints of existing technologies in that either they were these shared disk systems where you’re bound by as a box that you’re running on and you sort of start to see contention between compute, compute nodes trying to access a single copy of data or they were the shared nothing distributed type architectures where the storage and compute are really, really tightly coupled together. And then you have many nodes to run on. And the idea here was that I was supposed to be scalable. What we’ve seen is, uh, yes it is scalable but it’s not elastic, right Scaling these things, it’s not a matter of just pushing a button.

Kelly Hotta: It involves a lot of time and effort and a customer told us recently it would take them 25 hours to increase the size of their cluster. So realistically that’s not going to be possible, right So what’s snowflake What we’ve done is this sort of, um, merge the two is we’ve created a completely new architecture.

We call it the Mulkey costs, shared data architecture. We have a single copy of, of data and our storage tier, and that’s completely separated from the compute here surrounding it. So storage and compute are completely decoupled. And what that does is it gives you flexibility to a, grow each piece as needed so they’re not forced to grow in, in tandem. And B, you’re able to spin up compute on demand. So you can, you can spin up compute clusters for specific jobs as you need them. And as soon as you don’t, you can turn that off.

Kelly Hotta: So we call these compute nodes warehouses. So going forward, I’ll just be using that term. But if we take a look at how this might play out with Tableau in your organization, well it all starts with what’s your data, right So either you have data that’s already in the cloud, which makes it much simpler of course, but even if it’s on premise, you know, getting that data into snowflake is a matter of simply running a copy command. Um, and, and to do that work, to do that data loading, we need compute power. And this is where we would spin up what we call a virtual warehouse. Now in Snowflake, these virtual warehouses are, again, they’re up to you to decide how big or small they need to be.

And he signed them according to what we call t shirt sizes. So extra small is our smallest warehouse that’s a single node. So here as this example, in this example here, we might spin up an extra small and then run a coffee, come mental load data into snowflake.

We could also then think about, of course, once it’s in a snowflake, you know, it’s in that logical model.

We’re going to now query it. So with Tableau we might say we have some dashboards that are running in tableau, online, tableau server. We will give those users their own compute power. So in this case, a medium warehouse to run those jobs on.

Kelly Hotta: The powerful thing here is the, again, separation of uh, not just the storage and the compute, but the two compute notes, right So you have the extra small warehouse doing the data loading while you have the medium warehouse on the right, running your operational dashboards. At no point to those two compete for resources cause they’re completely isolated and dedicated. Same thing goes for different types of reporting. So you might have your, your operational dashboards running there, but you also have some advanced analytics that you’re running in, in tableau desktop. Um, the advanced analytics requires a bit more horsepower and so you’re going to give them a, a larger warehouse

Kelly Hotta: No, because we are in the cloud. Again, not only are we scalable, we’re super elastic. So there’s two main ways that we, we scale snowflake. The first is that we can give you a faster engine. Great. Um, and then the main sort of use case we see for this is people running models are advanced things, uh, advanced analytics in tableau or whatever tool they’re using. And they’re essentially wanting more lap to go faster. And so if you run that on an, on a large warehouse to start and you find that it’s too slow, what you can do is write a single command, an increased the warehouse side on demand, right So now I have an extra large brown house to run that work on. It’s going to be that much faster.

We see linear scaling, uh, with, with warehouse sizes and when work is done, I can simply turn it off. And at that point I’m not even paying for compute at all.

Kelly Hotta: Cool. So other way that we scale is more for a situation like this. And I’m sure there are a lot of people on the call today that I’ve experienced, not just traffic, literally, but uh, traffic to try to get access to their data, whether it’s in a dashboard or trying to actually analyze it. Right And this is those peaks that happen in every business. Is it, if it’s end of quarter, um, you know, end of financial year, Monday morning, uh, we had a customer that, that said that their entire sales team has tried to get at their dashboards every Monday morning and they would have to wait in line to get their request fulfilled. Right And it makes pug well look that at the end of the day. But really it was the, it’s the underlying infrastructure that can’t handle that peak peaking concurrency. And so what feels like what we can do is we can enable what we call multi foster warehousing.

Kelly Hotta: Now this gives you the ability to set a maximum number of warehouses to run it sued as concurrent high concurrency begins to happen. So say for example, we set this operational dashboards, uh, warehouse have a maximum of five. Well, what that means is as soon as people start hitting the tablet dashboards, say those, all of a sudden a hundred people trying to access that data and queuing starts to occur on snowflake, we will automatically spin up a second, a third, a fourth, and up to a fifth warehouse to accommodate for that peak. And as soon as the peak subsides, we will switch, goes off automatically as well.

Kelly Hotta: Okay. So in terms of how to look, how you would actually go about doing this, um, whether you expose this to your analyst or you’re just doing it as a deep gay, it’s really simple. So the interface in Snowflake, this is just a screenshot, but scaling off as a matter of just choosing the size of your warehouse. Right And you can of course alter that size a on demand scaling out. It’s just a matter of setting the maximum costumes to something more than one.

And then finally we also have the ability to automatically stop and start warehouses to again, really mimic your usage rather than have to pay for this thing 24, seven you’re only paying for it when you’re actually using it.

Kelly Hotta: Okay. So a couple points around, um, what we can enable for tableau users, both the admin and the, the analyst. The first is that we allow you to ingest all types of data, whether they’re semi structured or structured. So Jason, XML, mostly the common ones we hear about are like clickstream data, lot.

Anything that’s machine generated, we can adjust that as is. And what that means is from a DBA or a data engineering perspective is you don’t have to manage a separate pipeline. Um, from an analyst perspective, this means you can have access to all of this data in one place and not have to pull it in from different sources and different excel files and data dumps and things. The second is that we give you the ability to create zero copy code. So whole clones of databases and tables can be created instantly without adding cost and without adding any maintenance overhead in terms of managing a separate environment.

So, um, if you want it to run an experiment on some data and not have to worry about ruining the production copy while you can create a clone for that work specifically. The next is that we have continuous data protection, which essentially allows us to recover from any point of time. It’s sort of like the back button in tableau desktop, which is probably the most viewed buttons. But essentially if you drop a table, if you alter a table by mistake, you can recover from that immediately within a specific time period that you set. So it goes anywhere from 24 hours all the way up to 90 days.

Kelly Hotta: Okay. With that, I will do a quick demo ice cream. Okay. I’m hoping that worked.

Kelly Hotta: Maybe I’ll get a ping if that doesn’t. So, um, this is the snowflake interface. Uh, super simple databases with where you store your data warehouses is where you decide what compute cluster you want to run. So at this point we have a load warehouse.

It’s a small size, it’s bended. Um, but what I can do is I can go to my worksheet and I can start running queries, right So I can do some data loading from this worksheet. I already have some data loaded into this environment. So what I’m going to do with some Tableau, just log in to this environment and logging in as a different user. So of course we have role based access here and as soon as I log in, I have this dashboard, we created a story not to wash me, create it.

But essentially now as I start to interact with this view, all of the queries are going to be running live to snowflake because we’re utilizing a life connection, right

Kelly Hotta: So I can start to see, you know, the weekends, how people wake up a little bit later and take some spikes. Um, I can, I can click on a specific point to see how people, where people are taking city bikes around New York City and while these crews are running, I can also look to do some analysis on the query time, right So I can pull in the data from what we call the information Schema to start to analyze how are my query is running Are there any slow running queries Where was the time spent And I can take a look at that in more detail. Cool. So quick demo. I’ve just loading data and then pulling in more, uh, more fields into the tablet with you. Well to do that. First of all, I’m going to create a clone because again, I’m used to experiment in here.

Kelly Hotta: What we’re going to do together is we’re going to load in some weather information. The weather information I’m going to load in though is engaged on format. So I’m going to show you two things. I’m going to show you how we’re going to clone to create a Dev environment. It shouldn’t be create instance. We’re cloning the full database that we’re done. So if I refresh on the left hand side, we now have this dev environment to work in. I’m also going to create a separate warehouse called my dad’s work cause I don’t want to again interfere with anything happening on my production warehouse. So we’ll, we’ll do that and we’re done. And then we have again this weather data. So I’ll just show you what it looks like from the source in this Jason Format. So normally tablet doesn’t handle particularly well, especially as it starts to get quite complex.

Kelly Hotta: What we can do is we can load it into snowflake first of all as it into a column by itself. To do the load work. What I’m gonna do is I’m going to give us more horsepower. So I’m giving us a faster engine here, giving us an extra large, and then I’m going to load that data in. So we’re loading, uh, just under 40 million records of Jake’s not here and it shouldn’t take, I dunno, about 15 seconds or so. So I think it’s going and loading into, uh, into snowflake here. We’re doing some work on it. We can talk to what that, what’s happening. Is it a cover in just a sec. So it’s taking a little longer than I thought, but we’ll let that load and as soon as it loaded, what I’m going to do after that is essentially make it available to my analyst, Caesar. Um, okay, good.

Conor Knowles: Yeah,

Kelly Hotta: of course. The test have happened during a recorded demo. Um, I’m going to stop that and just see what it’s going. All right, well let’s try that again.

Kelly Hotta: All right, we’ll try that again. That’s going on. Just keep in mind that we are loading, you know, quite a, quite a lot of data here and that one seems to be going a little bit better. And I think it’s going a query ids it’s associated with, with Cleary. So I can actually drill down into the query ID and see as it’s running what it’s doing. So as a DBA, can I have insight into what’s happening Where’s the time spent Right How many rows are we are we working with But that one worked well. Something was off with the first one, but now we’ve loaded in that 38 million rows and we have it in snowflake but it’s still in Jason.

Right So it’s still in that Jason’s not, so what we’re going to do from this point is we’re going to query it to parse it out. And really magical thing we can do in snowflake here is we can essentially query with sequel and make it

Conor Knowles: yeah

,Kelly Hotta: Anita and make it look as if it were structured stores. So here query Matt Jason Data that just sound looked like scary, um, is now in its nice parsed out table format. Okay. So with this, what I can do is first of all, let’s switch back to prod. So I’m going to do a clone back to prod and I’m going to drop our Dev environment and then I’m going to create a combined view for my analyst and tableau to be able to use. So it’s going to combine those city bike trips with the weather information

Kelly Hotta: and then I’m going to give them access. Cool. So with that I can ask switch over to tableau and I can actually, while we’re doing that, I’m going to load in the rest of my data just so that we have more data to work with. And what it can do from here is I can go into my connection and I can simply change the table that I’m viewing to that new combined view.

And then if I go back to say, looking at the number of rides people are taking, uh, here over the years, I can now bring in, you know, what was the temperature at that point in time, for example. So the average temperature and I can start to see, you know, if I do ask this this and we get rid of where we don’t have records for my, my weather, we go in and drill down to some of these days we’ll find is that it’s always boggled my mind that on the 31st of December it was negative 20 degrees. Sorry I didn’t convert to Fahrenheit Celsius.

I think they converge at that point. You know there were six and a half thousand people that took bikes out and it was not cool in New York. So I don’t know what’s going on. But in any case, we now have all of the data. So we now have 61 million roads. You can still query it here from Tableau and get good response time and people are usually blown away by the fact of vast and wide theory against snowflake on that much data.

Kelly Hotta: Okay. That was my demo and I think I can stop sharing now. All right.

Kelly Hotta: Okay, cool. I think we’re good. Um, and in terms of time, I’m gonna just go through a couple more here around the performance. I know we are running a little late. So in terms of questions we can say that late. Answer those if people have time. Otherwise we’ll more than we’ll follow up with you guys directly afterwards. Okay. So performance tuning and Snowflake, I’m going to hit on a couple points here on that are important to consider when when connecting with Tableau. The first is to try connecting live. So again, that’s demo was alive, connection to do snowflake. The reason for that is we have a lot of built in optimization that you just simply wouldn’t be able to leverage if you weren’t doing that live connection.

Kelly Hotta: The second is to remember to isolate workloads. So characterize your workloads according to a specific elements. For example, um, you know, you might have first of all simple dashboards and you give those simple dashboards, uh, large or maybe even a medium warehouse to work off. Meanwhile, you have a couple of dashboards that you know are going to be really high concurrency dashboards. And for those you can turn on the multiculture warehousing. Meanwhile, you have maybe some data science teams and you know how they want their own power and they don’t want to have to, uh, you know, uh, hold back at all. And so give them their own warehouse warehouse to work on.

Conor Knowles: Okay.

Kelly Hotta: And then finally run all your ELT Etl and all the maintenance jobs, uh, on their own, on their own cluster as well. Um, this is a bit of a review so I won’t spend too much time, but the two ways we scale, again, scale up, give yourself a bigger engine if you need it, uh, or scale out if you’re, if you’re finding that the road is not wide enough. I had some lanes about highway right by the, by scaling out with a multicluster,

Kelly Hotta: um, middle under the coverage here. No flakes does a lot of performance tuning under the covers. That’s really transparent to you as a user, but it’s, it’s important to understand what some of those main ones are. So the first is, you know, if you’re seeing each one slowly even check that the optimizer is behaving as expected. And when we load data into snowflake, we’re actually creating what we call micro practitioners. We’re breaking the data into tiny little blocks and we’re running statistic on top of it so we know where they all live. Now when you run a query, so your table had 100,000 of these micro partitions, what we should see is only a subset of those returns.

So query tuning is, is a big way that we make sure those queries run quickly. The way you can check it is in your crew profile, which I showed before. Or you can check pruning in terms of total partitions and how many were actually the next thing you can do is also what we demoed. We can query the information schema or account usage tables directly. And what you can do is pull that into tableau and start running those dashboards. Even creating alerts potentially, you know, to see how our queries running over a certain amount of time have warehouse has been turned on and not shut off, that kind of thing.Kelly Hotta: Oh, we have that twice. Okay. And the last piece is it’s more of a knob that you can turn. This happens, this is important with really large data sets. Usually there is a way that we can explicitly cluster or sort the data. So by default we do this naturally. Meaning as you load data and just snowflake, we will cluster it for you. However, for really large tables or maybe tables that you’re, you know you’re going to be filtering on specific, uh, field.

Quite often you can specify those as posture key. The other thing we have coming in, it’s currently in in creed you is materialized view and with both of these options, the cool thing about them or the really powerful thing is that we handle all the maintenance. So once you create a clustered table, we will handle all of the maintenance of that cluster cable and make sure it’s always clustered.

Same thing by just materialized views. Soon as you create an MV, we will make sure that it’s up to date.

Conor Knowles: Okay.

Kelly Hotta: Okay. So all of what I’ve done is probably made the island really excited because they, they can do all the things they want to do and they have freedom tax with the data when they need it, hopefully.

But then on the flip side of that, and we always have the DBA is usually, and the people that are concerned about, and rightly so, the quality of the data, right and making sure that it’s not just a free for all and chaos. And so tableau has a, as an approach to that, which Conor will, we’ll jump into now called data server. Yeah, it’s a good way. Bring it full circle.Conor Knowles: There is essentially how do we get the right data into the right hands and tableaus you know, governance strategy through tableau. Data server really, uh, helps us get that situated. And you know, cause if a business isn’t empowered with, with data or there’s too many restrictions or hurdles required to access that data,

I’m sure it has a constant nightmares of users going the route of locally saving sensitive data to perform a quick analysis at puts your company at risk and it blinds your organization from knowing how the data is being used. So in a self service environment, the role of data governance is to enable access, not restricted and ultimately allow those users to get answers they need in a controlled environment. And that’s what we can do with tablet server. So essentially, you know, having the owner control that data and it was coming from snowflake, you bring it into Tableau, publish it at the Tableau server, and then you can, you know, do different things with the security as far as you know, through on authentication, making sure the right people are accessing it.

Conor Knowles: This can be done through permissioning a tableau server to make sure the right groups, you might have a marketing group, they need to have access to one data source. But, uh, you don’t want the Dev team that access to that. You need to make sure that maybe there’s more a specific criteria within the data with your row level security.

You can make sure that you know, so and so can only see the region that they work in and not necessarily a entirely different region of a, another teammate. So that’s where Tableau server kind of bring, bring this all full circle together to make sure that once you have that data and it’s working really well, it’s in tab low and you have those workbooks going off of them very quickly and smoothly, that the right people are accessing it accordingly. So ultimately, uh, we want to also discuss how this may have been done by others, who else has been successful.

Conor Knowles: Uh, definitely want to leave time for Q and a. So, uh, talk to you really quickly about a couple customer use cases and then we can dive into some questions. I know we’d love to, to make sure we have time for that.

A couple of good examples. So electronic arts, uh, one of the largest video game manufacturers in the world. A lot of recognizable titles that some of you have probably heard of. Um, initially ea evolve their data strategy to accommodate, uh, uh, constantly expanding data environment. Right That’s a very common time you’d see Tableau and snowflake used together is, you know, these companies are using so much data, especially with ea, they had 24, seven reporting across the globe. They really had no time for like ETL windows and it was taking too long, just great Tableau extracts initially it took anywhere from 48 hours to kind of duplicate data.

Conor Knowles: There are forced into splitting some datasets then ideally want to do that. And really they were just to update those as consistently as they wanted to and get them through to, you know, there are sales teams quickly enough to uh, analyze, okay, what are we selling Well, what, what can we improve on and how can we do this quickly to get that out to the business. So once they were able to implement Tableau and snowflake, they experienced much shorter, faster and fresher, more holistic data sets. As we kind of saw through a number of the different ways. Kelly’s showed us as to how that can be optimized and now they’re able to more easily look back at their historic data in a much more granular detail. So they’re asking a much more in depth questions and they were able to before and they’re saving a lot of time in doing that.

Conor Knowles: So obviously important there. One last example I’ll, I’ll go through is university of Notre Dame.

So, uh, specifically high growing business demands a, the fundraising office at Notre Dame. They needed a flexible, scalable and user friendly data warehouse and a Bi platform that could keep pace with all of that big data coming in. So fast query performance, time support for both concurrency and scalability. Those are some top priorities along with some reduced maintenance and overhead. So that department adopted snowflakes, elastic data warehouse and if you are interested in that, it’s actually a joint story on Tableau’s website and snowflakes website. Uh, now this team connects directly to snowflake with tableau desktop to see and understand trends and help the university better structure their fundraising efforts. So those dashboards and reports that are consumed, they also leveraged direct connectivity to snowflake. And that really allows the end users to see that performance gain as well.

Conor Knowles: So if any of you are curious a little more on either of those, I recommend going to Tableau’s website or snowflakes website to read up on it. But a couple more things to keep in mind. Some some good things to access. Some further reading is a best practices for using tableau with snowflake. You can get that on our respective websites as well. And then, uh, obviously access trial, both Tableau and snowflake.

Try it out for yourself. If you have any questions, you know, reach out to, to us, we’d love to help. And with that, I know that we want to make sure that you know where you can get those trials, Tableau website and snowflake as well. And now what we want to do is open it up to questions and as Kelly said, well we’ll hang around a little late to make sure that if there are any coming through, if we can get to those.Kelly Hotta: Great. Awesome. Okay. We have tons of questions. I’m just sorting through some of them.

Um, I saw couple though that we could, I guess both address the one a couple of people were asking about extracts versus lives. Um, so yeah, from my perspective, look, I would say start by doing live connections only because it’s like you’re spending time and money on a platform and then not using it if you’re going to extract everything out of it at the end. Right. In reality we see kind of a hybrid. I, it’s my experience so far is that people do a little bit of both and so it’s, it’s not a really clear answer, uh, but what we do across the board live connections to self, like versus current database tends to be a lot quicker. Did you have anything that you wanted to add, Columbia

Conor Knowles: No, I would agree with that. I, whenever I use Tableau I start live and then I only go the extract route. If maybe, you know, I’m, I’m looking at something that needs to be optimized a little more or you know, the live connection isn’t very fast. I’m probably, the only other thing I’d note is take an extract. If you do need to do some sort of like offline workbook manipulation, uh, cause obviously you can’t use a live connection if, uh, you know, I guess plans I’ve Internet nowadays. But yeah, that’s the only other way I’d recommend is a go and extract offline work. Yeah, exactly.

Kelly Hotta: Cool. Um, okay, we had one about, uh, how long are this is for me, I guess. So how long do you, uh, clones per se so they, they don’t expire. So unlike the data protection that I mentioned where more we will be called time trouble, um, that expires depending on how long you’ve been at four. So 24 hours to 90 days clones with forever until you drop them. They’re just like any other object in the system. Um, okay. So, uh, let me, there’s a bunch of out, there was one around a Tableau performance over a really wide tables. I think that kind of goes towards your, your first point, Conor, around maybe filtering the data before you start.

Conor Knowles: Yeah, I mean, yes. Skinny and tall obviously is what, um, tableau would prefer to read. But, um, getting the column structure, you know, as I guess is, is limited or is down as possible. Um, utilizing data source filters to see what exactly you need. That being said, I mean extracts you can, you can definitely create an extract to see if that would speed things up. But if you have a live connection with a data source, it has tons and tons of columns. I just say just toggle between this to see which works. I typically recommend trying to, I mean anytime you have that many columns, odds are you can maybe pivot it down. I see it a lot with date information, so a number of different things you can work through. But best thing I’d say is try it and see what, uh, what you’re experiencing.

Kelly Hotta: Cool. Um, okay. There’s a couple of me that I will answer. So one, uh, security in snowflakes, so data security particularly in particular. So in stuff like when you load your data in, all the data is by default encrypted at rest and in flight. So we handle all of that encryption for you. Um, and we essentially, uh, we even have another addition that have even more control. So we essentially have a hierarchical encryption key model that we rotate keys within. And with our enterprise risk sensitive data, in addition, we also give you the option of managing your own key. And so at any point, if you want to revoke that key, you can’t. Okay. Well, let’s see what else. Um, how about, uh, there was one about how tableau handles security. So, um, is there anything in particular, I guess you want to highlight with security within tableau data server and stuff

Conor Knowles: Yeah, it’s a data service element of that. Um, it, depending on where the specifics of the question is, I mean, I can really quickly Tableau, we can authenticate through SSO of, of, of your choice to make sure that you have the right people going into your tableau server and then your authorization and permissioning within Tableau Server, uh, you can permission that individual user levels of recommended more at the group level. So kind of like I said, in that example, you can permission so that like your marketing team, uh, your development team only has access to certain projects within tableau. You can think of those folders so that, you know, I personally am on the marketing team. I don’t have access to go and look at maybe our executive folder or our sales team’s folder. It’s a very easy to do that via the different group levels.

Conor Knowles: And then down at sort of the, the row level. So maybe you do have aspects of data through row level security you can implement with Tableau calls, user filters. So I can, uh, you know, I work out of our Seattle office and maybe I can only see the data coming from the west region. And you can implement that within your, essentially columns from your data source. So it can be very automated, very easy to do. So they only the right people see the right aspects of specific data within a dashboard, for example. So I guess that’s how I’d say security at a high level.

Kelly Hotta: Cool. Um, okay. I will choose this one in terms of, uh, which cloud providers, I know we had a couple of questions about is your, um, in terms of which cloud providers supports So it depends where you live in the u s we are across AWS and Azure as well as uh, in the EU. So we have a year on AWS support out there too. Um, that just means that, you know, if you have an existing relationship with either of those clouds providers, um, you know, it, you can run snowflake in that same sort of network. It does not mean that we run within your own VPC. Um, and the other thing to notice that from a user perspective snowflake as the same experience, right So it’s just the underlying storage and compute that we utilize from that provider. Um, there was one question corner that I think was, and she by mistake, but it was around white labeling. Do you mind answering thatConor Knowles: Okay.

Conor Knowles: Yeah. What, what was it specifically Was it like Tableau online related orKelly Hotta: white label Is there a multitenant capability and can we white label it as well

Conor Knowles: Uh, yes, you can. And then, I mean typically I get the multitenancy question. If people were talking maybe about Tableau online or wanting to create multiple different sites so you don’t necessarily have to partition a different project levels are then server, you can create different sites all together so you might have an entirely different environment like within your tableau online instance. So that’s, that’s very easy to uh, to set up and do.

Conor Knowles: Okay,

Kelly Hotta: cool. Um, there was one about scheduling a SQL queries and stuff like, so what I would demo going that was all just in the Ui and our worksheet. Um, but most of our customers actually connect up to cell phones either via our command line, no SQL or whatever sequel editor, they Martin. And so, um, we don’t have a built in scheduler so you can, you can look to scheduling that into your own in your own scheduler or if you have an Etl tool which can obviously utilize that too. Um, there wasn’t a related question around Etl in snowflake. So we do not have an ETL tool. And so we are, our whole premise there is similar to the Bi side of things. We will plug into whatever tool you have. Um, we have optimized connectors to specific ones, but we also have generic ODBC and JDBC drivers. Uh, okay, well I know there’s a couple more coming in, but most of them we did answer. I think, um, uh, I forgot to mention at the beginning, but we will send out a recording to everyone I think within, within 24 hours or so. I’m happy to share the slides too, so what, we’ll look to maybe package those together. Uh, but yeah, thanks everyone for joining. If you did have questions that we weren’t able to get to win, more than happy to follow up offline and have a great rest of your day.

Conor Knowles: Thanks Kevin. Thanks everyone. Thanks Kelly.

DIGR | Data Analytics done right

Copyright 2018 – DIGR | All Rights Reserved.

Scroll to Top