DjangoCon US 2017 – Python & Spreadsheets: 2017 Edition by Kojo Idrissa

(soothing guitar music) – So I am here I am your presenter, Kojo Idrissa

Here are my slides As you can see, I'm very fancy, and I've made very fancy, elaborate slides with white and black text because the man who talks about spreadsheets, of course, I'm very concerned about fancy things

The talk here Python & Spreadsheets, State of the Union, August 2017 It's called that for a reason, and just for information purposes, my name is there, my Twitter handle is also on every slide, so if you have questions you want to tweet at me later, then that's a thing you can do I'm usually pretty available So this is called State of the Union, August 27, because I gave this talk initially in 2013, or a version of it in 2013 when I first started using the library I'll be talking about Right now, I'm a QA specialist for a startup in Houston called Decisio Health

I used to be an accountant, and then I got an MBA, then I ran away to China, and was a college instructor there for a few years, and also taught here in the US The slides and the IPython Notebook I'm using for this talk will be available at the link there, at my Github, Django Con 2017 It's not available at the moment, but it will be soon Note to self, when I turn to look at the screen, turn my body, and not my head, because if I just turn my head, the volume goes down See, that's not good

(audience laughs) This is what happens when you practice this talk a few times, you realize these finer details Basic outline of what the talk's gonna be, I'm gonna talk about how I got here, sort of my secret origin story, demonstrate some of the fundamental data types that OpenPyXL gives you, and that's because there are a few things there that are a little non-obvious, but then, once they're explained, you recognize how helpful they are Then we'll do a basic demo of some basic things you can do with Python and spreadsheets, and then we'll look at some of the problems that you'll run into trying to use spreadsheets with code There are some situations where using a spreadsheet with code is straightforward and simple, but then there are a lot of situations where it's not, and so I just want to highlight some of those Again, the pain of my life as a recovering accountant can hopefully benefit you in some point

So, (sings), secret origin of Kojo I used to be a professional spreadsheet fighter, which is also what's known as an accountant Lots of spreadsheets all the time You'd be terrified by how many large organizations are running spreadsheets, but I'd always had an interest in code, and learning to program, but I never really needed to as someone who wasn't a professional programmer, so I decided in late 2012 to get more serious about teaching myself to code and becoming a professional developer, and I went pro, ie

got my first development job, in December of 2015, so that's sort of how I got to this point My role in the Python community is such that I've always had an interest in trying to make some sort of a contribution to the community, but as someone who didn't come from a traditional, a CS background, or a coding background, I knew I wasn't gonna start just making contributions by writing awesome code on day one, or even early on So I thought, you know, I got decent personal skills, and interpersonal skills, so I can maybe help grow the community, and so, I felt one of the best ways to grow the Python community was not by converting people who used other languages into Python developers, but by taking people who weren't developers, and bringing them into the Python community by showing them how Python can benefit them For one reason, if you're already developing, say, in Java, or C++, or what have you, you already have your built-in biases, but at the some time, there are more people who aren't programmers than there are people who are programmers to convert, so it's just a bigger world factor Also wanting to look at some solutions that are not obvious to people who aren't developers

So there are certain things, if you're a developer, you think about things in a certain way When I first gave this talk, one of the solutions to this problem that someone suggested was, well, just put it in a database Well, if you already know SQL, then you'd have access to databases You wouldn't have the problems that I'm discussing in this talk, so trying to come up with some solutions that are useful to people who aren't already developers Who's the talk for? We've got two sets of people the talk is geared at, and again, well, not again, but if you've read the description, you might've seen that I'm trying to make this talk as beginner-friendly as possible, and so we got two general categories of people

One, people who are using spreadsheets on a regular basis, but want to sort of step their game up, want to be able to do some different things with spreadsheets, in most cases, Excel, so they want to be able to do some things outside of the norm with a spreadsheet Then the next set of people are people who are already Python developers, but they keep being given spreadsheets, they keep being confronted with spreadsheets They're like, "Well, what do I do with these? "There's gotta be some better way to handle these" Hopefully, this will benefit both sets of people The code here is not gonna be very advanced, again, partially because I wanted this talk to be as beginner-friendly as possible, but also because the really interesting things code-wise are gonna be based on your specific application

Pulling data out of the spreadsheet, writing data back to the spreadsheet, those things are fairly straightforward There are some other things that the library will help you do, but the really interesting things are going to be based around your specific use case I don't know what that is, and I can't demonstrate that code, so I'm showing fairly basic code for a couple of basic applications to give you some ideas of what can be done So now it's time for demo This demo sponsored by Jupyter Notebooks, Jupyter Notebooks, purveyor of fine internet notebooks since, like, five years ago

(audience chuckles) Not that long ago, but still very helpful Step one, we've got the Jupyter Notebook Is this readable to people in the back? Yes? Okay, uh-oh, I got a kind of That too big? Is that, yes? All right I'm trying to think, I think I've gotten rid of all those

Yeah So the first thing you have to do is, you have to know your data, and for that, we will take a look at the actual spreadsheet file It's almost impossible to work with a spreadsheet programmatically if you don't know what's in the spreadsheet, so here, what we've got, for my simple example, and let me try to, no, it won't work No, that won't help Want you to do what I say

I want you to do what I mean, not what I say These spreadsheets are trouble already There we go I'm gonna blame my own mistakes on the spreadsheet, that's how that goes What we have here is, we have some data

We have what is some simulated time sheet data What we have here, we have an employee number, and the employee number identifies each employee, so that's the same employee, this is a different employee Employee number, the cost center that that employee worked for, so their specific work group We also have their division within the company, so your company had broken into multiple different divisions, and so each division will have multiple cost centers in it Then who is that employee's manager, and of course, these are fake simulated names

Then the date that they worked So again, this is time sheet data, so we're seeing here, on this date, this employee with this name and this employee number Can you all see the cell moving fairly well? All right This employee worked one hour on a project, and so this is simplified data just for the purposes If you work in a professional services firm, so I've worked in accounting firms, and for engineering companies, and things of that nature, one of the things that they look at is what's called utilization, so how much time do you spend working on an actual billable project versus what's called overhead time where you're at work, and you're doing things, but you're not working on a specific client project that they can bill for

For this example, everybody's working on billable projects, and so we're not doing that level of analysis That's what we've got here, so we've got basic time sheet data This employee worked one hour on a project, and then they worked three hours on another project the next day, so on and so forth I've got about 1,000 rows of data here Examples that I've worked on before had 10, 15, 20,000 rows of data

This is the basic data, and again, it's just like anything else in programming If you're going to be writing code to manipulate data, you need to be familiar with that data Here's the basic data we've got, and then we start with some basics, reading a file and getting some basic data types The library that I'm using here is called OpenPyXL OpenPyXL is a Python library that lets you read from and write to

xlsx files, and for those not familiar, the older xls file format is what Microsoft Word used up until, I believe, '07, and after that, they switched to the xlsx format, and that extra x means that it's based around XML, and it is compatible with the OpenDocument organization's formats At the time, in 2013, when I started looking at this, OpenPyXL was one of the few libraries that would actually work with that file format That's what I had to deal with

Again, very straightforward stuff here Importing OpenPyXL, by a show of hands, how many of you would consider yourselves beginner or novice programmers? Sort of in early stages Okay, so how many of you might know some people who would be beginner or, not you, yourselves of course, but might know some other people who might be beginner or novice programmers that might maybe benefit from hearing things described in that way? Okay, so be sure to tell a friend and share this with them, 'cause I'm trying to step through this in a fairly straightforward way, again, so that you could explain it to someone who's not an experienced developer, and they could actually get some benefit from this You're importing openpyxl, and then from openpyxl, importing workbook, and this gets you a workbook object, and we'll talk about that distinction between workbooks and worksheets and that sort of thing later Then, this line of code, workbook = openpyxl

load_workbook, this load_workbook function, this is the name of the file that we're using, so the spreadsheet that we just saw, this spreadsheet, this is pyxl_demo_djangoconxlsx That should be, that's wrong That should be true I was playing with this earlier

The distinction here is, this data_only=True, is for situations where you'd have a formula in a spreadsheet, which, lots of spreadsheets have formulas in them, if that data only is equal to true, what that's going to do is, that's going to give you the result of the formula, in all cases, as opposed to bringing you back the formula itself That's that, and let me do this, I'm just gonna run this to make sure I've got everything behaving as it should

All right, and so, this is all fairly straightforward Python stuff except for this data_only=True, this is something specific to OpenPyXL Now we will talk about this distinction between workbooks vs worksheets or spreadsheets and tabs Most people tend to use the terms sort of interchangeably They'll say, "Send me a spreadsheet "Oh, no, I want that spreadsheet in the spreadsheet

" In accounting nerd talk, a spreadsheet is a workbook Excel describes them as workbooks A workbook is the actual file itself What we see here is part of a workbook The individual tabs here have clean_data, which is what we're using, then KPop_draft_roster, which is a whole different thing

K-pop is one of my things So each of these tabs is known as a worksheet, and the multiple worksheets make up a workbook That is important because, when you're using Most people will say a spreadsheet or a tab in a spreadsheet, but Excel defines them as a workbook, which is a collection of individual worksheets, and that's important because, when you're accessing the data, you've got a workbook file, so you want to open that file, that's what we're opening up here in this cell, but then you need to know which worksheet you need to get out of that workbook, and they're different objects that have different properties to them Here, we've got this wb is the workbook that we've opened, which is the entire thing For those of you who are, again, newer to Python, or who have friends who are newer, the dir function, the directory function in Python, is helpful because it will give you a list of the attributes that are available for a particular object Now, for a lot of basic Python constructs, if you're already familiar with these, maybe you don't care, but if you are using a new library like this, like OpenPyXL, it has some different data types that you're not going to be familiar with because you haven't seen them before, so using the dir function on them is helpful So we see all the different attributes, and we'll notice here a couple, sheets, so this will show you the sheets that are available

Let's see, copy_worksheet, so you can copy a specific worksheet create_sheet, which we'll see a little bit later get_sheet_by_name, so you can grab a specific worksheet out of a workbook, or if you don't know what sheets are available, get_sheet_names, and so that will tell you what sheets are actually here in this workbook We'll look at those Skipping around

That's that, and then, I'm here, I've got the workbook, and I print workbooksheetnames 'cause I want to know what sheetnames are there Now, you will notice Is that high enough? Yeah, I'm trying to make sure it's high enough on the screen so people can see it You'll notice the workbook has sheetnames When we looked at the actual spreadsheet itself, we saw two worksheets, clean_data and KPop_draft_roster, but when I print the workbook sheetnames, I get three, japan_spending, clean_data, and KPop_draft_roster The japan_spending worksheet is actually hidden As I was working on this talk initially, I was in Japan, and I was trying to figure out where all my money had gone, so I started, I was like, "I shouldn't have spend that much money," so Show Sheet, japan_spending, so that's a hidden worksheet

I point that out because, if you're in a situation where you're trying to hide something from someone, you want to hide a sheet, well, if they just look at it, they won't see it, but if they get a list of the sheetnames, it's still visible there So those are the worksheets we have available to us The one we want is the clean_data worksheet, so here, I'm gonna create this variable called demo_worksheet, and I want to get_sheet_by_name, and then you pass in the worksheet name Now, this wb is a workbook object, and actually, let me do that here The live coding portion

Notice this wb, it's a workbook object, again, which is not something that natively exists in Python, so openpyxlworkbookworkbookWorkbook, so it's a workbook object, and it has those different attributes that we saw before That's why I had the dir, so I ran the directory function on it, and again, here, I've created this demo worksheet, which pulls a specific worksheet, and we see that it is a worksheet object that has a specific set of attributes that we can see with the directory function

So those are the different things you can do with it The two we're gonna focus on most here are, see, we've got cell, so you can get a particular cell from worksheet, but we're gonna look at

I just passed it, columns and rows Each worksheet, again, depending on how familiar you are with the worksheet, columns go up and down, rows go side to side In this case, each row represents a particular record, similar to interrelational database

Those are the different attributes we've got for the worksheet object, and these workbook and worksheet objects, and the cell objects we'll see later, there are a lot of different attributes, a lot of different options There's clearly not time to go through all of them, but I'm just gonna try to go over some of the highlights We've got this worksheet now, this demo_worksheet, and what we want is, we want the data out of this, so we're gonna grab it by rows So demo_worksheetrows, so what are the rows here? You'll notice, it returns this generator object, generator object Worksheet

_cells_by_row, and so what this does is, it gives us generator object Instead of reading every row out of the spreadsheet, it creates a generator object, and if you are not familiar with generators, a simpler way to think about them is that, a generator object is something that has a collection of items, but instead of giving you all those items at once, it will give them to you one at a time, and this helps to save memory So instead of having all 10,000 rows of this spreadsheet, you have this generator object that gives you one row at a time as needed, and that helps to save memory But I point that out because, if you say, "Oh, okay, show me the rows," you're not gonna just get plain rows of plain text or things of that nature You're gonna get this generator object that's gonna give you a row at a time

Has that done that before? What you get is, when you print out this generator object again, this demoworksheet there, so for row in demoworksheetrows, so I want to see what these rows are like, and you'll notice that each row, I've also got it printing the type, and so you've got a generator object, but what it returns, each object it returns is a tuple of cells, and so you've got this cell, clean_data A1, clean_data

A2, so cell A1, cell B1, cell C1 of this clean_data worksheet A1, B1, C1, that's gonna correspond to the first row, and then the next one goes to the second row, and so and so forth, and so you have a tuple This generator object is giving you tuples, and each element of that tuple is a cell Tuple, standard Python data construct A cell is not

A cell is another OpenPyXL data construct, and we'll see why that's important and useful in just a moment So we move on, we'll take a look at cells, and just want to show you some of the differences about cells For cell in next (demo_worksheetrows), so again, demo_worksheetrows is a generator object

If you have a generator object, like I said, it gives you one new item at a time To get one item at a time, you can use this next function I'm pulling one item out of that generator at a time, and so I'm having it print out just cell, and then the cell name Here is cellcolumn, 'cause it tells me what column the cell's in

cellrow tells me what row the cell is in using some string formatting here to make that look sort of nice Then I'm printing the cell itself, which gives me information about the cell, and then the type of cell just to demonstrate that it's this different data type that's not native to Python It's provided by the library So cell A1, cell A1 from clean_data, it's of the type cell, and the value in it, that last line, is going to show you what's actually in that cell

When you look at a spreadsheet, that's what you're actually after You're after celldata, and so we see A1, the value is employee_num, B1, it's cost_center, C1, it's division, so this row 1, these are the headers, and so on and so forth until we move on I just popped off that top row just to demonstrate that So we've got these cell values and types, and now, OpenPyXL is also smart enough to try to take the data that's in a cell, the values that are in a cell, and convert them to the appropriate Python data type

Here, we're looking at demo_worksheet e1value, and so I also do this to demonstrate that, instead of grabbing things just by rows, you can go to a specific cell, if you want to In this case, we're going to this demo worksheet, which is the clean_data worksheet, and we're grabbing cell E1 specifically, and we're getting its value, and we're doing the same thing to cell E2 If we take a look at the spreadsheet, cell E1 is date_worked, E2 is that first date We print these, and these two cells, 180 and 181, are basically the same thing, just shown differently

That's the value, and then that's the type The value here is this date_worked, and it's a string The value here is that first date, so it's showing it, it's displaying it here, but it's also showing you that it's a datetimedatetime object, and so Python knows that, hey, this is a date That's useful

Cell attributes, again, so why is there a cell object? Just like we had the workbook and the worksheet objects, we have these cell objects that have all these different attributes, and the primary reason for that is because, when you're looking at a spreadsheet, it's a cell object, when you're looking at a spreadsheet, there's a lot more going on than just what's in the cell itself than just the value There are other attributes Is it bold? What sort of styling is going on? What other things are happening with the cell? These cell objects contain all those other attributes The one that we're gonna use the most here is the value attribute We'll be doing cellvalue to get the actual value of it That's the actual data we're wanting to work with

However, there's other data in the cell, so if you need to know if a cell is colored a certain way, or uses a certain type of font, or something like that, you can grab that information and do things with it as well, but for our purposes, we'll be focusing mostly on that You look at cell styles, did I do that? When you look at cell styles, there's not enough time, really, for me to do a demo on the cell styles, but there are all sorts of things you can do working with styles OpenPyXL, the documentation is, of course, on Read the Docs Thank you, Eric Holscher, for making things like this available for us There's a lot of style information, and while that might not seem like the most important thing when you're dealing with data, one, you might be in a situation where the spreadsheets that you are given are being styled in a certain way

Maybe a number that's a loss is red, and a number that's a profit is green, or something like that You can actually make use of that information You can actually pull that out If you're needing to write your results to a spreadsheet, you can write them in that fashion as well Making beautiful spreadsheets has been left as an exercise for the viewer

I'll let you all do that on your own Example one, aggregating time sheet info What we basically want to do here is take this time sheet, and instead of having these individual lines for each day, what we want is, we want to see, okay, how much time did each employee work in this month? This is June of 2017, so want to see, okay, how much time did each employee work in this month? We want to aggregate that information A lot of the Python here is just, it's not particularly impressive, but I wanted to point out the spots where we use regular Python mixed with things that are specific to OpenPyXL Here, you can use a for loop versus a set comprehension

I used a set comprehension because I wanted a set of the employee IDs I didn't want every occurrence of an employee ID because you'd have multiple, so I used a set comprehension here Trey Hunter, who is here at the conference, does an exceptional talk on what he calls comprehensible comprehensions, and so you can, and if you see Trey, ask him, and tell him I told you that Is Trey here? No, Trey? He's probably in that room If you see Trey, ask him about comprehensions, and tell him that Kojo told you to ask him

He'll do a better job of explaining comprehensions than I will, but what you would do with a for loop, in a lot of cases, you can do with a comprehension Here, I'm creating this set comprehension of employee IDs just so I have a list of the unique employee IDs That's what this looks like here Then I'm using that set, that I called employee_ids1, and I'm using that to create a dictionary that takes the hours, I'm using list comprehensions here, list insight comprehensions here, I want all the hours for that employee I want a set comprehension of the cost center

Each employee should only work in one cost center, so here, we've got a set comprehension of one cost center, a set comprehension of divisions Again, each employee should only work for one division Each employee should only have one manager, so this is, again, the part about knowing your data So I got set comprehensions that are building those things You'll notice here, I got row[6]

value for row in demo_worksheetrows In this case, demo_worksheetrows is, again, that generator objects that's giving us a row at a time, so here, I'm saying, I want row index six for hours If we take a look back at our spreadsheet, we can see one, two, three, four, five, six, seven, because, spreadsheets, and this is where it gets a little tricky, we'll see this later, Python indexes from zero, so if you have a list of four items, Python will count them as zero, one, two, three

Spreadsheets index from one, and so, we'll see in the code a little bit later, we get to make that adjustment Here, index zero, one, two, three, four, whoa, I can't use my key Zero, one, two, three, four, five, six, so I'm getting the hours there, but I'm using that row index six, which is gonna be a cell object in the row, then value, so I'm pulling the value out of that, and that's what's getting me my hours Doing the same thing to get the cost center, the division, and the manager

(dog barks) Uh oh I do QA where I work now, and I also used to be an auditor, so I tend to try to want to test things and things of that nature, so I've got a little assertion here, because I know there should only be one cost center, one division, and one manager for employee, so I just do that there Then I build this employee_aggregate object Again, a lot of this is regular Python Here, I've got some of the specifics to OpenPyXL, and then I print this employee_aggregate object, just a pretty printed, it's a dictionary, just so it's clear

What you end up with is an employee ID as the key, and then the cost center, the division, and then the number of hours for that employee This lets me take this spreadsheet, turn it into a dictionary, which could also be used as a JSON object I know from some filtering of the spreadsheets, the manual filtering of the spreadsheet, that I should have 49 employees, and so that's what's happening here Now, here, we've looked at reading data from a spreadsheet, and then processing it, and turning it into something in Python The next thing becomes, what if you've already got a Python program that's running, and you want those results to be written to a spreadsheet? Well, you can do the same thing with OpenPyXL

First, you need to create a workbook, and I've given it the very creative name of output_book I've created this new workbook object called output_book, and then you need to create, well, let's see, you can create a specific sheet Here, I'm creating a sheet called output_sheet So output_bookcreate_sheet, which, this create_sheet is a method that belongs to the workbook object, and I'm giving it a name here, Aggregate Time, and I'm also giving it this argument of 0

The 0 argument means it's going to be the zeroth item in the workbook Otherwise, by default, when you create a workbook, it will have a sheet one as the first object, so here, I'm saying, "Make this the first item "in the sheet that we see" Then, when we look at output_book, we see that it's a workbook type object Then I decide to build a header because I don't want to just write the raw data to the spreadsheet I also want some sort of a header so the spreadsheet looks sort of organized when I give it to someone else, so they can understand it, and so I'm just building a header here, basically by just copying the values out of the demo worksheet

Again, I'm just accessing those cells directly Then I'm printing the header to make sure it's what I want, and so I've got that same header Then, for output data, I build this table It's a list of lists, and I move through this employee_aggregate, and I'm building these new rows So I want the row with the employee, the cost center, the division, their manager, and their number of hours, but here, it's gonna be the number of hours that were aggregated from the earlier dictionary that we saw

Then now, I'm assigning those values that are in this output data construct, I'm writing them to the output sheet, and I've got nested for loops here because I'm writing them by row and by cell Here, we see (clears throat) Excuse me

We see this row index that I've got here, and the column number, but I've gotta use +1 because the indexes that come from Python start with 0 On a spreadsheet, they start with 1 That writes that stuff in, so here is the output data construct that I built, and so you see the first list is the header, and the next list is the aggregate numbers for each employee, so this employee worked 160 hours in a month, so on and so forth So, there we go Now, I can save that

So output_booksave, and then I give it a file name, so this is the file name of the file that it's being written to We open that All right Open

I want done This is the result This doesn't take a huge amount of time, but it's a small amount of data

So there's this I've got this aggregate time sheet with those times, and I can check the totals The total there is, well, you can't see it at the bottom It's 5,386 hours, and if I go back to the original spreadsheet I've got the same total, 5,386 hours, which is here at the bottom, perhaps visible to people in the front row Again, this isn't the most complicated of things, but it gives you an idea of, if you have a lot more spreadsheets to work with, 10 or 100, or you have a lot more data The last thing we do is, I can take that object, that aggregate_time object that I created, and I can write it out as a JSON file Then, what I end up with is

This, where is my JSON file? There we go (mumbles) So I get that as a JSON file that I can then use to configure something else, or to do other processing, if I'd like The problems that you'll run into

Reading the data, writing the data, that sort of thing, not terribly complicated The problems that you'll run into is that, often, a spreadsheet is gonna used as a visual medium Someone wants a spreadsheet to look nice, and so that spreadsheet might not make sense to code

If the only spreadsheet you ever get is one that looks like this, then you'll be fine because you've got fairly well-structured data, but the reality is, the boss wants the spreadsheet to look a certain way, and so it's been laid out, they've tried to do desktop publishing with it, or whatever, and you've got to go through it In those situations, you might be able to access individual cells, or you may be able to convince them to maybe change some of the formatting with the idea that, hey, we can speed this up by literally 100 times You might have visual input, or you might have a visual output requirement, and that's where styling can help you You can make new friends by helping teach a coworker how to automate some of their simple tasks with Python Again, the Python here that I demonstrated wasn't terribly complicated

The comprehensions were probably the most complicated thing You can teach them to read data, and do some things with it, and write the data back out That's what I've got I am Transition on Twitter, if you have questions The slides and the code will be available in this Github repository very shortly

Thank you for your time (audience claps) – [Man] With the Python CSV module, you're able to use DictReader and actually get named columns in and out Does OpenPyXL support that? – I'm not sure if it supports the named column Here, I read things in by rows, but you can also do the same things by columns – [Man] You mentioned that visual spreadsheets would not be a good candidate for this sort of approach

Are there any types of actual data structures in spreadsheets that would not be good for programmatic analysis this way? – Data structures in spreadsheets Hmm Let's see, so if you have a lot of computations happening in macros, and that's something I have not worked with very much, but if you've got a lot of macro calculations going on, that might be a little tricky, but you should be able to grab either the data that's going into that macro, or the results of the macro calculation That's probably what you'd want There's a whole different approach that involves being able to run Python inside of a spreadsheet

I might add that to this talk and update it later, but this far, I've focused on just the files themselves – [Man] Is there any way to make pivot tables in it? I know you can do Pandas, you can do a pivot table in Pandas and write it statically to the Excel sheet, but is there any way to – I believe there is

– [Man] If you're looking to, 'cause in my mind, all this stuff kind of gets rid of VBA – If you go to the OpenPyXL documentation, I believe there is a pivot table function that's there Off the top of my head, I can't recall I personally, I have sort of a love-hate relationship with pivot tables – [Man] Just before I ask my question, I've worked a lot with POI, and if you make a pivot table, and then use a range, you can actually use something like this to populate your pivot table, so you can sort of have a pivot table

Question for you is, streaming and stability Have you noticed any bugs or any stability problems? Then, when you're dealing with large amounts of data, is there any streaming interface that you're familiar with, and anything you, comments on that? – Not familiar with the streaming interface I haven't used this recently with huge amounts of data, so I couldn't really speak to that I think the fact that it's not pulling an enti- I've seen large spreadsheets that have caused Excel itself to slow down and run slowly, so I haven't run this with those same, because those spreadsheets, unfortunately, were visually formatted, but I think the fact that this is creating this generator object and returning the data in small pieces at a time would help alleviate some of that issue, but I haven't actually had a chance to test it I need to create a fake thing with a bunch of data and try that

– [Man] Hi, so your Excel was pretty nicely, obviously, formatted Would it deal with cells or rows that are merged in the middle of the sheets? For example, a merged cell of days of the week and so on? – It has some capacity to deal with that, but again, that's sort of a knowing your data type of thing I haven't specifically tried things with merged cells, but you can access individual cells, so that might be a situation where you might need to access an individual cell, because I'm not sure how OpenPyXL views that Visually, just like with the hidden worksheet, you can't see the hidden worksheet, but OpenPyXL can see it on a list of sheetnames, so I'm not sure how OpenPyXL sees those, because the merging is just a visual thing It's not an actual data thing

– [Man] Have you ever used xlrd or xlwt, and how does it compare? – This is one of the more common questions I get with this talk I've used those a little bit, but when I started using this library, at the time, this was 2013, so those two libraries, xlrd and xlwt, they wouldn't work with xlsx files I think, I believe, now they do, but at the time, they wouldn't I've played with them a little bit, but I was like, "Oh, well, I can't

" I either had to take every spreadsheet, convert it to an xls file, and then use this, or I can just use a library that supports it natively – [Man] Thank you I'm gonna reveal my ignorance real quick, but what tool were you using to run your Python in a browser and show us the output? – That was Jupyter Notebooks This talk is sponsored by Jupyter Notebooks, purveyor of fine

So a Jupyter Notebook – [Man] All right, seeing none Thank you, Kojo

(audience claps) – Over there, there was one more (logo whirs)