cleaning up excel data

10
http://breakingintowallstreet.com Formatting: Cleaning Up Data Hello and welcome to our lesson on cleaning up data, which is one of the final ones in this Formatting Module. What we’re going to be doing in this lesson is using some of these functions that you see onscreen. So trim, trim with spaces, proper to properly capitalize. Well, not really capitalize, but capitalize each word in text. Clean to remove non-printable characters, and then value and text possibly, to change around the formatting of text and numbers. We’ll be using those along with some of the copy and paste functions in Excel to take our address data over here in this Customer Order due diligence file that shows us customer orders of the past 10 years or over a 10-year period. We’re going to go in and fix all these addresses. You can see the problem right now that some parts of them, like the state abbreviations are properly capitalized whereas the cities are not, and the addresses themselves are also not properly capitalized. [01:00] So we’re going to go in and fix all that using these functions. What I’m going to do here is since we’re really just going over functions that we’ve learned previously, I’m going to do it the first time, very quickly. Then you’re going to try it on your own via an exercise and then if you don’t get it or you forget the keys or you can’t reproduce it, then we’ll walk through it more slowly together, and I’ll go through each step and all the shortcut keys in detail. So here is what we’re going to do first. The first thing I like to do here is create some extra columns, so that we have actual space to do this. Now here we can see that we basically have four fields. We have the address itself; then we have the city, then the state, and then the zip code. So we’re going to need four columns. ‘CTRL + Spacebar’, and then ‘CTRL + SHFT + plus’. We can change the width here with ‘ALT + OCW’. I’ll change that to 10, and then ‘CTRL +

Upload: mayor78

Post on 17-Jul-2016

9 views

Category:

Documents


2 download

DESCRIPTION

Excel Data

TRANSCRIPT

http://breakingintowallstreet.com

Formatting: Cleaning Up Data

Hello and welcome to our lesson on cleaning up data, which is one of the final ones in this

Formatting Module. What we’re going to be doing in this lesson is using some of these

functions that you see onscreen. So trim, trim with spaces, proper to properly capitalize. Well,

not really capitalize, but capitalize each word in text. Clean to remove non-printable characters,

and then value and text possibly, to change around the formatting of text and numbers.

We’ll be using those along with some of the copy and paste functions in Excel to take our

address data over here in this Customer Order due diligence file that shows us customer orders

of the past 10 years or over a 10-year period. We’re going to go in and fix all these addresses.

You can see the problem right now that some parts of them, like the state abbreviations are

properly capitalized whereas the cities are not, and the addresses themselves are also not

properly capitalized.

[01:00]

So we’re going to go in and fix all that using these functions. What I’m going to do here is since

we’re really just going over functions that we’ve learned previously, I’m going to do it the first

time, very quickly. Then you’re going to try it on your own via an exercise and then if you don’t

get it or you forget the keys or you can’t reproduce it, then we’ll walk through it more slowly

together, and I’ll go through each step and all the shortcut keys in detail.

So here is what we’re going to do first. The first thing I like to do here is create some extra

columns, so that we have actual space to do this. Now here we can see that we basically have

four fields. We have the address itself; then we have the city, then the state, and then the zip

code. So we’re going to need four columns. ‘CTRL + Spacebar’, and then ‘CTRL + SHFT + plus’.

We can change the width here with ‘ALT + OCW’. I’ll change that to 10, and then ‘CTRL +

http://breakingintowallstreet.com

Spacebar’, ‘CTRL + SHFT + plus’. Press ‘F4’ to do that a few more times or ‘CMD + Y’ on the

Mac.

[02:00]

‘ALT + OCW’ to change the width, and then what we’re going to do here is press ‘SHFT + CTRL +

down arrow key’, and then we’re going to use the text to columns feature in Excel that we went

over briefly in the import/export lesson. Press ‘ALT + A’ to go to data or ‘ALT + D’ for data in

Excel 2003 and older versions, ‘ALT + AE’ for text to columns. On the Mac remember there’s no

shortcut key, so you have to use the mouse to go up to the ribbon to do this.

We want to select delimited here, and then next, and we want commas, and then for the

destination what we want to do is select F3 instead. So I actually use the mouse, you could just

press ‘F2’ or ‘CTRL + U’ on the Mac to exit out and go and do that. And then I’m going to say

finish, and then you can see how all of these are separate fields now.

[03:00]

Now we still have a problem, which is that the states here, these are not correct. We have

some issues here and we have some extra spaces, so we’re going to have to fix that. But what

we can do now is go in and insert some extra columns. Now before I even get started with this,

make sure that you go to ‘ALT + TO’ in Excel or ‘CMD + comma’ on the Mac.

Go to formulas or calculations. Make sure that workbook calculations are set to ‘automatic’. If

they’re not then this is not going to work correctly the first time around, so make sure these are

set to ‘automatic’. And so what I’m going to do here is use the trim and proper functions to get

rid of all these extra spaces, and then to get the capitalization correct.

http://breakingintowallstreet.com

So I’m going to say equals proper trim and we’re going to target this text. We’re going to copy

this formula all the way down with ‘CTRL + C’ or ‘CMD + C’ on the Mac; ‘CTRL + down arrow

key’, and then ‘ALT + ESF’ or ‘CTRL + ALT + VF’ or ‘CMD + CTRL + V’ and then ‘CMD + F’ on the

Mac. Then we can just copy this over with ‘CTRL + C’ and ‘CTRL + V’ or ‘CMD + C’ and ‘CMD + V’

on the Mac. We have that.

[04:00]

And then we have all of these in place, so all of these are correct now. What I am now going to

do is copy and paste values. So I’m going to use the ‘SHFT + right arrow keys’, ‘SHFT + CTRL +

down’, ‘CTRL + C’ or ‘CMD + C’ on the Mac, then ‘ALT + ESV’ or ‘CTRL + ALT + VV’ or ‘CMD + V’

on the Mac. So you have those. And then what I can do now is delete the unnecessary columns.

So I’m going to highlight this first one, which does not have the correct capitalization, ‘CTRL +

Spacebar’, ‘CTRL + minus’.

Do the same thing for the cities, ‘CTRL + Spacebar’, ‘CTRL + minus’, and then do the same thing

for the states and zip codes; ‘CTRL + Spacebar’, ‘CTRL + minus’. Now let’s auto-fit these column

widths so we can see everything. So ‘SHFT + CTRL’, and then ‘SHFT + right arrow key’, ‘ALT +

OCA’. On the Mac you cannot do this. You have to go to the home ribbon, and go to format and

then auto-fit column width instead.

[05:00]

So let’s take a look at this. The addresses here look much better. The cities look much better,

but these states and zip codes are still not quite right. What can we do to fix this? The easiest

thing to do here is to actually insert a few extra columns, so I’m going to press ‘CTRL +

spacebar’, and then ‘CTRL + SHFT + plus’ to insert those two columns twice. Now what we can

do here is use the text to columns feature, just on this part.

http://breakingintowallstreet.com

So I’m going to highlight this whole thing, ‘SHFT + CTRL + down arrow key’, and then ‘ALT + AE’

or ‘ALT + DE’ in older versions of Excel. On the Mac you have to use the mouse, go up to data,

delimited. We are separating these by a space, so let’s do that. And then for the destination cell

let’s say I3, and we’ll say OK to this warning message error that Excel gives us, so we have that.

[06:00]

So we have all of these in place now, and so what we need to do very quickly is just uppercase,

capitalize properly everything in these state abbreviations. So let’s go through and do that

quickly. I’m going to delete the contents of this column, because we don’t need it anymore.

And then we’re just going to use the upper function. Remember this converts a text string to all

uppercase letters.

Use that on the states, and then ‘CTRL + down arrow key’, and then ‘SHFT + CTRL + up’, then

‘ALT + ESF’ or ‘CTRL + ALT + VF’ or ‘CMD + CTRL + V’ and then ‘CMD + F’ on the Mac. We have

this. And then once again, we can just copy and paste these values using the same shortcut

combinations, except we select values under paste special now and then ‘CTRL + spacebar’ to

highlight this column, and then ‘CTRL + minus sign’ to delete it. So now if we go and look at

some of these, you can see that it’s much better overall.

[07:00]

Now there are still some issues here. For example, we’re missing a trailing zero in front of this

zip code. And if you really look and dig into the data here, you’ll see that there are some other

problems. For example, whenever we have NE or SW here, or something of that nature. Well,

we have an issue because it’s not quite displaying correctly. What’s going on here is that the NE

that is capital N, and lowercase E rather than all caps there.

http://breakingintowallstreet.com

So there are some issues and with these zip codes, similar problems apply. And if you really

want to, if you want an added challenge you could go in and fix this. Probably the easiest way

to do is to look in this column. Look for anything that only has two letters in it, and then make

sure you apply proper to that. So you can do a search and then apply the upper function to

that. So that’s probably the easiest thing to do here, if you want an added challenge, but we’re

just going to leave it at this, because I don’t want this lesson to go on for too long.

[08:00]

So what you should do now is replicate everything that I just showed you. So take this data and

convert it into the columns that you see here using the steps that I just went through. What I’m

going to do now is just delete all of this data, so that you do not see it anymore, and then you

can go in and try it yourself, and maybe you’ll even find a slightly better way to do it, over what

I showed you just now. So I’m going to delete this. I am going to remove these columns.

So pause this video right now. Give it a shot yourself. If you get confused or you can’t quite get

it, un-pause it, then we’ll walk through it, and I’ll go through it a bit more slowly this time

around and show you all the shortcut keys. Okay, good. So here’s what we’re going to do. Once

again, we’re going to start by inserting extra columns over here. So I’m going to press ‘CTRL +

spacebar’, and then ‘CTRL + SHFT + plus’ a few times to insert some extra columns. I’m going to

change the width of all these columns by going ‘ALT + OCW’. I’ll change it to 10.

[09:00]

And then what we’re going to do is select this whole area and say, ‘SHFT + CTRL + Down’, and

‘ALT + AE’ for text to columns or ‘ALT + DE’ in older versions of Excel or use the mouse on the

Mac. We want delimited. So we’ll say next, and then we are going to have a space, actually

http://breakingintowallstreet.com

we’re going to have a comma separating all of these values for now. So we have a comma in

between all of these. I’m going to hit next once again using ‘ALT + N’ or the mouse.

We’re going to stick to general. For destination, I’m going to say F3 right here, and then we’re

going to press ‘ALT + F’ for finish or just click this. And Excel is giving this warning. We’ll just say

OK. So we have all of this in place now, and we can actually delete this data, because we don’t

need it anymore. So ‘SHFT + CTRL + down arrow key’, and then hit the delete key. Now this time

around I’m going to do some more thinking about how to properly do this, and maybe how to

do it a little bit more efficiently.

[10:00]

So what we can do now is use the proper function and I’m just going to apply it to these two

columns, because clearly we don’t need to apply a proper to these state abbreviations. So that

can save us some time this time around. So let’s say proper and then trim. Trim is to get rid of

the extra spaces in between here on this data. So we have that, and we can see it’s already

looking much better.

And then we’ll insert another column here, so ‘CTRL + spacebar’, and then ‘CTRL + SHFT + plus’,

and then we’re going to apply the same function right here to these cities. So now let’s copy

these formulas down, so ‘CTRL + C’, and then ‘CTRL + down arrow key’, and then ‘SHFT + CTRL +

up arrow key’, and then we can use ‘ALT + ESF’, and then ‘CMD + CTRL + V’ and ‘CMD + F’ on

the Mac to paste all of these in, paste these formulas. We have that.

[11:00]

And then we’re going to do the same thing for these others over here. So just take this formula,

‘CTRL + C’ or ‘CMD + C’ on the Mac, ‘CTRL + Down arrow key’, and then ‘SHFT + CTRL + up arrow

key’, and then ‘ALT + ESF’ or ‘CMD + CTRL + V’, ‘CMD + F’ to paste formulas, and we have all of

http://breakingintowallstreet.com

these in place, and the capitalization is fixed for the most part. Now what we can do is just

paste all these as values and get rid of the unnecessary data. So I’m going to use the ‘SHFT +

right arrow key’, ‘CTRL + C’ or ‘CMD + C’, and then ‘SHFT + CTRL + up’. So copy this whole area,

‘CTRL + C’ or ‘CMD + C’.

And then special paste as values, so ‘ALT + ESV’ on the PC or ‘CMD + CTRL + V’, ‘CMD + V’ on the

Mac. So we have that, and all of these are in place now. And so now what we can do is get rid

of some of the data that we don’t need with poor capitalization, so this column for example.

‘CTRL + spacebar’, ‘CTRL + minus key’ gets rid of it. And then the same thing right over here.

[12:00]

Now these state abbreviations, what do we actually need to fix here? Well, the capitalization is

already all correct, but we want to do two things. Number one is we want to separate the

states and the zip codes, and then number two is we want to get rid of all these extra trailing

spaces. So to do that what I can do is go to the top, and enter the trim function, equals trim.

Apply it to these state abbreviations, state, and zip codes really, so we have that.

And then same idea ‘CTRL + C’, go to the bottom and then ‘ALT + ESF’ or ‘CMD + CTRL + V’,

‘CMD + F’ on the Mac, paste formulas, so we have that. And then once again, we can copy and

paste this whole area, ‘SHFT + CTRL + up arrow key’, and then ‘CTRL + C’, ‘ALT + ESV’ or ‘CMD +

CTRL + V’ or ‘CMD + V’ on the Mac. And so we have this, and you can see how we’ve gotten rid

of the extra trailing spaces, and possibly the extra following spaces here.

[13:00]

So let’s delete this unnecessary column now. And then the last thing I’m going to do is just

separate the state and zip code here. So I’m going to insert a new column, ‘CTRL + spacebar’,

http://breakingintowallstreet.com

and then ‘CTRL + SHFT + plus sign’ to do that. Highlight this whole with ‘SHFT + CTRL + down

arrow’, ‘ALT + AE’ for text to columns or ‘ALT + DE’ in older versions of Excel.

We want delimited. This time we have a space separating both of these, so we want that. So we

selected space, and then ‘ALT + N’ or just click on next. And then destination cell, we want H3

right here. So I’m going to go finish or ‘ALT + F’ for that. We’re going to ignore this warning and

you can just say OK, and so we have that. We can see that once again, it’s not perfect, some of

our zip codes here are not quite right. So this zero for example, Excel has deleted this

inadvertently, but overall the data is in much better shape. So we have that.

[14:00]

And now as the final step what we can do is just write address at the top here, and then city,

state, zip, and we can auto-fit all of these columns ‘SHFT + CTRL + down’, hold down ‘SHFT’ and

release ‘CTRL’ and then right arrow key a few times. And then a couple of ways we can auto-fit.

We can go to ‘ALT + OCW’ or ‘ALT + H’ for home. O for format. and I for auto-fit. Column width

on the Mac you have to use the mouse, go to the home ribbon, format, and then do the same

things, and we have that set and formatted now.

So again, this is not perfect. Some of the data here will still need to be cleaned up. For example,

the issue that I mentioned before, the abbreviations for the addresses not quite being right.

There’s the issue with the zip codes here, and how some of the zeros have been deleted. So if

you want an extra challenge, you can go in now and try to use some of the functions we went

over before, like length for example, to fix this issue with the zip codes. And do the same thing

really with the addresses here. So as an extra challenge if you want to further fix the data, you

can do that.

[15:00]

http://breakingintowallstreet.com

But for now we’re going to stop, because this is already looking much better. It’s much more

readable, and understandable, and easier to manipulate in Excel. So that’s it for our lesson on

how to clean up data by using all of these functions; trim, proper, clean, and then the normal,

copy and paste formulas and values, and text to column functions; as well as the basic

navigation control arrow keys, and ‘CTRL + SHFT + arrow keys’ in Excel.

Now you should have a better idea of how to use this, and how you can use it in real life to

clean up data. So that’s it for this lesson. Coming up next we’re going to be jumping into sorting

and filtering data, and then go into a few more advanced topics on the subject of formatting

within Excel.

http://breakingintowallstreet.com