cleaning up excel data
DESCRIPTION
Excel DataTRANSCRIPT
http://breakingintowallstreet.com
Formatting: Cleaning Up Data
Hello and welcome to our lesson on cleaning up data, which is one of the final ones in this
Formatting Module. What we’re going to be doing in this lesson is using some of these
functions that you see onscreen. So trim, trim with spaces, proper to properly capitalize. Well,
not really capitalize, but capitalize each word in text. Clean to remove non-printable characters,
and then value and text possibly, to change around the formatting of text and numbers.
We’ll be using those along with some of the copy and paste functions in Excel to take our
address data over here in this Customer Order due diligence file that shows us customer orders
of the past 10 years or over a 10-year period. We’re going to go in and fix all these addresses.
You can see the problem right now that some parts of them, like the state abbreviations are
properly capitalized whereas the cities are not, and the addresses themselves are also not
properly capitalized.
[01:00]
So we’re going to go in and fix all that using these functions. What I’m going to do here is since
we’re really just going over functions that we’ve learned previously, I’m going to do it the first
time, very quickly. Then you’re going to try it on your own via an exercise and then if you don’t
get it or you forget the keys or you can’t reproduce it, then we’ll walk through it more slowly
together, and I’ll go through each step and all the shortcut keys in detail.
So here is what we’re going to do first. The first thing I like to do here is create some extra
columns, so that we have actual space to do this. Now here we can see that we basically have
four fields. We have the address itself; then we have the city, then the state, and then the zip
code. So we’re going to need four columns. ‘CTRL + Spacebar’, and then ‘CTRL + SHFT + plus’.
We can change the width here with ‘ALT + OCW’. I’ll change that to 10, and then ‘CTRL +
http://breakingintowallstreet.com
Spacebar’, ‘CTRL + SHFT + plus’. Press ‘F4’ to do that a few more times or ‘CMD + Y’ on the
Mac.
[02:00]
‘ALT + OCW’ to change the width, and then what we’re going to do here is press ‘SHFT + CTRL +
down arrow key’, and then we’re going to use the text to columns feature in Excel that we went
over briefly in the import/export lesson. Press ‘ALT + A’ to go to data or ‘ALT + D’ for data in
Excel 2003 and older versions, ‘ALT + AE’ for text to columns. On the Mac remember there’s no
shortcut key, so you have to use the mouse to go up to the ribbon to do this.
We want to select delimited here, and then next, and we want commas, and then for the
destination what we want to do is select F3 instead. So I actually use the mouse, you could just
press ‘F2’ or ‘CTRL + U’ on the Mac to exit out and go and do that. And then I’m going to say
finish, and then you can see how all of these are separate fields now.
[03:00]
Now we still have a problem, which is that the states here, these are not correct. We have
some issues here and we have some extra spaces, so we’re going to have to fix that. But what
we can do now is go in and insert some extra columns. Now before I even get started with this,
make sure that you go to ‘ALT + TO’ in Excel or ‘CMD + comma’ on the Mac.
Go to formulas or calculations. Make sure that workbook calculations are set to ‘automatic’. If
they’re not then this is not going to work correctly the first time around, so make sure these are
set to ‘automatic’. And so what I’m going to do here is use the trim and proper functions to get
rid of all these extra spaces, and then to get the capitalization correct.
http://breakingintowallstreet.com
So I’m going to say equals proper trim and we’re going to target this text. We’re going to copy
this formula all the way down with ‘CTRL + C’ or ‘CMD + C’ on the Mac; ‘CTRL + down arrow
key’, and then ‘ALT + ESF’ or ‘CTRL + ALT + VF’ or ‘CMD + CTRL + V’ and then ‘CMD + F’ on the
Mac. Then we can just copy this over with ‘CTRL + C’ and ‘CTRL + V’ or ‘CMD + C’ and ‘CMD + V’
on the Mac. We have that.
[04:00]
And then we have all of these in place, so all of these are correct now. What I am now going to
do is copy and paste values. So I’m going to use the ‘SHFT + right arrow keys’, ‘SHFT + CTRL +
down’, ‘CTRL + C’ or ‘CMD + C’ on the Mac, then ‘ALT + ESV’ or ‘CTRL + ALT + VV’ or ‘CMD + V’
on the Mac. So you have those. And then what I can do now is delete the unnecessary columns.
So I’m going to highlight this first one, which does not have the correct capitalization, ‘CTRL +
Spacebar’, ‘CTRL + minus’.
Do the same thing for the cities, ‘CTRL + Spacebar’, ‘CTRL + minus’, and then do the same thing
for the states and zip codes; ‘CTRL + Spacebar’, ‘CTRL + minus’. Now let’s auto-fit these column
widths so we can see everything. So ‘SHFT + CTRL’, and then ‘SHFT + right arrow key’, ‘ALT +
OCA’. On the Mac you cannot do this. You have to go to the home ribbon, and go to format and
then auto-fit column width instead.
[05:00]
So let’s take a look at this. The addresses here look much better. The cities look much better,
but these states and zip codes are still not quite right. What can we do to fix this? The easiest
thing to do here is to actually insert a few extra columns, so I’m going to press ‘CTRL +
spacebar’, and then ‘CTRL + SHFT + plus’ to insert those two columns twice. Now what we can
do here is use the text to columns feature, just on this part.
http://breakingintowallstreet.com
So I’m going to highlight this whole thing, ‘SHFT + CTRL + down arrow key’, and then ‘ALT + AE’
or ‘ALT + DE’ in older versions of Excel. On the Mac you have to use the mouse, go up to data,
delimited. We are separating these by a space, so let’s do that. And then for the destination cell
let’s say I3, and we’ll say OK to this warning message error that Excel gives us, so we have that.
[06:00]
So we have all of these in place now, and so what we need to do very quickly is just uppercase,
capitalize properly everything in these state abbreviations. So let’s go through and do that
quickly. I’m going to delete the contents of this column, because we don’t need it anymore.
And then we’re just going to use the upper function. Remember this converts a text string to all
uppercase letters.
Use that on the states, and then ‘CTRL + down arrow key’, and then ‘SHFT + CTRL + up’, then
‘ALT + ESF’ or ‘CTRL + ALT + VF’ or ‘CMD + CTRL + V’ and then ‘CMD + F’ on the Mac. We have
this. And then once again, we can just copy and paste these values using the same shortcut
combinations, except we select values under paste special now and then ‘CTRL + spacebar’ to
highlight this column, and then ‘CTRL + minus sign’ to delete it. So now if we go and look at
some of these, you can see that it’s much better overall.
[07:00]
Now there are still some issues here. For example, we’re missing a trailing zero in front of this
zip code. And if you really look and dig into the data here, you’ll see that there are some other
problems. For example, whenever we have NE or SW here, or something of that nature. Well,
we have an issue because it’s not quite displaying correctly. What’s going on here is that the NE
that is capital N, and lowercase E rather than all caps there.
http://breakingintowallstreet.com
So there are some issues and with these zip codes, similar problems apply. And if you really
want to, if you want an added challenge you could go in and fix this. Probably the easiest way
to do is to look in this column. Look for anything that only has two letters in it, and then make
sure you apply proper to that. So you can do a search and then apply the upper function to
that. So that’s probably the easiest thing to do here, if you want an added challenge, but we’re
just going to leave it at this, because I don’t want this lesson to go on for too long.
[08:00]
So what you should do now is replicate everything that I just showed you. So take this data and
convert it into the columns that you see here using the steps that I just went through. What I’m
going to do now is just delete all of this data, so that you do not see it anymore, and then you
can go in and try it yourself, and maybe you’ll even find a slightly better way to do it, over what
I showed you just now. So I’m going to delete this. I am going to remove these columns.
So pause this video right now. Give it a shot yourself. If you get confused or you can’t quite get
it, un-pause it, then we’ll walk through it, and I’ll go through it a bit more slowly this time
around and show you all the shortcut keys. Okay, good. So here’s what we’re going to do. Once
again, we’re going to start by inserting extra columns over here. So I’m going to press ‘CTRL +
spacebar’, and then ‘CTRL + SHFT + plus’ a few times to insert some extra columns. I’m going to
change the width of all these columns by going ‘ALT + OCW’. I’ll change it to 10.
[09:00]
And then what we’re going to do is select this whole area and say, ‘SHFT + CTRL + Down’, and
‘ALT + AE’ for text to columns or ‘ALT + DE’ in older versions of Excel or use the mouse on the
Mac. We want delimited. So we’ll say next, and then we are going to have a space, actually
http://breakingintowallstreet.com
we’re going to have a comma separating all of these values for now. So we have a comma in
between all of these. I’m going to hit next once again using ‘ALT + N’ or the mouse.
We’re going to stick to general. For destination, I’m going to say F3 right here, and then we’re
going to press ‘ALT + F’ for finish or just click this. And Excel is giving this warning. We’ll just say
OK. So we have all of this in place now, and we can actually delete this data, because we don’t
need it anymore. So ‘SHFT + CTRL + down arrow key’, and then hit the delete key. Now this time
around I’m going to do some more thinking about how to properly do this, and maybe how to
do it a little bit more efficiently.
[10:00]
So what we can do now is use the proper function and I’m just going to apply it to these two
columns, because clearly we don’t need to apply a proper to these state abbreviations. So that
can save us some time this time around. So let’s say proper and then trim. Trim is to get rid of
the extra spaces in between here on this data. So we have that, and we can see it’s already
looking much better.
And then we’ll insert another column here, so ‘CTRL + spacebar’, and then ‘CTRL + SHFT + plus’,
and then we’re going to apply the same function right here to these cities. So now let’s copy
these formulas down, so ‘CTRL + C’, and then ‘CTRL + down arrow key’, and then ‘SHFT + CTRL +
up arrow key’, and then we can use ‘ALT + ESF’, and then ‘CMD + CTRL + V’ and ‘CMD + F’ on
the Mac to paste all of these in, paste these formulas. We have that.
[11:00]
And then we’re going to do the same thing for these others over here. So just take this formula,
‘CTRL + C’ or ‘CMD + C’ on the Mac, ‘CTRL + Down arrow key’, and then ‘SHFT + CTRL + up arrow
key’, and then ‘ALT + ESF’ or ‘CMD + CTRL + V’, ‘CMD + F’ to paste formulas, and we have all of
http://breakingintowallstreet.com
these in place, and the capitalization is fixed for the most part. Now what we can do is just
paste all these as values and get rid of the unnecessary data. So I’m going to use the ‘SHFT +
right arrow key’, ‘CTRL + C’ or ‘CMD + C’, and then ‘SHFT + CTRL + up’. So copy this whole area,
‘CTRL + C’ or ‘CMD + C’.
And then special paste as values, so ‘ALT + ESV’ on the PC or ‘CMD + CTRL + V’, ‘CMD + V’ on the
Mac. So we have that, and all of these are in place now. And so now what we can do is get rid
of some of the data that we don’t need with poor capitalization, so this column for example.
‘CTRL + spacebar’, ‘CTRL + minus key’ gets rid of it. And then the same thing right over here.
[12:00]
Now these state abbreviations, what do we actually need to fix here? Well, the capitalization is
already all correct, but we want to do two things. Number one is we want to separate the
states and the zip codes, and then number two is we want to get rid of all these extra trailing
spaces. So to do that what I can do is go to the top, and enter the trim function, equals trim.
Apply it to these state abbreviations, state, and zip codes really, so we have that.
And then same idea ‘CTRL + C’, go to the bottom and then ‘ALT + ESF’ or ‘CMD + CTRL + V’,
‘CMD + F’ on the Mac, paste formulas, so we have that. And then once again, we can copy and
paste this whole area, ‘SHFT + CTRL + up arrow key’, and then ‘CTRL + C’, ‘ALT + ESV’ or ‘CMD +
CTRL + V’ or ‘CMD + V’ on the Mac. And so we have this, and you can see how we’ve gotten rid
of the extra trailing spaces, and possibly the extra following spaces here.
[13:00]
So let’s delete this unnecessary column now. And then the last thing I’m going to do is just
separate the state and zip code here. So I’m going to insert a new column, ‘CTRL + spacebar’,
http://breakingintowallstreet.com
and then ‘CTRL + SHFT + plus sign’ to do that. Highlight this whole with ‘SHFT + CTRL + down
arrow’, ‘ALT + AE’ for text to columns or ‘ALT + DE’ in older versions of Excel.
We want delimited. This time we have a space separating both of these, so we want that. So we
selected space, and then ‘ALT + N’ or just click on next. And then destination cell, we want H3
right here. So I’m going to go finish or ‘ALT + F’ for that. We’re going to ignore this warning and
you can just say OK, and so we have that. We can see that once again, it’s not perfect, some of
our zip codes here are not quite right. So this zero for example, Excel has deleted this
inadvertently, but overall the data is in much better shape. So we have that.
[14:00]
And now as the final step what we can do is just write address at the top here, and then city,
state, zip, and we can auto-fit all of these columns ‘SHFT + CTRL + down’, hold down ‘SHFT’ and
release ‘CTRL’ and then right arrow key a few times. And then a couple of ways we can auto-fit.
We can go to ‘ALT + OCW’ or ‘ALT + H’ for home. O for format. and I for auto-fit. Column width
on the Mac you have to use the mouse, go to the home ribbon, format, and then do the same
things, and we have that set and formatted now.
So again, this is not perfect. Some of the data here will still need to be cleaned up. For example,
the issue that I mentioned before, the abbreviations for the addresses not quite being right.
There’s the issue with the zip codes here, and how some of the zeros have been deleted. So if
you want an extra challenge, you can go in now and try to use some of the functions we went
over before, like length for example, to fix this issue with the zip codes. And do the same thing
really with the addresses here. So as an extra challenge if you want to further fix the data, you
can do that.
[15:00]
http://breakingintowallstreet.com
But for now we’re going to stop, because this is already looking much better. It’s much more
readable, and understandable, and easier to manipulate in Excel. So that’s it for our lesson on
how to clean up data by using all of these functions; trim, proper, clean, and then the normal,
copy and paste formulas and values, and text to column functions; as well as the basic
navigation control arrow keys, and ‘CTRL + SHFT + arrow keys’ in Excel.
Now you should have a better idea of how to use this, and how you can use it in real life to
clean up data. So that’s it for this lesson. Coming up next we’re going to be jumping into sorting
and filtering data, and then go into a few more advanced topics on the subject of formatting
within Excel.