data management workshop speaker: ingrid richterlab200c.psych.columbia.edu/classes/class -...
TRANSCRIPT
ColumbiaUniversity
DepartmentofPsychology
DataManagementWorkshopSpeaker:IngridRichter
Friday,February12,2016
2:00p.m.Schermerhorn200BAbstract:Learnmoreaboutdatamanagementanddigitalorganization.Topicscoveredwillinclude:organizingfilesonmultiplecomputerswithmultipleusers,onlinesolutions,softwaretools,andworkflowmanagementtips.
DATAMANAGEMENT
Page2of15
IndexACQUIRINGDATA
• General ……………………………………………………………………………. Page03• Storage …………………………………………………………………………… Page04• Formats …………………………………………………………………………… Page05• Workflow ……………………...……………………………………………………. Page06
ACCESSINGDATA
• Security ............................................................................................................. Page07• Directories ............................................................................................................. Page08• Files/Folders............................................................................................................. Page09• Tools ............................................................................................................. Page10• Sharing ............................................................................................................. Page11
ARCHIVINGDATA
• Backups ............................................................................................................. Page12• ColdStorage ............................................................................................................. Page13• Columbia ............................................................................................................. Page14• Notes ............................................................................................................ Page15
DATAMANAGEMENT
Page3of15
GENERAL
DataLifecyclebyMushonz,WikimediaCommmons
Definition:
“Datamanagementisthedevelopment,executionandsupervisionofplans,policies,programsandpracticesthatcontrol,protect,deliverandenhancethevalueofdataandinformationassets."–DAMA
Notes:• Don’tlosewhatyoucurrentlyhave• Backeverythingupfirst• Workfromacopy,nottheoriginal• Documenteverything• Beconsistentwithnaming
DATAMANAGEMENT
Page4of15
STORAGE
MacStorageReportingvs.WindowsStorageReporting
Notes:
• Keepatleast10%ofyourstoragespacefree• Uninstallprogramsyounolongeruse• Emptyyourbrowsercachesregularly• Keepyourdesktopclean:onlystorealiasesorshortcuts• Keepyourdownloadsclean:onlystoresoftwaredownloadsandemptyitregularly
Location:
• Pickacentrallocationforallfiles/folders/software• Haveonemasterfolderwithmanysub-folders• Mapnetworkdrivestothislocationifitisonadifferentcomputer/server• Don’trelyonSpotlightorWindowsSearchtofindyourfiles
Why?
Computersandserversactstrangewhenthey’rerunningoutofspace.Thecleandesktopgivesyouablankslateandconcentratesyourfocusonthemostimportantproject.Thecentralstoragelocationmeansyou’llalwaysknowwheretolookanditmakesbackupeasier.SpotlightandSearcharenotasaccurateasdrillingdowntotheproperlocation.
DATAMANAGEMENT
Page5of15
FORMATS:
Magnetictapes&CD-Rs
Notes:
• Becautiousofproprietaryhardware&software• Don’tletyourdatabecomeobsoleteandunreadable• Actsoonerratherthanlater• Asyoumoveawayfromoldertechnology(magnetictape,CD-R/DVD-R),itbecomes
hardertoconvertthemintoreadableformsReadableFormats:
• Defaulttocommonformatsforyourworkingfileswhenpossible• Quicktest–canyourfilebeopenedbydraggingitintoawebbrowser?• Easilyreadableformatsinclude:TXT,PDF,JPG,GIF,PNG,WAV,MP3,MP4
StoringData:
• Fordatathatneedstobelinear,useadocument• Fordatathatneedstobesorted,useaspreadsheet• Fordatathatneedstobesearched,useadatabase
Why?
Readableformatsareeasytoworkwith,arecross-platform,andcanbeuploadedandlinkedtodirectlyfromtheweb.
DATAMANAGEMENT
Page6of15
WORKFLOW:
Emailfolders&archivedWorkNotes
Notes:
• Keepadailyworklogwitha“todo”list• Keepamonthlyprojectslist
Email:
• Keepyourpersonalandworke-mailaccountsseparate• Useyourinboxaspartofyouryour“todo”list• Immediatelymovemessagesthatyou’verespondedtointofolders• Keepyourinboxunderonescreen/page• Archivemessagesifyouhaven’trespondedtotheminamonth
Why?
Theworklogandprojectslistgivesyousomethingtocheckbackonfromtimetotime.Thiskeepsthedataandworkflowmanageable,andyoucantraceyourprogress.Thecleanemailin-boxhelpswithfocus.
DATAMANAGEMENT
Page7of15
SECURITY:
Filevaultvs.BitLocker
Notes:
• Whoneedsaccesstoyourfiles&folders?• Password-protectyourcomputerandscreensaver• Doesyourdataneedtobeencryptedorpassword-protected?• Rememberthepasswordorencryptionkey
Encryption:
• ForMacEncryption,useFileVault:Apple->SystemPreferences->SecurityandPrivacy->FileVault
• ForWindowsEncryption(Pro&Enterprise),useBitLocker:Start->ControlPanel–Security->BitlockerDriveEncryption.Toencrypt:Right-clickfolderandchooseTurnonBitLocker
• Foreverythingelse,useTrueCrypt:Nolongermaintained,butstillavailable:https://truecrypt.ch
PasswordProtection:• Password-protectAcrobatPDFs(File->Properties->Security->PasswordProtect)• Password-protectWord/Exceldocuments(Word->Preferences->Security)• Password-protectWindowsFoldersthroughWinZip(WinZip->Option->Password)• Password-protectMacFoldersthroughDiskUtility:
http://blogs.cuit.columbia.edu/ikr2107/mac-password-protect-folders/
DATAMANAGEMENT
Page8of15
DIRECTORIES:
Mac:QuickPrintdirectorycontentsbycutandpaste
Notes:
• Directorylistscanprovideaquickindextothecontentsofyourprojects• Keepsacompleteinventoryofallfilesonyourcomputer
Instructions:
• QuickprintdirectorycontentsonaMac:OpenFinder,expandthefolderyouwanttoinventory,selectallthefiles(Command+A)andcopy(Command+C).OpenTextEdit(orNotepad),changetoPlainText(Format->PlainText)andpaste(Command+V)thecontentsintotheplaintextdocument.
• PrintdirectorycontentsinMac/Linux:OpenTerminal,changetofolderandtypels-R>inventory.txt
• PrintdirectorycontentsinWindows:Opencommandprompt,changetofolderandtype:dir/s/b>inventory.txt
Tip:
DragafileorfolderintoTerminal(Mac)ortheCommandprompt(Windows)insteadoftypingthefullpathtochangeintothatfolder(cd<dragged_link>)
Why?It’seasierandfastertosearchatextfilethanfilesandfolders.You’llhaveacurrentworkinginventoryofeverythingonthecomputer(orserver)thatyoucanstoreelsewhere.YoucanalsoimportthislistintoExceloradatabaseforreporting.
DATAMANAGEMENT
Page9of15
FILES&FOLDERS:
MacFinderfoldersvsWindowsExplorerfolders
Notes:
• Useuniquenamesforallfilesandfolders• Don’trelyonthedatecreated/modifiedfield
Files:
• Addthefulldate,yearfirst:Notes_2016_0212.txt• Keepordersofmagnitudeinmind(Morethan10?Morethan100?)• Addzeroestoyournumbers:usepage001insteadofpage1• Neverspelloutnumbersordatesinthefilename
Folders:
• Usesimple,1-2wordnamesforfolders• Don’tusespacesinfoldernames(useunderscoresorhyphens)• Considertop-levelalphabeticfolders(A,B,C)oryears/semesters(2016,2015_Fall)
Why?
Tocreateacleanerfilingsystemwithfewernamestolookatandtoavoidfutureproblemswithmergingfolders.Nospacesinthefoldernamesmakeiteasierforuploadingfilestotheweband/orspecifyingdatapaths.Anunderscore(_)orhyphen(-)makesthepathmorereadablethana%20sign.
DATAMANAGEMENT
Page10of15
TOOLS:
NameChangervs.BulkRenameUtility
Notes:
• Usebulk-renameutilitiestostandardizenamesinafolder• Usescriptstoperformconsistent,repeatedaction(i.e.Photoshopforimages)• Usede-dupingtoolstoidentifyandremoveduplicatefilesandfolders
Software:• NameChanger(Mac)
https://mrrsoftware.com/namechanger/NameChangerisdesignedforthesolepurposeofrenamingalistoffiles.
• BulkRenameUtility(Windows)http://www.bulkrenameutility.co.uk/BulkRenameUtilityallowsyoutoeasilyrenamefilesandentirefoldersbaseduponextremelyflexiblecriteria.Adddate/timestamps,replacenumbers,inserttext,convertcase,addauto-numbers,processfoldersandsub-folders....plusawholelotmore!
• DupeGuru(Mac&Windows)https://www.hardcoded.net/dupeguru/dupeGuruisatooltofindduplicatefilesonyourcomputer.Itcanscaneitherfilenamesorcontents.NotonlycanyoudeleteduplicatesfilesdupeGurufinds,butyoucanalsomoveorcopythemelsewhere.
Why?Streamlineyourfilesnamesandremoveduplicates.Helpfulforrenamingphotoimports(i.e.DSCF4119.JPGintoNYC_2016_0212–Columbia.jpg)orremoving/addingextensionstofilenames.
DATAMANAGEMENT
Page11of15
SHARING:
ColumbiaBlogvs.ColumbiaWiki
Notes:
• Documenteverything(composeinWord,dateit,save&shareasPDF)• Storeeverythingpublicontheweb(userobots.txttoblocktraffic)• UseYouTubeforaudioandvideostorage• UseAmazonCloudDrive,AppleiCloud,Flickr,Google+,etc.forphotostorage
Blogs:
• Useblogsforsearching/taggingdataandasauser-friendlydatabase• ColumbiaoffersafreeWordpressblogat:
http://blogs.cuit.columbia.edu
Wikis:• UseWikisforcollaborativedocumentsandextendeddiscussions• ColumbiaWiki(general):
https://wikis.cuit.columbia.edu/• ColumbiaWikispaces(forcourses):
https://www.wikispaces.columbia.edu/home• ColumbiaWikischolars(forcollaborativeresearch):
http://www.wikischolars.columbia.edu/Why?
Completefileaccessacrossallbrowserplatforms(includingphonesandtablets)atatalltimes.Ifyou’renervousaboutcontentorimagesgettingindexedbysearchengines,youcantellthesearchenginespidersnottoindexfoldersorcontentonyourserverthrougharobots.txtfile.
DATAMANAGEMENT
Page12of15
BACKUPS:
TimeMachinevs.WindowsBackup
Notes:
• Assumetheworstwillhappen• Testyourbackupsregularly(harddrivesfail)• Identifyyourirreplaceabledata• Considerthebackupruleofthree(3copiesin2formatswith1locatedoffsite)
Software:
• AutomateMacBackupwithTimeMachine(Apple->SystemPreferences->TimeMachine).Itdoesn’thavetobeanexternalharddrive–youcanuseanothercomputerasaTimeMachinebackupwiththepurchaseOSXServersoftwarefromAppStorefor$20.TimeMachinerequires2-3timesthestorageofthefilesbeingbackedup.
• AutomateWindowsBackupwithBackupandRestore(Start->ControlPanel->BackupandRestore).NotasconvenientasTimeMachine.
Why?
Notonlytohaveasecondcopy(orthird,forirreplaceabledata),buttoalsogiveyoutheflexibilityofmovingtoadifferentcomputerifyourscrashes.Backupsareessentialfordisasterrecovery.
DATAMANAGEMENT
Page13of15
COLDSTORAGE:
AmazonGlaciervs.GoogleCloudStorageNearline
Notes:
• Usecoldstorageforprojectsthatarenolongeractive• Coldstoragechargesforstoring,placingandretrievingdata• Backupisfairlyrobustandhandledbythecompany
Options:
• AmazonGlacier:https://aws.amazon.com/glacier/AmazonGlacierisasecure,durable,andextremelylow-coststorageservicefordataarchivingandlong-termbackup.Tokeepcostslow,AmazonGlacierisoptimizedforinfrequentlyaccesseddatawherearetrievaltimeofseveralhoursissuitable.
• 1GB=$0.007/month• 1,000requests=$0.05
• MicrosoftAzure:
https://azure.microsoft.com/en-us/MicrosoftAzureisagrowingcollectionofintegratedcloudservicesformovingfaster,achievingmore,andsavingmoney
• 1GB=$0.02/month• 1transaction=$0.0036perunit/month
• GoogleCloudStorageNearline:
https://cloud.google.com/storage/docs/nearlineGoogleCloudStorageNearlineisalow-cost,highlydurablestorageservicefordataarchiving,onlinebackup,anddisasterrecovery.
• 1GB=$0.01/month• Dataretrievalincursacostof$0.01perGB
Why?
Freesupspaceoncomputersandservers.Archivesprojects(atanongoingcost)
DATAMANAGEMENT
Page14of15
COLUMBIASTORAGE:
Columbia’sAcademicCommons
Notes:
• MonitorColumbia’sresourcestoseeifyoucanusethem
Options:• AcademicCommons:
http://academiccommons.columbia.edu/AcademicCommonsisColumbiaUniversity’sdigitalrepositorywherefaculty,students,andstaffofColumbiaanditsaffiliateinstitutionscandeposittheresultsoftheirscholarlyworkandresearch.ContentinAcademicCommonsisfreelyavailabletothepublic.
• Thereisnooveralllimitonthenumberofitemsyoucandeposit.• Filesupto100MBinsize.• Ifyouhavefilesover100MBinsize,pleasecontact
cuac@libraries.cul.columbia.eduandwewillarrangeanalternativewaytotransferthem.
• ResearchStorageServicePilot:http://rss.cuit.columbia.edu/Currentlynotacceptingnewusers.
• $1perGBstorageperyear(100GBminto2TBmax:$100-2,000)• Aneligibleresearchercanpurchasepersonalstoragespace(2TB)anda
maximumoftwogroupstoragespaces(+4TB).Why?
AcademicCommonsiscompletelyfree.ResearchStorageServicePilotmightbeagoodtestingground(whenitstartsacceptsnewusers)
DATAMANAGEMENT
Page15of15
NOTES: