data management workshop speaker: ingrid richterlab200c.psych.columbia.edu/classes/class -...

15
Columbia University Department of Psychology Data Management Workshop Speaker: Ingrid Richter Friday, February 12, 2016 2:00 p.m. Schermerhorn 200B Abstract: Learn more about data management and digital organization. Topics covered will include: organizing files on multiple computers with multiple users, online solutions, software tools, and workflow management tips.

Upload: others

Post on 18-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Management Workshop Speaker: Ingrid Richterlab200c.psych.columbia.edu/classes/Class - Data... · Data Management Workshop Speaker: Ingrid Richter Friday, February 12, 2016 2:00

ColumbiaUniversity

DepartmentofPsychology

DataManagementWorkshopSpeaker:IngridRichter

Friday,February12,2016

2:00p.m.Schermerhorn200BAbstract:Learnmoreaboutdatamanagementanddigitalorganization.Topicscoveredwillinclude:organizingfilesonmultiplecomputerswithmultipleusers,onlinesolutions,softwaretools,andworkflowmanagementtips.

Page 2: Data Management Workshop Speaker: Ingrid Richterlab200c.psych.columbia.edu/classes/Class - Data... · Data Management Workshop Speaker: Ingrid Richter Friday, February 12, 2016 2:00

DATAMANAGEMENT

Page2of15

IndexACQUIRINGDATA

• General ……………………………………………………………………………. Page03• Storage …………………………………………………………………………… Page04• Formats …………………………………………………………………………… Page05• Workflow ……………………...……………………………………………………. Page06

ACCESSINGDATA

• Security ............................................................................................................. Page07• Directories ............................................................................................................. Page08• Files/Folders............................................................................................................. Page09• Tools ............................................................................................................. Page10• Sharing ............................................................................................................. Page11

ARCHIVINGDATA

• Backups ............................................................................................................. Page12• ColdStorage ............................................................................................................. Page13• Columbia ............................................................................................................. Page14• Notes ............................................................................................................ Page15

Page 3: Data Management Workshop Speaker: Ingrid Richterlab200c.psych.columbia.edu/classes/Class - Data... · Data Management Workshop Speaker: Ingrid Richter Friday, February 12, 2016 2:00

DATAMANAGEMENT

Page3of15

GENERAL

DataLifecyclebyMushonz,WikimediaCommmons

Definition:

“Datamanagementisthedevelopment,executionandsupervisionofplans,policies,programsandpracticesthatcontrol,protect,deliverandenhancethevalueofdataandinformationassets."–DAMA

Notes:• Don’tlosewhatyoucurrentlyhave• Backeverythingupfirst• Workfromacopy,nottheoriginal• Documenteverything• Beconsistentwithnaming

Page 4: Data Management Workshop Speaker: Ingrid Richterlab200c.psych.columbia.edu/classes/Class - Data... · Data Management Workshop Speaker: Ingrid Richter Friday, February 12, 2016 2:00

DATAMANAGEMENT

Page4of15

STORAGE

MacStorageReportingvs.WindowsStorageReporting

Notes:

• Keepatleast10%ofyourstoragespacefree• Uninstallprogramsyounolongeruse• Emptyyourbrowsercachesregularly• Keepyourdesktopclean:onlystorealiasesorshortcuts• Keepyourdownloadsclean:onlystoresoftwaredownloadsandemptyitregularly

Location:

• Pickacentrallocationforallfiles/folders/software• Haveonemasterfolderwithmanysub-folders• Mapnetworkdrivestothislocationifitisonadifferentcomputer/server• Don’trelyonSpotlightorWindowsSearchtofindyourfiles

Why?

Computersandserversactstrangewhenthey’rerunningoutofspace.Thecleandesktopgivesyouablankslateandconcentratesyourfocusonthemostimportantproject.Thecentralstoragelocationmeansyou’llalwaysknowwheretolookanditmakesbackupeasier.SpotlightandSearcharenotasaccurateasdrillingdowntotheproperlocation.

Page 5: Data Management Workshop Speaker: Ingrid Richterlab200c.psych.columbia.edu/classes/Class - Data... · Data Management Workshop Speaker: Ingrid Richter Friday, February 12, 2016 2:00

DATAMANAGEMENT

Page5of15

FORMATS:

Magnetictapes&CD-Rs

Notes:

• Becautiousofproprietaryhardware&software• Don’tletyourdatabecomeobsoleteandunreadable• Actsoonerratherthanlater• Asyoumoveawayfromoldertechnology(magnetictape,CD-R/DVD-R),itbecomes

hardertoconvertthemintoreadableformsReadableFormats:

• Defaulttocommonformatsforyourworkingfileswhenpossible• Quicktest–canyourfilebeopenedbydraggingitintoawebbrowser?• Easilyreadableformatsinclude:TXT,PDF,JPG,GIF,PNG,WAV,MP3,MP4

StoringData:

• Fordatathatneedstobelinear,useadocument• Fordatathatneedstobesorted,useaspreadsheet• Fordatathatneedstobesearched,useadatabase

Why?

Readableformatsareeasytoworkwith,arecross-platform,andcanbeuploadedandlinkedtodirectlyfromtheweb.

Page 6: Data Management Workshop Speaker: Ingrid Richterlab200c.psych.columbia.edu/classes/Class - Data... · Data Management Workshop Speaker: Ingrid Richter Friday, February 12, 2016 2:00

DATAMANAGEMENT

Page6of15

WORKFLOW:

Emailfolders&archivedWorkNotes

Notes:

• Keepadailyworklogwitha“todo”list• Keepamonthlyprojectslist

Email:

• Keepyourpersonalandworke-mailaccountsseparate• Useyourinboxaspartofyouryour“todo”list• Immediatelymovemessagesthatyou’verespondedtointofolders• Keepyourinboxunderonescreen/page• Archivemessagesifyouhaven’trespondedtotheminamonth

Why?

Theworklogandprojectslistgivesyousomethingtocheckbackonfromtimetotime.Thiskeepsthedataandworkflowmanageable,andyoucantraceyourprogress.Thecleanemailin-boxhelpswithfocus.

Page 7: Data Management Workshop Speaker: Ingrid Richterlab200c.psych.columbia.edu/classes/Class - Data... · Data Management Workshop Speaker: Ingrid Richter Friday, February 12, 2016 2:00

DATAMANAGEMENT

Page7of15

SECURITY:

Filevaultvs.BitLocker

Notes:

• Whoneedsaccesstoyourfiles&folders?• Password-protectyourcomputerandscreensaver• Doesyourdataneedtobeencryptedorpassword-protected?• Rememberthepasswordorencryptionkey

Encryption:

• ForMacEncryption,useFileVault:Apple->SystemPreferences->SecurityandPrivacy->FileVault

• ForWindowsEncryption(Pro&Enterprise),useBitLocker:Start->ControlPanel–Security->BitlockerDriveEncryption.Toencrypt:Right-clickfolderandchooseTurnonBitLocker

• Foreverythingelse,useTrueCrypt:Nolongermaintained,butstillavailable:https://truecrypt.ch

PasswordProtection:• Password-protectAcrobatPDFs(File->Properties->Security->PasswordProtect)• Password-protectWord/Exceldocuments(Word->Preferences->Security)• Password-protectWindowsFoldersthroughWinZip(WinZip->Option->Password)• Password-protectMacFoldersthroughDiskUtility:

http://blogs.cuit.columbia.edu/ikr2107/mac-password-protect-folders/

Page 8: Data Management Workshop Speaker: Ingrid Richterlab200c.psych.columbia.edu/classes/Class - Data... · Data Management Workshop Speaker: Ingrid Richter Friday, February 12, 2016 2:00

DATAMANAGEMENT

Page8of15

DIRECTORIES:

Mac:QuickPrintdirectorycontentsbycutandpaste

Notes:

• Directorylistscanprovideaquickindextothecontentsofyourprojects• Keepsacompleteinventoryofallfilesonyourcomputer

Instructions:

• QuickprintdirectorycontentsonaMac:OpenFinder,expandthefolderyouwanttoinventory,selectallthefiles(Command+A)andcopy(Command+C).OpenTextEdit(orNotepad),changetoPlainText(Format->PlainText)andpaste(Command+V)thecontentsintotheplaintextdocument.

• PrintdirectorycontentsinMac/Linux:OpenTerminal,changetofolderandtypels-R>inventory.txt

• PrintdirectorycontentsinWindows:Opencommandprompt,changetofolderandtype:dir/s/b>inventory.txt

Tip:

DragafileorfolderintoTerminal(Mac)ortheCommandprompt(Windows)insteadoftypingthefullpathtochangeintothatfolder(cd<dragged_link>)

Why?It’seasierandfastertosearchatextfilethanfilesandfolders.You’llhaveacurrentworkinginventoryofeverythingonthecomputer(orserver)thatyoucanstoreelsewhere.YoucanalsoimportthislistintoExceloradatabaseforreporting.

Page 9: Data Management Workshop Speaker: Ingrid Richterlab200c.psych.columbia.edu/classes/Class - Data... · Data Management Workshop Speaker: Ingrid Richter Friday, February 12, 2016 2:00

DATAMANAGEMENT

Page9of15

FILES&FOLDERS:

MacFinderfoldersvsWindowsExplorerfolders

Notes:

• Useuniquenamesforallfilesandfolders• Don’trelyonthedatecreated/modifiedfield

Files:

• Addthefulldate,yearfirst:Notes_2016_0212.txt• Keepordersofmagnitudeinmind(Morethan10?Morethan100?)• Addzeroestoyournumbers:usepage001insteadofpage1• Neverspelloutnumbersordatesinthefilename

Folders:

• Usesimple,1-2wordnamesforfolders• Don’tusespacesinfoldernames(useunderscoresorhyphens)• Considertop-levelalphabeticfolders(A,B,C)oryears/semesters(2016,2015_Fall)

Why?

Tocreateacleanerfilingsystemwithfewernamestolookatandtoavoidfutureproblemswithmergingfolders.Nospacesinthefoldernamesmakeiteasierforuploadingfilestotheweband/orspecifyingdatapaths.Anunderscore(_)orhyphen(-)makesthepathmorereadablethana%20sign.

Page 10: Data Management Workshop Speaker: Ingrid Richterlab200c.psych.columbia.edu/classes/Class - Data... · Data Management Workshop Speaker: Ingrid Richter Friday, February 12, 2016 2:00

DATAMANAGEMENT

Page10of15

TOOLS:

NameChangervs.BulkRenameUtility

Notes:

• Usebulk-renameutilitiestostandardizenamesinafolder• Usescriptstoperformconsistent,repeatedaction(i.e.Photoshopforimages)• Usede-dupingtoolstoidentifyandremoveduplicatefilesandfolders

Software:• NameChanger(Mac)

https://mrrsoftware.com/namechanger/NameChangerisdesignedforthesolepurposeofrenamingalistoffiles.

• BulkRenameUtility(Windows)http://www.bulkrenameutility.co.uk/BulkRenameUtilityallowsyoutoeasilyrenamefilesandentirefoldersbaseduponextremelyflexiblecriteria.Adddate/timestamps,replacenumbers,inserttext,convertcase,addauto-numbers,processfoldersandsub-folders....plusawholelotmore!

• DupeGuru(Mac&Windows)https://www.hardcoded.net/dupeguru/dupeGuruisatooltofindduplicatefilesonyourcomputer.Itcanscaneitherfilenamesorcontents.NotonlycanyoudeleteduplicatesfilesdupeGurufinds,butyoucanalsomoveorcopythemelsewhere.

Why?Streamlineyourfilesnamesandremoveduplicates.Helpfulforrenamingphotoimports(i.e.DSCF4119.JPGintoNYC_2016_0212–Columbia.jpg)orremoving/addingextensionstofilenames.

Page 11: Data Management Workshop Speaker: Ingrid Richterlab200c.psych.columbia.edu/classes/Class - Data... · Data Management Workshop Speaker: Ingrid Richter Friday, February 12, 2016 2:00

DATAMANAGEMENT

Page11of15

SHARING:

ColumbiaBlogvs.ColumbiaWiki

Notes:

• Documenteverything(composeinWord,dateit,save&shareasPDF)• Storeeverythingpublicontheweb(userobots.txttoblocktraffic)• UseYouTubeforaudioandvideostorage• UseAmazonCloudDrive,AppleiCloud,Flickr,Google+,etc.forphotostorage

Blogs:

• Useblogsforsearching/taggingdataandasauser-friendlydatabase• ColumbiaoffersafreeWordpressblogat:

http://blogs.cuit.columbia.edu

Wikis:• UseWikisforcollaborativedocumentsandextendeddiscussions• ColumbiaWiki(general):

https://wikis.cuit.columbia.edu/• ColumbiaWikispaces(forcourses):

https://www.wikispaces.columbia.edu/home• ColumbiaWikischolars(forcollaborativeresearch):

http://www.wikischolars.columbia.edu/Why?

Completefileaccessacrossallbrowserplatforms(includingphonesandtablets)atatalltimes.Ifyou’renervousaboutcontentorimagesgettingindexedbysearchengines,youcantellthesearchenginespidersnottoindexfoldersorcontentonyourserverthrougharobots.txtfile.

Page 12: Data Management Workshop Speaker: Ingrid Richterlab200c.psych.columbia.edu/classes/Class - Data... · Data Management Workshop Speaker: Ingrid Richter Friday, February 12, 2016 2:00

DATAMANAGEMENT

Page12of15

BACKUPS:

TimeMachinevs.WindowsBackup

Notes:

• Assumetheworstwillhappen• Testyourbackupsregularly(harddrivesfail)• Identifyyourirreplaceabledata• Considerthebackupruleofthree(3copiesin2formatswith1locatedoffsite)

Software:

• AutomateMacBackupwithTimeMachine(Apple->SystemPreferences->TimeMachine).Itdoesn’thavetobeanexternalharddrive–youcanuseanothercomputerasaTimeMachinebackupwiththepurchaseOSXServersoftwarefromAppStorefor$20.TimeMachinerequires2-3timesthestorageofthefilesbeingbackedup.

• AutomateWindowsBackupwithBackupandRestore(Start->ControlPanel->BackupandRestore).NotasconvenientasTimeMachine.

Why?

Notonlytohaveasecondcopy(orthird,forirreplaceabledata),buttoalsogiveyoutheflexibilityofmovingtoadifferentcomputerifyourscrashes.Backupsareessentialfordisasterrecovery.

Page 13: Data Management Workshop Speaker: Ingrid Richterlab200c.psych.columbia.edu/classes/Class - Data... · Data Management Workshop Speaker: Ingrid Richter Friday, February 12, 2016 2:00

DATAMANAGEMENT

Page13of15

COLDSTORAGE:

AmazonGlaciervs.GoogleCloudStorageNearline

Notes:

• Usecoldstorageforprojectsthatarenolongeractive• Coldstoragechargesforstoring,placingandretrievingdata• Backupisfairlyrobustandhandledbythecompany

Options:

• AmazonGlacier:https://aws.amazon.com/glacier/AmazonGlacierisasecure,durable,andextremelylow-coststorageservicefordataarchivingandlong-termbackup.Tokeepcostslow,AmazonGlacierisoptimizedforinfrequentlyaccesseddatawherearetrievaltimeofseveralhoursissuitable.

• 1GB=$0.007/month• 1,000requests=$0.05

• MicrosoftAzure:

https://azure.microsoft.com/en-us/MicrosoftAzureisagrowingcollectionofintegratedcloudservicesformovingfaster,achievingmore,andsavingmoney

• 1GB=$0.02/month• 1transaction=$0.0036perunit/month

• GoogleCloudStorageNearline:

https://cloud.google.com/storage/docs/nearlineGoogleCloudStorageNearlineisalow-cost,highlydurablestorageservicefordataarchiving,onlinebackup,anddisasterrecovery.

• 1GB=$0.01/month• Dataretrievalincursacostof$0.01perGB

Why?

Freesupspaceoncomputersandservers.Archivesprojects(atanongoingcost)

Page 14: Data Management Workshop Speaker: Ingrid Richterlab200c.psych.columbia.edu/classes/Class - Data... · Data Management Workshop Speaker: Ingrid Richter Friday, February 12, 2016 2:00

DATAMANAGEMENT

Page14of15

COLUMBIASTORAGE:

Columbia’sAcademicCommons

Notes:

• MonitorColumbia’sresourcestoseeifyoucanusethem

Options:• AcademicCommons:

http://academiccommons.columbia.edu/AcademicCommonsisColumbiaUniversity’sdigitalrepositorywherefaculty,students,andstaffofColumbiaanditsaffiliateinstitutionscandeposittheresultsoftheirscholarlyworkandresearch.ContentinAcademicCommonsisfreelyavailabletothepublic.

• Thereisnooveralllimitonthenumberofitemsyoucandeposit.• Filesupto100MBinsize.• Ifyouhavefilesover100MBinsize,pleasecontact

cuac@libraries.cul.columbia.eduandwewillarrangeanalternativewaytotransferthem.

• ResearchStorageServicePilot:http://rss.cuit.columbia.edu/Currentlynotacceptingnewusers.

• $1perGBstorageperyear(100GBminto2TBmax:$100-2,000)• Aneligibleresearchercanpurchasepersonalstoragespace(2TB)anda

maximumoftwogroupstoragespaces(+4TB).Why?

AcademicCommonsiscompletelyfree.ResearchStorageServicePilotmightbeagoodtestingground(whenitstartsacceptsnewusers)

Page 15: Data Management Workshop Speaker: Ingrid Richterlab200c.psych.columbia.edu/classes/Class - Data... · Data Management Workshop Speaker: Ingrid Richter Friday, February 12, 2016 2:00

DATAMANAGEMENT

Page15of15

NOTES: