Intermission: Code Review(mainly Audience Profiling)
Project Structure
• A basic README would be nice – structure not obvious
• One sbt multi-project build ?
• Ant builds for teamcity no longer needed
Tests
• @RunWith(classOf[JUnitRunner]) ?
• test names should be more informative:• "Apply" is not a good test name
• „Correct classification of urls with a tree“ -> "URLs are classified correctly with a tree„
• Hopefully last session was useful
Naming Conventions
• extensions -> model ?
• object classifyNewUrls – CamelCasing, object should be adjective, not verb, NewUrlClassifier
• Convention for object with main method is Tool
Naming Conventions
• def apply(key: List[K]): Option[V] = map.get(key.reverse)
• def applyrev(reversekey: List[K]): Option[V] = map.get(reversekey)
• Semantics: apply -> T; get -> Option[T]
• length -> size
• apply is very different from applyrev
Public APIs
//use += when building a tree, this checks the existing tree for mother branches
def +=(key: List[K], properties: V): CumulativePrefixTree[K, V] = ...
//use + when reading a built tree, otherwise it will map properties incorrectly
def +(key: List[K], properties: V): CumulativePrefixTree[K, V] = ...
Mm!?
It should be impossible for me to destroy your data structure!
+= implies mutation, you do not mutate !!
CumulativePrefixTree
• Looks like a case class? Use case classes!
• Not obvious how it works
• Hallway usability testing ?
Designing Classes
Refactoring
Qu
alit
y Perfection
Embrace refactoring !
Readability
implicit def flatten2[A, B, C](t: (A, (B, C))): (A, B, C) = (t._1, t._2._1, t._2._2)
new CategoryList(List(l1.catscores, l2.catscores).flatten.groupBy(w => w._1).mapValues(x => (x.map(w => w._2).sum, x.map(w => w._3).sum)).toList.map(x => flatten2(x)))
Wow, so utils, very much
• object Utils is not the best way to go, either add methods on class or on the companion object
• Putting utils in package objects also looks weird
Too much tuples ?
• type CatScore = (Int, Double, Int) // ._1._2._3 hell !!
• def addLists(l1: List[(Int, Int)], l2: List[(Int, Int)]): List[(Int, Int)] = {
• List(l1, l2).flatten.groupBy(w => w._1).mapValues(_.map(_._2).sum).toList
• }
• val categorylist: List[(Int, Int)] List[CategoryWeight] ?
Use types!
• type Weight = Double even this is OK, maybe excessive
• def function(category: Int, user: Int, banner: Int)
• Introduce classes, less confusion!
• Next session on types ?
Catching throwable
def parse(line: String): Option[AdmantxUrl] = try {
val c = line.split('\t')
Some(new AdmantxUrl( c(0), c(1) ))
} catch {
case e: Throwable => None
}
Catching throwable
• Prefer Try {} match Success/Failure
• Investigate scalding traps
Vertica UDFs
• Implements Java interfaces, hard to write idiomatic scala
.. but we can try!
for (i <- 0 to argCols.size() - 1) {
outputWriter.copyFromInput(i, inputReader, argCols.get(i))
}
argCols.zipWithIndex.foreach { (arg, idx) =>
outputWriter.copyFromInput(idx, inputReader, arg)
}
(for loops are too mainstream?)
Pattern matching
if (result == None) {
resWriter.setStringNull()
} else {
resWriter.setString(result.getOrElse(""))
}
Types !!!!!111
case class Vars(
productId: Option[String], step: Option[String], categoryId: Option[String]
)
Why is device_type_id a VARCHAR(1) !?
val returnVarcharLength = 1
returnType.addVarchar(returnVarcharLength)
process-logs-rtb
process-logs-rtb
• Generated code -> macros
• Records -> tuples, HLists ?
• Pull request for MultiSourceTap
• Testing for MultiSourceTap
• Catching throwables
Future Scala Sessions
• Saulius: types, macros
• Edgaras: algebird
• Dima: functional hipsterism ?
• Alex: scalaz (streams, etc.) ?
• Ed: Java/Scala collections
• Others: !?!• Topics: Actors, Parallelism, Library overviews ? More diversification ?