A Hadoop Technical Discussion - Presented by Rapleaf
October 21, 2008, 6:45PM
In the next Rapleaf technical discussion, we'll be focusing on distributed computing and Hadoop. Members of the Hadoop community, including Rapleaf's Bryan Duxbury, Stefan Groshupf from 101tec, and Arun C Murthy of Yahoo! and the Apache Hadoop PMC, will be presenting on and discussing various Hadoop related topics.
Food and drinks will be served.
THIS EVENT IS EXCLUSIVELY FOR THE TECHNICAL/ENGINEERING COMMUNITY.
1. The Collector - A Tool to Have Multi-Writer Appends into
HDFS
by Bryan Duxbury, Software Engineer at Rapleaf
Bryan will cover The Collector, a tool built by Rapleaf that facilitates multi-writer appends into Hadoop Distributed Filesystem. This talk will detail why this is an important workflow component, along with the performance characteristics and some gotchas surrounding the implementation of such a system.
2. Katta - Distributed Lucene Index in Production
by Stefan Groshupf, Founder/CTO at 101tec Inc. and Co-Founder at Scale Unlimited Inc.
Stefan will describe an in-production system that processes
millions of events, producing trend alerts and reports, using Hadoop
and Katta, a distributed indexing system.
3. Debugging and Tuning Map-Reduce Applications
by Arun C Murthy, Principal Engineer at Yahoo! and Member of Apache Hadoop PMC
Arun will cover simple home-made remedies for peering into the Hadoop
Map-Reduce framework as it crunches your data. This wide-ranging discussion will
cover topics such as using debuggers/profilers on your applications, using
Map-Reduce Counters, other simple ways to tune your
applications, and how to avoid common pitfalls.
6:45-7:00PM | Food/drinks and get seated
7:00-7:15PM | Presentation #1
7:15-7:20PM | Break
7:20-7:50PM | Presentation #2
7:50-8:00PM | Break
8:00-8:30PM | Presentation #3
8:30-9:00PM | Networking