Events/Portland2009/Sessions
From OpenSQLCamp
The sessions will be chosen in an un-conference style on the day of the event, but session topics should be posted here to give other people an idea of what to expect. If you would like to lead a session, please describe it here. This may be a more formal presentation with slides, or possibly just leading a group discussion on a hot topic. Also feel free to toss ideas out to the mailing list to get feedback in case you are not sure. We want to encourage people from all levels to lead sessions, share your thoughts!
Virtual Pivot Tables in SQLite - Brian Duggan
Pivot tables are a useful representation of data when the columns of a table cannot be specified a priori. This talk will be about making use of two of SQLite's distinctive features -- virtual tables and duck typing -- to query and maintain pivot tables.
BLOB handling and storage using MySQL/Drizzle and the PBMS engine - Barry Leslie
Traditionally MySQL has not been very good at handling BLOB data. This has lead to the often-repeated phrase “BLOBs do not belong in databases” but what is really meant is that “BLOBs do not belong in MySQL databases”. The good news is that since the creation of the PBMS storage engine this phrase can be retired.
I will describe the inner workings of the PBMS engine (in as much detail as people want) and how it works with other engines to efficiently handle and store BLOB data. I would also like to discuss with people the future direction of this engine and what people need or would like to see in it.
Checking MySQL's replication sync with Ma'atkit tools - Sheeri K. Cabral
The theory has been around since an article by Baron Schwartz in October 2008. Sheeri K. Cabral tackled this daunting task for clients of the Pythian Group and found a few surprises -- from options that don't work as they are supposed to, to learning that a lot more data was in sync than originally guessed. Sheeri will lead folks through the theory of setting up continual replication sync, and the actual process, including the important aspects to be aware of when doing this on your own systems.
Speak HTTP to your database: discoveries from mod_ndb - J.D. Duncan
mod_ndb, an integration of MySQL Cluster's NDB API with the Apache web server, is now three years old. Its history is a story of good (and not-so-good) ideas for using pure HTTP, REST, and JSON in a database API. In this 45-minute session J.D. Duncan will talk about the problems of using a database over HTTP, like how to map URLs to database objects; the advantages, like being able to use HTTP proxy caches; and the unsolved problems, like the flawed matchup between the verbs in HTTP and the verbs in SQL. He will give a demo of mod_ndb, and introduce the future directions for the project, including the new SQL parser, a JDBC-like API, and reuse of Apache's plugin structure.
Implementation of the new SQL parser in mod_ndb - Tulay Muezzinoglu
mod_ndb ( http://mod-ndb.googlecode.com/ ) runs inside an Apache web server, providing a platform to manage data in MySQL Cluster using pure HTTP, REST, JSON, and XML. Tulay Muezzinoglu created the SQL parser as a Google Summer of Code project, and will discuss its design and implementation in a lightning talk.
Developing a Universal Storage Engine API - Bradley C. Kuszmaul
I'm interested in developing an improved storage engine API. In particular, I'd like to see an API with
- A better specification the current plugin API.
- A test suite.
- Suitable for a wide variety of storage engines.
- A reference implementation.
- Suitable for use with other open source databases. (That might mean that the API should be specified in C instead of C++.)
Perhaps the API could be layered in a stack (like the ISO networking stack). For example, one layer might be a key-value store like Berkeley DB. Another layer might be a relational store like InnoDB. (One difference is that InnoDB knows that the names and types of the columns are.) At another layer there might be mechanisms such as query pushdown. Perhaps we should delay the higher levels on the stack.
The reference implementation might include a key-value store using Berkeley DB, a relational store using InnoDB, and a connector from the relational layer to the key-value layer
I'm willing to put some work into this.
I'll be prepared to share a draft of a concrete proposal. I'd like a session to discuss it, hear feedback, and to brainstorm about how best to proceed.
xtrabackup - tricks beyond backup , Vadim Tkachenko
This talk will be about XtraBackup - an Open Source hot backup utility for the InnoDB storage engine for MySQL.
I am going to be showing you what features are in xtrabackup (besides just a "full backup"), and how it can make your life easier. This includes incremental/differential backups, setting up replication, copying tables between servers and InnoDB table statistics.
What MySQL is bad at - Morgan Tocker
This session is not about bashing MySQL, it's about playing to your strengths and minimizing your weaknesses. At the present time there are a number of problems that are best not solved by MySQL. To list a few; session storage, fulltext search, GIS/spacial functions and message queuing systems.
I'll be explaining why MySQL is bad at these problems, if there are any answers on the development road map, and what other tools you are better off using instead, and how to best integrate those tools with your MySQL application. Unconference style discussion, disagreements, and plug-your-open-source-product pitches are welcome!
memcached and the database - Patrick Galbraith
One of the more common approaches to scaling sites is to add a caching layer, typically with memcached. Frequently, after that is added, developers go a step or two further, using the cache as a shared, non-persistent store for appropriate application data and going further with integration of updates and invalidation from the persistent store. This latter use case can be exploited with MySQL/Drizzle UDF, pgmemcache or JavaDB UDF. One can also use moxi, the memcached proxy, to help insulate applications from the management and some failure modes in dealing with servers which speak memcached protocol.
Patrick will cover some of these use cases and talk about the latest happenings with memcached and moxi.
Comparing non-relational databases: CouchDB, Tokyo Tyrant, MongoDB
Part presentation, part group discussion. The presentation will offer an overview and comparison of open source non-relational database options, and how they stack up against popular relational databases. The talk will discuss each non-relational database, describe recommended use cases, pro's and con's, and demonstrate sample code to give you a taste of their approach. Although slides will be shown, audience participation will be strongly encouraged to better discuss these varied tools and their practical application.
A completely versioned schema - Ben Hengst
What has learned while building a schema where every bit of data knows when and where it came from. I'll explain the reasons why I started this project, limitations that we needed to build with in, and what hurdles have been found and how we solved them. My project was done in MySQL but the concepts presented are likely applicable to any datastore that allows relationships.
An Introduction to MongoDB - Mike Dirolf
MongoDB is an open-source, high-performance, schema-free, document-oriented database. I'll talk a little bit about what all of those buzzwords mean and why they combine to make MongoDB a great solution for many problems. The presentation will focus on introducing MongoDB and explaining motivations for it's use. There will also be example use cases and some sample code.
There will be slides and a presentation, but discussion and interaction with the audience will be highly encouraged throughout the talk.
Maatkit's Swiss Army Knife: mk-query-digest - Baron Schwartz
mk-query-digest was born as a log parser and aggregator, but many features have since been added. It can now gather and analyze queries in many different ways, and it's possible to do almost anything you want with them. For example, you can sniff TCP traffic and ask it to inspect the mysql protocol (in much more complex ways than just grabbing the text of queries, which has been done many times before). And recently we added memcached support to it, making it the world's first tool for analyzing and profiling memcached usage and performance. This session will move from basic to advanced usage, showing you how to make the most of this powerful tool.
Solving problems by writing Drizzle Plugins - Monty Taylor
The Drizzle Plugin Interface is getting better and better all the time. I'll do a walkthrough of making a plugin or two. I'll also come back in a bit and fill in more detail here on what I mean...
Operational/Sys-Admin Aspects of Databases in the Cloud - Joe Williams
A session on the operational and sysadmin aspects of the cloud, automation, configuration management, scaling, big data, nosql and sql. While it's not specific to databases we still need to manage, scale and tune the systems our databases live on. This will be a discussion on past experiences and cool stuff going on in the space.
- automation
- use a configuration management system, like Puppet or Chef
- Cloudant extended Chef (using the Chef merb slice) to manage EC2, EBS, Slicehost
- performance and tuning
- Cloudant compiles Erlang to native. Non-portable, but doesnt matter in EC2
- it would be nice if there was a way to predict, smooth, autoscale
- that is what RightScale does
- Use sysctl to expand filehandle space, tcpip settings
- pain points
- EC2 network is not all that great
- EC2 nodes can have radically difference performance
- EBS is slow as heck
- Hit a bug where an EBS was detached in the API, but still in /dev
- scaling
- anyone using RDS? no, not yet
- if a node fails, throw it away, dont try to save it
An Introduction to Cassandra -- Eric Evans
Cassandra is a highly scalable, eventually consistent, distributed, structured key-value store. It is also one of the many alternative databases carrying the NoSQL moniker, and NoSQL is the new black.
This talk will provide a brief introduction to Apache Cassandra.
Database Structures That Make Us Crazy -- Panel Discussion
Comic relief.
Databases and Flash -- Peter Zaitsev
How Database systems should be optimized to get the most benefits of Flash/SSD drives.
Goal Driven Performance Optimization -- Peter Zaitsev
A session on approach of focused performance Optimization. Way to often there are many things which look "wrong" in the system - low hit ratios or table structure which does not look perfect. However how do you find what changes can provide you the most immediate improvement ? The answer is simple you focus on your Perfromance goals and analyze what do you need to do to get there. The session is about how to set the goals and make sure you reach them.
XtraDB and performance -- Yasufumi Kinoshita
This session is about XtraDB. I will talk about its specific features from InnoDB. And explain both merits and demerits of each options for that you can choose proper options and settings for your best performance of XtraDB.
BlackRay Data Engine -- Felix Schupp
A brief session about the BlackRay in-memory data engine. BlackRay is a relational database system, with relational and SQL support. Connectivity is made possible via the PostgreSQL protocol as well as an object oriented API. The session will focus on the engine design and the near future roadmap.
Spatial Operators and Datatypes in PostgreSQL: a Tutorial -- Webb Sprague
A tutorial on how to use PostGIS to store, manipulate, and query geographic information in a database. We will cover three scenarios -- calculating polygon centroids, calculating distances between points, and joining tables using geographic criteria. We will be working at the psql command line, with a few pictures to motivate the SQL. The basic datasets involved will be zip code boundaries and county boundaries for Oregon. The code (somewhat rough) is here: http://github.com/wsprague/demoglib/blob/master/demography/postgis/tutorial.sql
Drizzle Client Community Requirements Discussion -- PSU Capstone Team C
A discussion and brainstorming session centered around the Drizzle/MySQL/MariaDB client under development by a team of 6 developers at Portland State University. The base set of requirements we've gathered will be presented to the session. We will then solicit your opinions, ideas, and imagination to determine what features will make a killer open source database client.
State of MariaDB -- Monty Widenius
For MariaDB 5.1 we have pulled in many of the patches and storage engines that have been floating around the MySQL community for years. In future releases we will continue with new features that are in the pipeline and requests that we are seeing from the community.
Firewater : Scaling out LucidDB, a purpose built BI Database (Column Store) -- Nicholas Goodman
At the core of LucidDB is column store technology and other technical optimizations for doing BI and analytics. Prior to Firewater, LucidDB was constrained to a single server; while it's fast and works well on smaller ( < 500GB datasets ) it had not had a scale out solution allowing it to take advantage of additional CPUs/Compute resources. We'll do a technical sojourn through Firewater, including a live demo of Firewater running on a multi CPU EC2 instance if connectivity allows. Bring your "nosql" flames as well and we'll tussle. j/k we'll discuss what the 100% Open Source Firewater / LucidDB are good for.
The slides from this talk are available at [Firewater OpenSQLcamp PPT] This is a direct link to a powerpoint file. You have been warned.
Computation Engines : Using MySQL storage engine API to perform graph calculations for OQGraph
OQGraph is a fascinating twist on the use of the MySQL storage engine interface. Data may be inserted into tables and then queries can then be performed where the rows returned are a result of the user's desired calculation.
