Categories: Summer Of Code | Contributing

SummerOfCode2007Ideas

Contents

[edit] Google Summer of Code 2007

MySQL intends in participating in our first Google Summer of Code 2007.

If you're applying, please look at the Application Template. We will then assign an appropriate mentor to your project, upon it being successful. If you have any queries, please don't hesitate to contact the Project Administrator Colin Charles (colin@mysql.com) or the community team at community@mysql.com.

We also have a cool logo to go with it - look to your left, that's Sakila hacking by the beach!

Here are some projects we think are relevant that you'll find a lot of interest in doing. If you'd like it to be a little free form, we've also got worklog and priority requested bugs for you to look at!

[edit] Ideas

Upon successfully completeing an idea/task, these features will end up in the next alpha community tree!

[edit] Test Suite Development

mysql-test-run.pl and mysqltest (that it uses) makes it impossible to write certain types of tests and largely relies on static tests (not dynamically generated but verifyable data, tables etc).

A good task would be to start development of an set of tools similar to the HUGO classes used in testing MySQL Cluster, but generically for the MySQL Server.

Some people think that "Part of the challenge is that the extended tools should be backward compatible." I do not. Ignore any compatibility and just make something that totally rocks (and finds bugs).

Mentor: StewartSmith

[edit] Test case development

MySQL has an excellent test suite, but it could further be extended.

[edit] Code coverage improvement

MySQL regression test suite code coverage is high, but it could be better. Improving the coverage means creating tests that will solicit parts of code currently not activated by the regular tests.

[edit] System Tests - Load Tests and Long-Running Tests

Countless business and projects use MySQL as a back-end. The test suite does not currently cover any real-world application deployment scenarios.

It would be nice to have a set of system tests and a framework for deploying and running such tests in both load test and long-running test configurations. We imagine that these system tests would be applications such as popular forum or blog software, or ETL and BI tools using a data warehouse from an existing business. The goal is to test MySQL and community software (server and connectors) against a realistic workload, with data and traffic as close as possible to the real world, and to be able to repeat such a test when new versions are released.

[edit] Test creation tools

Another possible improvement of the testing tools involves test creation. Currently, to test some feature, you need to write a test script using SQL intermingled with testing directives. This is not user friendly. It would be desirable to use your normal SQL tools (like SQL browser, or the CLI), and when you are satisfied with the results, you could ask your tool to generate the test case from the outcome of your session. The creation tool could be either a plug-in for existing SQL management applications, or a separate tool, which could parse the regular output of the application and produce a test suitable for the test suite framework.

Mentor: Giuseppe Maxia

[edit] Benchmarking the MySQL Server

Ever wondered how fast the Optimizer did, and EXPLAIN didn't cut it?

[edit] Instance Manager fixes (DEPRECATED)

Make the Instance Manager deterministic, test it, prepare for integration with NDB. Refer to documentation on mysqlmanager or even a blog entry titlted enabling and using the MySQL Instance Manager.

[edit] Integrate MySQL Cluster with Instance Manager (DEPRECATED)

Make the Instance Manager be able to connect to a Cluster Management Server and find out what processes it needs to run on this node. Refer to documentation on mysqlmanager or even a blog entry titlted enabling and using the MySQL Instance Manager.

[edit] INFORMATION_SCHEMA tables for MySQL Cluster status

Implement INFORMATION_SCHEMA table plugins to do things like show the status of the cluster.

[edit] MySQL Based Atom Store

The goal is to implement Atom as the wire format for accesing a MySQL database. This way, any program that handles Atom requests and responses, say a browser, becomes a MySQL client. This should allow for easier data integration and stream manipulation. The plan is to proceed in two steps:

  1. Implement GData API on top of the MySQL client C API
  2. implement GDATA API directly on the server side.

[edit] Replication and Backup Development

MySQL replication is a very powerful tool used to create mirrors of servers across a network. We have many features we are working on and many more things we would like to improve. If you’re interested in working with near-real time code, take a look at this list of projects.

[edit] Information Schema

The MySQL Information Schema needs to include information about replication on the master and slaves. This project includes making information schemas for SHOW SLAVE STATUS, SHOW BINLOG EVENTS, SHOW MASTER STATUS, SHOW SLAVE HOSTS, and other replication related SHOW commands.

[edit] Replication Test Suite

There is a need for a test system that can set up any form of topology for replication testing. The project encompasses designing a test suite for replication topologies. It is preferred that the code be written as a Perl module, but most scripting languages are acceptable.

[edit] SQL-level Backup

The mysqldump program allows users to create SQL-level copies of their data. This program can be improved by a redesign and/or a rewrite to improve efficiency and expand its options.

[edit] Monitoring Tools

There is a need for a replication monitoring application that can plug into the MySQL Enterprise tools for monitoring replication (a graphical option would be nice).

Mentor: Chuck Bell.

[edit] MySQL Auditing Software (DONE)

Building on the concept of the "poor man's query profiler" at http://forge.mysql.com/snippets/view.php?id=15 , this project entails developing a process to listen to network packets to find MySQL packets destined for a server, and keeping a copy of the packet to use in the auditing of what commands are sent to that MySQL server. This project can accommodate a light or heavy workload, and one or more students.

The separate aspects of the software are: Network process: process to intercept network packets on a separate machine (in the same network range) for full access to information being sent to the database without causing any load on the MySQL server or interference with the queries. This process should also parse the queries and be able to store them in its own database.

Graphs and reports: This module will show statistics and graphs. This is obviously a very open-ended part of the project, great for students who are worried about time constraints on their summer.

Administrative interface: This will take user input and write to the configuration file that the "network process", described above, will read to determine which queries to keep and which to not worry about. It will also configure the "graphs and reports" desired by the user. This may actually be 2 separate interfaces, and therefore can become 2 separate tasks for students.

Students should have some familiarity with the different types of DML and DDL. A semester-long course using SQL will be sufficient; if students have taken a course that used SQL in one part it may be sufficient; apply, because you never know! Students should also be familiar with at least one of writing programs in Perl, PHP or shell scripting, preference is given to students who know more than one and can make reasonable decisions on what functions are best written in what languages.

Mentor: Sheeri Kritzer

[edit] The "anti-profiler" - a performance analysis tool for database engines (DONE)

As a database engine scales up, the performance characteristics change. The potential for bottlenecks increases and where they occur will differ depending on amount of memory available, disk speed and number of CPUs.

This task involves finding a way to analyze performance and locate bottlenecks. My idea, subject to revision, is to stub out all potential conflict/wait points: semaphores, memory allocation, disk access, and then to record access frequency, wait times and conflicts that occur. The result must then be presented in such a way that hot-spots can be identified.

A profiler tells you how much time is spent executing your code. This tool is a kind of "anti-profiler". It gives me the information that a profiler does not give me, namely: how much time the program spends NOT executing my code!

The task will have two parts. The first part involves creating components that are compiled into the program. They must be capable of gathering and dumping the statistics. The second involves creating a program that analyses and presents the results.

My own interest is to have the anti-profiler built into the PBXT storage engine. The programmer of this task will work on the basis of the PBXT source code. However, I see no reason why the tool cannot be used to help with the performance analysis of MySQL engines and database engines in general.

Mentor: Paul McCullagh

[edit] Further Ideas for students to get inspired by

[edit] Simple, P4 bugs/features involving options/flags

At last query, the MySQL bugs system had over 1,556 bugs filed in a Priority 4 state, which means they're Feature Requests (Severity 4). You can see such a list at the Feature Requests (Severity 4) page.

There's plenty of opportunity to make your mark, and suggestions are to implement isolated changes to the Server, related utilities, or clients.

Some of these tasks may also be logically grouped, by MySQL.

[edit] IPv6 support for MySQL (DONE)

IPv4 is the way of the luddite, and IPv6 is being accepted a lot more. It would be wise if MySQL supported IPV6 as well. #8836: IPv6 support is a bug that references this. The task will require also writing test suites, and ensuring that there is a high value of QA to the new feature.

[edit] IPv6 datatype (DONE)

This feature will enable the IPv6 datatype. Refer to #15127: Possibility to store IPv6 addresses and #3318: Add support for IPv6 addresses.

[edit] "point" column makes MyISAM row_format dynamic

#23568: "point" column makes MyISAM row_format dynamic

[edit] Indexes for ARCHIVE Storage Engine

#24663: Possiblity of indexes for ARCHIVE engine

[edit] Memory tables with dynamic rows format

#25007: memory tables with dynamic rows format is the bug. Please also refer to a posting on the internals list, that will allow a student to profit on work already done: BLOB implementation for heap engine

[edit] Controlling data file fragmentation in MyISAM

#17077: MyISAM: controlling data files fragmentation

[edit] Tool for converting offline from old ISAM format

#23550: tool for converting offline from old ISAM format

[edit] Simple features from Worklog

Worklog is the way MySQL plans for new features in releases. Refer to Worklog, and look for simple tasks that are in the Un-Assigned status. If these tasks require no further explanation, start implementing them!

More importantly, these tasks may also be logically grouped, by MySQL.

[edit] Error Message Prefix

Add a prefix to error messages, and document it. #WL3417: Error Messages Prefix

[edit] Resources

[edit] Miscellaneous

About the Summer of Code Logo

Retrieved from "http://forge.mysql.com/wiki/SummerOfCode2007Ideas"

This page has been accessed 4,677 times. This page was last modified 09:58, 18 March 2008.

Find

Browse
MySQLForge
Main Page
Current events
Recent changes
Random page
Help
Edit
Edit this page
Editing help
This page
Discuss this page
Post a comment
Printable version
Context
Page history
What links here
Related changes
My pages
Special pages
New pages
File list
Statistics
Bug reports
More...