My.sys library and algorithms
← Back to MySQL University main pagePresenter: SergeiGolubchik Time: Thursday 1. March 2007, at 6am PST = 9am EST = 15 CET = 16 EET Estimated length of Session: 1-1.5 hours Moderator: Patrik Backman Scribe: Paul DuBois Attendees: Rafal Somla Jörg Brühe Alexander Nozdrin Andrey Hristov Marc Alff Ingo Strüwing Hartmut Holzgraefe Giuseppe Maxia Jeffrey Pugh Dmitri Lenev Axel Schwenke Kristofer Pettersson Oleksandr Byelkin Chuck Bell Andrei Elkin Georgi Kodinov Ann Harrison Chris Powers ChadMiller Mats Kindahl JayPipes Paul DuBois Omer BarNir User:TimSmith Timour Katchaounov IgnacioGalarza Martin Hansson User:JohnDavidDuncan Susann Wolfgram Kristian Nielsen Kaj Arnö User:DanielFischer User:GuilhemBichot User:AlexeyKopytov
Contents |
[edit] mysys library and algorithms
[edit] What is libmysys
- http://dev.mysql.com/doc/internals/en/mysys-functions.html
- Portability layer and algorithm library
- Contains:
- Wrappers for many "standard" functions
- Utility functions
- Implementations of useful algorithms
- I'll also cover libmystrings
[edit] Why mysys wrappers
- Portability
- Speed (sometimes, in libmystrings)
- Usability, additional features:
- MyFlags:
- error reporting
MY_WME,MY_FAE, ... -
MY_ZEROFILL,MY_SYNC_DIR, ...
- error reporting
-
fillerargument formy_chize() - DBUG
- MyFlags:
[edit] What mysys wrappers
-
my_access,my_chsize,my_malloc,my_mmap,my_pwrite,my_rw_rdlock,sem_destroy, ... -
my_gethwaddr,my_getsystime,my_msync,my_large_malloc,my_dir,my_multi_malloc, ... - safe_mutex, safemalloc
[edit] Algorithm and utility library
- Filename conversion/normalization
-
fn_format,fn_ext,to_unix_path,unpack_dirname,convert_dirname...
-
- my_tmpdir, my_bitmap, my_bit, dynarray, dynstring, typelib, my_getopt, io_cache, mem_root
- md5, sha1, aes, soundex
- Quick sort, radix sort, wildcard compare
- List, hash, tree, queue, trie
- Lock-free allocator, lock-free hash, wait-free dynarray
[edit] libmystrings
- Charsets
-
vsnprintf() - Precision math
- Lightweight xml parser
- Utility functions:
-
bcmp,bfill,bmove,longlong2str,my_strtoll10,strtod,strmake,strxnmov, ...
-
[edit] Some files don't belong to libmysys
- thr_lock.c (sql)
- keycache (myisam)
- thr_alarm
[edit] Most interesting algorithms from mysys
-
TREE,tree.c- Red-black balanced binary tree (with an optional upper memory boundary)
- Note the limited height!
-
QUEUE,queues.c- Priority queue
-
HASH,hash.c- Extensible linear hash
-
TRIE,trie.c- Trie and Aho-Corasick algorithm (locate a set of substrings in a string)
- Not used at the moment
-
LF_ALLOCATOR,lf_alloc-pin.c- Lock-free allocator for fixed-size objects
-
LF_HASH,lf_hash.c- Lock-free extensible hash
[edit] Memory management
-
MEM_ROOT- You cannot free a memory or delete an object that was allocated in a mem_root
- Even with
delete Object - There are objects that are allocated in memroots (
Sql_alloc,Field,Item,List<>) - Can be disguised as
sql_alloc(size), but there's nosql_free()! - The memory is freed when mem_root is freed as a whole
- It's faster if there are many small allocations that should be all freed at once
- For example: The fulltext parser, that parses a text into words, allocates memory for every word in mem_root, and when it's done, frees the mem_root, not individual words. One free of a 100k region is much faster than 10k frees of 10-byte regions.
- Also, it's faster to allocate a memory in mem_root - because mem_root is used in a one thread only, it's not shared like a system heap (where malloc takes the memory from), so there are no mutex locks when you allocate from a mem_root. There are mutex locks when you allocate with malloc.
-
sql_alloc()and objects that are allocated in mem_root (objects wherenewwas redefined to allocate in mem_root) use current_thd's mem_root. This mem_root is freed at the end of the query, you cannot store there anything that you want to use later than that. - Yes, this includes Items and Lists of any objects
- There's a workaround, create your own mem_root and change thd's mem_root to use yours. Allocate your objects and change thd's mem_root back.
- This is done for SQL warnings, and in PS/SP many times - so you should always check in what mem_root you object is allocated.
- Many (but not all) objects that are allocated in mem_root have also a placement
newthat allows to specify mem_root explicitly.
-
TREE-
free_element()callback
-
-
HASH-
free_element()callback
-
-
QUEUE- Manual, but beware
resize_queue()anddelete_queue()
- Manual, but beware
-
LIST- Manual
-
List<>- The data you manage manually
- But list items are allocated with
sql_alloc()!!!
[edit] String handling
-
char *- Try to avoid it
-
LEX_STRING- Use instead of
char * - Saves on
strlen()calls - Good for static strings
- Initialize constants as
LEX_STRING my_name={ STRING_WITH_LEN("Sergei") };
- Use instead of
-
DYNAMIC_STRING- A string that magically grows as more data are appended
- Used in C code, where
class Stringis not available
-
class String- Good for dynamic strings
- Small tricks that helps to avoid mallocs in many cases
- Don't need to call
strlen() - Knows a character set of a string
- Many utility methods (number->string conversion, strstr, replace, fill, copy, charset conversion)
[edit] Session notes
Session notes to be appended here after the Session
See "Questions posted for the Session," where answers are given for the questions asked before and during the presentation.
[edit] Questions posted for the Session
Area where attendees can post questions in advance to the Session
- (Rafal)
- General memory management policies: how to allocate memory, when to deallocate it (who is responsible for doing it).
- How to use lists, hashes and queues (with emphasis on memory allocation issues).
- How to store and manipulate strings (in the code one can see several different approaches to this problem).
- answered above
- If I use priority queue and insert things into it - is it using MEMROOT? Do I need to free memory explicitly when finished with the queue?
- No, priority queue isn't using mem_root. No,
delete_queuewill handle it.
- No, priority queue isn't using mem_root. No,
- What about memory alloc/dealloc for Strings?
- String is using malloc/realloc/free. Not mem_root
- If I initialize String from other String, or char *table or "string constant", is memory allocated/deallocated correctly?
- Yes. String does not call malloc if you initialize it from a constant. And String only calls free() if it called malloc
- Can I create a separate memroot, use it for allocation and then remove it?
- Sure. For example every TABLE structure has its own mem root for data that needs to be allocated when a TABLE itself is allocated and freed when a TABLE is freed - for example Fields
- About allocating object in memroots: if I define my own class X, can I write something like new (memroot) X()? What needs to be done so that it works?
- inherit from Sql_alloc class
- (Jörg) My main interest is in wrappers and the use of the mysys lib in portability:
- Does it hide differences between operating systems (Unix, Windows, Netware, Unix variants) from the using layers of the server code?
- Yes, that's the main reason to have wrappers.
- If so, which ones? Which ones must be handled by the higher layers?
- The goal is to keep all differences in the wrapper, but it's not always possible.
- Does it get all necessary info from "configure", or are there other means?
- Yes, from configure (e.g. HAVE_MMAP), or compiler/OS/arch defines (e.g. _MSC_VER, _WIN64, _PA_RISC1_1). In either case - no manual configuration is necessary.
- Are there any rules what is wrapped, and what isn't?
- Common sense. if reasons from "why mysys wrappers" slide apply, there's a wrapper.
- Is libmysys on the server side only, or also in the client lib?
- Client library, too.
- Second aspect: "Priority queues": For which purposes are they used?
- It's a data structure that supports, basically, two operations "push" and "pop". There's a comparison function defined on elements, so you can check whether element_A is greater than element_B, less than, or equal. Let's say that the "greater" element has higher "priority" - the word comes from the example of OS process scheduler, every process has an associated number, priority. So, you can "push" many elements into the queue. The main queue feature is that when you "pop" an element, you always take the one with the highest priority - that is, the element with the highest priority is always on top of the queue.
- For OS scheduler it's very convenient - it takes the top process, lets it run for a while (which decreases the process' priority) and puts it back in the queue. The process goes down, and the new process on the top is again the one with the highest priority. OS scheduler picks it up, lets it run for a while...
- An example from MySQL world - events. Every event has a time when it should be executed. Let's put them all into a priority queue. The event that we should execute next will be on top (because this is how we defined the comparison function). Sleep until the time arrives, take the event out of the queue, execute it, calculate next time when it should be executed, put it back into the queue. Again, take the event from the top, sleep until it time arrives... And so on.
- Does it hide differences between operating systems (Unix, Windows, Netware, Unix variants) from the using layers of the server code?
- (Giuseppe's questions)
- It's my understanding that my_options (from my_getopt.h) are always used as a static struct. Why so?
- Because a set of command-line options is usually known at compile time.
- Would it be possible or practical to use them inside a function?
- You probably mean "to generate them inside a function". Yes, it is both possible and practical. It is done in a patch for WL#2936.
- Are MySQL application using any standard STL classes and algorithms? If not, why?
- No, they are not. Historically STL implementations were not good enough, they had problems under high load in multi-threaded applications. That's what Monty is saying, at least. Now, presumably, STL got much better. On the other hand, MySQL is supposed to work on many architectures that have different STL implementations, sometimes quite old. So there are reasons for and against STL. Anyway, whatever the reasons are, there was no official decision to start using STL, and thus - don't use it.
- Is there any plan to use the PCRE regular expression library within MySQL products?
- No plans that I know of. On the other hand, we have a old problem of our regex library being unable to handle multi-byte character sets. We need a replacement, and PCRE is a very probable candidate.
- It's my understanding that my_options (from my_getopt.h) are always used as a static struct. Why so?
- (Joro's questions)
- Can we have "typical/recommended usage" examples for the data structures above (preferably from the MySQL code itself)?
- Just grep the source for the name of the structure. Most examples are okay.
- So there's only one memroot in use at any time?
- One "current" mem_root - that is used if you don't specify mem_root explicitly.
- How expensive is to create a MEMROOT?
- Not expensive. Only one malloc, and only if you specified "prealloc size."
- Can we have "typical/recommended usage" examples for the data structures above (preferably from the MySQL code itself)?
- JayPipes' questions
- What are the primary focal areas that external community contributors should be aware of in mysys?
- The primary focal area is to be aware of mysys existance :)
- For outside contributions, what is the tolerance we exhibit for patches to use non-mysys library functions or custom functions which mimic our mysys functionality?
- None. When a patch is getting pushed into the source tree it should not use non-mysys library functions if a mysys wrapper exists. And it should not have custom functions that mimic mysys functionality
- But for small/simple/otherwise good patches we can let our developer who will be committing the patch to fix it, we don't need to reject a patch solely on the reason of using
open()instead ofmy_open()
- Should we encourage patch submitters to rewrite code which "reinvents" something already in mysys?
- Absolutely.
- Does InnoDB or PBXT or Falcon re-implement some of these basic portability functions/layers? To what extent? If so, are efforts planned to align these?
- Yes. Cannot tell. No.
- What are the primary focal areas that external community contributors should be aware of in mysys?
- Ingo
- Is there a way to remember the aggregation of files to prefix groups (mf_, my_, thr_, my_thr_, none)? That is what do the prefixes mean?
- Not that I know of :)
- Suggestion: It would be great if someone had time to fix and complete the quoted internals document.
- Do mysys functions use DBUG?
- Many do.
- Ok. Thanks. I believe I saw link -ldbug -lmysys, not other order
- You're right, there's a circular dependency.
- Is there a way to remember the aggregation of files to prefix groups (mf_, my_, thr_, my_thr_, none)? That is what do the prefixes mean?
- JD
- re. mystrings: how do you recode a string from one character set to another inside mysqld?
-
bool String::copy(const char *s,uint32 arg_length, CHARSET_INFO *csfrom, CHARSET_INFO *csto, uint *errors);
-
- Maybe copy_and_convert() can be moved from sql_string.cc into strings library? It is C, not C++, and it would be useful in NDB API programs (which link with mystrings already).
- Agree. Good idea.
- re. mystrings: how do you recode a string from one character set to another inside mysqld?
- TheK
- If we gain speed for deallocating using a MEM_ROOT and Item_* uses memroot, why are all items freed individaully during clean up phase?
- They are not - you cannot free an object from mem_root. They are deleted individually -
delete some_item- to have desctructors called. But it does not free the memory, it is only freed when mem_root is freed as a whole.
- They are not - you cannot free an object from mem_root. They are deleted individually -
- If we gain speed for deallocating using a MEM_ROOT and Item_* uses memroot, why are all items freed individaully during clean up phase?
- Iggy
- While talking about alloc/dealloc for Strings, could you comment on the usefullness of SAFEMALLOC and how to take advantage of it?
- Safemalloc works absolutely automatically, if the code is compiled with -DSAFEMALLOC, all calls to malloc/free are tracked and safemalloc reports about memory leaks and buffer over/underruns. Because String uses malloc/free, its allocations will be tracked by safemalloc, too.
- While talking about alloc/dealloc for Strings, could you comment on the usefullness of SAFEMALLOC and how to take advantage of it?
- Timour
- Is it possible to use io_cache to implement data structures that automatically spill-over to disk if there is no memory?
- Yes, it's already done. If you create an IO_CACHE without a file descriptor, it'll be a "virtual file" in memory, you can write to it, read it, seek in it. If you overflow the buffer a temporary file will be automatically created as a backing store. For small "temporary caches" IO_CACHE will never touch the disk. This feature is used in many places in MySQL.
- So this is like a memory-mapped file (kind of)? If so, why not use memory mapped files? Because of caching?
- Yes, because of caching, and because we want to avoid disk writes at all, if possible.
- Is
multi_alloc()used to reduce the fragmentation of the memory root or for performance reasons?- Indeed, one can say that it helps to avoid fragmentation, but I never thought about it this way. As far as I understand (and Monty confirms) the main reason to use multi_malloc is to simplify the code - if you need to allocate ten buffers, you'd need to call malloc ten times, and check return value to be not NULL ten times. And later free ten times. With multi_malloc replace "ten" with "one" in the above.
- Have we considered separating the portability functionality from algorithms/data structures?
- Yes, there's a plan to split mysys this way.
- In what cases (if any) do we use the OS heap for memory instead of the memory roots?
- Oh, in many. malloc is used when mem_root cannot be. For example, you have a pool of objects, new objects are constantly allocated and added to the pool, old objects are removed from it. You cannot allocate these objects from mem_root, because you won't be able to free some (old) objects without freeing all of them. Also, you should not use mem root for occasional big allocations - for example, for blobs.
- Is allocation in memory roots faster than malloc?
- Yes. It does not go into the system heap, so it does not need any synchronization mechanisms - read, mutexes - to protect system heap control data. Assuming you don't share your mem_root between threads, of course - or you'd need your own mutex.
- Is it possible to use io_cache to implement data structures that automatically spill-over to disk if there is no memory?
- Jeffrey
- Which of the mysys areas do you think would benefit most from rewrite/refactoring?
- Removing obsolete code. mysys has some very old functions (20 years?) which are now full of garbage, completely dead code, and workarounds that became obsolete more than ten years ago.
- Logically organizing it - splitting in two, renaming files and functions (mf_/my_/none file and function prefixes, typelib).
- Which of the mysys areas do you think would benefit most from rewrite/refactoring?
- Alik
- What do you think about refactoring/redesigning mysys in a way to make it C++ classes library? And use (only) them for the new code.
- Not a good idea, how would we use such a library from C code?
- What about having C++-wrappers
- Well, this is another question. I think it's a reasonable thing to do. Create a libmysys++ library with C++ wrappers. Furthermore, other useful "utility and algorithms" in C++ can go there, instead of cluttering sql/ directory - I mean String, List, Bitmap, for example.
- What do you think about refactoring/redesigning mysys in a way to make it C++ classes library? And use (only) them for the new code.
