|
CSE-4411M
Database Management Systems
York University
Winter 2013
|
Project #1
Buffer Pool Manager
Frequently Asked Questions (FAQ)
|
|
|
|
Do
we implement anything other than the methods listed
in the assignment page?
Are we permitted to create more classes,
such as bufferHashTable class,
to support the bufMgr class?
Yes,
you will also be building supporting classes within the bufmgr
package, like
FrameDesc,
BufHTEntry,
BufHashTbl,
and
Replacer
(implement the LRU strategy).
It is up to you what you want to call these
and how to organize them.
Indeed,
it would be quite awkward to pull this off with only sub-classes
within bufMgr,
so introducing other classes when appropriate is fine.
The exceptions you are to handle
will also require you to have small auxiliary classes
(and files to hold them).
|
|
What
is flushAllPages supposed to do?
The
method flushAllPages is called on clean-up by Minibase
when things are shut down
to flush all dirty pages to disk.
|
|
|
|
Compiling, Make, The Code Base, etc.
|
|
If
I want to say import global *, import diskmgr.*
in my code,
do I need to download these packages somehow?
No.
You have effectively "downloaded" these
when you copied the directory over.
There is a JAR file at
- /cs/dept/course/2012-13/W/4411/bufmgr/lib/bufmgrAssign.jar
It contains all the components of Java Minibase
(the packages global and diskmgr)
that your part, bufmgr, needs to compile.
Add this onto your CLASSPATH,
or use the Makefile to do your compilation.
|
|
So
when I follow the instructions and do, for example,
- % cp -rp /cs/dept/course/2012-13/W/4411/bufmgr/src proj
what all am I copying over?
You
are copying a directory with three sub-directories:
- bufmgr/
- diskmgr/
- tests/
As you have copied it,
these are within a new directory of yours now called proj/.
So far, there is nothing in bufmgr/ but a Makefile.
This is the sub-directory where you will be putting the code you write
to implement the bufmgr.
This is the only sub-directory you need to change really.
The sub-directory diskmgr/ is just there for your convenience.
It is the source code for the disk space manager.
You do not really need this for your project;
seeing the API for the diskmgr/ in the JAVADOC is enough.
However,
it may be useful to see how things are done within the diskmgr/.
Caution: The source code for the diskmgr/
that you have copied is not what is compiled
with you code!
The code in the JAR for diskmgr/
(which is on your CLASSPATH)
is what is being used.
The source for diskmgr/ you have copied is
just for reference.
There is a Makefile in diskmgr/
for compiling that code,
but you are not using it.
The sub-directory tests/ is important.
It is wrapper code (with a main
that runs a number of tests on the code
(that you are writing).
So you do not have to design and build your own testing code.
Use what is here.
You are welcome to modify this,
add new tests, etc.
When we test your final submitted code,
we will be using the original version of this tester.
|
|
When
I attempt to compile,
the compiler complains it cannot find the package bufmgr.
What is wrong?
As
your buffer manager code is in its own directory (bufmgr/),
it is expected that the code is encapsulated in a java package
bufmgr.
Be certain to declare that at the beginning of your BufMgr.java
file.
- package bufmgr;
|
|
What
files do I need to have in my bufmgr/ directory?
Your
main code for the buffer pool will be
in a file BufMgr.java.
Here is a template file to get you started:
BufMgr.java.
This will house the class BufMgr.
You may have auxiliary classes in the same file.
Of course,
you are allowed to put these in other files in the same directory,
if you prefer.
You will need a file per exception that you need to declare too.
This will make for a number of small files.
Remember, all files will be of the package bufmgr/,
and you need to declare that at the top of each,
as discussed above.
|
|
What
is the deal with the Makefile?
How do I compile my code?
To
compile your code,
javac needs to see packages global and diskmgr
(the parts of the existing Java Minibase code base)
that your code employs.
There is a Makefile both in bufmgr/ and tests/
for the UNIX make utility meant to make life easy for compiling
the code.
Do not know anything about make?!
Read the man page on make.
It is a useful utility for managing (compiling, linking, etc.)
code spread over multiple files (classes).
For example,
when in the directory bufmgr/, you say
- % make
it compiles all the java files whose classes either do not exist yet
or whose source (.java) is newer than the object code
(.class).
So make is efficient in that it only performs that work
that is needed.
(Note that the '%' above represents the command-line prompt.)
The make utility reads a file called Makefile
in the current directory (if there is one)
to find its instructions.
The Makefile is just a short ASCII file
and you can easily look it over.
Of course,
you can do your "compilation" by hand, if you prefer.
Most importantly,
you will have to set up your CLASSPATH correctly for this.
(When doing it via make,
the Makefile sets this up.)
Look inside the Makefile to see what you need.
Namely,
you will need something like
- % setenv CLASSPATH ../:/cs/dept/course/2012-13/W/4411/bufmgr/lib/bufmgrAssign.jar:
You need to have that JAR on your path.
The ../ is important for tests;
it puts your bufmgr package on the path.
Now to run,
since BMTest is in a package named tests,
say
- % java tests.BMTest
|
|
|
|
May
we use things from the java utility package?
Such as to do a Hashtable?
Certainly.
(Note that a map might be more appropriate.)
|
|
|
|
Output, Testing, & Grading
|
|
May we have a working copy
of, say, BufMgr.class so we can see
the output of a working version
and how it performs? :-)
I
know that could be useful.
However, no.
Sorry.
(Java is too easy to reverse compile, for one thing.)
I realize that it is difficult when starting the project.
Nothing works yet.
Build things step by step and modularly.
Think carefully what each method needs to do.
It is doable.
|
|
How
should our output look then
when we run bmtest
once things are all working?
Here
is the output as generated by a working bufmgr package:
sample output file.
Note that it says, "Failed as expected,"
and things like that in places.
It is reporting that the BP manager "failed" in the proper way.
This is not saying there is a problem.
So interpret carefully what the output is saying before panicking.
You can read the BMTest.class file to see what tests are being
performed on the BP manager.
Indeed, you can comment out some of the tests while you are working.
Just be sure to uncomment them by the end,
and know your code passes all the tests.
The TestDriver.java file is just a wrapper class
the Minibase folks developed that calls and executes the tests,
in this case, in BMTest.class.
|
|
When
grading,
will you be testing our BP manager with additional tests
not in bmtest?
No.
So if your manager is passing bmtest fine,
you're in great shape.
That said,
I do look at your code when grading.
So things need to be implemented in the right way.
(For instance, writing a dummy package
that simply prints out right looking results
would not pass!)
However,
I will not go looking for bugs in your BP manager beyond
what bmtest tests for.
|
|
|
|
What
does the emptyPage mean in the pinPage() method?
Boy
you people are nosey! :-)
The assignment specifies,
"You can assume that emptyPage==false for this assignment."
Anyway,
it is used to indicate that we know the pageID has just been created, and
we are now pinning the (brandnew) page in the buffer-pool.
Because the page is new, there is nothing written on it yet.
So the bufmgr does not have to ask the diskmgr to read it from disk.
This saves an I/O.
Anything to save an I/O!
|
|
In
the flushPage() method,
if the page is pinned,
will it still be flushed?
Yes.
However, throw a, say, PAGE_PINNED exception when you are done.
|
|
In
the flushPage() method,
if the page is not in buffer pool,
do we need throw a exception?
Since the client of buffer manager doesn't know
whether the page is present or not,
this would seem odd.
Indeed.
Just do nothing if called to flush a page not in the buffer pool.
|
|
Why
bufpool[numbuf][page_size]?
This
is the buffer pool itself.
This is the space in main memory
controlled by the buffer pool manager
to store (main-memory copies of) the database pages from disk.
The array bufpool[numbuf] is the array of "frames".
The buffer pool owns numbuf number of frames in this case.
Each entry here is a reference to an (implicit) array of page_size
number of bytes.
Each such chunk of memory, a contiguous array of page_size
number of bytes, is precisely the size of memory needed to store a page.
You probably recall that, in Java,
an array is always an array of references.
Objects to be placed in the array must be allocated (new)
independently of the array allocation.
So, each array of bytes,
that is really an array of references to bytes?
No.
In the case of the primitive datatype byte
(which is shorter than a pointer / reference),
Java allocates it directly as an array of bytes,
an exception to the general array-allocation rule of Java.
So each array of bytes really is a contiguous chunk of memory!
I
thought you complained that, in Java,
we had to fake the buffer pool,
that we were not doing real memory management.
It seems this trick with the 2d array
lets us do it!
Yes
and no.
Ideally,
we want the entire buffer pool to be one entire, contiguous
span of main memory.
That is because in a real database system
with a real diskspace manager,
the disk controller card can write into main memory directly,
without the CPU being involved.
It is passed an instruction, like,
write the 20 blocks off disk starting at block address x002f98
and place them in main memory starting at address xd80294.
This only works if our frames are contiguous.
We are not guaranteed this here.
Since bufpool[numbuf] is an array of references,
the arrays of bytes (our frames) can be scattered throughout the heap.
Java does not allow us closer control over memory allocation.
However,
our arrays of bytes (our frames) are each contiguous, at least.
(Otherwise, we really couldn't do anything.)
Why
can I not be a good object-oriented citizen and store my frame
descriptor information with the "frame" in bufpool[numbuf]?
Again,
ideally our frames for page placement would be contiguous in memory.
If we made a frame object with the descriptor information and the
page-space together,
we are guarranteeing that this is not the case.
So our engineering requirements overrule our object-oriented sense
in this case.
|
|
What
is meant by "make the buffer manager aware that the replacement
policy is specified by replacerArg" in the constructor?
A
string may be passed in when
the BufMgr constructor is called
that is to indicate the replacement
strategy to be used.
You don't need to do anything with this.
Under different circumstances,
this string might be used to pick one of several implemented strategies.
You are just implementing LRU, though, so just ignore it.
|
|
In
the methods newPage and flushPage,
there are several references to the DB object.
How do I access this object?
Minibase
declares a diskspace manager (DB), a buffer pool manager,
etc., when "started" up.
(Look into the tests code and the diskmgr code to see examples.)
Look in the JavaDoc to see specifics on the SystemDefs package.
For example,
you can access the DB object as in
- SystemDefs.JavabaseDB.read_page(pageno, page);
|
|
Does
bufmgr.freePage really free a page regardless of
whether it is pinned,
or does it issue an error if the page is pinned and not free it?
As
you know, bufmgr.freePage is meant to free a page.
This means it issues a DB call to deallocate the page from the database.
Why does bufmgr have a role in freeing a page?
Well, code calls from higher layers in minibase call bufmgr
to do this rather than calling DB (the disk space manager)
directly because the buffer manager needs a shot
at adjusting the buffer pool
when a page is freed from the database.
(Essentially in the layered architecture,
a layer only gets to see the
layer immediately below it,
and not further below.)
Now back to the original question:
Should bufmgr.freePage free a page regardless or not?
It (bufmgr.freePage) frees a page
if the page is unpinned or if it is pinned once (pin_count == 1).
If it is pinned multiply (pin_count > 1),
it does not free the page but returns a Minibase error code.
So for myself,
I could see justifying either approach.
- The bufmgr refuses to "free" pinned pages;
they must be unpinned first.
- Or the bufmgr happily frees singly pinned pages,
and callers shouldn't be doing this
unless they know what they are doing.
As is, it is the latter (2) that is the case for us.
Now for the other part of that question.
Does it issue an error if the page is pinned?
Good question.
- If the pin_count of the page is zero, of course not.
There is no problem in this case.
- If the pin_count is one, no. What!?
Well, we assume that the caller process that pinned the page is
the one asking for it to be dropped from the database altogether.
Since there is no other process relying on that page (the
pin_count is only one), no other transaction will be adversely
affected by this.
- If the pin_count is greater than one, yes.
In this case, more than one process / transaction has asked for
that page to be in the buffer pool. One of the transactions (or
some unrelated transaction accidently) is asking for it to be
dropped from the database. This can cause problems for the other
transactions that are relying on this page.
I personally find this policy a bit lame.
For instance, we never check in the case that pin_count == 1
that it is the same transaction that pinned the page that is now asking
for it to be freed.
So some other transaction could be accidently trying to free the page.
Anyway, this is design choice made in minibase.
And the tests for the project rely on this assumption.
So...implement it this way.
What steps should be done in bufmgr.freePage?
The following.
- Call DB to deallocate the page from the database.
This should be done if the page is currently not in the buffer pool,
or if it is and is just singly pinned or unpinned.
This is because calls from higher layers call
bufmgr to free (deallocate) pages, regardless.
- If the pin_count of the page was greater than one,
issue an error, don't kill the page, and return.
- Call "replacer.free"
(this depends on how you've coded things),
given the page was in the buffer pool,
so replacer can make the requisite adjustments.
(This is only pertinent if you are implementing a separate
replacer class.)
- Check using the hash whether the page is in the buffer pool.
If so, free the frame that holds the buffer copy of the page.
Note that it does not matter whether the page in the buffer pool is
marked as dirty, since we are destroying the page altogether.
|
|
For
the getNumUnpinnedBuffers method,
what is the definition of an unpinned BUFFER?
I am currently assuming it means that it is either a
frame that is free (that is, has no page in it) or that the page in
the frame has pin_count == 0.
Is that correct?
Yes.
Free frames have their pincounts set to zero also and should be
counted here.
|
|
Why
does bufmgr.newPage
have a second parameter howmany?
This
is in case the X-act (an outside caller to the buffer-pool manager)
wants to request that more than one new disk page be created.
(The default is that howmany = 1.)
So what is BP (the buffer-pool manager) supposed to do?
BP will first call the method allocate_page
of DB (the disk-space manager)
in order to have a new disk page (or disk pages) made.
Method allocate_page takes one or two arguments.
The first argument is a PageID object, start_page_num.
The second, optional argument is an int, runsize.
(This second argument defaults to one if missing.)
PageID is a "holder" object.
It contains one field, pid, of type int.
The pid is for the ID of a disk page.
When calling allocate_page,
BP does not know the ID of the page to be created!
The DB will set start_page_num's pid to the appropriate
value when it has allocated a new page.
BP must coin the object start_page_num before
calling allocate_page,
but BP does not set start_page_num's pid.
Here start_page_num is a holder object we are using to pass back
the page ID.
If the X-act has howmany > 1,
BP should call allocate_page with runsize = howmany.
DB, if it can,
will allocate a physically contiguous sequence of howmany#
of pages on disk.
(If it cannot,
it will throw an exception,
which the BP should catch and chain along.)
The page ID of the first page will be conveyed back via
start_page_num's pid.
What are the page ID's of the subsequent pages?
They are pid + 1,
pid + 2,
etc.
The BP's second job is then to pin the first page
of the newly created pages.
The BP does not pin any of the subsequent pages.
(The X-act might pin them later,
but this is not the BP's business.)
If the BP cannot pin this first page
(say the BP is full and everything is pinned),
it should clean up before failing gracefully
(throwing the appropriate exception).
This means it should call DB to deallocate the pages
just allocated before the BP itself exits.
|
|
|
|
Are
we supposed to implement three replacement strategies,
Clock, LRU, and MRU?
No.
Just implement LRU.
The strategies (e.g., Clock and LRU)
are just as described in the textbook.
So you already should know what you need to know about them.
|
|
|
Parke Godfrey
|
|
|