CS 352 Spring 2007
Programming Assignment 3
Due
Date: SATURDAY 15 December, 2007 8:00AM(extended again)
Sections
Updates
- Because the iLab machines DO
NOT support SSH public key authentication, the script that starts BitTorrent client at multiple machines at a same time
doesn’t work. So, we eliminate the comparison between random piece select
algorithm and rarest-piece-first piece selection algorithm.
The third project will build upon the
first two. The basic idea is the same - open a torrent file, connect to the
tracker, peers, upload and download, but your client must implement all parts
of the BitTorrent protocol. This means the addition
of more requirements (this is not an exhaustive list):
- Your
client must implement the rarest-piece-first piece selection algorithm.
- Your
client must implement choking and optimistic unchoking.
The basics of this are explained on the BT Protocol page.
- Your
client must maintain an open socket listening for incoming connections.
Your client must drop connections that do not follow the BT protocol, and
may refuse connections if it already has more than 50 active connections -
this last portion should not be a problem.
At this point, your client should have some kind of UI. It
doesn't have to be fancy, it doesn't even have to be graphical, but should have
some sort of feedback to users, or receive some inputs from users.
Assignment Specifics
Your assignment should basically do the
following:
- Take as
a command-line argument the name of the .torrent file to be loaded and the
name of the file to save the data to. For example:
java mybtclient project3.torrent myvideo.wmv
- Pass
the name of the .torrent file to the TorrentFileHandler class as a String and store the
returned TorrentFile object.
- Open a TCP socket on the local machine and send an HTTP
GET request to the tracker at the IP address and port specified by the TorrentFile object.
- Capture the response from the tracker and decode it in
order to get the list of peers. From this list of peers, you could
filter your own peers by specific port or a special signature in peer_id in order to test functions among your own
peers. But the final version should be accept
your classmate’s peers.
- Listen in a TCP port to accept incoming connection.
Open TCP connections to the peers and be able to download simultaneously
from several at one time.
- Download one or more pieces of the file and verify its
SHA-1 hash against the hash stored in the TorrentFile
object.
- After a piece is downloaded and verified, the peers are
notified that you have completed the piece.
- Other peers should be able to request pieces from your
client that you have verified, and your client should send those pieces to
the peers.
- Repeat steps 6-8 (using the same TCP connections) for
the rest of the file. Make sure your client has uploaded at least 10% of
the file size before closing your program.
- When the file is finished, you must contact the tracker
and send it the completed event and properly close all connections.
In
addition to the above basic run-through of your program's execution, your
program should be able to detect input from the user at any point (you can
decide what the input should be), and allow the user to close the program,
suspending download of the file. The user should be able to restart the program
at a later time and resume download. Any non-verified pieces/blocks should be
discarded before suspending, and the program should also save how much it has
uploaded and downloaded for that torrent.
Your
client should also monitor the speed of all connections (except the tracker),
and choke connections to the peer with the lowest download bandwidth. You only
need to choke if you have 5 or more connections (meaning you would have 4 unchoked connections), and every 30 seconds you should
optimistically unchoke one of the choked connections.
Also, any connections with transfer rates less than 1KBps for more than 1
minute should be choked.
The files below are intended to help you concentrate
on the networking portion of the project and not have to worry about how to bencode or "unbencode"
the torrent file and some of the communication.
·
TorrentFile.java - Stores the data from
a .torrent file in an easy-to-access format.
·
TorrentFileHandler.java -
Accepts a String corresponding to a filename for a .torrent file and returns a TorrentFile object containing the unencoded
data stored in the file.
·
TorrentFileHandlerTester.java -
An example of how to use the TorrentFileHandler and TorrentFile classes.
- project3.torrent - The torrent file containing
metadata about the file to be downloaded.
- Bencoder.java - Accepts Strings,
Integers, Lists, and Maps (Dictionaries) and returns them as bencoded byte arrays. Also accepts bencoded
byte arrays of Strings, Integers, Lists, and Dictionaries and returns them
as Strings, Integers, Lists and Maps.
Writing maintainable code is just as important as
writing correct code. Consequently, you will be graded on your style. Below is
a list of suggestions to make your code easier to understand and maintain:
- Conditions
- All of your methods must have clearly specified preconditions and postconditions. A precondition is what you assume to
be true before your method is called, and a postcondition
is what you can show to be true after your method has returned.
Trivial preconditions (such as "int a is an integer") should be omitted. The best
place to put the pre and postconditions is in a
comment before the first line of each method.
Example:
· /*
· Precondition: b is non-zero.
· Postcondition: returns a / b.
· */
· double divide(double a, double b)
· {· return (a / b);
· }
- Comments -
Comments for variables and for sections of code that does not have an
obvious purpose. Please do not comment obvious statements or sections of
code. For example,
int i = 0; //i is an integer and the initial value is 0
is not useful and unnecessarily
clutters the code.
- Naming -
Names of classes, methods, and variables should be descriptive. For
example, if a class has a field containing its hash value, naming the
field hash_value is much better
than naming it hv .
- Exceptions
- Catch exceptions and do something useful with them. For example, print a
statement describing what went wrong to System.err.
- Easy to
understand loops/recursion. When possible, it's best to avoid break statements
whenever possible. When using recursion, tail-recursion is often
preferable because "smart" compilers can often translate it into
a loop. Return statements within an if/else clause should be avoided
whenever possible.
- Efficiency
- Avoid being terribly inefficient. For example, if you have to sort 10M
objects, it's better to sort them with a O(n*lg(n)) algorithm than a O(n2) algorithm.
- Encapsulation/Objects
- Encapsulation is a useful feature of object-oriented languages, and
because you are writing your program in an OO language, your program must
separate the functionality into multiple classes. This way you can change
the implementation of a class without affecting the rest of your program.
An example of separation would be to have separate FileHandler,
PeerInterface, and TrackerInterface
classes to manipulate files, interface with peers and interface with the
tracker, respectively. If you do not separate your program into functional
units, you will lose points on the Style portion of the grading.
Bencoding
(Pronounced "Bee Encoding")
Bencoding a method of encoding binary data. Tracker responses, and interpeer
communication will be bencoded. Below is how data
types are bencoded according to the BT protocol. [The
following list is taken from http://www.bittorrent.org/protocol.html
]
- Strings are length-prefixed base ten followed by a
colon and the string. They are encoded in UTF-8. For example 4:spam corresponds to 'spam'.
- Integers are represented by an 'i'
followed by the number in ASCII base 10 followed by an 'e'. For example
i3e corresponds to 3 and i-3e corresponds to -3. Integers have no size
limitation. i-0e is invalid. All encodings with a
leading zero, such as i03e, are invalid, other than i0e, which of course
corresponds to 0.
- Lists are encoded as an 'l' followed by their elements
(also bencoded) followed by an 'e'. For example,
l4:spam4:egse corresponds to ['spam', 'eggs'].
- Dictionaries are encoded as a 'd'
followed by a list of alternating keys and their corresponding values
followed by an 'e'. For example, d3:cow3:moo4:spam4:eggse
corresponds to {'cow':'moo', 'spam':'eggs'} and d4:spaml1:a1:bee corresponds to
{'spam': ['a', 'b']}. Keys must be strings and appear in sorted order (sorted
as raw strings, not alphanumerics).
Your client must take the information supplied by
the TorrentFile object and use it to communicate with
the tracker. The tracker's IP address and port number will be given to you by
the TorrentFile object, and your program must then
contact the tracker. Your program will send an HTTP GET request to the tracker
with the following key/value pairs. Note that these are NOT bencoded,
but must be properly escaped [this list is taken from http://www.bittorrent.org/protocol.html
]:
- info_hash - The 20 byte(160-bit) SHA1 hash of the bencoded
form of the info value from the metainfo file.
This value will almost certainly have to be escaped.
- peer_id - A string of length 20
which this downloader uses as its id. Each downloader generates its own id
at random at the start of a new download. This value will almost certainly
have to be escaped.
- ip - An optional parameter
giving the IP (or DNS name) which this peer is at.
- port - The port number this
peer is listening on. Common behavior is for a downloader to try to listen
on port 6881 and if that port is taken try 6882, then 6883, etc. and give
up after 6889.
- uploaded - The total amount
uploaded so far, encoded in base ten ascii.
- downloaded - The total amount
downloaded so far, encoded in base ten ascii.
- left - The number of bytes
this peer still has to downloaded, encoded in base ten ascii.
This key is important - If you do not specify how much you have
left to download, the tracker assumes you are a seed and will not return
any seeds in the peer list.
- event - This is an optional
key which maps to started , completed , or stopped (or
empty , which is the same as not being present. If not present,
this is one of the announcements done at regular intervals. An
announcement using started is sent when a download first begins,
and one using completed is sent when the download is complete. No completed
is sent if the file was complete when started. Downloaders
send an announcement using stopped when they cease downloading.
The response from the tracker
is a bencoded dictionary and contains two keys:
- interval - Maps to the number of
seconds the downloader should wait between regular rerequests.
- peers - Maps to a list of dictionaries corresponding to
peers, each of which contains the keys peer id , ip , and port , which map to the
peer's self-selected ID, IP address or dns name
as a string, and port number, respectively.
In
addition to what your program did for the last project, your client should also
periodically update its status to the tracker. The update period should be no
less than the interval returned by the tracker. This may
change during execution, so it should be updated each time the tracker is
contacted.
Handshaking between peers begins with character
nineteen (decimal) followed by the string 'BitTorrent
protocol'. After the fixed headers are 8 reserved bytes which
are set to 0. Next is the 20-byte SHA-1 hash of the bencoded
form of the info value from the metainfo (.torrent)
file. The next 20-bytes are the peer id generated by the client. The info_hash should be the same as sent to the tracker, and
the peer_id is the same as sent to the tracker. If
the info_hash is different between two peers, then
the connection is dropped.
- All integers
are encoded as 4-bytes big-endian. e.g. 1,234 should be encoded as (hex)
04D2
- The peer_id should simply be random numbers.
After
the handshake, messages between peers take the form of <length
prefix><message ID><payload> , where length prefix is a 4-byte
big-endian value and message ID is a single decimal character. The payload
depends on the message. Please consult either of the BT-related resources for
detailed information describing these messages. Below is a list of messages
that need to be implemented in the project.
- keep-alive:
<length prefix> is 0. There is no message ID and no payload.
These should be sent around once every 2 minutes to prevent peers from
closing connections. These only need to be sent if no other packets are
sent within a 2-minute interval.
- choke:
<length prefix> is 1 and message ID is 1. There is
no payload.
- unchoke:
<length prefix> is 1 and the message ID is 2. There
is no payload.
- interested:
<length prefix> is 1 and message ID is 2. There is no
payload.
- uninterested:
<length prefix> is 1 and message ID is 3. There is no
payload.
- have:
<length prefix> is 5 and message ID is 4. The payload is a
zero-based index of the piece that has just been downloaded and verified.
- request:
<length prefix> is 13 and message ID is 6. The payload is as
follows:
<index><begin><length>
Where <index> is an integer specifying the zero-based piece
index, <begin> is an integer specifying the zero-based byte
offset within the piece, and <length> is the integer
specifying the requested length. <length>
is typically 2^14 (16384) bytes. A smaller piece should only be used
if the piece length is not divisible by 16384. A peer may close the
connection if a block larger than 2^14 bytes is requested.
- piece:
<length prefix> is 9+X and message ID is 7. The payload is as
follows:
<index><begin><block>
Where <index> is an integer specifying the zero-based piece
index, <begin> is an integer specifying the zero-based byte
offset within the piece, and <block> which is a block of
data, and is a subset of the piece specified by <index> .
During the first project, a lot of people were
confused about the order of messages after the handshake. Below is an example
of what would take place between two peers setting-up a connection and starting
sharing.
- The local
host opens a TCP Socket to the remote peer and sends the handshake packet.
The local host then listens for the remote peer to respond.
- Upon
receiving the handshake packet and verifying the info_hash,
the remote peer responds with a similar handshake packet (except with its peer_id). The remote peer then listens for the local
host to send a bitfield or other packet. The
remote host can send a bitfield packet to the
local host at this time.
- Upon
receiving the handshake and verifying the info_hash,
the local host then (optionally) sends a bitfield
packet which tells the remote peer which pieces it has downloaded and
verified so far.
- If the local
host is interested in what the remote peer has downloaded, then it sends
an interested packet, otherwise it sends an uninterested packet. If the
remote peer is interested in what the local host has downloaded, then it
sends an interested packet, otherwise it sends an uninterested packet.
- When the
local host, or the remote peer, is ready to download/upload to the other,
it will send an unchoke packet.
Please
note that clients will, and your client should, ignore any request packets
received while a remote peer is choked. A client should only upload to another
client if the connection is unchoked AND the remote
peer is interested. This means that a remote peer will not reply to the local hosts's request packets unless you have expressed interest
AND the remote peer has sent an unchoke packet. If
the remote peer sends a choke packet during data transfer, any outstanding
requests will be discarded and unanswered - they should be re-requested after
the next unchoke packet.
The
Write-Up
- Include
your name as it appears on the roster and your student ID.
- A
high-level written description of how your program works. Specifically,
describe the high-level interactions between the classes. This is a good way
to make sure that your program doesn't have any classes depending on the
implementations of other classes. You may include diagrams or
illustrations if you feel it would aid your explanation or if you have a
hard time explaining how your program works. However, you MUST include
some written description of your program. If you know how to use LaTeX, this should be fairly easy to diagram. Also, OpenOffice.org's Draw program has an Export to PDF
feature that is very simple to use. This section should be around 100
words, or 2 paragraphs.
- A brief
(about 50 words or 1 paragraph) description of each class in the program.
Describe what it does in general, the main public methods it provides, and
any public fields it exposes. Note that classes should generally NOT have
public fields unless they are Static (e.g. constants).
- Optionally,
you may include a section on feedback about the program. Please only
provide constructive/useful comments or insights. What parts of the
project were the most challenging and which were the easiest? Did you have
trouble finding resources for the project? Were any parts confusing to
you? Including feedback will help shape the direction of the next project
as well as help in pointing-out trouble spots to look at during grading.
The project should be submitted through Sakai.
- Make sure your name is in comments at the top of every
file submitted.
- The "main" starting method should be in a
file called RUBTClient.java.
- The write-up in RTF, HTML or PDF format saved as a writeup.<html/pdf/rtf>. Please do not submit a write-up in any
other formats
- All files should be submitted either as a .zip or .tar
file. This file should be named "<NetID>.<zip/tar>",
where <NetID> is your eden NetID. If you are
submitting as a group, only one person needs to submit, but the file
should be named with both NetIDs, e.g.
"<NetID1>.<NetID2>.<zip/tar>"
- Late submissions will not be accepted.
If you have not completed the project, submit what you have and you will
be graded for partial credit.
The portion of the grade devoted to the write-up
and programming style have been increased to reflect the importance of good
documentation and readable code for larger projects.
- Programming
Style: 10%
- Ability
to simultaneously upload/download from multiple peers: 10%
- Correctly
implementing the rarest-piece-first piece selection algorithm: 20%
- Correctly
monitoring bandwidth on all connections and choking/unchoking
peers: 20%
- Correctly
downloading and verifying the file: 10%
- Handle
user inputs (restart, suspend/resume download): 15%
- Write-up
15%
It is strongly recommended that you
bookmark or download the Sun Java 1.5.0 API , as well as read the following pages:
Please email tphan@cs.rutgers.edu concerning any questions or problems with
this page. Please visit the Sakai website for questions concerning this
project.