CS 352 Fall 2006
Programming Assignment 2
Due Date: Tue November 13th , 8:00pm (Extended again)
Sections
Updates
-
If the handin system doesn't work, you could send your project 2 to TA: tphan@cs.rutgers.edu.
- 2 more leeches are running at "booberry"(128.6.13.143 port 6880) and "cornpops" (128.6.13.135 port 6880).
- Clarifications:
- The leeches are used for the first phase uploading test only. Your program should connect to seeds, leeches and others peer (from your classmates). Your program should also accept incomming connection from other peers (from your classmates, or you may run your program in other machines later). So, the peer connections is kind of dynamic.
- You can not download from the leeches, because they don't send you the "bitfield" or "have" messages. But after downloading a piece and verifying, you should advertise the pieces by sending a "have "message to all connected peers (including leeches). The leeches will send a request message for the piece your program has just downloaded. It may need to send an interested message to unchoke. Your program should send "piece" when receiving request from leeches (or other peers). That's what I mean "first phase uploading test".
- For the final test, you should run your program at multiple iLab machines simultaneously. We recommend you to download pieces in randomized order. Your program should start downloading some pieces from seeds, and later may download from other peers (from your program or your classmate's program in other machines). So, even the leeches may be down, your program can be able to upload to other peers, at least it could upload to your programs running in other machines.
- Your program should be compiled and run at ILAB machines. Make sure that your submitting version can be graded in ILAB environment. The Bencoder.java doesn't work with LANG=en_US.UTF8, therefore, remember to set the LANG environment variable to "en_US" by following commands:
For C shell: setenv LANG en_US
For Bourne shell: export LANG=en_US
- Leeches are listening at port 6880 but leeches are used for the "first phase" uploading test only. You should run your final version program at multiple machines to test downloading and uploading.
- As of October 15, there are now 2 seeds running on the
president cluster: johnson(128.6.60.137) and tyler(128.6.60.131.
We also have 2 leeches at washington(128.6.60.123) and jefferson(128.6.60.125).
These leeches will accept incoming connections from your BitTorrent clients. Right after they receive a “have”
message from your clients, if your client un-chokes the connection, they
will send a “request” message for the corresponding piece; if your client
chokes the connection, they will send a “interested”
message and then a “request” message.
The second project will expand upon what
you did for the first. Your client must now open a .torrent file, contact the
tracker as well as multiple peers and be able to upload and download to the
peers simultaneously. This means you will have to manage the state of several
connections and work with multiple threads. Your client should connect to 2
seeds and 2 leeches mentioned above; and should also accept incoming
connections from your classmate’s clients at iLab
machines (128.6.13.*).
If your code for project 1 was well
thought-out, it should be easy to expand your program for multiple downloads,
and uploading should not be much more difficult than listening for more types
of packets and sending new ones. You will not have to worry about choking any
peers, but like the first project, you will have to make sure to unchoke them and keep them up to date on what pieces your
client has downloaded.
Your assignment should basically do the
following:
- Take as
a command-line argument the name of the .torrent file to be loaded and the
name of the file to save the data to. For example:
java mybtclient somefile.torrent myvideo.wmv
- Open the
.torrent file and parse the data inside. You may use the Bencoder class to decode the data.
- Open a TCP socket on the local machine and send an HTTP
GET request to the tracker at the IP address and port specified by the TorrentFile object.
- Capture the response from the tracker and decode it in
order to get the list of peers. From this list of peers, use only a
pre-selected list of peers to be specified . You
must extract these IP addresses from the list, hard-coding it is not
acceptable, except hard-coding the comparison.
- Open TCP connections to the peers and be able to
download simultaneously from several at the same time.
- Download one or more pieces of the file and verify its
SHA-1 hash against the hash stored in the .torrent file.
- After a piece is downloaded and verified, the peers are
notified that you have completed the piece.
- Other peers should be able to request pieces from your
client that you have verified, and your client should send those pieces to
the peers, according to the protocol.
- Repeat steps 3-7 (using the same TCP connections) for
the rest of the file. Make sure your client has uploaded at least 10% of
the file size before closing your program.
- When the file is finished, you must contact the tracker
and send it the completed event and properly close all connections.
In
addition to the above basic run-through of your program's execution, your
program should be able to detect input from the user at any point (you can
decide what the input should be), and allow the user to close the program,
suspending download of the file. The user should be able to restart the program
at a later time and resume download. Any non-verified pieces/blocks should be
discarded before suspending, and the program should also save how much it has
uploaded and downloaded for that torrent.
How
you choose to implement saving the state of your program is up to you, but an
intuitive approach might be to allocate the total space to disk before
downloading, and then writing the appropriate pieces into the file as they
complete. You can keep track of valid pieces using a java.util.BitSet.
The
files below are intended to help you concentrate on the networking portion of
the project and not have to worry about how to bencode
or "unbencode" the torrent
file and some of the communication.
·
TorrentFile.java - Stores the data from a .torrent
file in an easy-to-access format.
·
TorrentFileHandler.java - Accepts a String
corresponding to a filename for a .torrent file and returns a TorrentFile object containing the unencoded
data stored in the file.
·
TorrentFileHandlerTester.java - An
example of how to use the TorrentFileHandler and TorrentFile classes.
- project2.torrent - The torrent
file containing metadata about the file to be downloaded.
- Bencoder.java - Accepts
Strings, Integers, Lists, and Maps (Dictionaries) and returns them as bencoded byte arrays. Also accepts bencoded
byte arrays of Strings, Integers, Lists, and Dictionaries and returns them
as Strings, Integers, Lists and Maps.
Writing maintainable code is just as important as
writing correct code. Consequently, you will be graded on your style. Below is
a list of suggestions to make your code easier to understand and maintain:
- Conditions
- All of your methods must have clearly specified preconditions and postconditions. A precondition is what you assume to
be true before your method is called, and a postcondition
is what you can show to be true after your method has returned.
Trivial preconditions (such as "int a is an integer") should be omitted. The best
place to put the pre and postconditions is in a
comment before the first line of each method.
Example:
· /*
· Precondition: b is non-zero.
· Postcondition: returns a / b.
· */
· double divide(double a, double b)
· {· return (a / b);
· }
- Comments -
Comments for variables and for sections of code that does not have an obvious
purpose. Please do not comment obvious statements or sections of code. For
example,
· int i = 0; //i is an integer and the initial value is 0
is not useful and unnecessarily
clutters the code.
- Naming -
Names of classes, methods, and variables should be descriptive. For
example, if a class has a field containing its hash value, naming the
field hash_value is much better
than naming it hv .
- Exceptions
- Catch exceptions and do something useful with them. For example, print a
statement describing what went wrong to System.err.
- Easy to
understand loops/recursion. When possible, it's best to avoid break statements
whenever possible. When using recursion, tail-recursion is often
preferable because "smart" compilers can often translate it into
a loop. Return statements within an if/else clause should be avoided
whenever possible.
- Efficiency
- Avoid being terribly inefficient. For example, if you have to sort 10M
objects, it's better to sort them with a O(n*lg(n)) algorithm than a O(n2) algorithm.
- Encapsulation/Objects
- Encapsulation is a useful feature of object-oriented languages, and
because you are writing your program in an OO language, your program must
separate the functionality into multiple classes. This way you can change
the implementation of a class without affecting the rest of your program.
An example of separation would be to have separate FileHandler,
PeerInterface, and TrackerInterface
classes to manipulate files, interface with peers and interface with the
tracker, respectively. If you do not separate your program into functional
units, you will lose points on the Style portion of the grading.
Bencoding
(Pronounced "Bee Encoding")
Bencoding a method of encoding binary data. Tracker responses, and interpeer
communication will be bencoded. Below is how data
types are bencoded according to the BT protocol. [The
following list is taken from http://www.bittorrent.org/protocol.html
]
- Strings are length-prefixed base ten followed by a
colon and the string. They are encoded in UTF-8. For example 4:spam corresponds to 'spam'.
- Integers are represented by an 'i'
followed by the number in ASCII base 10 followed by an 'e'. For example
i3e corresponds to 3 and i-3e corresponds to -3. Integers have no size
limitation. i-0e is invalid. All encodings with a
leading zero, such as i03e, are invalid, other than i0e, which of course
corresponds to 0.
- Lists are encoded as an 'l' followed by their elements
(also bencoded) followed by an 'e'. For example,
l4:spam4:egse corresponds to ['spam', 'eggs'].
- Dictionaries are encoded as a 'd'
followed by a list of alternating keys and their corresponding values
followed by an 'e'. For example, d3:cow3:moo4:spam4:eggse
corresponds to {'cow':'moo', 'spam':'eggs'} and d4:spaml1:a1:bee corresponds to
{'spam': ['a', 'b']}. Keys must be strings and appear in sorted order
(sorted as raw strings, not alphanumerics).
Your client must take the information supplied by
the TorrentFile object and use it to communicate with
the tracker. The tracker's IP address and port number will be given to you by
the TorrentFile object, and your program must then
contact the tracker. Your program will send an HTTP GET request to the tracker
with the following key/value pairs. Note that these are NOT bencoded,
but must be properly escaped [this list is taken from http://www.bittorrent.org/protocol.html
]:
- info_hash - The 20 byte(160-bit) SHA1 hash of the bencoded
form of the info value from the metainfo file.
This value will almost certainly have to be escaped.
- peer_id - A string of length 20
which this downloader uses as its id. Each downloader generates its own id
at random at the start of a new download. This value will almost certainly
have to be escaped.
- ip - An optional parameter
giving the IP (or DNS name) which this peer is at.
- port - The port number this
peer is listening on. Common behavior is for a downloader to try to listen
on port 6881 and if that port is taken try 6882, then 6883, etc. and give
up after 6889.
- uploaded - The total amount
uploaded so far, encoded in base ten ascii.
- downloaded - The total amount
downloaded so far, encoded in base ten ascii.
- left - The number of bytes
this peer still has to downloaded, encoded in base ten ascii.
This key is important - If you do not specify how much you have
left to download, the tracker assumes you are a seed and will not return any
seeds in the peer list.
- event - This is an optional
key which maps to started , completed , or stopped (or
empty , which is the same as not being present. If not present,
this is one of the announcements done at regular intervals. An
announcement using started is sent when a download first begins,
and one using completed is sent when the download is complete. No completed
is sent if the file was complete when started. Downloaders
send an announcement using stopped when they cease downloading.
The response from the tracker
is a bencoded dictionary and contains two keys:
- interval - Maps to the number of
seconds the downloader should wait between regular rerequests.
- peers - Maps to a list of dictionaries corresponding to peers,
each of which contains the keys peer id , ip
, and port , which map to the peer's self-selected ID, IP
address or dns name as a string, and port
number, respectively.
In
addition to what your program did for the last project, your client should also
periodically update its status to the tracker. The update period should be no
less than the interval returned by the tracker. This may
change during execution, so it should be updated each time the tracker is
contacted. In the event that the interval value is excessively large,
you may cap it at 180 seconds.
Handshaking between peers begins with character
nineteen (decimal) followed by the string 'BitTorrent
protocol'. After the fixed headers are 8 reserved bytes which
are set to 0. Next is the 20-byte SHA-1 hash of the bencoded
form of the info value from the metainfo (.torrent)
file. The next 20-bytes are the peer id generated by the client. The info_hash should be the same as sent to the tracker, and
the peer_id is the same as sent to the tracker. If
the info_hash is different between two peers, then
the connection is dropped.
- All integers
are encoded as 4-bytes big-endian. e.g. 1,234 should be encoded as (hex)
00 00 04 d2
- The peer_id should simply be random numbers.
After
the handshake, messages between peers take the form of <length
prefix><message ID><payload> , where length prefix is a 4-byte
big-endian value and message ID is a single decimal character. The payload
depends on the message. Please consult either of the BT-related resources for
detailed information describing these messages. Below is a list of messages
that need to be implemented in the project.
- bitfield: <len=0001+X><id=5><bitfield>
The bitfield message
may only be sent immediately after the handshaking sequence is completed, and
before any other messages are sent. It is optional, and need not be sent if a
client has no pieces.
The bitfield message
is variable length, where X is the length of the bitfield.
The payload is a bitfield representing the pieces
that have been successfully downloaded. The high bit in the first byte
corresponds to piece index 0. Bits that are cleared indicated a missing piece,
and set bits indicate a valid and available piece. Spare bits at the end are
set to zero.
- keep-alive:
<length prefix> is 0. There is no message ID and no payload.
These should be sent around once every 2 minutes to prevent peers from
closing connections. These only need to be sent if no other packets are
sent within a 2-minute interval.
- choke:
<length prefix> is 1 and message ID is 0. There is
no payload.
- unchoke:
<length prefix> is 1 and the message ID is 1. There
is no payload.
- interested:
<length prefix> is 1 and message ID is 2. There is no
payload.
- uninterested:
<length prefix> is 1 and message ID is 3. There is no
payload.
- have:
<length prefix> is 5 and message ID is 4. The payload is a
zero-based index of the piece that has just been downloaded and verified.
- request:
<length prefix> is 13 and message ID is 6. The payload is as
follows:
<index><begin><length>
Where <index> is an integer specifying the zero-based piece
index, <begin> is an integer specifying the zero-based byte
offset within the piece, and <length> is the integer
specifying the requested length. <length>
is typically 2^14 (16384) bytes. A smaller piece should only be used
if the piece length is not divisible by 16384. A peer may close the
connection if a block larger than 2^14 bytes is requested.
- piece:
<length prefix> is 9+X and message ID is 7. The payload is as
follows:
<index><begin><block>
Where <index> is an integer specifying the zero-based piece
index, <begin> is an integer specifying the zero-based byte
offset within the piece, and <block> which is a block of
data, and is a subset of the piece specified by <index> .
Below is an example of what would take place between
two peers setting-up a connection and starting sharing.
- The local
host opens a TCP Socket to the remote peer and sends the handshake packet.
The local host then listens for the remote peer to respond.
- Upon
receiving the handshake packet and verifying the info_hash,
the remote peer responds with a similar handshake packet (except with its peer_id). The remote peer then listens for the local
host to send a bitfield or other packet. The
remote host can send a bitfield packet to the
local host at this time.
- Upon
receiving the handshake and verifying the info_hash,
the local host then (optionally) sends a bitfield
packet which tells the remote peer which pieces it has downloaded and
verified so far.
- If the local
host is interested in what the remote peer has downloaded, then it sends
an interested packet, otherwise it sends an uninterested packet. If the
remote peer is interested in what the local host has downloaded, then it
sends an interested packet, otherwise it sends an uninterested packet.
- When the
local host, or the remote peer, is ready to download/upload to the other,
it will send an unchoke packet.
Please note that clients will, and your client
should, ignore any request packets received while a remote peer is choked. A
client should only upload to another client if the connection is unchoked AND the remote peer is interested. This means that
a remote peer will not reply to the local hosts's
request packets unless you have expressed interest AND the remote peer has sent
an unchoke packet. If the remote peer sends a choke
packet during data transfer, any outstanding requests will be discarded and
unanswered - they should be re-requested after the next unchoke
packet.
- Include your
name as it appears on the roster and your student ID.
- A
high-level written description of how your program works. Specifically,
describe the high-level interactions between the classes. This is a good
way to make sure that your program doesn't have any classes depending on
the implementations of other classes. You may include diagrams or
illustrations if you feel it would aid your explanation or if you have a
hard time explaining how your program works. However, you MUST include
some written description of your program. If you know how to use LaTeX, this should be fairly easy to diagram. Also, OpenOffice.org's Draw program has an Export to PDF
feature that is very simple to use. This section should be around 100
words, or 2 paragraphs.
- A brief
(about 50 words or 1 paragraph) description of each class in the program.
Describe what it does in general, the main public methods it provides, and
any public fields it exposes. Note that classes should generally NOT have
public fields unless they are Static (e.g. constants).
- Optionally,
you may include a section on feedback about the program. Please only
provide constructive/useful comments or insights. What parts of the
project were the most challenging and which were the easiest? Did you have
trouble finding resources for the project? Were any parts confusing to
you? Including feedback will help shape the direction of the next project
as well as help in pointing-out trouble spots to look at during grading.
The project should be submitted through the Computer Science Handin Server.
- Make sure your name is in comments at the top of every
file submitted.
- The "main" starting method should be in a
file called RUBTClient.java.
- The write-up in RTF, HTML or PDF format saved as a writeup.<html/pdf/rtf>. Please do not submit a write-up in any
other formats
- All files should be submitted either as a .zip or .tar
file. This file should be named "<NetID>.<zip/tar>",
where <NetID> is your eden NetID. If you are
submitting as a group, only one person needs to submit, but the file
should be named with both NetIDs, e.g.
"<NetID1>.<NetID2>.<zip/tar>"
- Late submissions will not be accepted.
If you have not completed the project, submit what you have and you will
be graded for partial credit.
- Programming
Style: 15%
- Correctly
interfacing with the tracker (retrieving peer list, periodic updates,
after completing download): 10%
- Correctly
interfacing with the peers (handshaking, maintaining state, keep-alive):
10%
- Correctly
downloading from at least 2 peers simultaneously: 20%
- Correctly
uploading to at least 2 peers simultaneously: 20%
- Correctly
downloading and verifying the file: 15%
- Write-up
10%
It is strongly recommended that you
bookmark or download the Sun Java 1.5.0 API , as well as read the following pages:
Please email
tphan@cs.rutgers.edu concerning any questions or problems with this page.