Tryy With Java: 8月 2013

2013年8月14日星期三

DataTubine 系統最低需求

http://www.dataturbine.org/content/system-requirements

System Requirements
What Do I Need to Run DataTurbine

DataTurbine is designed to run on any device from an industry-grade server to a low powered smart phone. It is both scalable and portable. Once the minimum requirements are satisfied additional constraints may be imposed by the needs of the specific project.
The Minimum Requirements:

    Java Runtime Environment
        JRE 1.5+ is required for the server
        Sources/Sinks may have different requirements
    Network Capabilities

Example of Compatible Systems

    Personal Computer
        Desktop, Laptop, Netbook
        Window, Linux, Mac
        32-bit, 64-bit
    Server
        Windows, Linux, Solaris, etc...
        32-bit, 64-bit
    Micro-computers
        Gumstix Device
        ARM devices
    Cell phone
        Android Device

DataTubine 讀書筆記3: Sink, Real-Time

http://www.dataturbine.org/content/sink
http://www.dataturbine.org/content/real-time

---------------------------------------------------------------
DataTurbine Sink
Introduction

A DataTurbine Sink (also refereed to as an 'off-ramp') is simply a program that takes data from a DataTurbine Server and utilizes it, for example brings it up in Matlab or Real-time Data Viewer or puts it into a relational database or file for permanent storage.

Just like a source, a sink runs independently from the server as a separate application and uses the network to communicate. It can run on the same machine as the server or on a machine across the world.
The Sink's Perspective

From the sink's point of view it no longer needs to know where the data came from or how it got there. It can query all the sources and channels to find out what is available or specify a single channel via its name and name of its source.

The data is heterogeneous and the sink could access any type of data seamlessly. It makes the decision on how to display and interpret the data via its data type (byte array, 32-bit float, 32-bit int, etc) as well as the MIME Type specified by the sink.

A sink can issue a request to pull data from the server in a timeframe. A sink could also subscribe to a specific set of channels getting data as it becomes available.

Example: For example a sink could get a listing of all the sources available on a server pick only the temperature channels, perform some analysis and based on the result bring up the images for the corresponding channels at significant time indexes
Common Types of Sinks

    Viewer: An application that can be used to access and interact with the streaming data
    Ex: Real-time Data Viewer (RDV), Google Earth, etc...
    Web Server: An application that serves the data as web content for public display
    Ex: Graphs on a public web site
    Analysis: Takes the data and performs some kind of manual or automated analysis
    Ex: Mat lab, R, ESPER, etc..
    Export: Exports the data into a file or set of files for distribution or integration
    Ex: CSV files, Excel, etc...
    Storage: Permanent storage in a database or as a series of files.
    Ex: Storage in a relational database
    Other: Easy to code any kind of sink that utilizes the data

Practical Example (Continued):

Going back to the example used in the source. Imagine a simple meteorological tower that measures temperature and humidity on top of a hill. Nearby is a field station that is also measuring temperature. We put this data into DataTurbine on a laptop at the field station and now want to view it and make sure that it is placed in permanent storage.

    Start a DataTurbine server on the laptop (rbnb.jar)
    Start a source on the laptop reading data from the meteorological tower
    Start a source on the laptop reading data from the field station
    Start a sink to view the data as it is collected in real-time. In this case we will use Real-time Data Viewer (RDV)
    Start a sink to put the data into permanent storage in a MySQL database.

Our laptop would now have five independent lightweight programs running (1 server, 2 sources, 2 sinks). We will probably keep the server, sources, and the permanent storage sink running at all times. But we will start and stop the viewer sink as we need it.

Now we have a very basic but complete deployment running. But we are not sharing the data and not really utilizing the power of a real-time system (Aside from viewing the data as it is collected). Fear not this will be discussed in further sections as we build on our example.

Power of Real-time
DataTurbine as a Real-time Data System

If you read through previous sections you can see some of the benefits of DataTurbine as a "black box" system, separating the sources from the sinks and handling heterogeneous data types in a unified system. However the primary reason to use DataTurbine is the ability to interact with data in real-time or near real-time.

DataTurbine is built around this constant and its limitations for historical data are a direct consequence of its strength and speed at working with streaming real-time data.

In addition to working with live data, DataTurbine can stream archived as if it were live, re-utilizing common data viewers and infrastructure for post-test data analysis and review.
What is Real-Time Data

Real-time data refers to delivering data as soon as it is collected. There is no delay in the timeliness of the information provided. This is in contrast to an archival system that stores data un till a later date.

DataTurbine can handle data sampled millions of times a second or as infrequently as once a century. In practice many uses are somewhere in between with data sampling every second, minute or hour.

As many remote sites can have drastic communication delays and do not require a strict time constraint, it would be more correct to refer to those systems as providing near real-time data but for the sake of simplicity they are often also grouped into the real-time category.

Also note that when we talk about real-time we are focusing on the availability of data not to be confused with real-time computing which focuses on guaranteed response within strict time constraints.
Benefits of Real-time Data

    Interactive:
        Failure:The most direct benefit of real-time data is the ability to respond to factors on the fly. If a sensor goes bad the system registers it immediately and can be fixed (before potentially months of data are ruined).
        Important Event: If an event of importance occurs a team can be dispatched immediately to gather additional samples and observe the occurrence first hand.
        Sampling: With a real-time system its possible to change sampling rates and activate and deactivate sensors based on the data they receive.
        Example: If one sensor detects an important event perhaps the sensors in that region need to increase their sampling rate temporarily or a camera needs to be activated.
    Analysis: There is a lot of analysis that can be performed on real-time data and in certain cases this is actually the more efficient route. Averages, correlations, and mathematical operations can be performed in real-time with ease. The derived data can be put back into DataTurbine and further utilized. The end result is that summary and analytic data is available on the fly giving an overview of the health of the system and the experiment.
    Public Consumption: Real-time also gives added value to the data. Data can be published publicly as it is gathered. The same sensor network that is monitoring an ecosystem for scientific research can display the tides and temperature of the water, the wind speed and direction, even a video feed showing the view of the forest.
    Portable: Streaming data is very portable. Adding destinations or applications is easy and transparent. Since data is contained as tuples (time,value, source) it is easy for any system to accept it and requires significantly less overhead then trying to read from a rigid structure such as a database. Once a streaming system is set up raw data, and automated analysis and quality assurance and quality control are available to any application and destination that the provider specifies the second it is available. Any additional analysis (which could take weeks or months) can then be amended later.
    Funding Compliance: There is an increasing pressures by funding agencies for data providers to publicly publish data in a timely manner. A real-time system can help satisfy that compliance.

Limitations of Real-Time Data

    Not a Replacement: A real-time data system would ideally be an addition not a replacement for an archival system. It should add to a system but makes a poor replacement for operations that are best suited to an archive such as a relational database.
    Data Quality: Data coming directly from sensors will have inherent imperfections which have to be cleaned away before consumption. Unlike an archival system which often just provides the cleanest most annotated data, a real-time system would ideally have multiple data levels of progressively cleaner data.
        Automated Cleaning: Automated QA/QC can be performed on a real-time stream to identify obvious inconsistencies and potentially problematic parts of the data.
        Levels of Assurance: Different applications require a different level of assurance. For example a local weather site could use nearly raw data, while an intricate carbon dioxide absorption experiment would utilize manually cleaned and validated data.
    Different Paradigm: While traditional analysis would still work on archived data, utilizing the real-time aspect of data often requires a different approach then analysis on archived data.

---------------------------------------------------

DataTurbine水槽介紹DataTurbine Sink（也作為一個'off-ramp'）是一個簡單的程序，數據一個DataTurbine服務器，並利用它，例如把它在Matlab或實時數據查看器或把它放到一個關係型數據庫永久存儲或文件。就像一個源，一個接收器獨立運行，從服務器作為一個單獨的應用程序，並且使用網絡進行通信。它可以運行在同一台機器作為服務器或世界各地的一台機器上。水槽的角度從水槽的角度來看，它不再需要知道從哪裡傳來的數據或如何到達那裡。它可以查詢找出什麼是可用的，或者指定一個單一的通道，通過它的名字和其來源名稱來源和渠道。數據是異構和水槽可以無縫地訪問任何類型的數據。這使得決定如何顯示和解釋數據通過它的數據類型（字節數組，32位浮點，32位的int等），以及指定的MIME類型水槽。一個接收器可以發出請求，將數據從服務器的時間表。一個接收器還可以訂閱到一組特定的渠道獲取數據，因為它成為可用。例：例如，一個接收器可以得到一個上市的所有源服務器上可用的只有溫度的渠道，進行一些分析，並根據結果提出相應通道的圖像顯著的時間索引常見類型的水槽

    查看器中：一個應用程序可以被用於訪問和互動的流數據
    例如：實時數據查看器（RDV），谷歌地球等..
    Web服務器：一個應用程序，提供Web內容的數據作為公開展示
    例如：一個公共網站上的圖
    分析：取數據，並執行某種手動的或自動的分析
    例如：墊的實驗室，R，ESPER等。
    出口：出口數據分佈或融合成一個文件或文件集
    例如：CSV文件時，Excel，等等。
    貯藏：永久存儲在數據庫或一系列文件。
    例如：存儲在關係數據庫中
    其他：便於代碼的任何一種接收器，利用數據實例（續）：回去用在源的例子。想像一下，一個簡單的氣象塔，測量溫度和濕度在一個小山頂上。附近是一個場站，這也是測量溫度。我們把這個數據到DataTurbine對場站的一台筆記本電腦，現在要查看它，並確保它被放置在永久存儲。

    啟動一個DataTurbine服務器上的筆記本電腦（rbnb.jar）
    從氣象塔在筆記本電腦上讀取數據，啟動源
    在筆記本電腦上讀取數據，從場站啟動源
    啟動一個接收器來查看數據，因為它是實時採集。在這種情況下，我們將使用實時數據查看器（RDV）
    啟動一個接收器，把數據轉換成永久存儲在MySQL數據庫中。現在，我們的筆記本電腦將有五個獨立的輕量級運行的程序（1個服務器，2個數據源，2個水槽）。我們可能會保持服務器，來源和運行在任何時候都永久存儲片。但是，我們將開始和停止觀眾片，因為我們需要它。現在我們有一個非常基本的，但完整的部署運行。但我們不會共享數據並沒有真正利用一個實時系統的力量（除了查看收集的數據，因為它）。不要害怕，這將是在進一步的章節中討論，因為我們建立我們的例子中。

Power Real-Time作為一個實時數據系統DataTurbine如果你通讀前面的章節中，你可以看到一些好處DataTurbine作為一個“黑盒子”系統，分離源匯和處理在一個統一的系統中的異構數據類型。然而，主要的原因使用DataTurbine是在實時或近實時的數據進行交互的能力。DataTurbine是圍繞這個常數，並在工作流的實時數據，歷史數據有其局限性的直接後果是它的力量和速度。除了工作的實時數據，可以流歸檔DataTurbine就好像它是活的，再利用常見的數據後測試數據的分析和審查的觀眾和基礎設施。什麼是實時數據實時數據是指提供數據，只要它被收集。在提供信息的時效性不存在延遲。這是檔案系統數據未存儲直到日後對比。DataTurbine可以處理數據採樣數百萬次，第二次或很少，因為一旦一個世紀。在實踐中，許多的用途是每一秒，分鐘或小時的數據採樣之間的某處。由於許多遠程站點可以有激烈的通信延遲，且不需要嚴格的時間約束，這將是更正確指這些系統提供近實時的數據，但為簡單起見，他們往往還分為實時時間類。還要注意的是，當我們談論實時我們的重點是專注於嚴格的時間限制內響應保證數據不被混淆與實時計算的可用性。實時數據的優勢

    互動：
        失敗：最直接的好處是實時數據的反應能力上飛的因素。如果傳感器變壞系統寄存器立即可以是固定的（潛在個月的數據破壞之前）。
        重要事件：如果發生的重要事件，立即派出一個團隊可以收集更多的樣本，並觀察發生的第一手資料。
        採樣：一個實時系統，它可能改變採樣率，並根據他們收到的數據的激活和停用傳感器。
        實施例：如果一個傳感器檢測到一個重要的事件，在該區域的傳感器可能需要增加採樣率暫時或相機需要被激活。
    分析：有大量的分析，可以執行實時數據和在某些情況下，這實際上是更有效的途徑。平均，相關性和數學運算，可以進行實時提供方便。導出的數據可以被放回到DataTurbine和進一步利用。最終的結果是，匯總和分析的數據提供給系統健康狀況和實驗的概觀上的蒼蠅。
    公共消費：實時還提供附加價值的數據。數據可以公佈，因為它是聚集。相同的傳感器網絡，監測生態系統的科研可以顯示潮汐和溫度的水，風的速度和方向，甚至是視頻飼料森林景觀。
    便攜式流數據是非常便攜。添加目的地或應用程序簡單和透明。由於數據包含元組（時間，價值，源）系統接受它很容易，需要明顯較少的開銷，然後試圖讀取從剛性結構（如數據庫）。一旦流系統設置原始數據，並自動分析和質量保證和質量控制，提供指定第二個它是可用的任何應用程序和目標。任何額外的分析（這可能需要數週或數月），然後可以修改。
    資助標準：資助機構的數據提供商，及時公開發布的數據，是一個越來越大的壓力。一個實時系統可以幫助滿足合規性。實時數據的局限性

    不能代替：一個實時數據系統，將理想的歸檔系統的補充而不是替代。它應該添加到系統中，但使一個貧窮的替代品是最適合，如關係數據庫中的歸檔操作。
    數據質量：直接來自傳感器的數據，將有消費前要清洗的固有缺陷。不同的檔案系統，這往往只是提供了最乾淨的標註數據，理想情況下，一個實時的系統將有多個數據水平的逐步清晰的數據。
        自動清洗：可以進行自動化的QA / QC找出明顯的不一致和潛在問題的部分數據的實時流。
        層次的保障：不同的應用需要不同程度的保證。例如，一個當地的天氣網站可以使用接近原始數據，而一個複雜的吸收二氧化碳的實驗將利用手動清洗和驗證數據。
    不同的模式：雖然傳統的分析仍然對歸檔數據的工作，利用實時數據方面往往需要不同的方法分析歸檔數據。

DataTubine 讀書筆記2 : Server, Source

出處
http://www.dataturbine.org/content/server
http://www.dataturbine.org/content/source

官網有圖比較好懂~~

大概就是要知道資料定義時
Name: 名稱...
Target Server: 總要知道server在那吧~囧a
Channel: 可以有多個~ 我這專案主要是讀資料~ 找對channel才對QQ
Cache Size: 因為不是建server給人用, 就還好

---------------------------------------------------

DataTurbine Server

What is RBNB?

The DataTurbine server is contained in rbnb.jar it is the core of DataTurbine and is used as a center point that applications (sources and sinks) interface with.

It is not a replacement for a database and is designed for speed. Because of this although it is possible to store years of data in a DataTurbine server, for most applications data is also archived in permanent storage in a database.

The acronym RBNB stands for Ring Buffered Network Bus, and is the technology inside the DataTurbine server. To data sources (applications that generate data), it acts as a middleware ring buffer which stores heterogeneous time-sequenced data. To data sinks (applications that read data), it acts as a consolidated repository of data. Key to RBNB scalability is each source (ring buffer) and sink (network bus connection) act independently of each other.
The DataTurbine Server

It can be thought of as a series of rotating disks (a ring buffer) with new data being added and old data removed when the archive becomes full.

Source (applications that add data to the server) will specify their own archive sizes and cache size. Each source can specify its own archive and cache sizes.

The archives size specified by a source determines the size of it's ring buffer and how much data is buffered before it is discarded. DataTurbine can use as much storage as a systems physical drives allows. A good value depend on the storage space of the device the server is running on and the needs of the project.

The cache size specified by a source determines how much of it's ring buffer is contained in memory (RAM). This is again determined by the nature of the system is running on and the applications. A cache can increase speed, but a bigger cache does not necessarily mean a faster system.

This approach allows applications to interact with data in near real-time. Sinks can read data as it is collected and display it online, in Matlab, or other applications. Sinks can also interact with the data and move it into permanent storage.

The server is agnostic to the data it receives and can accept heterogeneous data types including numerical, video, audio, text, or any other digital medium. It acts as a black box with sources adding in data and sinks reading the data out.

The server expects an accurate timestamp for every data point. One limitation of this is that data cannot be back-loaded into the server. That means that data has to be entered sequentially and so for a give source each data point has to have a timestamp that is greater than the previous timestamp on record.
What is a Frame

Sizes are specified in the number of frames. Each time a source application flushes data it adds one frame. A frame is a data structure of one or more channels, with 1 or more data objects per channel. Thus the size of a frame may be small to large, and may vary frame to frame.

DataTurbine Source
Introduction

A DataTurbine Source (also refereed to as an 'on-ramp') is a program that takes data from a target (for example a sensor or file) and puts it into a DataTurbine server.

A source runs independently from the server as a separate application and uses the network to communicate. It can run on the same machine as the server or across the world.

Each source can contain multiple channels each with its own data type. It controls its own server-side memory and hard drive space allocation
Anatomy of a Source

    Name: Identifies the source
    Target Server: The server the source sends data to
    Cache Size: Each source specifies how many frames of data to buffer for itself in the server's memory (RAM).
    Archive Size: Each source specifies how many frames of data to store on the server's hard drive.
    Multiple Channels: Data stream containing one type of data (for example numeric or video).

    In turn each channel consists of a :
        Name: Identifies the specific channel
        MIME Type: Media type the applications can use to make decisions about the data they are receiving. Each channel can only store one type of data.
        Data: Series of data points consisting of a time and value

Practical Example:

For example let us imagine a simple meteorological tower that measures temperature and humidity on top of a hill. Nearby is a field station that is also measuring temperature. We want to get this data into DataTurbine on a laptop at the field station. Lets go over what we would do.

Assuming we have custom sources for our instrumentation.

    Start a DataTurbine Server on the laptop (rbnb.jar)
    Start a source on the laptop targeting our server that reads data from the meteorological tower and puts it into DataTurbine. This source would contain two channels (temperature & humidity)
    Start another source on the laptop that reads from the local field station and writes puts the data into the server. This source would contain a single channel (temperature)

Our laptop would now have three independent lightweight programs running. And now that we have the data in the server we now need a way to access it. This is discussed in the next section.
PlugIns

PlugIns are a specialized on-request type of data source. Whereas regular sources proactively push data to the DT server, plugins reply with data in response to sink requests forwarded to them via their plugin server connection.
Things to Keep in Mind

Each channel can only have one data type associated with it. Also remember that data cannot be back-loaded into the server. For each channel data has to be entered sequentially and so for a given channel each data point has to have a timestamp that is greater than the previous timestamp on record.

----------------------------------------------------
DataTurbine 伺服器
什麼是RBNB？在rbnb.jar它的核心是DataTurbine被用作一個中心點，應用程序（源Source和匯Sink）接口與的DataTurbine服務器。它不是一個數據庫的一個替代品，是專為速度。正因為如此，雖然它是可以存儲年在一個DataTurbine服務器的數據，對於大多數應用程序數據的存檔在永久存儲在數據庫中。縮寫RBNB代表環緩衝網絡總線技術裡面DataTurbine服務器的。到數據源（生成數據的應用程序），它作為一個中間件異質性的時間序列數據存儲的環形緩衝器。數據接收器（讀取數據的應用程序），它作為一個綜合的數據存儲庫。擴展性RBNB的關鍵是每個源（環形緩衝區）和接收器（連接網絡總線）彼此獨立行事。的DataTurbine服務器它可以被認為是一系列的旋轉圓盤（環形緩衝器）被添加的新數據和舊數據存檔已滿時，將刪除。源（數據添加到服務器的應用程序），將指定自己的存檔大小和緩存大小。每個源可以指定自己的存檔和高速緩存大小。源所指定的檔案大小的大小決定了它的環形緩衝區和緩衝多少數據被丟棄之前。 DataTurbine可以使用盡可能多的存儲系統的物理驅動器允許。一個很好的價值依賴於服務器上運行的設備的存儲空間和項目的需求。由源指定高速緩存的大小決定多少被包含在它的環形緩衝存儲器（RAM）。這又是由該系統的性質上運行的應用程序。高速緩存可以提高速度，但並不一定意味著一個更大的高速緩存更快的系統。這種方法允許應用程序在近實時的數據交互。水槽，因為它可以讀取數據被收集並在網上顯示，在Matlab或其他應用程序。水槽也可以與數據交互，並把它移動到永久存儲。該服務器是不可知的數據接收和可以接受的異構數據類型，包括數字，視頻，音頻，文字，或任何其他數字媒體。它作為一個黑盒子的來源，添加數據和匯讀取數據出來。服務器期望接收的每一個數據點的準確時間戳。這方面的一個限制是，數據不能被加載到服務器。這意味著，數據必須依次輸入，所以每個數據點都必須有大於以前的時間戳記錄的時間戳給源。什麼是框架尺寸的指定的幀的數目。每次刷新數據源應用程序，它增加了一個框架。車架的數據結構的一個或多個通道，每個通道的1個或多個數據對象。因此，一幀的尺寸可從小到大，並可能會發生變化幀到幀。

DataTurbine來源介紹DataTurbine來源（也叫做'on-ramp'）是一個程序，需要從一個目標（例如傳感器或文件）的數據，並把它成DataTurbine服務器。一位知情人士獨立作為一個單獨的應用程序從服務器上運行，並使用網絡溝通。它可以運行在同一台機器作為服務器或世界各地。每個源可包含多個通道，每個通道有它自己的數據類型。它控制它自己的服務器端的內存和硬盤空間的分配解剖的來源

    名稱：標識源
    目標服務器：該服務器將數據發送到源
    高速緩存大小：每個源指定多少幀的數據緩衝本身在服務器的內存（RAM）。
    存檔大小：每個源指定多少幀的數據存儲在服務器的硬盤驅動器。
    多通道數據流包含一種類型的數據（例如，數字或視頻）。

    反過來，每個通道都包括一個：
        名稱：標識特定通道
        MIME類型：媒體類型的應用程序可以使用他們所接收的數據作出決策。每個通道都可以只存儲一種類型的數據。
        數據系列的數據點，包括時間和價值實際的例子：例如，讓我們想像一個簡單測量溫度和濕度的氣象塔，在一個小山頂上。附近是一個場站，這也是測量溫度。我們想要得到這個數據DataTurbine在筆記本電腦上的場站。讓我們走了過來，我們會做什麼。假設我們有我國儀器儀表的自定義來源。

    啟動一個DataTurbine服務器上的筆記本電腦（rbnb.jar）
    啟動筆記本電腦上的針對我們的服務器中讀取數據的氣象塔，並把它放到DataTurbine源。此源將包含兩個通道（溫度和濕度）
    局部場站的筆記本電腦，讀取和寫入將數據放入服務器上啟動另一個來源。此源將包含一個單通道（溫度）現在，我們的筆記本電腦將有三個獨立運行的輕量級程序。現在我們已經在服務器中的數據，我們現在需要一種方法來訪問它。這將在下一節討論。插件插件是一本專門關於請求類型的數據源。鑑於常規渠道主動推送數據的DT服務器，插件回复響應的數據，通過他們的插件服務器連接請求轉發給它們的下沉。要記住的事情每個通道只能有一個與它相關聯的數據類型。還記得數據不能被備份到服務器加載。對於每個通道數據已被依次輸入，因此對於一個給定的信道，每個數據點具有大於先前已記錄的時間戳的時間戳是。

DataTubine 讀書筆記1

依據官網
http://www.dataturbine.org/content/documentation

和強大的google翻譯...這個沒有wiki..QQ

接下來挑幾項~ 個人覺得是重點的做記錄和google翻譯

DataTurbine 是即時的資料串流, 角色應該是定位在middle ware上, 其主要用Java的技術, 而且是一個open source project~~

這個project 本身包含Server與UI的部分...
Server就是處理資料串流, 傳遞/中繼
UI部分有提供另一個套件 rdv...是一個可執行的jar...

---------------------------------------

Introduction

DataTurbine is a robust real-time streaming data engine that lets you quickly stream live data from experiments, labs, web cams and even Java enabled cell phones. It acts as a "black box" to which applications and devices send and receive data. Think of it as express delivery for your data, be it numbers, video, sound or text.

DataTurbine is a buffered middleware, not simply a publish/subscribe system. It can an receive data from various sources (experiments, web cams, etc) and send data to various sinks. It has "TiVO" like functionality that lets applications pause and rewind live streaming data

DataTurbine is open source and free. There is also an active developer and user community that continues to evolve the software and assist in application development. This guide is designed as a first step to learning and deploying DataTurbine.
Why Use Data Turbine

    Extendable: It is a free Open Source project with an extensive well documented API.
    Scalable: It uses a hierarchical design that allows a network structure that grows with the requirements of your application
    Portable: DataTurbine runs on devices ranging from phones & buoys to multicore servers.
    Dependable Using a Ring Buffered Network Bus, it provides tunable persistent storage at key network nodes to facilitate reliable data transport
    Community There is also an active developer and user community that continues to evolve the software and assist in application development.

Understanding DataTurbine
The Goal

Let’s say you have some data you’re collecting. Could be, say, weather data. Could be load readings from a bridge, pictures from a security camera, GPS-tagged biometrics from a tracked tiger, chlorophyll readings from a lake buoy, or pretty much anything else you can think of. Add in data from another system, and now stir in the requirement of multiple viewers. In other words, you have a system with lots of disparate data that you want to see, share and process.

DataTurbine is an excellent solution. It’s probably even answering needs that you didn’t know you had! Brieﬂy, DataTurbine lets you stream data and see it in real-time. But it also lets you TiVo through old and new data, share it with anyone over the network, do real-time processing of the streams and more.
A Free Open Source Solution

In 2007, DataTurbine was transitioned from commercial to open source under the Apache 2.0 license. All code and documentation are public and available from the project web site. Current DataTurbine related research includes projects sponsored by NSF, NASA, and the Gordon and Betty Moore Foundation.
What DataTurbine Does Best

    Reliable Data Transfer
    Real-Time Data
        Streaming
        Analysis
        Visualization
        Publication
    Cleanly works with heterogeneous data types
    Separates data acquisition (sources) from data utilization (sinks)
    Seamlessly access historical and real time data
    Synchronized access across disparate data channels

What DataTurbine Is Not Good At

    Replacing a database (DataTurbine should be used with a database)
    Out of order data (Data is accepted chronologically)
    Back-loading data

The Parts

DataTurbine consists of one or more servers accepting data from sources and serving them up to sinks. Each component can be located on the same machine or different machines, allowing for flexibility in the deployment.

---------------------------------------google 翻譯

介紹

DataTurbine是一個功能強大的實時數據流引擎，可以讓你快速現場實驗，實驗室，網絡攝像頭，甚至Java功能的手機的數據流。它作為一個“黑盒子”的應用程序和設備發送和接收數據。把它看成是快遞為您的數據，無論是數字，視頻，聲音或文本。

DataTurbine是一個緩衝的中間件，而不是簡單的發布/訂閱系統。它可以接收各種來源的數據（實驗，網絡攝像頭，等等），將數據發送到不同的水槽。它具有“TiVo的”一樣的功能，允許應用程序暫停和倒帶的實時數據流

DataTurbine是開源和免費。還有一個活躍的開發者和用戶社區，繼續發展軟件和協助的應用程序開發。本指南設計，作為第一步學習和部署DataTurbine的。
為什麼要使用數據渦輪

    擴展性：它是一個免費的開源項目，具有廣泛的有據可查的API。
    可擴展性：它採用了分層設計，使網絡結構，隨著您的應用程序的要求
    便攜性：從手機設備上運行DataTurbine＆浮標多核服務器。
    可靠，使用環型緩衝網絡總線，在關鍵的網絡節點，它提供了可調的持久存儲，以方便可靠的數據傳輸
    社區也有一個活躍的開發者和用戶社區，繼續發展軟件和協助的應用程序開發。

了解DataTurbine
目標

比方說，你有你收集一些數據。可以說，氣象數據。可以從橋上負荷讀數，從一個安全攝像頭的照片，GPS標記的生物識別技術從跟踪老虎，一個湖泊浮標的葉綠素讀數，或幾乎任何你能想到的。加在另一個系統中的數據，現在攪拌多個觀眾的要求。換句話說，你有一個系統，有很多不同的數據，你希望看到的，分享和處理。

DataTurbine是一個極好的解決方案。它甚至可能回答需要你不知道你有！簡言之，DataTurbine讓您流數據和實時看到它。但它也可以讓你通過TiVo的舊的和新的數據，它在網絡上的任何人分享，做流和實時處理。
一個免費的開源解決方案

2007年，DataTurbine轉變，從商業開放源碼基於Apache 2.0許可。所有的代碼和文件是公開的，可以從項目網站。電流DataTurbine相關研究項目由美國國家科學基金會，美國航空航天局，戈登和貝蒂·摩爾基金會主辦。

什麼DataTurbine做得最好

    可靠的數據傳輸
    實時數據
        流
        分析
        可視化
        發布
    乾淨與異構數據類型
    分隔數據採集（源）從數據的利用率（匯）
    無縫地訪問歷史和實時數據
    跨越不同的數據通道的同步訪問

什麼DataTurbine不擅長

    更換一個數據庫（數據庫中應使用DataTurbine）
    出於訂單數據（數據被接受的時間順序）
    返回加載數據

零件

DataTurbine由一個或多個服務器接受的數據來源，並為他們提供服務的匯。位於每個組件都可以在同一機器上或不同的機器上，允許靈活部署中。

DataTurbine DT2DB install

前略~ 總之有個介接是要用rbnb接...其實要做之前完全搞不懂這是什麼碗糕XD...

總之~
先到相關的官網找資料吧~__~|||
DataTurbine
http://www.dataturbine.org/

要把資料接進來寫入DB~ 要用另一個套件DT2DB
oss-dataturbine
http://code.google.com/p/oss-dataturbine/
去他們的svn下來就有DT2DB的source...

承辦人有凹了之前有建置過的人來簡單講解~
(總覺得文人阿~ 氣度小的人很多...還好承辦人很認真的想學XD問了好多問題)

所需的系統環境:
OS : windows/linux都可以, 兩種都蠻好裝的(開發是win, 測試是centos)...
DB : postgresql, 因為要把資料寫進DB, 就是要有個DB, 記得先create db/table
JAVA: 個人習慣使用sun jdk 1.6...
jython: 一個可以用java執行py的東東...因為DT2DB是用這個寫的
           http://www.jython.org/downloads.html

改code就是改py, xml之類的~ 就各人各憑本事~囧/

原始的source code做了好多print...因為他一下子就寫爆system out...就只好在那邊一直mark..XD

但是啟動的方式google了好久才找到...不知道是太common還是怎樣一u一a

設定startTime.txt為資料同步的開始時間...他會以此開始時間一直抓資料~ 到結束時間(後來依情況需要就改成系統時間...就可以一直抓新的進來)...開始時間的值為Time float...標準的1970開始算~ 不過他是用"s"~不是ms...

其他沒用啥高級的設定與什麼的..先求會動會寫資料就可以了=_=a...

以下就是純安裝運作~~

0. 裝個OS (centOS 6)
    裝個DB (postgresql 9.1)

1. 安裝java
    貌似只要jre就可以的樣子~ 不過因為原本就是java系統 ...裝個jdk是基本的~~

2. 安裝jython
    使用人家給的是2.2.1版本~ 記得要有GUI才行~ 不能直接ssh解決QQ...
    使用一般user就可以了

   ### install jthon (need GUI/Desktop)
   cd /etc/xxx/rbnb/install
   java -jar jython_installer-2.2.1.jar
   ## use Stander install
   ## install path --> /etc/xxx/jython2.2.1

3. 安裝程式
其實只要file copy就好了...我把全部都放在同一資料夾(因為檔案不多)~
    file list:
    src/
        configReader$py.class
        configReader.py
        DataGenSrc.py
        DBOperator$py.class
        DBOperator.py                          //parser資料寫DB(組sql的地方)
        postgresql-8.4-701.jdbc4.jar     //db用的jdbc driver
        rbnb.jar
        row_xxx.xml                             //設定檔
        row_xxx_test.xml                     //測試用設定檔
        runRbnb.bat                             //win用執行
        runRbnb.sh                              //linux用執行
        SinkClientHelper.py
        SinkClientManager.py              //讀設定檔/連rbnb...main process
        SinkProxy.java
        SinkTest.py
startTime.txt //資料同步的開始時間...
        __init__.py

   ### upload files to home
   ### tar files
   ### copy to workdir
   cd /etc/xxx/
   mkdir rbnb
   cp ~/rbnb.0814.tgz .
   ### unzip
   tar -xzvf rbnb.0814.tgz

4. 執行
     ./runRbnb.sh
     ##sh 內容
cd /etc/xxx/rbnb/src
java -Dpython.home=/etc/xxx/jython2.2.1 -classpath ".:/etc/xxx/jython2.2.1/jython.jar:postgresql-8.4-701.jdbc4.jar" org.python.util.jython SinkClientManager.py row_xxx.xml > aa.log

訂閱：文章 (Atom)

2013年8月14日 星期三