Post Reply 
 
Thread Rating:
  • 0 Votes - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Data Thinker简述和性能概述 (LQI:DTSUMM)
10-16-2017, 02:10 PM (This post was last modified: 04-19-2018 08:08 PM by rayluk.)
Post: #1
Data Thinker简述和性能概述 (LQI:DTSUMM)
LQI: DTSUMM


目前基于云计算的大数据技术可分为两代。第一代以MapReduce、Hadoop、GFS、HDFS等技术为核心。第二代则主要通过体系结构的革新及内存的合理使用大幅​度提高计算能力,典型的系统包括Spark、RAMCloud等。目前我国的云计算大数据处理技术很大程度上以跟随开源软件和少量仿制为主。特别是谷歌、雅虎的GFS/HDFS和MapReduce/Hadoop技术为国内几乎所有主流公司(包括百度、阿里巴巴、网易等)所使用、仿制和改进,作为其大数据计算能力的基础和核心。但是我们认为这种仿制和追赶的道路最终将​失败,其原因有三:第一,GFS/MapReduce技术本身有严重技术缺陷,克服这一缺陷的技术(如谷歌的Megastore和微软的Dryad)国内公司无能力仿制;第二,美国政府于2010年将Ma​pReduce技术的专利授予谷歌,因此在这一技术体系上建立自己的技术体系有潜在的法律问题,尤其是一旦该技术体系获得商业成功时,有可能会引起无法承受的法律后果;第​三,仿制和追赶的产品其质量、成熟度、应用面无法与主流MapReduce/Hadoop产品相比,因此会陷入长期的技术劣势。对于构建为军队应用做支撑的大规模图数据处理平台,还是必须走自主可控的道路,突破核心的关键技术,掌握核心的知识产权​。
综上所述,新一代大规模图数据处理核心技术,不但要在计算能力和效率上超越MapReduce/Hadoop系统,更重要的是不依赖任何现有开源和版权软件,完全独立自行开发。Data Thinker系统就是基于上述因素开发的完全自主可控的大数据处理软件。
Data Thinker的系统设计解决了大数据系统两个关键问题:存储与计算的耦合、内外存的自动转换和优化。先进的设计使得系统既具备极大数据规模又能快速响应,既能使用大容量​二级存储又能通过内存进行自动优化,既容易编程又可高度并行计算。对比Facebook使用的Hadoop/Hbase和Data Thinker的架构,Data Thinker的设计更利于支持类型更广泛、性能更优异的应用。
Data Thinker的设计实现保障了优异的性能,使得它能在许多应用场景中都取得很好的效果。
同时D-thinker技术由我司自主研发并拥有完全知识产权,该系统整合众多计算机的 CPU、内存及硬盘资源,以经济、高效、可扩展的方式构建高性能、低延时、图灵完备的计算体系,提供对GB-PB量级数据的存储、搜索、挖掘、学习及商业智能处理的能力,其性能较Hadoop高2-70倍、较Spark高1.6-5倍,并且可实现几乎所有大数据算法和应用 。
   
图1-1 D-thinker系统概览


English version of a part
The system design of Data Thinker has solved two key problems in big data systems: coupling of storage and computation and the seamless switching and optimization between main memory and secondary storage. Its advanced design makes the system highly scalable with low latency. It can access the secondary storage with performance optimized with main memory. Compared to Hadoop/HBase, Data Thinker supports a wider range of applications with better performance.

The design of Data Thinker ensures the great efficiency, empowering it to have good performance in many scenarios.

In the mean time, we have full intellectual property rights on the D-thinker technology. The system coordinates CPU, memory and HDDs from multiple computers to build a high-performance, low latency, inexpensive and scalable computing system. The system can do storage, searching, data mining, learning and business intelligence on PBs of data. Its performance is 2-70 times higher then hadoop and 1.6-5 times higher the Spark. And nearly all big data algorithms and applications can be implemented on D-thinker.

------
20180419/rayluk: added some english version
Quote this message in a reply
10-16-2017, 02:24 PM
Post: #2
RE: Data Thinker简述和性能概述 (LQI:DTSUMM)
(10-16-2017 02:10 PM)User_Yang Wrote:  DTSUMM
源: http://tab.d-thinker.org/showthread.php?...http://tab.d-thinker.org/showthread.php?tid=9605&pid=5204

“源”是规则里的专用名词。
看你的帖子描述,此处的“源”应该就是本帖本身吧?该链接只能算是一个“参考链接”。
Find all posts by this user
Quote this message in a reply
10-16-2017, 03:04 PM
Post: #3
RE: Data Thinker简述和性能概述 (LQI:DTSUMM)
(10-16-2017 02:24 PM)YU_Xinjie Wrote:  
(10-16-2017 02:10 PM)User_Yang Wrote:  DTSUMM
源: http://tab.d-thinker.org/showthread.php?...http://tab.d-thinker.org/showthread.php?tid=9605&pid=5204

“源”是规则里的专用名词。
看你的帖子描述,此处的“源”应该就是本帖本身吧?该链接只能算是一个“参考链接”。

此处的“源” ,我指的是链接处的文档。
Quote this message in a reply
10-16-2017, 03:13 PM (This post was last modified: 10-16-2017 03:15 PM by YU_Xinjie.)
Post: #4
RE: Data Thinker简述和性能概述 (LQI:DTSUMM)
(10-16-2017 03:04 PM)User_Yang Wrote:  此处的“源” ,我指的是链接处的文档。

如果你这句话的“源”是指规则里定义的“源”,那么:

A. 如果你“链接处的文档”是指thread 9605的word文档:

那么你是想把本帖文本的“源”定义在word文档里,那
1. 不能用这样含有歧义的链接。
2. word里就应该清楚地写标记清楚LQI。

B. 如果你“链接处的文档”是指thread 9605本身,那你错误理解了“源”的定义。



如果你这句话的“源”是你自创的含义,那么请按照我#2的建议,把“源”改成“参考链接”。



一般建议把“源”定义在帖子里,而不是word文档里,便于讨论和修改。
Find all posts by this user
Quote this message in a reply
03-02-2018, 01:12 PM (This post was last modified: 03-02-2018 01:12 PM by rayluk.)
Post: #5
RE: Data Thinker简述和性能概述 (LQI:DTSUMM)
I am translating the following to english as it is needed in the recruitment leaflet ( http://tab.d-thinker.org/showthread.php?...http://tab.d-thinker.org/showthread.php?tid=9677&pid=6618 )
Quote:Data Thinker的系统设计解决了大数据系统两个关键问题:存储与计算的耦合、内外存的自动转换和优化。先进的设计使得系统既具备极大数据规模又能快速响应,既能使用大容量​​二级存储又能通过内存进行自动优化,既容易编程又可高度并行计算。对比Facebook使用的Hadoop/Hbase和Data Thinker的架构,Data Thinker的设计更利于支持类型更广泛、性能更优异的应用。
Data Thinker的设计实现保障了优异的性能,使得它能在许多应用场景中都取得很好的效果。
同时D-thinker技术由我司自主研发并拥有完全知识产权,该系统整合众多计算机的 CPU、内存及硬盘资源,以经济、高效、可扩展的方式构建高性能、低延时、图灵完备的计算体系,提供对GB-PB量级数据的存储、搜索、挖掘、学习及商业智能处理的能力,其性能较Hadoop高2-70倍、较Spark高1.6-5倍,并且可实现几乎所有大数据算法和应用 。
Find all posts by this user
Quote this message in a reply
03-02-2018, 06:00 PM
Post: #6
RE: Data Thinker简述和性能概述 (LQI:DTSUMM)
RR zma,

the underlined part is something I am not sure with. I think this part can be placed as the same LQI first.

The system design of Data Thinker has solved two key problems in big data systems: coupling between storage and computer as well as the switching and optimization between main memory and secondary storage. Its advanced design made the system highly scalable with short delay. It can access the secondary storage with performance optimized by main memory. When comparing with the Hadoop/Hbase used by Facebook, Data Thinker supports a wider range of applications with better performance.
The design of Data Thinker ensures the good computation power. Empowering it to have good performance in many scenarios.
In the mean time, we have full intellectual property rights on the D-thinker technology. The system coordinates CPU, MEMORY and HDD from multiple computers to build a high-performance, short delay, inexpensive and extensible computing system. The system can do storage, searching, data mining, learning and business process on up to several PG data. Its performance is 2-70 times better then hadoop and 1.6-5 times better the Spark. Nearly all big data algorithms and applications can be implemented on it.
Find all posts by this user
Quote this message in a reply
03-02-2018, 06:07 PM
Post: #7
RE: Data Thinker简述和性能概述 (LQI:DTSUMM)
The system design of Data Thinker has solved two key problems in big data systems: coupling of storage and computation and the seamless switching and optimization between main memory and secondary storage. Its advanced design makes the system highly scalable with low latency. It can access the secondary storage with performance optimized with main memory. Compared to Hadoop/HBase, Data Thinker supports a wider range of applications with better performance.

The design of Data Thinker ensures the great efficiency, empowering it to have good performance in many scenarios.

In the mean time, we have full intellectual property rights on the D-thinker technology. The system coordinates CPU, memory and HDDs from multiple computers to build a high-performance, low latency, inexpensive and scalable computing system. The system can do storage, searching, data mining, learning and business intelligence on PBs of data. Its performance is 2-70 times higher then hadoop and 1.6-5 times higher the Spark. And nearly all big data algorithms and applications can be implemented on D-thinker.
Visit this user's website Find all posts by this user
Quote this message in a reply
Post Reply 


Forum Jump: