mapreduce程序实例的容错机制是怎么样的

Hadoop、Hive、Hbase、Flume等交流群:(定期清人)、群 (定期清人)。博客无法注册,请联系wyphao.
文章总数:500
浏览总数:3,729,042
评论:3053
分类目录:52 个
注册用户数:589
最后更新:日
欢迎关注微信公共帐号:iteblog_hadoop
、Hive、Hbase、Flume等QQ交流群:(已满),请加入新群:
IT英文电子书免费下载频道上线啦,共收录4300+本IT方面的电子书,欢迎访问
  本文详细地介绍了如何将上的Mapreduce程序转换成的应用程序。有兴趣的可以参考一下:
The key to getting the most out of
is to understand the differences between its RDD API and the original Mapper and Reducer API.
Venerable MapReduce has been Apache Hadoop‘s work-horse computation paradigm since its inception. It is ideal for the kinds of work for which Hadoop was originally designed: large-scale log processing, and batch-oriented ETL (extract-transform-load) operations.
如果想及时了解Spark、Hadoop或者Hbase相关的文章,欢迎关注微信公共帐号:iteblog_hadoop
As Hadoop’s usage has broadened, it has become clear that MapReduce is not the best framework for all computations. Hadoop has made room for alternative architectures by extracting resource management into its own first-class component, YARN. And so, projects like Impala have been able to use new, specialized non-MapReduce architectures to add interactive SQL capability to the platform, for example.
Today, Apache Spark is another such alternative, and is said by many to succeed MapReduce as Hadoop’s general-purpose computation paradigm. But if MapReduce has been so useful, how can it suddenly be replaced? After all, there is still plenty of ETL-like work to be done on Hadoop, even if the platform now has other real-time capabilities as well.
Thankfully, it’s entirely possible to re-implement MapReduce-like computations in Spark. They can be simpler to maintain, and in some cases faster, thanks to Spark’s ability to optimize away spilling to disk. For MapReduce, re-implementation on Spark is a homecoming. Spark, after all, mimics Scala‘s functional programming style and APIs. And the very idea of MapReduce comes from the functional programming language LISP.
Although Spark’s primary abstraction, the RDD (Resilient Distributed Dataset), plainly exposes map() and reduce() operations, these are not the direct analog of Hadoop’s Mapper or Reducer APIs. This is often a stumbling block for developers looking to move Mapper and Reducer classes to Spark equivalents.
Viewed in comparison with classic functional language implementations of map() and reduce() in Scala or Spark, the Mapper and Reducer APIs in Hadoop are actually both more flexible and more complex as a result. These differences may not even be apparent to developers accustomed to MapReduce, but, the following behaviors are specific to Hadoop’s implementation rather than the idea of MapReduce in the abstract:
Mappers and Reducers always use key-value pairs as input and output.
A Reducer reduces values per key only.
A Mapper or Reducer may emit 0, 1 or more key-value pairs for every input.
Mappers and Reducers may emit any arbitrary keys or values, not just subsets or transformations of those in the input.
Mapper and Reducer objects have a lifecycle that spans many map() and reduce() calls. They support a setup() and cleanup() method, which can be used to take actions before or after a batch of records is processed.
This post will briefly demonstrate how to recreate each of these within Spark — and also show that it’s not necessarily desirable to literally translate a Mapper and Reducer!
Key-Value Pairs as Tuples
Let’s say we need to compute the length of each line in a large text input, and report the count of lines by line length. In Hadoop MapReduce, this begins with a Mapper that produces key-value pairs in which the line length is the key, and count of 1 is the value:
public class LineLengthMapper
extends Mapper&LongWritable,Text,IntWritable,IntWritable& {
protected void map(LongWritable lineNumber, Text line, Context context)
throws IOException, InterruptedException {
context.write(new IntWritable(line.getLength()), new IntWritable(1));
It’s worth noting that Mappers and Reducers only operate on key-value pairs. So the input to LineLengthMapper, provided by a TextInputFormat, is actually a pair containing the line as value, with position within the file thrown in as a key, for fun. (It’s rarely used, but, something has to be the key.)
The Spark equivalent is:
lines.map(line =& (line.length, 1))
In Spark, the input is an RDD of Strings only, not of key-value pairs. Spark’s representation of a key-value pair is a Scala tuple, created with the (a,b) syntax shown above. The result of the map() operation above is an RDD of (Int,Int) tuples. When an RDD contains tuples, it gains more methods, such as reduceByKey(), which will be essential to reproducing MapReduce behavior.
Reducer and reduce() versus reduceByKey()
To produce a count of line lengths, it’s necessary to sum the counts per length in a Reducer:
public class LineLengthReducer
extends Reducer&IntWritable,IntWritable,IntWritable,IntWritable& {
protected void reduce(IntWritable length, Iterable&IntWritable& counts, Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable count : counts) {
sum += count.get();
context.write(length, new IntWritable(sum));
The equivalent of the Mapper and Reducer above together is a one-liner in Spark:
val lengthCounts = lines.map(line =& (line.length, 1)).reduceByKey(_ + _)
Spark’s RDD API has a reduce() method, but it will reduce the entire set of key-value pairs to one single value. This is not what Hadoop MapReduce does. Instead, Reducers reduce all values for a key and emit a key along with the reduced value. reduceByKey() is the closer analog. But, that is not even the most direct equivalent in S see groupByKey() below.
It is worth pointing out here that a Reducer’s reduce() method receives a stream of many values, and produces 0, 1 or more results. reduceByKey(), in contrast, accepts a function that turns exactly two values into exactly one — here, a simple addition function that maps two numbers to their sum. This associative function can be used to reduce many values to one for the caller. It is a simpler, narrower API for reducing values by key than what a Reducer exposes.
Mapper and map() versus flatMap()
Now, instead consider counting the occurrences of only words beginning with an uppercase character. For each line of text in the input, a Mapper might emit 0, 1 or many key-value pairs:
public class CountUppercaseMapper
extends Mapper&LongWritable,Text,Text,IntWritable& {
protected void map(LongWritable lineNumber, Text line, Context context)
throws IOException, InterruptedException {
for (String word : line.toString().split(& &)) {
if (Character.isUpperCase(word.charAt(0))) {
context.write(new Text(word), new IntWritable(1));
The equivalent in Spark is:
lines.flatMap(
_.split(& &).filter(word =& Character.isUpperCase(word(0))).map(word =& (word,1))
map() will not suffice here, because map() must produce exactly one output per input, but unlike before, one line needs to yield potentially many outputs. Again, the map() function in Spark is simpler and narrower compared to what the Mapper API supports.
The solution in Spark is to first map each line to an array of output values. The array may be empty, or have many values. Merely map()-ing lines to arrays would produce an RDD of arrays as the result, when the result should be the contents of those arrays. The result needs to be “flattened” afterward, and flatMap() does exactly this. Here, the array of words in the line is filtered and converted into tuples inside the function. In a case like this, it’s flatMap() that’s required to emulate such a Mapper, not map().
groupByKey()
It’s simple to write a Reducer that then adds up the counts for each word, as before. And in Spark, again, reduceByKey() could be used to sum counts per word. But what if for some reason the output has to contain the word in all uppercase, along with a count? In MapReduce, that’s:
public class CountUppercaseReducer
extends Reducer&Text,IntWritable,Text,IntWritable& {
protected void reduce(Text word, Iterable&IntWritable& counts, Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable count : counts) {
sum += count.get();
context.write(new Text(word.toString().toUpperCase()), new IntWritable(sum));
But reduceByKey() by itself doesn’t quite work in Spark, since it preserves the original key. To emulate this in Spark, something even more like the Reducer API is needed. Recall that Reducer’s reduce() method receives a key and Iterable of values, and then emits some transformation of those. groupByKey() and a subsequent map() can achieve this:
... .groupByKey().map { case (word,ones) =& (word.toUpperCase, ones.sum) }
groupByKey() merely collects all values for a key together, and does not apply a reduce function. From there, any transformation can be applied to the key and Iterable of values. Here, the key is transformed to uppercase, and the values are directly summed.
Be careful! groupByKey() works, but also collects all values for a key into memory. If a key is associated to many values, a worker could run out of memory. Although this is the most direct analog of a Reducer, it’s not necessarily the best choice in all cases. For example, Spark could have simply transformed the keys after a call to reduceByKey:
... .reduceByKey(_ + _).map { case (word,total) =& (word.toUpperCase,total) }
It’s better to let Spark manage the reduction rather than ask it to collect all values just for us to manually sum them.
setup() and cleanup()
In MapReduce, a Mapper and Reducer can declare a setup() method, called before any input is processed, to perhaps allocate an expensive resource like a database connection, and a cleanup() method to release the resource:
public class SetupCleanupMapper extends Mapper&LongWritable,Text,Text,IntWritable& {
private Connection dbC
protected void setup(Context context) {
dbConnection = ...;
protected void cleanup(Context context) {
dbConnection.close();
The Spark map() and flatMap() methods only operate on one input at a time though, and provide no means to execute code before or after transforming a batch of values. It looks possible to simply put the setup and cleanup code before and after a call to map() in Spark:
val dbConnection = ...
lines.map(... dbConnection.createStatement(...) ...)
dbConnection.close() // Wrong!
However, this fails for several reasons:
It puts the object dbConnection into the map function’s closure, which requires that it be serializable (for example, by implementing java.io.Serializable). An object like a database connection is generally not serializable.
map() is a transformation, rather than an operation, and is lazily evaluated. The connection can’t be closed immediately here.
Even so, it would only close the connection on the driver, not necessarily freeing resources allocated by serialized copies.
In fact, neither map() nor flatMap() is the closest counterpart to a Mapper in Spark — it’s the important mapPartitions() method. This method does not map just one value to one other value, but rather maps an Iterator of values to an Iterator of other values. It’s like a “bulk map” method. This means that the mapPartitions() function can allocate resources locally at its start, and release them when done mapping many values.
adding cleanup code is harder because it remains difficult to detect when the transformed iterator has been fully evaluated. For example, this does not work:
lines.mapPartitions { valueIterator =&
val dbConnection = ... // OK
val transformedIterator = valueIterator.map(... dbConnection ...)
dbConnection.close() // Still wrong! May not have evaluated iterator
transformedIterator
A more complete formulation (HT Tobias Pfeiffer) is roughly:
lines.mapPartitions { valueIterator =&
if (valueIterator.isEmpty) {
Iterator[...]()
val dbConnection = ...
valueIterator.map { item =&
val transformedItem = ...
if (!valueIterator.hasNext) {
dbConnection.close()
transformedItem
Although decidedly less elegant than previous translations, it can be done.
There is no flatMapPartitions() method. However, the same effect can be achieved by calling mapPartitions(), followed by a call to flatMap(a => a) to flatten.
The equivalent of a Reducer with setup() and cleanup() is just a groupByKey() followed by a mapPartitions() call like the one above. Take note of the caveat about using groupByKey() above, though.
But Wait, There’s More
MapReduce developers will point out that there is yet more to the API that hasn’t been mentioned yet:
MapReduce supports a special type of Reducer, called a Combiner, that can reduce shuffled data size from a Mapper.
It also supports custom partitioning via a Partitioner, and custom grouping for purposes of the Reducer via grouping Comparator.
The Context objects give access to a Counter API for accumulating statistics.
A Reducer always sees keys in sorted order within its lifecycle.
MapReduce has its own Writable serialization scheme.
Mappers and Reducers can emit multiple outputs at once.
MapReduce alone has tens of tuning parameters.
There are ways to implement or port these concepts into Spark, using APIs like the Accumulator, methods like groupBy() and the partitioner argument in various of these methods, Java or Kryo serialization, caching, and more. To keep this post brief, the remainder will be left to a follow-up post.
The concepts in MapReduce haven’t stopped being useful. It just now has a different and potentially more powerful implementation on Hadoop, and in a functional language that better matches its functional roots. Understanding the differences between Spark’s RDD API, and the original Mapper and Reducer APIs, helps developers better understand how all of them truly work and how to use Spark’s counterparts to best advantage.
本文转载自:/blog/2014/09/how-to-translate-from-mapreduce-to-apache-spark/
本博客文章除特别声明,全部都是原创!
尊重原创,转载请注明: 转载自本文链接地址:
Hadoop YARN公平调度(FairScheduler)介绍
Spark和Hadoop优劣
Hadoop yarn任务调度策略介绍
Spark Checkpoint写操作代码分析
Spark RDD缓存代码分析
Spark Task序列化代码分析
Spark社区可能放弃Spark 1.7而直接发布Spark 2.x
Spark分区器HashPartitioner和RangePartitioner代码详解
下面文章您可能感兴趣云计算下MapReduce多组容错机制架构的分析与研究_张治斌_李燕歌-海文库
全站搜索:
您现在的位置:&>&&>&计算机硬件及网络
云计算下MapReduce多组容错机制架构的分析与研究_张治斌_李燕歌
31卷 第1期 2014年1月微电子学与计算机ICROELECTRONICS &COMPUTERMVol.31 No.1014January 2云计算下MaReduce多组容错机制架构的分析与研究p李燕歌张治斌,()河南理工大学计算机科学与技术学院,河南焦作454000在传统的H即在同机柜中的T摘 要:提出了MaReduce多组容错机制,adooMaReduce架构上进行改进,ask-ppp 同时减轻J减低了这样可以缩短发现失效节点的时间,Tracker节点之间增加了多组关系,obTracker节点的负荷,带宽使用率,减少网络拥塞.通过实验证明,aReduce多组容错机制提高了MaReduce的工作效率.Mpp关键词:云计算;多组容错机制MaReduce并行编程模型;p()中图分类号:TP302.8     文献标识码:A     文章编号:000-7180201401-0052-041AnalsisandStudonunderCloudComutinMultile     yypgp  SetsofFaultToleranceStrateArchitectureofMaReduce      gyp ,eZbHANGZhiinLIYan--g  ,),(HenanPoltechnicUniversitJiaozuo454000,ChinaolleeofComuterScienceandTechnoloC        yygpgy,i:TofMaReducemroveHadooAbstracthisaerresentsaMultileSetsofFaultToleranceStrate           pppppppgy ,baddedmultilesetsrelationshiMaReduceframeworketweenTaskTrackernodeinthesamecabinetrou          ppgpp  ,,,,thefailednodeatthesametimereducetheloadofJobTrackernoderelationsitcanshortthetimeoffoundin                g bandwidthutilizationandnetworkconestion.TheexerimentalresultsshowthatMultileSetsofFaultTolerance             gppofMaReduce.StrateofMaReduceimrovedtheefficienc     pppygy  ;m;M:modelultilesetsoffaulttolerancestrateaReducearallelroramminKewordscloudcomutin        pgypgpppggy  1 引言随着信息化程度的日益剧增,信息量成指数级增长,各大商家需要从PB级甚至是EB级的海量数据中挖掘出有效信息,伴随着这些问题云计算产生了,但是云计算只是一种思维模式,要想真正地解决这些问题就必须处理好支持云计算后台的云计算数1]据中心[.Goole公司提出来MaReduce编程模gp型简单易用、并行化程度高、系统易扩张等特点吸引[]了很多使用者,但是在MaReduce并行编程模型2p集群系统中使用普通PC代替价格昂贵的服务器,容错问题的解决首先是JobTracker节点需要askTracker节点周期性发送的心跳消息获得节点T的工作状态,其次对出现故障的节点任务进行再执行或者备份启动,这一切的前提都是发现失效节点,如何快速准确地发现失效节点是容错技术中需要解决的问题.]在文献[中提出的主动容错技术中虽然很好4地预知了节点失效但是复杂的预知策略加重了Ma-在文献[中提出的自适应超Reduce的运行负担;5]p时时间算法能缩短发现节点失效的时间,但是由于估算作业运算时间不准确造成超期时间设置的不精确.这样从很大程度上减低了运行成本,然而人为操作错误或硬件故障等造成的机器失效现象时常发生,并且随着集群规模的扩大,这些问题会日益增加,所[]传统以MaReduce的容错技术成为研究的热点3.p收稿日期:2013-02-18;修回日期:2013-04-27)基金项目:河南省软件科学研究(102400450064 第1期张治斌,等:云计算下MaReduce多组容错机制架构的分析与研究p53无论是长作业还是Tracker节点之间传输时间长,短作业都能提前发现失效节点;第二,减少网络拥塞,因为JobTracker和TaskTracker节点的通信必须通过交换机,是高带宽,而同机柜中的各个Task-在多Tracker节点在局域网中进行通信是低带宽,组容错机制中通过低带宽来监测节点失效节约了带宽资源,从根本上减少网络拥塞;第三,节点失效的监测工作部分交给J减少了JobTracker来完成,ob-降低了因为控制节点任务Tracker节点的负载量,繁重带来的性能瓶颈的可能性.只有TaskTracker节点空闲时才主动执行多组监测的任务,不会影响工作节点计算的主要任务,既保证了数据计算的独立性,又把节点空闲的时间利并且多组监用起来去分担JobTracker节点的任务,测的过程是在局域网中进行的,不经过交换机,节约了带宽,降低了网络拥塞的可能.()请求申请2请求申请在多组容错中起到承上启下的作用.askTracker节点周期性向JobTracker节点发送T心跳消息告诉自己的工作状态,空闲TaskTracker节点会把监测到本机柜中失效的TaskTracker节点信息也包含在给JobTracker节点发送的心跳消息中,obTracker节点收到消息后查找任务分配表,J找出失效节点的任务是什么和任务执行到哪一步.因为对于失效节点正在执行Maeduce任p任务和R务的处理方法是不同的,如果执行的是map工作,一旦失效则结果因为map的输出是在本地存储的,就无法得到,那么就同意请求者的申请,准备进行任务迁移;如果执行的是r因为reduce工作,educe的结果是全局量,节点在完成后失效并不影响最终结果,所以就发送消息给申请者已知晓该节点失效的消息,不需要进行任务迁移.()任务迁移3任务迁移技术是多组容错机制中的关键技术,根据前两步的判断,对失效节点进行任务迁移,任务迁移技术可保证对用户透明.另外,obTracker向J从失效节点获取中间结果的节点发送通知,远程获取结果的地址已经改变,并把新地址发送给它们.以上三个步骤在MaReduce多组容错技术结p构关系如图1所示,obTracker为客户端和工作节J点的桥梁,在整个MaReduce工作中起到神经中枢p的作用,它主要工作任务是为客户端完成请求和告知计算结果的位置信息,为分配任务并对其工作状态进行实时监测.TaskTracker节点主要是完成是计算的核心,在空闲之aeduce工作,Mp工作和R余主动触发多组监测和响应任务节点批准的任务迁移.2 多组容错机制架构的分析[]由一个提供MaReduce架构6采用主从架构,paster服务JobTracker节点和多个提供Salve服M务TaskTracker节点组成.JobTracker节点相当于负责对TaReduce的神经中枢,askTracker节点Mp分配任务,同时还通过心跳消息监测TaskTracker节点的工作状态.TaskTracker节点负责执行map任务和reduce任务.1 需求分析2.aReduce并行模型的多组容错机制在设计Mp时,除了要继承集群的主从架构和原有系统的功能外,还要兼顾T引入多组askTracker节点的扩展性、机制带来的系统管理复杂等问题,()继承原有系统各个TaskTracker节点计算1的独立性以及高度的并行处理能力;()改变传统容错方案中再执行和备份的方法,2从如何更快更准确的感知失效节点入手;()解决传统容错技术中的JobTracker节点任3务量繁重的问题;()解决在同机柜中两个以上TaskTracker节4点竞争失效节点未完成任务问题.2 实现方案2.()多组监测1与传统的MaReduce容错机制最大的不同点p在于各个计算节点并不是孤立存在的,每个机柜中的T它们可以通过askTracker节点存在多组关系,局域网发送询问消息来监测彼此工作状态.多组监例测行为是在TaskTracker节点空闲时主动执行,如当机柜1中的TaskTrackera节点完成了Job- 它会通过局域网向同机柜Tracker分配的任务后,其他的TaskTracker节点发送消息询问工作状态,一旦发现某个节点失效,监测工作完成.3 MaReduce多组关系p3.1 多组关系建立在现实的网络环境中,若干个TaskTracker放在一个机柜里组成一个局域网,它们彼此之间通信不需要通过交换机,obTracker节点与各个机柜之J间的通信都必须通过交换机.MaReduce多组架构p45微电子学与计算机014
年2节点.第一种现象,解决方案是择优选择,首先从obTracker节点那里得知失效节点没有完成的任J务执行需要相关的数据块存放地址,根据数据本地性原则,如果该地址正好是申请任务迁移的空闲节点其中一个,那么JobTracker节点就同意那个节点图1 MaReduce多组容错技术结构关系图p进行任务迁移,拒绝其他节点的请求.如果没有一个节点符合,那么就对申请的所有空闲节点进行计算效率比较,节点的计算效率主要是由磁盘读取速率、因此cu执行效率以及网络带宽三个因素制约的,p定义下面公式作为节点计算能力的值:如图2所示
.pnode=α×λ×dcuiskp+β×+mVdmVcaxaxuiskpVbandwith.mVbaxandwith式中,VdVcVbu,isk,andwith分别指的是节点的磁盘读取p速率,Vdaxcu执行效率,网络带宽速率,mpisk,图2 MaReduce多组架构图p在多组容错机制中最重要的是在同机柜的所有节点之间建立多组关系,已经完成任务的空闲节点担当机柜的管理者角色,利用多组关系向本机柜中的其他节点发送询问命令询问工作状态,多组关系只能在本机柜中有效,是不能跨越机柜的.在每个节点中都有存放记录本机柜其他节点工作状态的多组报告表,这个表中的每条记录都是节点在空闲时经过询问和应答消息方式一条一条添加的,而且这些记录是有期限的,每次TaskTracker节点向Job-这样以免重复Tracker节点申请过之后都会清空,的申请加重节点负荷.MaReduce多组关系如图3p所示
.mVcVbmaxaxu,andwith分别指的是本机柜中最快的磁p盘读取速率,最快的c最快的网络带宽u执行效率,p速率,α,λ是三个不同的参数指标,α>0,λβ,β>0,且α+VdVcVbisk,andwith是动态变化u,>0,pβ+λ=1,的.第二种现象解决办法是JobTracker节点从其他机柜中找出空闲节点,首先,优先选择数据本地性,任务需要的数据块存储在本节点上;其次,选择数据本机柜性,任务需要的数据块存储在同机柜的其他节点上;再次,以上两者都不满足的话,就根据现象一中节点计算效率比较择优选择.4 实验与结果分析搭建实验平台,我们采用H安adoop开源平台,实验配置:装的是H整个实验平台由adoo0.20.1.p 具DataNode8台,C机组成,ameNode1台,9台PN  图3 MaReduce多组关系图p体如表1所示.表1 实验平台搭建表主机名称Host1 ost2H …ost9H IP218.198.241.181218.198.241.182…218.198.241.189功能JobTracker节点TaskTracker节点…TaskTracker节点3.2 多组关系带来的资源竞争由于多组关系的存在,一个失效节点可能被同机柜中多个空闲节点监测到,也有可能多个节点失效被同一个空闲节点监测到,因此会导致了两种竞争资源的现象:多个空闲节点竞争获得一个失效节点的任务和多个失效节点的任务竞争一个空闲5放在机柜1中,9放在机ost2ost6H  其中H--柜2中,本实验采用U学校buntu10.10操作系统,校园网,使用Mclise7.5编程平台的实验环境.Epy[]使用HadooGridMix7自带的代表性作业p vaSort作为实验测试基准,vaSort作业包括10aajj个M作业大小为a5个Reduce任务个数,1p任务,如图6所示.在这种情况两种技术作业完成平均的,多组容错技术比传统运行时间约是119s和149s具体实验结果如图6技术提前足足30s完成任务,所示
.我们通过几组实验数据来分析:5M.4()在没有节点失效情况下,aReduce多组1Mp容错机制和传统容错运行原理相同,在整个作业运行过程中没有节点失效的情况下,运行8次分别得到的作业完成时间,具体实验结果如图4所示,得出所有节点都正常工作时作业运行完成的平均时间为3s
.4图6 两个节点失效两种容错机制运行时间对比图从上面所有的实验可以看出多组容错技术提高了M并且随着集群规模的增aReduce的工作效率,p大,失效机器的增多,增效会更加明显.5 结束语制针对传统容错技术再执行和备份方案的前提进行补充,提出建立多组关系可以更快更准确地发现失图4 没有节点失效情况下作业运行时间()机柜1中一个节点失效情况下通过8次实2验得多组容错和传统错容错中作业运行时间对比.多组容错和传统容错技术中作业平均运行时间分别,多组容错技术比传统技术提前6s约是71s和77s完成任务,在大规模的作业中工作效率高的优势将更加明显,具体实验结果如图5所示
.效节点,通过实验对MaReduce多组容错机制的性p能进行测试,结果表明,能够减少发现失效节点的时间,提高MaReduce的工作效率.paReduce多组容错机制相对于传统的容错Mp技术没有从集群全局考虑数据本地性造成机柜失效机器极少的情况下可能选择任务迁移的机器不是最理想的,但是在任务分配初期就考虑数据本地性的影响使这种情况发生的概率极低,对多组容错机制的整体优势影响不大.在为失效节点启动任务迁移时并没有考虑到负载均衡问题,将来可以考虑负载均衡的影响,进一步提升系统性能.参考文献:][]现代计李铭.云计算下的海量数据挖掘研究[J.1 王鄂,图5 一个节点失效两种容错机制运行时间对比图但是从图5中可以看出,在第3次实验时传统容错机制要比多组容错机制花费的时间要短,这是因为可能存在数据本地性的原因,在同机柜中不存在数据本地性的空闲节点,可是在别的机柜中存在,在这种情况下传统的容错机制可能就更占优势,但是在刚开始作业运行开始时任务调度中就考虑到数据本地性,所以这样情况发生的概率太低对多组容错机制的整体优势影响不大.()机柜1中两个节点失效情况下通过8次实3验得出多组容错和传统错容错中作业运行时间对比():算机:专业版,2200911225.-[],:2eanJGhemawantS.MaReducesimlieddataro D    -pppessinonlareclusters[J].Communicationsofthec    gg ():ACM,2008,55113.[],3oSY,HoQueIChkoB,etal.Makincloudinter K       -g [//mtediatedatafaultolerantC]SOCC2010.Princeton-   ,Universit2010.y[]董国良.主动容错技术在集群系统中的研究与4 付金辉,]():应用[6J.高性能计算技术,20125645.-(下转第59页) 第1期李文泽,等:一种基于粒子群的实际业务流预测算法95mintoenhancehsicallaersecuritusinrelas    gpyyygy   [],J.IEEETransactionsonSinalProcessin2011,    gg():5931317322.1-[]等.小时间尺度网络流量混2 温祥西,孟相如,马志强,]:沌性分析及趋势预测[J.电子学报,2012,40(8)1609616.1-[]等.基于随机网络的多步网3 胡治国,张大陆,侯翠平,:络时延预测模型[009,36(7)J].计算机科学,257.88-[],4iSYinQ,GuoP.Ahierarchicalmixtureosfsoftware L       ]reliabilitmodelforrediction[J.AliedMathematics   yppp ,,():andComutation2007185211120130. -p[]5arsiH,GobalF.Artificialneuralnetworksimulator F     []forsuercaacitorerformanceredictionJ.Comuta   -pppp,():ionalMaterialsScience2007,39367883.t6  -[]6 单伟,何群.基于非线性时间序列的预测模型检验与]():优化的研究[电子学报,J.008,36122485489.22-[]等.基于粒子群优化的虚拟网络7 程祥,张忠宝,苏森,]():映射算法[J.电子学报,2011,39102240244.2-[]罗辛,王高丽,等.基于F8ARIMA模型的流量抽 潘乔,]():样测量方法[J.计算机工程,28.010,36157-[]9SR-LSSVM的网络流量预测 陈卫民,陈志刚.基于P():[]012,39724.29J.计算机科学,9-作者简介:(,硕士,讲师.研究方向为网络优化.李文泽 男,1979-)(,盛光磊 男,硕士,讲师.研究方向为网络安全.1981-)(上接第55页)[]上云环境下M5aReduce容错技术的研究[D]. 朱浩.p海:上海交通大学,2012.[]何宏,张福临.M6aReduce框架与调度容错机 王晋川,p]():制研究[9J.中国储运,201012901.-://[/7] CridMixntroduction[EBOL].htthen igp D.G///.drbmareducehadooonxichen.oridmixenchmark--gppggg作者简介:(,教授.研究方向为云计算、分布式计张治斌 男,1953-)算等.(,李燕歌 女,硕士研究生.研究方向为云计算、1987-)aReduce并行编程模型.Mp
上一篇: 下一篇:
All rights reserved Powered by
copyright &copyright 。文档资料库内容来自网络,如有侵犯请联系客服。}

我要回帖

更多关于 mapreduce程序 的文章

更多推荐

版权声明:文章内容来源于网络,版权归原作者所有,如有侵权请点击这里与我们联系,我们将及时删除。

点击添加站长微信