tableau连接hortonworks 安装Hadoop Hive出错

on page 1226
on page 1231
on page 1234
Statistical File
on page 1235
Other Files
on page 1237 (such as Tableau .tde, .tds, .twbx)
Tableau Server
on page 1238
Actian Matrix
on page 1240
Actian Vectorwise
on page 1242
Amazon Aurora
on page 1245
Amazon EMR
on page 1248
Amazon Redshift
on page 1250
Aster Database
on page 1253
Cloudera Hadoop
on page 1256
DataStax Enterprise
on page 1259
on page 1262
on page 1264
Google Analytics
on page 1267
Google BigQuery
on page 1271
Google Cloud SQL
on page 1274
Hortonworks Hadoop Hive
on page 1277
HP Vertica
on page 1280
IBM BigInsights
on page 1283
on page 1286
IBM PDA (Netezza)
on page 1288
MapR Hadoop Hive
on page 1290
on page 1293
Microsoft Analysis Services
on page 1295
Microsoft PowerPivot
on page 1297
Microsoft SQL Server
on page 1298
on page 1302
on page 1304
on page 1306
on page 1308
Oracle Essbase
on page 1311
Pivotal Greenplum Database
on page 1314
on page 1317
Progress OpenEdge
on page 1319
on page 1321
on page 1325
SAP NetWeaver Business Warehouse
on page 1328
SAP Sybase ASE
on page 1331
SAP Sybase IQ
on page 1334
on page 1336
on page 1339
on page 1342
on page 1343
Teradata OLAP Connector
on page 1349
Web Data Connector
on page 1351
Other Databases (ODBC)
on page 1354
由于在该文档上看到它支持对spark sql的对接,于是从其官网上找到相应的spark sql插件,安装,并进行连接,果然可以实现。(分析下其技术原理,它主要采用hive server2的方式来实现)
* 以上用户言论只代表其个人观点,不代表CSDN网站的观点或立场
(2)(8)(12)(11)(5)(6)(16)(6)(21)On ‘Select Stack’ page under ‘Advanced Repository Options’, I checked only ‘redhat6′ which shows ‘400:Bad request’ for
Then I checked ‘Skip Repository Base URL validation’ and proceeded.
Then I added the hostnames and the id_rsa file(of the host where Ambari is running and will also be used as NN) and clicked on next.
3.Three hosts(non-Ambari) failed earlier than the other one, following is the log for one of those
Creating target directory…
Command start time
Connection to l1033lab.sss. closed.
SSH command execution finished
host=l1033lab.sss., exitcode=0
Command end time
Copying common functions script…
Command start time
scp /usr/lib/python2.6/site-packages/ambari_commons
host=l1033lab.sss., exitcode=0
Command end time
Copying OS type check script…
Command start time
scp /usr/lib/python2.6/site-packages/ambari_server/
host=l1033lab.sss., exitcode=0
Command end time
Running OS type check…
Command start time
Cluster primary/cluster OS type is redhat6 and local/current OS type is redhat6
Connection to l1033lab.sss. closed.
SSH command execution finished
host=l1033lab.sss., exitcode=0
Command end time
Checking ‘sudo’ package on remote host…
Command start time
Connection to l1033lab.sss. closed.
SSH command execution finished
host=l1033lab.sss., exitcode=0
Command end time
Copying repo file to ‘tmp’ folder…
Command start time
scp /etc/yum.repos.d/ambari.repo
host=l1033lab.sss., exitcode=0
Command end time
Moving file to repo dir…
Command start time
Connection to l1033lab.sss. closed.
SSH command execution finished
host=l1033lab.sss., exitcode=0
Command end time
Copying setup script file…
Command start time
scp /usr/lib/python2.6/site-packages/ambari_server/
host=l1033lab.sss., exitcode=0
Command end time
Running setup agent script…
Command start time
This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
http://public-repo-/ambari/centos6/1.x/updates/1.7.0/repodata/repomd.xml: [Errno 12] Timeout on http://public-repo-/ambari/centos6/1.x/updates/1.7.0/repodata/repomd.xml: (28, ‘connect() timed out!’)
Trying other mirror.
Error: Cannot retrieve repository metadata (repomd.xml) for repository: Updates-ambari-1.7.0. Please verify its path and try again
This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
http://public-repo-/ambari/centos6/1.x/updates/1.7.0/repodata/repomd.xml: [Errno 12] Timeout on http://public-repo-/ambari/centos6/1.x/updates/1.7.0/repodata/repomd.xml: (28, ‘connect() timed out!’)
Trying other mirror.
Error: Cannot retrieve repository metadata (repomd.xml) for repository: Updates-ambari-1.7.0. Please verify its path and try again
/bin/sh: /usr/sbin/ambari-agent: No such file or directory
{‘exitstatus': 1, ‘log': (”, None)}
Connection to l1033lab.sss. closed.
SSH command execution finished
host=l1033lab.sss., exitcode=1
Command end time
ERROR: Bootstrap of host l1033lab.sss. fails because previous action finished with non-zero exit code (1)
ERROR MESSAGE: tcgetattr: Invalid argument
Connection to l1033lab.sss. closed.
STDOUT: This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
http://public-repo-/ambari/centos6/1.x/updates/1.7.0/repodata/repomd.xml: [Errno 12] Timeout on http://public-repo-/ambari/centos6/1.x/updates/1.7.0/repodata/repomd.xml: (28, ‘connect() timed out!’)
Trying other mirror.
Error: Cannot retrieve repository metadata (repomd.xml) for repository: Updates-ambari-1.7.0. Please verify its path and try again
This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
http://public-repo-/ambari/centos6/1.x/updates/1.7.0/repodata/repomd.xml: [Errno 12] Timeout on http://public-repo-/ambari/centos6/1.x/updates/1.7.0/repodata/repomd.xml: (28, ‘connect() timed out!’)
Trying other mirror.
Error: Cannot retrieve repository metadata (repomd.xml) for repository: Updates-ambari-1.7.0. Please verify its path and try again
/bin/sh: /usr/sbin/ambari-agent: No such file or directory
{‘exitstatus': 1, ‘log': (”, None)}
Connection to l1033lab.sss. closed.
The last one to failed(where Ambari runs) had the following log
Creating target directory…
Command start time
Connection to l1032lab.sss. closed.
SSH command execution finished
host=l1032lab.sss., exitcode=0
Command end time
Copying common functions script…
Command start time
scp /usr/lib/python2.6/site-packages/ambari_commons
host=l1032lab.sss., exitcode=0
Command end time
Copying OS type check script…
Command start time
scp /usr/lib/python2.6/site-packages/ambari_server/
host=l1032lab.sss., exitcode=0
Command end time
Running OS type check…
Command start time
Cluster primary/cluster OS type is redhat6 and local/current OS type is redhat6
Connection to l1032lab.sss. closed.
SSH command execution finished
host=l1032lab.sss., exitcode=0
Command end time
Checking ‘sudo’ package on remote host…
Command start time
Connection to l1032lab.sss. closed.
SSH command execution finished
host=l1032lab.sss., exitcode=0
Command end time
Copying repo file to ‘tmp’ folder…
Command start time
scp /etc/yum.repos.d/ambari.repo
host=l1032lab.sss., exitcode=0
Command end time
Moving file to repo dir…
Command start time
Connection to l1032lab.sss. closed.
SSH command execution finished
host=l1032lab.sss., exitcode=0
Command end time
Copying setup script file…
Command start time
scp /usr/lib/python2.6/site-packages/ambari_server/
host=l1032lab.sss., exitcode=0
Command end time
Running setup agent script…
Command start time
Automatic Agent registration timed out (timeout = 300 seconds). Check your network connectivity and retry registration, or use manual agent registration.
The machines are having Internet access so I presume that there is no need for configuring local repositories. Are there some steps mandatory before one can install Ambari and proceed ?
Best How To :
After spending plenty of time, I assumed that despite of having Internet connectivity, the local repositories will be needed. I installed Apache server and made my repositories accessible as per the . Then, in ‘Advanced Repository Options’, replaced the web url with the local repository URL and it registered the hosts. I'm still not sure why local repos. are needed(even the documentation mentions that those are needed only in case of limited or no Internet connectivity)
I am not certain whether the examples were still shipped with HDP. You would want to search for hadoop-examples\*.jar. If you were unable to locate the examples jar then you might need to download it. Unfortunately it appears the version 2 hadoop does not have maven for it?? /artifact/org.apache.hadoop/hadoop-examples ...
After spending plenty of time, I assumed that despite of having Internet connectivity, the local repositories will be needed. I installed Apache server and made my repositories accessible as per the documentation. Then, in ‘Advanced Repository Options’, replaced the web url with the local repository URL and it registered the...
Generally speaking, oozie has several advantages here: Generate a DAG each time so you can have a direct view on your workflow. Easier access to the log files for each action such as hive, pig, etc. You will have your full history for each run of your task. Better schedule...
Try this : sqoop import --connect "jdbc:sqlserver://;database=AdventureWorksLT2012;username=password=test" --table ProductModel --hive-import -- --schema SalesLT --incremental append --check-column ProductModelID --last-value "128" ...
I have been using Hortonworks, and you need to add the file/jar within the same session - as you have discovered.
It turns out to be a proxy issue, to access the internet I had to add my proxy details to the file /var/lib/ambari-server/ export AMBARI_JVM_ARGS=$AMBARI_JVM_ARGS' -Xms512m -Xmx2048m -Dhttp.proxyHost=theproxy -Dhttp.proxyPort=80' When ganglia was trying to access each node in the cluster the request was going via the proxy...
First, if you want a DR solution you need to store this data somewhere outside of the main production site. This implies that the secondary site should have at least the same storage capacity as the main one. Now remember that the main ideas that lead to HDFS were moving...
See if this JIRA helps with the issue you are hitting...
Yes, you can specify each time you run a Giraph job via using option -Dgiraph.zkList=localhost:2181 Also you can set it up in Hadoop configs and then you don't have to pass on this option each time you submit a Giraph job. For that add the following line in conf/core-site.xml file...
i was using hortonworks sandbox v2.2 , after long time of debugging, i found out there's some conflicts between spark version i installed manually "v1.2" and hortonworks sandbox libraries,so i decided to use cloudera quickstart 5.3.0 and now everything working fine
This great blog post helped us: http://www./2015/01/rename-host-in-ambari-170.html Basically you will need to log into Ambari's database. (Not the GUI, the actual backend database). It's best to read the blog post in its entirety, but I am appending the important secret sauce that actually makes things happen. If you're on mysql:...
I got an answer,First we will need to go to Hive and enter the Hive query Query:grant SELECT on table "table-name" where “table” is the table you want user “hue” to view....
Wrong user used during the installation process. This solved the problem: sudo -u oozie /usr/lib/oozie/bin/ create -sqlfile /usr/lib/oozie/oozie.sql -run Instead of: sudo /usr/lib/oozie/bin/ create -sqlfile /usr/lib/oozie/oozie.sql -run...
Hive doesn't have the feature of unique identifier of each row (rowid). But if you don't have any primary key or unique key values, you can use the analytical function row_number.
A simple count operation involves a map reduce job at the back end. And that involves 10 million rows in your case. Look here for a better explanation. Well this is just for the things happening at the background and execution time and not your question regarding memory requirements. Atleast,...
Okay, found it (had to debug the full python stack to understand). It's not really advertised, but some hue.ini parameter names have changed: beeswax_server_host --& hive_server_host beeswax_server_port --& hive_server_port It was defaulting hive_server_host to localhost, which is not correct on a secure cluster....
I suspect this is due to memory. Your memory should be at least 4096 MB.
(Disclaimer: I work at WANdisco.) My view is that the products are complementary. Falcon does a lot of things besides data transfer, like setting up data workflow stages. WANdisco's products do active-active data replication (which means that data can be used equivalently from both the source and target clusters). In...
Hello fellow Hortonworker! Mahout is in the HDP repositories, but it's not available in the ambari install wizard (i.e. Services-&Add Service). Therefore the only way to install it is via: yum install mahout As noted here, you should only install it on the master node. Also note that Mahout is...
It's in /usr/hdp/
Consider decreasing your channel's capacity and transactionCapacity settings: capacity 100 The maximum number of events stored in the channel transactionCapacity 100 The maximum number of events the channel will take from a source or give to a sink per transaction These settings are responsible for controlling how many events get...
Yes you can do it via ssh. Horton Sandbox comes with ssh support pre installed. You can execute the sqoop command via ssh client on windows. Or if you want to do it programaticaly (thats what I have done in java) you have to follow this step. Download sshxcute java...
Issue is fixed. There was a permission issue with user folder i.e. /home/hduser. Somehow permission got changed.
Use a FQHN (fully-qualified-hostname) on both ambari server and all its client nodes.
Moving NameNode to same network with DataNodes solved the problem. DataNodes are in 192.1.5.* network. NameNode was in 192.1.4.* network. After moving NameNode to 192.1.5.* did the trick for my case....
Partually fixed: it is necessary to stop all the HDFS services (Journal Node, Namenodes and Datanodes) before editing the hdfs-site.xml file. Then, of course, Ambari "start button" cannot be used because the configuration would be smashed... thus it is necessary to re-start all the services manually. This is not the...
hdfs dfs -get /hdfs/path /local/path hdfs dfs -put /local/path /hdfs/path...
For formatting the NameNode, you can use the following command run as the 'hdfs' admin user: /usr/bin/hdfs namenode -format For starting up the NameNode daemon, use the script: /usr/hdp/current/hadoop-hdfs-namenode/../hadoop/sbin/ start namenode "-config $HADOOP_CONF_DIR" is an optional parameter here in case you want to reference a specific Hadoop configuration directory....
bdutil in fact is designed to suppo you can certainly edit an existing one for an easy way to get started, but the recommended best-practice is to create your own "" extension which can be mixed in with other bdutil extensions if necessary. This way you can more...
Apache Falcon simplifies the configuration of data motion with:
lineage and traceability. This provides data governance consistency across Hadoop components. Falcon replication is asynchronous with delta changes. Recovery is done by running a process and swapping the source and target. Data loss – Delta data may...
So here is the way I solved it. We are using CDH5 to run Camus to pull data from kafka. We run CamusJob which is responsible for getting data from kafka using comman line: hadoop jar... The problem is that new hosts didn't get so-called "yarn-gateway". Cloudera names pack of...
Do you have a table already created in Hbase ? You will first have to create a table in Hbase with 'd' as a column family and then you can import this tsv file into that table.
First delete all contents from hdfs folder: Value of hadoop.tmp.dir rm -rf /grid/hadoop/hdfs Make sure that dir has right owner and permission (Username according to your system) sudo chown hduser:hadoop -R /grid/hadoop/hdfs sudo chmod 777 -R /grid/hadoop/hdfs Format the namenode: hadoop namenode -format Try this: sudo chown -R hdfs:hadoop /grid/hadoop/hdfs/dn...
Hortonworks Hadoop companion files contain oozie-site.xml property with missing entry which enables ShareLibService. Which causes that new Shared Lib feature doesn't work as the endpoint is not registered. To fix this add org.apache.oozie.service.ShareLibService entry to list. Be careful as the services are not independent so the order matters!...
Check the value of property fs.defaultFS in core-site.xml this contains the ip-address/hostname and port on which NameNode daemon should bind to when it start's up. I see that you are using hortonworks sandbox, here is the property in core-site.xml and its located in /etc/hadoop/conf/core-site.xml &property& &name&fs.defaultFS&/name& &value&hdfs://:8020&/value& &/property& So, you...
What do you have specified as the hostname in /etc/ambari-agent/conf/ambari-agent.ini ? I assume that it is ''Hortonworks, Hadoop, Stinger and Hive - 推酷
Hortonworks, Hadoop, Stinger and Hive
I chatted yesterday with the Hortonworks gang. The main subject was Hortonworks’ approach to SQL-on-Hadoop — commonly called Stinger — &but at my request we cycled through a bunch of other topics as well. Company-specific notes include:
Hortonworks founder J. Eric “Eric14″ Baldeschwieier is no longer& at Hortonworks, although I imagine he stays closely in touch. What he’s doing next is unspecified, except by the general phrase “his own thing”. (Derrick Harris has
John Kreisa still is at Hortonworks, just not as marketing VP. Think instead of partnerships and projects.
~250 employees.
~70-75 subscription customers.
Our deployment and use case discussions were a little confused, because a key part of Hortonworks’ strategy is to support and encourage the idea of combining use cases and workloads on a single cluster. But I did hear:
10ish nodes for a typical starting cluster.
100ish nodes for a typical “data lake” committed adoption.
Teradata UDA (Unified Data Architecture)* customers sometimes (typically?) jumping straight to a data lake scenario.
A few users in the 10s of 1000s of nodes. (Obviously Yahoo is one.)
HBase used in &50% of installations.
Hive probably even more than that.
Hortonworks is saying a fair amount of interest in Windows Hadoop deployments.
*By the way — Teradata seems serious about pushing the UDA as a core message.
Ecosystem notes, in Hortonworks’ perception, included:
Cloudera is obviously Hortonworks’ biggest distro competitor. Next is IBM, presumably in its blue-forever installed base. MapR is barely Pivotal’s likely rise hasn’t yet hit sales reports.
Hortonworks evidently sees a lot of MicroStrategy and Tableau, and some Platfora and Datameer, the latter two at around the same level of interest.
Accumulo is a big deal in the Federal government, and has gotten a few health care wins as well. Its success is all about security. (Note: That’s all consistent with what I hear elsewhere.)
I also asked specifically about OpenStack. Hortonworks is a member of the OpenStack project, contributes nontrivially to Swift and other subprojects, and sees Rackspace as an important partner. But despite all that, I think strong Hadoop/OpenStack integration is something for the indefinite future.
Hortonworks’ views about
Hadoop 2.0
start from the premise that its goal is to support running a multitude of workloads on a single cluster. (See, for example, what I previously posted about
.) Timing notes for Hadoop 2.0 include:
It’s been in preview/release candidate/commercial beta mode for weeks.
Q3 H2 is the emphatic goal.
Yahoo’s been in production with YARN &8 months, and has no MapReduce 1 clusters left. (Yahoo has &35,000 Hadoop nodes.)
The last months of delays have been mainly about sprucing up various APIs and protocols, which may need to serve for a similar multi-year period as Hadoop 1′s have. But there also was some YARN stabilization into May.
Frankly, I think Cloudera’s earlier and necessarily
was a better choice than Hortonworks’ later big bang, even though the core-mission aspect of Hadoop 2.0 is what was least ready. HDFS (Hadoop Distributed File System) performance, NameNode failover and so on were well worth having, and it’s more than a year between Cloudera starting supporting them and when Hortonworks is offering Hadoop 2.0.
Hortonworks’ approach to doing SQL-on-Hadoop can be summarized simply as “Make Hive into as good an analytic RDBMS as possible, all in open source”. Key elements include:&
a Hive-friendly execution environment in Hadoop 2.0.
For example, this seems to be a main point of
, although Tez is also meant to support Pig and so on as well. (Recall the close relationship between Hortonworks and Pig fan Yahoo.)
a Hive-friendly HDFS file format,
. To a first approximation, ORC sounds a lot like Cloudera Impala’s preferred format
Improving Hive itself, notably in:
SQL functionality.
Query planning and optimization.
Vectorized execution (Microsoft seems to be helping significantly with that).
Specific notes include:
Some of the Hive improvements — e.g. SQL windowing, better query planning over MapReduce 1 — came out in May.
Others — e.g. Tez port &– seem to be coming soon.
Yet others — notably a true cost-based optimizer — haven’t even been designed yet.
Hive apparently often takes 4-5 seconds to plan a query, with a lot of the problem being slowness in the metadata store. (I hope that that’s already improved in
, but I didn’t think to ask.) Hortonworks thinks 100 milliseconds would be a better number.
Other SQL functionality that got mentioned was UDFs (User Defined Functions) and sub-queries. In general, it sounds as if the Hive community is determined to someday falsify the “Hive supports a distressingly small subset of SQL” complaint.
As for ORC:
ORC manages data in 256 megabyte chunks of rows. Within such chunks, ORC is columnar.
Hortonworks asserts that ORC is ahead of Parquet in such areas and indexing and predicate pushdown, and only admits a Parquet advantage in one area — the performance advantages of being written in C.
The major contributors to ORC are Hortonworks, Microsoft, and Facebook. There are ~10 contributors in all.
ORC has a 2-tiered compression story.
“Lightweight” type-specific compression is mandatory, for example:
Dictionary/tokenization, for single columns within chunks.
Run-length encoding for integers.
Block-level compression on top of that is optional, via a collection of usual-suspect algorithms.
Finally, I asked Hortonworks what it sees as a typical or default Hadoop node these days. Happily, the answers seemed like straightforward upgrades to
. Specifics included:
2 x 6 = 12 cores.
12 or so disks, usually 2-3 terabytes each. 4 TB disks are beginning to show up in “outlier” cases.
Usually 72 gigs or more of RAM. 128 gigs is fairly common. 256 sometimes happens.
10GigE is showing up at some web companies, but Hortonworks groaned a bit about the expense. Hearing that, I didn’t even ask about Infiniband, its use in certain
Hortonworks isn’t seeing much solid-state drive adoption yet, some NameNodes excepted. No doubt that’s a cost issue.
Hortonworks sees GPUs only for “outlier” cases.
权限设置: 公开


更多关于 hortonworks hdp 的文章


