PingCAP
  • Docs
  • Success Stories
  • Blog
  • About
  • Free Consultation
PingCAP
  • Docs
  • Success Stories
  • Blog
  • About
  • Free Consultation

Contact

中文
Documentation
  • About TiDB
    • TiDB Introduction
    • TiDB Architecture
  • Quick Start
    • TiDB Quick Start Guide
    • Basic SQL Statements
    • Bikeshare Example Database
  • TiDB User Guide
    • TiDB Server Administration
      • The TiDB Server
      • The TiDB Command Options
      • The TiDB Data Directory
      • The TiDB System Database
      • The TiDB System Variables
      • The Proprietary System Variables and Syntax in TiDB
      • The TiDB Server Logs
      • The TiDB Access Privilege System
      • TiDB User Account Management
      • Use Encrypted Connections
    • SQL Optimization
      • Understand the Query Execution Plan
      • Introduction to Statistics
    • Language Structure
      • Literal Values
      • Schema Object Names
      • Keywords and Reserved Words
      • User-Defined Variables
      • Expression Syntax
      • Comment Syntax
    • Globalization
      • Character Set Support
      • Character Set Configuration
      • Time Zone Support
    • Data Types
      • Numeric Types
      • Date and Time Types
      • String Types
      • JSON Types
      • The ENUM data type
      • The SET Type
      • Data Type Default Values
    • Functions and Operators
      • Function and Operator Reference
      • Type Conversion in Expression Evaluation
      • Operators
      • Control Flow Functions
      • String Functions
      • Numeric Functions and Operators
      • Date and Time Functions
      • Bit Functions and Operators
      • Cast Functions and Operators
      • Encryption and Compression Functions
      • Information Functions
      • JSON Functions
      • Aggregate (GROUP BY) Functions
      • Miscellaneous Functions
      • Precision Math
    • SQL Statement Syntax
      • Data Definition Statements
      • Data Manipulation Statements
      • Transactions
      • Database Administration Statements
      • Prepared SQL Statement Syntax
      • Utility Statements
      • TiDB SQL Syntax Diagram
    • JSON Functions and Generated Column
    • Connectors and APIs
    • TiDB Transaction Isolation Levels
    • Error Codes and Troubleshooting
    • Compatibility with MySQL
    • TiDB Memory Control
    • Slow Query Log
    • Advanced Usage
      • Read Data From History Versions
      • Garbage Collection (GC)
  • TiDB Operations Guide
    • Hardware and Software Requirements
    • Deploy
      • Ansible Deployment (Recommended)
      • Offline Deployment Using Ansible
      • Docker Deployment
      • Docker Compose Deployment
      • Cross-Region Deployment
      • Kubernetes Deployment
    • Configure
      • Configuration Flags
      • Configuration File Description
      • Modify Component Configuration Using Ansible
      • Enable TLS Authentication
      • Generate Self-signed Certificates
    • Monitor
      • Overview of the Monitoring Framework
      • Key Metrics
      • Monitor a TiDB Cluster
    • Scale
      • Scale a TiDB Cluster
      • Scale Using Ansible
    • Upgrade
      • Upgrade the Component Version
      • TiDB 2.0 Upgrade Guide
    • Tune Performance
    • Backup and Migrate
      • Backup and Restore
      • Migrate
        • Migration Overview
        • Migrate All the Data
        • Migrate the Data Incrementally
    • TiDB-Ansible Common Operations
    • Troubleshoot
  • TiDB Enterprise Tools
    • Syncer
    • mydumper
    • Loader
    • TiDB-Binlog
    • PD Control
    • PD Recover
    • TiKV Control
    • TiDB Controller
  • TiKV Documentation
  • TiSpark Documentation
    • Quick Start Guide
    • User Guide
  • Frequently Asked Questions (FAQ)
  • TiDB Best Practices
  • Releases
    • 2.1 RC3
    • 2.1 RC2
    • 2.0.7
    • 2.1 RC1
    • 2.0.6
    • 2.0.5
    • 2.1 Beta
    • 2.0.4
    • 2.0.3
    • 2.0.2
    • 2.0.1
    • 2.0
    • 2.0 RC5
    • 2.0 RC4
    • 2.0 RC3
    • 2.0 RC1
    • 1.1 Beta
    • 1.0.8
    • 1.0.7
    • 1.1 Alpha
    • 1.0.6
    • 1.0.5
    • 1.0.4
    • 1.0.3
    • 1.0.2
    • 1.0.1
    • 1.0
    • Pre-GA
    • RC4
    • RC3
    • RC2
    • RC1
  • TiDB Adopters
  • TiDB Roadmap
  • Connect with us
  • More Resources
    • Frequently Used Tools
    • PingCAP Blog
    • Weekly Update

TiSpark Quick Start Guide

To make it easy to try TiSpark, the TiDB cluster installed using TiDB-Ansible integrates Spark, TiSpark jar package and TiSpark sample data by default.

Deployment information

  • Spark is deployed by default in the spark folder in the TiDB instance deployment directory.
  • The TiSpark jar package is deployed by default in the jars folder in the Spark deployment directory.

    spark/jars/tispark-SNAPSHOT-jar-with-dependencies.jar
  • TiSpark sample data and import scripts are deployed by default in the TiDB-Ansible directory.

    tidb-ansible/resources/bin/tispark-sample-data

Prepare the environment

Install JDK on the TiDB instance

Download the latest version of JDK 1.8 from Oracle JDK official download page. The version used in the following example is jdk-8u141-linux-x64.tar.gz.

Extract the package and set the environment variables based on your JDK deployment directory.

Edit the ~/.bashrc file. For example:

export JAVA_HOME=/home/pingcap/jdk1.8.0_144
export PATH=$JAVA_HOME/bin:$PATH

Verify the validity of JDK:

$ java -version
java version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)

Import the sample data

Assume that the TiDB cluster is started. The service IP of one TiDB instance is 192.168.0.2, the port is 4000, the user name is root, and the password is null.

cd tidb-ansible/resources/bin/tispark-sample-data

Edit the TiDB login information in sample_data.sh. For example:

mysql -h 192.168.0.2 -P 4000 -u root < dss.ddl

Run the script:

./sample_data.sh

Note: You need to install the MySQL client on the machine that runs the script. If you are a CentOS user, you can install it through the command yum -y install mysql.

Log into TiDB and verify that the TPCH_001 database and the following tables are included.

$ mysql -uroot -P4000 -h192.168.0.2
MySQL [(none)]> show databases;
+--------------------+
| Database           |
+--------------------+
| INFORMATION_SCHEMA |
| PERFORMANCE_SCHEMA |
| TPCH_001           |
| mysql              |
| test               |
+--------------------+
5 rows in set (0.00 sec)

MySQL [(none)]> use TPCH_001
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
MySQL [TPCH_001]> show tables;
+--------------------+
| Tables_in_TPCH_001 |
+--------------------+
| CUSTOMER           |
| LINEITEM           |
| NATION             |
| ORDERS             |
| PART               |
| PARTSUPP           |
| REGION             |
| SUPPLIER           |
+--------------------+
8 rows in set (0.00 sec)

Use example

First start the spark-shell in the spark deployment directory:

$ cd spark
$ bin/spark-shell
import org.apache.spark.sql.TiContext
val ti = new TiContext(spark)

// Mapping all TiDB tables from `TPCH_001` database as Spark SQL tables
ti.tidbMapDatabase("TPCH_001")

Then you can call Spark SQL directly:

scala> spark.sql("select count(*) from lineitem").show

The result is:

+--------+
|count(1)|
+--------+
|   60175|
+--------+

Now run a more complex Spark SQL:

scala> spark.sql(
      """select
        |   l_returnflag,
        |   l_linestatus,
        |   sum(l_quantity) as sum_qty,
        |   sum(l_extendedprice) as sum_base_price,
        |   sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
        |   sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
        |   avg(l_quantity) as avg_qty,
        |   avg(l_extendedprice) as avg_price,
        |   avg(l_discount) as avg_disc,
        |   count(*) as count_order
        |from
        |   lineitem
        |where
        |   l_shipdate <= date '1998-12-01' - interval '90' day
        |group by
        |   l_returnflag,
        |   l_linestatus
        |order by
        |   l_returnflag,
        |   l_linestatus
      """.stripMargin).show

The result is:

+------------+------------+---------+--------------+--------------+
|l_returnflag|l_linestatus|  sum_qty|sum_base_price|sum_disc_price|
+------------+------------+---------+--------------+--------------+
|           A|           F|380456.00|  532348211.65|505822441.4861|
|           N|           F|  8971.00|   12384801.37| 11798257.2080|
|           N|           O|742802.00| 1041502841.45|989737518.6346|
|           R|           F|381449.00|  534594445.35|507996454.4067|
+------------+------------+---------+--------------+--------------+
(Continued)
-----------------+---------+------------+--------+-----------+
       sum_charge|  avg_qty|   avg_price|avg_disc|count_order|
-----------------+---------+------------+--------+-----------+
 526165934.000839|25.575155|35785.709307|0.050081|      14876|
  12282485.056933|25.778736|35588.509684|0.047759|        348|
1029418531.523350|25.454988|35691.129209|0.049931|      29181|
 528524219.358903|25.597168|35874.006533|0.049828|      14902|
-----------------+---------+------------+--------+-----------+

See more examples.

"TiSpark Quick Start Guide" was last updated Aug 31 2018: op-guide: update tikv rolling udpate policy (#592) (965cd1d)
Improve this page

What’s on this page

Product

  • TiDB
  • TiSpark
  • Roadmap
  • TiDB Cloud

Docs

  • Quick Start
  • Best Practices
  • FAQ
  • TiDB Utilities
  • Release Notes

Resources

  • Blog
  • Weekly
  • GitHub
  • TiDB Academy
  • TiDB Community

Company

  • About
  • Careers
  • News
  • Contact Us
  • Privacy Policy
  • Terms of Service

Connect

  • Twitter
  • LinkedIn
  • Reddit
  • Google Group
  • Stack Overflow

© 2018 PingCAP. All Rights Reserved.

中文