Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deploy spark in a server and connect to it locally #72811

Open
1Edtrujillo1 opened this issue Sep 30, 2024 · 4 comments
Open

deploy spark in a server and connect to it locally #72811

1Edtrujillo1 opened this issue Sep 30, 2024 · 4 comments
Assignees
Labels
spark tech-issues The user has a technical issue about an application triage Triage is needed

Comments

@1Edtrujillo1
Copy link

1Edtrujillo1 commented Sep 30, 2024

Name and Version

docker.io/bitnami/spark:3.5

What architecture are you using?

None

What steps will reproduce the bug?

Hi, I have deployed the docker compose in a Ubuntu VM:

services:
  spark:
    image: docker.io/bitnami/spark:3.5
    environment:
      - SPARK_MODE=master
      - SPARK_RPC_AUTHENTICATION_ENABLED=no
      - SPARK_RPC_ENCRYPTION_ENABLED=no
      - SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
      - SPARK_SSL_ENABLED=no
      - SPARK_USER=spark
    ports:
      - '8080:8080'
      - '7077:7077'
  spark-worker:
    image: docker.io/bitnami/spark:3.5
    environment:
      - SPARK_MODE=worker
      - SPARK_MASTER_URL=spark://<ubuntu ip>:7077
      - SPARK_WORKER_MEMORY=1G
      - SPARK_WORKER_CORES=1
      - SPARK_RPC_AUTHENTICATION_ENABLED=no
      - SPARK_RPC_ENCRYPTION_ENABLED=no
      - SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
      - SPARK_SSL_ENABLED=no
      - SPARK_USER=spark

I tested and it is working correctly on the server, but it is not working if I try to connect from my local computer

It is possible to connect through port 7077

nc -zv 7077
Connection to port 7077 [tcp/*] succeeded!

What do you see instead?

from pyspark.sql import SparkSession

spark = SparkSession.builder \
        .appName("SparkTest") \
        .master("spark://<ubuntu ip>:7077") \
        .getOrCreate()

I receive the error:

Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
  File "/Users/etru/anaconda3/lib/python3.11/site-packages/pyspark/sql/session.py", line 497, in getOrCreate
    sc = SparkContext.getOrCreate(sparkConf)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/etru/anaconda3/lib/python3.11/site-packages/pyspark/context.py", line 515, in getOrCreate
    SparkContext(conf=conf or SparkConf())
  File "/Users/etru/anaconda3/lib/python3.11/site-packages/pyspark/context.py", line 203, in __init__
    self._do_init(
  File "/Users/etru/anaconda3/lib/python3.11/site-packages/pyspark/context.py", line 296, in _do_init
    self._jsc = jsc or self._initialize_context(self._conf._jconf)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/etru/anaconda3/lib/python3.11/site-packages/pyspark/context.py", line 421, in _initialize_context
    return self._jvm.JavaSparkContext(jconf)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/etru0005/anaconda3/lib/python3.11/site-packages/py4j/java_gateway.py", line 1587, in __call__
    return_value = get_return_value(
                   ^^^^^^^^^^^^^^^^^
  File "/Users/etru/anaconda3/lib/python3.11/site-packages/py4j/protocol.py", line 326, in get_return_value
    raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.UnsupportedOperationException: getSubject is supported only if a security manager is allowed
        at java.base/javax.security.auth.Subject.getSubject(Subject.java:347)
        at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:577)
        at org.apache.spark.util.Utils$.$anonfun$getCurrentUserName$1(Utils.scala:2416)
        at scala.Option.getOrElse(Option.scala:189)
        at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2416)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:329)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
        at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62)
        at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:501)
        at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:485)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
        at py4j.Gateway.invoke(Gateway.java:238)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
        at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
        at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
        at java.base/java.lang.Thread.run(Thread.java:1575)
@1Edtrujillo1 1Edtrujillo1 added the tech-issues The user has a technical issue about an application label Sep 30, 2024
@github-actions github-actions bot added the triage Triage is needed label Sep 30, 2024
@carrodher carrodher added the spark label Oct 1, 2024
@carrodher
Copy link
Member

Hi, the issue may not be directly related to the Bitnami container image/Helm chart, but rather to how the application is being utilized, configured in your specific environment, or tied to a particular scenario that is not easy to reproduce on our side.

If you think that's not the case and want to contribute a solution, we welcome you to create a pull request. The Bitnami team is excited to review your submission and offer feedback. You can find the contributing guidelines here.

Your contribution will greatly benefit the community. Feel free to reach out if you have any questions or need assistance.

Suppose you have any questions about the application, customizing its content, or technology and infrastructure usage. In that case, we highly recommend that you refer to the forums and user guides provided by the project responsible for the application or technology.

With that said, we'll keep this ticket open until the stale bot automatically closes it, in case someone from the community contributes valuable insights.

@cpbotha
Copy link

cpbotha commented Oct 8, 2024

I am seeing exactly this error, with a similarly minimal example. docker-compose.yml (only change: port 7077 exposed from spark master) + images from bitnami, and in my case connecting to port 7077 on localhost for local development:

spark = SparkSession.builder.appName("HelloWorld").master("spark://localhost:7077").getOrCreate()

Note: "how the application is being utilized" -- the only use here is the single-line invocation above which causes the exception, the rest is bitnami images and docker-compose. Is there anything else we can try?

I am using docker desktop 4.34.2 (167172) on macOS 15.0.1 on an M1 Pro, pulled images are arm.

@cpbotha
Copy link

cpbotha commented Oct 8, 2024

Looks like this could be because getSubject() was deprecated for removal in JDK 17, but the hadoop packaged with spark still makes use of this API.

See https://issues.apache.org/jira/browse/HADOOP-19212 and https://issues.apache.org/jira/browse/CALCITE-6590 and https://openjdk.org/jeps/411

I can see the bitnami image has jdk 17, so I'm not sure why it's raising the getSubject / security manager exception when it's supposed to only warn in that version of the jdk.

As a work-around, I've tried to pass -Djava.security.manager=allow in through various environment variables and even the SparkSession builder config, but to no avail.

I would appreciate any tips here.

@cpbotha
Copy link

cpbotha commented Oct 8, 2024

The problems was the JDK on my client. When I downgraded from 23 to 21, the security manager issue turned into a warning and not an exception, as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spark tech-issues The user has a technical issue about an application triage Triage is needed
Projects
None yet
Development

No branches or pull requests

3 participants